If a collection of interconnected phi nodes is only ever loaded, stored
or bitcast then we can convert the whole set to the bitcast type,
potentially helping to reduce the number of register moves needed as the
phi's are passed across basic block boundaries. This has to be done in
CodegenPrepare as it naturally straddles basic blocks.
The alorithm just looks from phi nodes, looking at uses and operands for
a collection of nodes that all together are bitcast between float and
integer types. We record visited phi nodes to not have to process them
more than once. The whole subgraph is then replaced with a new type.
Loads and Stores are bitcast to the correct type, which should then be
folded into the load/store, changing it's type.
This comes up in the biquad testcase due to the way MVE needs to keep
values in integer registers. I have also seen it come up from aarch64
partner example code, where a complicated set of sroa/inlining produced
integer phis, where float would have been a better choice.
I also added undef and extract element handling which increased the
potency in some cases.
This adds it with an option that defaults to off, and disabled for 32bit
X86 due to potential issues around canonicalizing NaNs.
Differential Revision: https://reviews.llvm.org/D81827
When the zext gets promoted, it used to retain the original location,
which pessimizes the debugging experience causing an unexpected
jump in stepping at -Og.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46120 (which also
contains a full C repro).
Differential Revision: https://reviews.llvm.org/D81437
The promotion machinery in CGP moves instructions retaining
debug locations. When the transformation is local, this is mostly
correct, but when instructions are moved cross-BBs, this is not
always true and causes jumpiness in line tables. This is the first
of a series of commits. sext(s) and zext(s) need to be treated
similarly.
Differential Revision: https://reviews.llvm.org/D81879
AddressingModeMatcher::matchScaledValue was calling getSExtValue for a constant before ensuring that we can actually represent the value as int64_t
Fixes OSSFuzz#22723 which is a followup to rGc479052a74b2 (PR46004 / OSSFuzz#22357)
Now that all of the statepoint related routines have classes with isa support, let's cleanup.
I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
AddressingModeMatcher::matchAddr was calling getSExtValue for a constant before ensuring that we can actually represent the value as int64_t
Fixes PR46004 / OSSFuzz#22357
Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.
Differential Revision: https://reviews.llvm.org/D80044
This is basically the same patch as D63233, but converted to
funnel shifts rather than regular shifts. I did not see a
way to effectively share code for these 2 cases though.
This follows D79718 and D79827 to re-fix PR37426 because
that gets canonicalized to funnel shift intrinsics in IR.
I did draft an alternative patch as an enhancement to
"shouldSinkOperands()", but that was awkward because
we have to key the transform from the select, but then
look at both its users and its operands.
Expands on the enablement of the shouldSinkOperands() TLI hook in:
D79718
The last codegen/IR test diff shows what I suspected could happen - we were
sinking all splat shift operands into a loop. But that's not what we want in
general; we only want to sink the *shift amount* operand if it is a splat.
Differential Revision: https://reviews.llvm.org/D79827
Under MVE a vdup will always take a gpr register, not a floating point
value. During DAG combine we convert the types to a bitcast to an
integer in an attempt to fold the bitcast into other instructions. This
is OK, but only works inside the same basic block. To do the same trick
across a basic block boundary we need to convert the type in
codegenprepare, before the splat is sunk into the loop.
This adds a convertSplatType function to codegenprepare to do that,
putting bitcasts around the splat to force the type to an integer. There
is then some adjustment to the code in shouldSinkOperands to handle the
extra bitcasts.
Differential Revision: https://reviews.llvm.org/D78728
them in a special text section.
For sampleFDO, because the optimized build uses profile generated from
previous release, previously we couldn't tell a function without profile
was truely cold or just newly created so we had to treat them conservatively
and put them in .text section instead of .text.unlikely. The result was when
we persuing the best performance by locking .text.hot and .text in memory,
we wasted a lot of memory to keep cold functions inside.
In https://reviews.llvm.org/D66374, we introduced profile symbol list to
discriminate functions being cold versus functions being newly added.
This mechanism works quite well for regular use cases in AutoFDO. However,
in some case, we can only have a partial profile when optimizing a target.
The partial profile may be an aggregated profile collected from many targets.
The profile symbol list method used for regular sampleFDO profile is not
applicable to partial profile use case because it may be too large and
introduce many false positives.
To solve the problem for partial profile use case, we provide an option called
--profile-unknown-in-special-section. For functions without profile, we will
still treat them conservatively in compiler optimizations -- for example,
treat them as warm instead of cold in inliner. When we use profile info to
add section prefix for functions, we will discriminate functions known to be
not cold versus functions without profile (being unknown), and we will put
functions being unknown in a special text section called .text.unknown.
Runtime system will have the flexibility to decide where to put the special
section in order to achieve a balance between performance and memory saving.
Differential Revision: https://reviews.llvm.org/D62540
Summary:
This helps detect some missed BFI updates during CodeGenPrepare.
This is debug build only and disabled behind a flag.
Fix a missed update in CodeGenPrepare::dupRetToEnableTailCallOpts().
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77417
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.
RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
Differential Revision: https://reviews.llvm.org/D79002
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.
The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.
getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.
This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.
Differential Revision: https://reviews.llvm.org/D78635
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Summary:
Remove asserting vector getters from Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: dexonsmith, sdesmalen, efriedma
Reviewed By: efriedma
Subscribers: cfe-commits, hiraditya, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D77278
I've always found the "findValue" a little odd and
inconsistent with other things in SDB.
This simplfifies the code in SDB to just handle a splat constant
address or a 2 operand GEP in the same BB. This removes the
need for "findValue" since the operands to the GEP are
guaranteed to be available. The splat constant handling is
new, but was needed to avoid regressions due to constant
folding combining GEPs created in CGP.
CGP is now responsible for canonicalizing gather/scatters into
this form. The pattern I'm using for scalarizing, a scalar GEP
followed by a GEP with an all zeroes index, seems to be subject
to constant folding that the insertelement+shufflevector was not.
Differential Revision: https://reviews.llvm.org/D76947
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: stoklund, sdesmalen, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77272
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2.
There 2 problems blocked the duplication:
1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts.
2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret.
The solutions:
1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume.
2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it.
Differential Revision: https://reviews.llvm.org/D76539
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not frozen, instsimplify or the later pipeline can potentially exploit undefined behavior.
The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).
Reviewers: jdoerfert, spatel, lebedev.ri, efriedma
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76179
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not freezed, instsimplify or the later pipeline can potentially exploit undefined behavior.
The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).
Reviewers: jdoerfert, spatel, lebedev.ri, efriedma
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76179
Summary:
This is a simple patch that expands https://reviews.llvm.org/D75859 to pointer comparison and fcmp
Checked with Alive2
Reviewers: reames, jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76048
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Summary:
This patch helps CodeGenPrepare move freeze into the icmp when it is used by branch.
It reenables generation of efficient conditional jumps.
This is only done when at least one of icmp's operands is constant to prevent the transformation from increasing # of freeze instructions.
Performance degradation of MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test is resolved with this patch.
Checked with Alive2
Reviewers: reames, fhahn, nlopes
Reviewed By: reames
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75859
As the test case shows if there is an ExtractValueInst in the Ret block, function dupRetToEnableTailCallOpts can't duplicate it into the block containing call. So later no tail call is generated in CodeGen.
This patch adds the ExtractValueInst handling code in function dupRetToEnableTailCallOpts and FoldReturnIntoUncondBranch, and later tail call can be generated for this case.
Differential Revision: https://reviews.llvm.org/D74242
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.
This reverts the revert commit
c7fc0e5da6.
This changes the SimplifyLibCalls utility to accept an IRBuilderBase,
which allows us to pass through the IRBuilder used by InstCombine.
This will ensure that new instructions get added to the worklist.
The annotated test-case drops from 4 to 2 InstCombine iterations thanks
to this.
To achieve this, I'm adding an IRBuilderBase::OperandBundlesGuard,
which is basically the same as the existing InsertPointGuard and
FastMathFlagsGuard, but for operand bundles. Also add a
setDefaultOperandBundles() method so these can be set outside the
constructor.
Differential Revision: https://reviews.llvm.org/D74792
Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.
This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV
This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D74228
On some targets, like SPARC, forming overflow ops is only profitable if
the math result is used: https://godbolt.org/z/DxSmdB
This patch adds a new MathUsed parameter to allow the targets
to make the decision and defaults to only allowing it
if the math result is used. That is the conservative choice.
This patch also updates AArch64ISelLowering, X86ISelLowering,
ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow
ops if the math result is not used. On those targets using the
overflow intrinsic for the overflow check only generates better code.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D74722
Summary:
Right now the alignment of the lower half of a store is computed as
align/2, which fails for unaligned stores (align = 1), and is overly
pessimitic for, e.g. a 8 byte store aligned to 4 bytes.
Fixes PR44851
Fixes PR44877
Reviewers: gchatelet, spatel, lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74311
shouldOptimizeForSize is showing up in a profile, spending around 10%
of the pass time in one function. This should probably not be so slow,
but the much cheaper attribute check should be done first anyway.
The code paths in the absence of TargetMachine, TargetLowering or
TargetRegisterInfo are poorly tested. As rL285987 said, requiring
TargetPassConfig allows us to delete many (untested) checks littered
everywhere.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D73754
Summary:
Without the BFI update, some hot blocks are incorrectly treated as cold code.
This fixes a FDO perf regression in the TSVC benchmark from D71288.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73146
This patch also fixes up a number of cases in DAGCombine and
SelectionDAGBuilder where the size of a scalable vector is used in a
fixed-width context (thus triggering an assertion failure).
Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71215
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:
getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1
This can be used to propagate the value in CodeGenPrepare.
In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.
This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).
Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner
Reviewed by: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68203
Summary:
Guard against a potential crash observed in https://github.com/JuliaLang/julia/issues/32994#issuecomment-524249628
If two branches are collapsed we can encounter a degenerate conditional branch `TBB==FBB`.
The subsequent code assumes that they differ, so we exit out early.
Reviewers: ributzka, spatel
Subscribers: loladiro, dexonsmith, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66657
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
Summary:
Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).
A second try after reverted D71072.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71149
CodeGenPrepare::placeDebugValues moves variable location intrinsics to be
immediately after the Value they refer to. This makes tracking of locations
very easy; but it changes the order in which assignments appear to the
debugger, from the source programs order to the order in which the
optimised program computes values. This then leads to PR43986 and PR38754,
where variable locations that were in a conditional block are made
unconditional, which is highly misleading.
This patch adjusts placeDbgValues to only re-order variable location
intrinsics if they use a Value before it is defined, significantly reducing
the damage that it does. This is still not 100% safe, but the rest of
CodeGenPrepare needs polishing to correctly update debug info when
optimisations are performed to fully fix this.
This will probably break downstream debuginfo tests -- if the
instruction-stream position of variable location changes isn't the focus of
the test, an easy fix should be to manually apply placeDbgValues' behaviour
to the failing tests, moving dbg.value intrinsics next to SSA variable
definitions thus:
%foo = inst1
%bar = ...
%baz = ...
void call @llvm.dbg.value(metadata i32 %foo, ...
to
%foo = inst1
void call @llvm.dbg.value(metadata i32 %foo, ...
%bar = ...
%baz = ...
This should return your test to exercising whatever it was testing before.
Differential Revision: https://reviews.llvm.org/D58453
Summary:
Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71072
One of CodeGenPrepare's optimizations is to duplicate address calculations
into basic blocks, so that as much information as possible can be folded
into memory addressing operands. This is great -- but the dbg.value
variable location intrinsics are not updated in the same way. This can lead
to dbg.values referring to address computations in other blocks that will
never be encoded into the DAG, while duplicate address computations are
performed locally that could be used by the dbg.value. Some of these (such
as non-constant-offset GEPs) can't be salvaged past.
Fix this by, whenever we duplicate an address computation into a block,
looking for dbg.value users of the original memory address in the same
block, and redirecting those to the local computation.
Differential Revision: https://reviews.llvm.org/D58403
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
For example:
long long test(long long a, long long b) {
if (a << b > 0)
return b;
if (a << b < 0)
return a;
return a*b;
}
Produces:
sld. 5, 3, 4
ble 0, .LBB0_2
mr 3, 4
blr
.LBB0_2: # %if.end
cmpldi 5, 0
li 5, 1
isel 4, 4, 5, 2
mulld 3, 4, 3
blr
But the compare (cmpldi 5, 0) is redundant and can be removed (CR0 already
contains the result of that comparison).
The root cause of this is that LLVM converts signed comparisons into equality
comparison based on dominance. Equality comparisons are unsigned by default, so
we get either a record-form or cmp (without the l for logical) feeding a cmpl.
That is the situation we want to avoid here.
Differential Revision: https://reviews.llvm.org/D60506
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374784
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374743
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 374085
Summary:
An erroneously negated if-statement by an earlier (March 2019) bugfix left phi replacement/simplification under optimizeMemoryInst() in CodeGenPrepare largely inactivated. The error was found when csmith found that the same assert as in the original bug report could still be triggered in a different way. This patch fixes the bugfix. The original bug was:
https://bugs.llvm.org/show_bug.cgi?id=41052
... and the previous fix was D59358.
Reviewers: aprantl, skatkov
Reviewed By: skatkov
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67838
llvm-svn: 373084
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917
As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086
The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535
But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.
The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.
Differential Revision: https://reviews.llvm.org/D67564
llvm-svn: 372866
In MVE, as of rL371218, we are attempting to sink chains of instructions such as:
%l1 = insertelement <8 x i8> undef, i8 %l0, i32 0
%broadcast.splat26 = shufflevector <8 x i8> %l1, <8 x i8> undef, <8 x i32> zeroinitializer
In certain situations though, we can end up breaking the dominance relations of
instructions. This happens when we sink the instruction into a loop, but cannot
remove the originals. The Use is updated, which might in fact be a Use from the
second instruction to the first.
This attempts to fix that by reversing the order of instruction that are sunk,
and ensuring that we update the uses on new instructions if they have already
been sunk, not the old ones.
Differential Revision: https://reviews.llvm.org/D67366
llvm-svn: 371743
Up to now, we've decided whether to sink address calculations using GEPs or
normal arithmetic based on the useAA hook, but there are other reasons GEPs
might be preferred. So this patch splits the two questions, with a default
implementation falling back to useAA.
llvm-svn: 371721
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
I don't think anything in this loop modifies the control flow and we don't restart any iteration after setting the flag.
This code was added in http://reviews.llvm.org/D16893 but looking at the test case added there the code that caused the dominator tree to change was merging blocks with their predecessor not the bitreverse optimization.
Differential Revision: https://reviews.llvm.org/D66366
llvm-svn: 369283
If OptimizeExtractBits() encountered a shift instruction with no operands at all,
it would erase the instruction, but still return false.
This previously didn’t matter because its caller would always return after
processing the instruction, but https://reviews.llvm.org/D63233 changed the
function’s caller to fall through if it returned false, which would then cause
a use-after-free detectable by ASAN.
This change makes OptimizeExtractBits return true if it removes a shift
instruction with no users, terminating processing of the instruction.
Patch by: @brentdax (Brent Royal-Gordon)
Differential Revision: https://reviews.llvm.org/D66330
llvm-svn: 369168
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Summary:
The old code can be simplified to define the element type of TailCalls as `BasicBlock` not `CallInst`. Also I use the for-range loop instead the for loop.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D64905
llvm-svn: 367644
Some GEPs were not being split, presumably because that split would just be
undone by the DAGCombiner. Not performing those splits can prevent important
optimizations, such as preventing the element indices / member offsets from
being (partially) folded into load/store instruction immediates. This patch:
- Makes the splits also occur in the cases where the base address and the GEP
are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.
Differential Revision: https://reviews.llvm.org/D60294
llvm-svn: 363544
This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
Proper vector shift instructions don't appear until AVX2, so we may generate several
extra instructions within a loop trying to compensate for that. It's difficult to
recover from that shift expansion later than this, so use the existing TLI hook and
splat analysis to enable better codegen.
This extends CGP functionality introduced with:
rL201655
Differential Revision: https://reviews.llvm.org/D63233
llvm-svn: 363511
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.
Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.
llvm-svn: 362436
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.
llvm-svn: 361613
Summary:
It is a common thing to loop over every `PHINode` in some `BasicBlock`
and change old `BasicBlock` incoming block to a new `BasicBlock` incoming block.
`replaceSuccessorsPhiUsesWith()` already had code to do that,
it just wasn't a function.
So outline it into a new function, and use it.
Reviewers: chandlerc, craig.topper, spatel, danielcdh
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61013
llvm-svn: 359996
Summary:
There is `PHINode::getBasicBlockIndex()`, `PHINode::setIncomingBlock()`
and `PHINode::getNumOperands()`, but no function to replace every
specified `BasicBlock*` predecessor with some other specified `BasicBlock*`.
Clearly, there are a lot of places that could use that functionality.
Reviewers: chandlerc, craig.topper, spatel, danielcdh
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61011
llvm-svn: 359995
This is a subset of the original commit from rL359879
which was reverted because it could crash when using the 'RemovedInstructions'
structure that enables delayed deletion of dead instructions. The motivating
compile-time win does not require that change though. We should get most of
that win from this change alone.
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.
See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html
Differential Revision: https://reviews.llvm.org/D61075
llvm-svn: 359969
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.
Also, we were restarting the iterator loops when doing the overflow intrinsic
transforms by marking the dominator tree for update. That was done to prevent
iterating over a removed instruction. But we can postpone the deletion using
the existing "RemovedInsts" structure, and that means we don't need to update
the DT.
See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html
Differential Revision: https://reviews.llvm.org/D61075
llvm-svn: 359879
The simple case of:
```
int *callee();
void *caller(void *a) {
if (a == NULL)
return callee();
return a;
}
```
would generate a regular call instead of a tail call because we don't
look through the bitcast of the call to `callee` when duplicating the
return blocks.
Differential Revision: https://reviews.llvm.org/D60837
llvm-svn: 359041
Summary:
A recent fix (r355751) caused a compile time regression because setting
the ModifiedDT flag in optimizeSelectInst means that each time a select
instruction is optimized the function walk in runOnFunction stops and
restarts again (which was needed to build a new DT before we started
building it lazily in r356937). Now that the DT is built lazily, a
simple fix is to just reset the DT at this point, rather than restarting
the whole function walk.
In the future other places that set ModifiedDT may want to switch to
just resetting the DT directly. But that will require an evaluation to
ensure that they don't otherwise need to restart the function walk.
Reviewers: spatel
Subscribers: jdoerfert, llvm-commits, xur
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59889
llvm-svn: 357111
Summary:
In r355512 CGP was changed to build the DominatorTree only once per
function traversal, to avoid repeatedly building it each time it was
accessed. This solved one compile time issue but introduced another. In
the second case, we now were building the DT unnecessarily many times
when we performed many function traversals (i.e. more than once per
function when running CGP because of changes made each time).
Change to saving the DT in the CodeGenPrepare object, and building it
lazily when needed. It is reset whenever we need to rebuild it.
The case that exposed the issue there are 617 functions, and we walk
them (i.e. execute the "while (MadeChange)" loop in runOnFunction) a
total of 12083 times (so previously we were building the DT 12083
times). With this patch we only build the DT 844 times (average of 1.37
times per function). We dropped the total time to compile this file from
538.11s without this patch to 339.63s with it.
There is still an issue as CGP is taking much longer than all other
passes even with this patch, and before a recent compiler release cut at
r355392 the total time to this compile was only 97 sec with a huge
reduction in CGP time. I suspect that one of the other recent changes to
CGP led to iterating each function many more times on average, but I
need to do some more investigation.
Reviewers: spatel
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59696
llvm-svn: 356937
This is probably a bigger limitation than necessary, but since we don't have any evidence yet
that this transform led to real-world perf improvements rather than regressions, I'm making a
quick, blunt fix.
In the motivating x86 example from:
https://bugs.llvm.org/show_bug.cgi?id=41129
...and shown in the regression test, we want to avoid an extra instruction in the dominating
block because that could be costly.
The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version
is any better than the other yet.
Differential Revision: https://reviews.llvm.org/D59602
llvm-svn: 356665
This should be extended, but CGP does some strange things,
so I'm intentionally not changing the potential order of
any transforms yet.
llvm-svn: 356566
Summary:
This is a fix to bug 41052:
https://bugs.llvm.org/show_bug.cgi?id=41052
While trying to optimize a memory instruction in a dead basic block, we end up registering the same phi for replacement twice. This patch avoids registering more than the first replacement candidate for a phi.
Patch by: JesperAntonsson
Reviewers: skatkov, aprantl
Reviewed By: aprantl
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59358
llvm-svn: 356260
This is almost the same as:
rL355345
...and should prevent any potential crashing from examples like:
https://bugs.llvm.org/show_bug.cgi?id=41064
...although the bug was masked by:
rL355823
...and I'm not sure how to repro the problem after that change.
llvm-svn: 356218
Targets can potentially emit more efficient code if they know address
computations never overflow. For example ILP32 code on AArch64 (which only has
64-bit address computation) can ignore the possibility of overflow with this
extra information.
llvm-svn: 355926
Inserting an overflowing arithmetic intrinsic can increase register
pressure by producing two values at a point where only one is needed,
while the second use maybe several blocks away. This increase in
pressure is likely to be more detrimental on performance than
rematerialising one of the original instructions.
So, check that the arithmetic and compare instructions are no further
apart than their immediate successor/predecessor.
Differential Revision: https://reviews.llvm.org/D59024
llvm-svn: 355823
r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be
maintained correctly in optimizations in optimizeBlock() and optimizeInst().
Function optimizeSelectInst() does not update the flag.
This patch propagates the flag in optimizeSelectInst() back to
optimizeBlock().
This patch also removes ModifiedDT in CodeGenPrepare class (which is not used).
The property of ModifiedDT is now recorded in a ref parameter.
Differential Revision: https://reviews.llvm.org/D59139
llvm-svn: 355751
Summary:
In r354298 a DominatorTree construction was added via new function
combineToUSubWithOverflow, which was subsequently restructured into
replaceMathCmpWithIntrinsic in r354689. We are hitting a very long
compile time due to this repeated construction, once per math cmp in
the function.
We shouldn't need to build the DominatorTree more than once per
function, except when a transformation invalidates it. There is already
a boolean flag that is returned from these methods indicating whether
the DT has been modified. We can simply build the DT once per
Function walk in CodeGenPrepare::runOnFunction, since any time a change
is made we break out of the Function walk and restart it.
I modified the code so that both replaceMathCmpWithIntrinsic as well as
mergeSExts (which was also building a DT) use the DT constructed by the
run method.
From -mllvm -time-passes:
Before this patch: CodeGen Prepare user time is 328s
With this patch: CodeGen Prepare user time is 21s
Reviewers: spatel
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58995
llvm-svn: 355512
The test is reduced from an example in the post-commit thread for:
rL354746
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html
While we must avoid dying here, the real question should be:
Why is non-canonical and/or degenerate code making it to CGP when
using the new pass manager?
llvm-svn: 355345
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.
Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.
The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.
This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486https://bugs.llvm.org/show_bug.cgi?id=31754
llvm-svn: 354746
We need to enhance the uaddo matching to handle special-cases
as seen in PR40486 and PR31754. That means we won't necessarily
have a def-use pattern, so we'll need to check dominance to
determine where to place the intrinsic (as we already do for
usubo). This preliminary patch is just rearranging the code,
so the planned follow-up to improve uaddo will be more clear.
llvm-svn: 354689
The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754https://bugs.llvm.org/show_bug.cgi?id=40487
..and those are shown in the IR test file and x86 codegen file.
Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.
This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo
formation by default. Only x86 overrides that setting for now although other targets will likely benefit
by forming usbuo too.
Differential Revision: https://reviews.llvm.org/D57789
llvm-svn: 354298
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.
I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.
The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.
Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.
Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.
This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.
As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.
Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel
Reviewed By: samparker
Differential Revision: https://reviews.llvm.org/D57377
llvm-svn: 353152
This is no-functional-change-intended although there could
be intermediate variations caused by a difference in the
debug info produced by setting that from the builder's
insertion point.
I'm updating the IR test file associated with this code just
to show that the naming differences from using the builder
are visible.
The motivation for adding a helper function is that we are
likely to extend this code to deal with other overflow ops.
llvm-svn: 353056
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
of condition registers. That diff allows PPC to produce
slightly better (dot-instructions should be generally good)
code.
Note: someone that cares about PPC codegen might want to
look closer at that output because it seems like we could
still improve this.
2. We (probably?) should not bother trying to form uaddo (or
other overflow ops) when there's no target support for such
an op. This goes beyond checking whether the op is expanded
because both PPC and AArch64 show better codegen for standard
types regardless of whether the op is legal/custom.
llvm-svn: 353001
This is not truly NFC because we are bailing out without
a TLI now. That should not be a real concern though because
there should be a TLI in any real-world scenario.
That seems better than passing around a pointer and then
checking it for null-ness all over the place.
The motivation is to fix what appears to be an unintended
restriction on the uaddo transform -
hasMultipleConditionRegisters() shouldn't be reason to limit
the transform.
llvm-svn: 352988
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57170
llvm-svn: 352909
This ensures that if we make it to the backend w/o lowering widenable_conditions first, that we generate correct code. Doing it in CGP - instead of isel - let's us fold control flow before hitting block local instruction selection.
Differential Revision: https://reviews.llvm.org/D57473
llvm-svn: 352779
This change reverts r351626.
The changes in r351626 cause quadratic work in several cases. (See r351626 thread on llvm-commits for details.)
llvm-svn: 352722
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.
rdar://32212419
Differential revision: https://reviews.llvm.org/D56761
llvm-svn: 352664
This patch makes sure that a debug value that is after the bitcast in
dupRetToEnableTailCallOpts() is also skipped.
The reduced test case is from SPEC-2006 on SystemZ.
Review: Vedant Kumar, Wolfgang Pieb
https://reviews.llvm.org/D57050
llvm-svn: 352462
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This reverts commit r351618.
Compiler RT + ASAN tests are failing for PowerPC. Not sure
how would I reproduce these on macOS, so reverting (again)
until I do.
llvm-svn: 351619
Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of
the same integer value while sinking address computations, but rather
CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking
that related pointers have nothing to do with each other.
This problem blocks LoadStoreVectorizer from vectorizing some of the
loads / stores in a downstream target.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D56838
llvm-svn: 351582
Creating the IR builder, then modifying the CFG, leads to an IRBuilder
where the BB and insertion point are inconsistent, so new instructions
have the wrong parent.
Modified an existing test because the test wasn't covering anything
useful (the "invoke" was not actually an invoke by the time we hit the
code in question).
Differential Revision: https://reviews.llvm.org/D55729
llvm-svn: 349693
This fixes PR39845. CodeGenPrepare employs a transactional model when
performing optimizations, i.e. it changes the IR to attempt an optimization
and rolls back the change when it finds the change inadequate. It is during
the rollback that references to locals were dropped from debug value
intrinsics. This patch reinstates debuginfo references during rollbacks.
Reviewers: aprantl, vsk
Differential Revision: https://reviews.llvm.org/D55396
llvm-svn: 348896
This is a fix for PR39625 with improvement the compile time
by reducing the number of intermediate Phi nodes created.
Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54932
llvm-svn: 347839
Every Analysis pass has a get method that returns a reference of the Result of
the Analysis, for example, BlockFrequencyInfo
&BlockFrequencyInfoWrapperPass::getBFI(). I believe that
ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning
a pointer.
Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock,
respectively. Most methods use BB as the argument of variable names while
methods usually refer to Basic Blocks as Blocks, instead of BB. For example,
Function::getEntryBlock, Loop:getExitBlock, etc.
I also fixed one of the comments.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D54669
llvm-svn: 347182
Summary:
D44571 changed SimplificationTracker to use SmallSetVector to keep phi nodes. As a result, when the number of phi nodes is large, the build time performance suffers badly. When building for power pc, we have a case where there are more than 600.000 nodes, and it takes too long to compile.
In this change, I partially revert D44571 to use SmallPtrSet, which does an acceptable job with any number of elements. In the original patch, having a deterministic iteration order was mentioned as a motivation, however I think it only applies to the nodes already matched in MatchPhiSet method, which I did not touch.
Reviewers: bjope, skatkov
Reviewed By: bjope, skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54007
llvm-svn: 346710
This adds the llvm-side support for post-inlining evaluation of the
__builtin_constant_p GCC intrinsic.
Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call
to a function where canConstantFoldTo returns true, and one of the
arguments is a struct.
Updated from patch initially by Janusz Sobczak.
Differential Revision: https://reviews.llvm.org/D4276
llvm-svn: 346322
Clearing LargeOffsetGEPMap at the end fixes a bug where if a large
offset GEP is in a dead basic block, we fail an assertion when trying
to delete the block due to the asserting VH in LargeOffsetGEPMap.
Differential Revision: https://reviews.llvm.org/D53464
llvm-svn: 345082
Summary:
Large GEP splitting, introduced in rL332015, uses a `DenseMap<AssertingVH<Value>, ...>`. This causes an assertion to fail (in debug builds) or undefined behaviour to occur (in release builds) when a value is RAUWed.
This manifested itself in the 7zip benchmark from the llvm test suite built on ARM with `-fstrict-vtable-pointers` enabled while RAUWing invariant group launders and splits in CodeGenPrepare.
This patch merges the large offsets of the argument and the result of an invariant.group strip/launder intrinsic before RAUWing.
Reviewers: Prazek, javed.absar, haicheng, efriedma
Reviewed By: Prazek, efriedma
Subscribers: kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51936
llvm-svn: 344802
Adding NonNull as attributes to returned pointers has the unfortunate side
effect of disabling tail calls. This patch ignores the NonNull attribute when
we decide whether to tail merge, in the same way that we ignore the NoAlias
attribute, as it has no affect on the call sequence.
Differential Revision: https://reviews.llvm.org/D52238
llvm-svn: 343091
CodeGenPrepare has a transform that sinks {lshr, trunc} pairs to make it
easier for the backend to emit fancy extract-bits instructions (e.g UBFX).
Teach it to preserve debug locations and salvage debug values.
llvm-svn: 342319
The output of splitLargeGEPOffsets does not appear to be deterministic because
of the way that we iterate over a DenseMap. I've changed it to a MapVector for
consistent output.
The test here isn't particularly great, only showing a consmetic difference in
output. The original reproducer is much larger but show a diffierence in
instruction ordering, leading to different codegen.
Differential Revision: https://reviews.llvm.org/D51851
llvm-svn: 342043
This causes crashes due to the interleaved dbg.value intrinsics being
left at the end of basic blocks, causing the actual terminators (br,
etc) to be not where they should be (not at the end of the block),
leading to later crashes.
Further discussion on the original commit thread.
This reverts commit r340368.
llvm-svn: 340794
CodeGenPrepare has a strategy for moving dbg.values so that a value's
definition always dominates its debug users. This cleanup was happening
too early (before certain CGP transforms were run), resulting in some
dbg.value use-before-def errors.
Perform this cleanup as late as possible to avoid use-before-def.
llvm-svn: 340370
In optimizeSelectInst, when scanning for candidate selects to rewrite
into branches, scan past debug intrinsics. This makes the debug-enabled
and non-debug paths through optimizeSelectInst more congruent.
NFC because every select is eventually visited either way.
llvm-svn: 340368
This patch fixes PR38125.
Instruction extension types are recorded in PromotedInsts, it can be used later in function canGetThrough. If an instruction has two users with different extension types, it will be inserted into PromotedInsts two times in function promoteOperandForOther. The second one overwrites the first one, and the final extension type is wrong, later causes problem in canGetThrough.
This patch changes the simple bool extension type to 2-bit enum type, add a BothExtension type in addition to zero/sign extension. When an user sees BothExtension for an instruction, it actually knows nothing about how that instruction is extended.
Differential Revision: https://reviews.llvm.org/D49512
llvm-svn: 339822
This fixes an inconsistency in code generation when compiling with or
without debug information (-g). When debug information is available in
an empty block, the original test would fail, resulting in possibly
different code.
Patch by: Jeroen Dobbelaere
Differential revision: https://reviews.llvm.org/D49467
llvm-svn: 339129
It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.
I used Python's re.sub with this regex to update users:
r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'
llvm-svn: 336462
Summary:
Replace use of a SmallPtrSet with a SmallSetVector to make the worklist
iteration order deterministic. This is done as the order the blocks are
removed may affect whether or not PHI nodes in successor blocks are
removed.
For example, consider the following case where %bb1 and %bb2 are
removed:
bb1:
br i1 undef, label %bb3, label %bb4
bb2:
br i1 undef, label %bb4, label %bb3
bb3:
pv1 = phi type [ undef, %bb1 ], [ undef, %bb2], [ v0, %other ]
br label %bb4
bb4:
pv2 = phi type [ undef, %bb1 ], [ undef, %bb2 ],
[ pv1, %bb3 ], [ v0, %other ]
If %bb2 is removed before %bb1, the incoming values from %bb1 and %bb2
to pv1 will be removed before %bb1 is removed as a predecessor to %bb4.
The pv1 node will thus be optimized out (to v0) at the time %bb1 is
removed as a predecessor to %bb4, leaving the blocks as following when
the incoming value from %bb1 has been removed:
bb3: ; pv1 optimized out, incoming value to pv2 is v0
br label %bb4
bb4:
pv2 = phi type [ v0, %bb3 ], [ v0, %other ]
The pv2 PHI node will be optimized away by removePredecessor() as all
incoming values are identical.
In case %bb2 is removed after %bb1, pv1 will not be optimized out at the
time %bb2 is removed as a predecessor to %bb4, leaving the blocks as
following when the incoming value from %bb2 to pv2 has been removed:
bb3:
pv1 = phi type [ undef, %bb2 ], [ v0, %other ]
br label %bb4
bb4:
pv2 = phi type [ pv1, %bb3 ], [ v0, %other ]
The pv2 PHI node will thus not be removed in this case, ultimately
leading to the following output
bb3: ; pv1 optimized out, incoming value to pv2 is v0
br label %bb4
bb4:
pv2 = phi type [ v0, %bb3 ], [ v0, %other ]
I have not looked into changing DeleteDeadBlock() so that the redundant
PHI nodes are removed.
I have not added a test case, as I was not able to create a particularly
small and (not messy) reproducer. This is likely due to SmallPtrSet
behaving deterministically when in small mode.
Reviewers: void, dexonsmith, spatel, skatkov, fhahn, bkramer, nhaehnle
Reviewed By: fhahn
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D48369
llvm-svn: 336109
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2
Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar
Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47103
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor
Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)
2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:
1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.
2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
- Since Pred is not deleted (BB is), the entry block does not need updating.
- The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
- isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
- eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
- adding some test owner as subscribers for the interesting tests modified:
test/CodeGen/X86/avx-cmp.ll
test/CodeGen/AMDGPU/nested-loop-conditions.ll
test/CodeGen/AMDGPU/si-annotate-cf.ll
test/CodeGen/X86/hoist-spill.ll
test/CodeGen/X86/2006-11-17-IllegalMove.ll
3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.
Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar
Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D48202
llvm-svn: 335183
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.
Differential Revision: https://reviews.llvm.org/D45537
This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem.
llvm-svn: 334049
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)
llvm-svn: 333954
When a GEP with all zero indices is converted to bitcast, its DI wasn't
copied over to the newly created instruction. This patch fixes that bug.
Patch by Kareem Ergawy!
Differential Revision: https://reviews.llvm.org/D47347
llvm-svn: 333235
When a bitcast is being sunk in -codegenprepare pass, its DI wasn't
copied over to the newly created instruction. This patch fixes that
bug.
Patch by Kareem Ergawy!
Differential Revision: https://reviews.llvm.org/D47282
llvm-svn: 333133
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
This commit adds a wrapper for std::distance() which works with ranges.
As it would be a common case to write `distance(predecessors(BB))`, this
also introduces `pred_size()` and `succ_size()` helpers to make that
easier to write.
Differential Revision: https://reviews.llvm.org/D46668
llvm-svn: 332057
Accessing the members of a large data structures needs a lot of GEPs which
usually have large offsets due to the size of the underlying data structure. If
the offsets are too large to fit into the r+i addressing mode, these GEPs cannot
be sunk to their users' blocks and many extra registers are needed then to carry
the values of these GEPs.
This patch tries to split a large data struct starting from %base like the
following.
Before:
BB0:
%base =
BB1:
%gep0 = gep %base, off0
%gep1 = gep %base, off1
%gep2 = gep %base, off2
BB2:
%load1 = load %gep0
%load2 = load %gep1
%load3 = load %gep2
After:
BB0:
%base =
%new_base = gep %base, off0
BB1:
%new_gep0 = %new_base
%new_gep1 = gep %new_base, off1 - off0
%new_gep2 = gep %new_base, off2 - off0
BB2:
%load1 = load i32, i32* %new_gep0
%load2 = load i32, i32* %new_gep1
%load3 = load i32, i32* %new_gep2
In the above example, the struct is split into two parts. The first part still
starts from %base and the second part starts from %new_base. After the
splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to
BB2 and folded into their users.
The algorithm to split data structure is simple and very similar to the work of
merging SExts. First, it collects GEPs that have large offsets when iterating
the blocks. Second, it splits the underlying data structures and updates the
collected GEPs to use smaller offsets.
Differential Revision: https://reviews.llvm.org/D42759
llvm-svn: 332015
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.
Differential Revision: https://reviews.llvm.org/D45537
llvm-svn: 331783
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
See r331124 for how I made a list of files missing the include.
I then ran this Python script:
for f in open('filelist.txt'):
f = f.strip()
fl = open(f).readlines()
found = False
for i in xrange(len(fl)):
p = '#include "llvm/'
if not fl[i].startswith(p):
continue
if fl[i][len(p):] > 'Config':
fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
found = True
break
if not found:
print 'not found', f
else:
open(f, 'w').write(''.join(fl))
and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.
No intended behavior change.
llvm-svn: 331184
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.
The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.
Differential Revision: https://reviews.llvm.org/D45017
llvm-svn: 328806
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)
llvm-svn: 328395
Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.
Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.
Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.
llvm-svn: 328165
Summary:
Made PHI node simplifiations more robust in several ways:
- Minor refactoring to let the SimplificationTracker own the
sets with new PHI/Select nodes that are introduced. This is
maybe not mapping to the original intention with the
SimplificationTracker, but IMHO it encapsulates the logic behind
those sets a little bit better.
- MatchPhiNode can sometimes populate the Matched set with
several entries, where it maps one PHI node to different candidates
for replacement. The Matched set is changed into a SmallSetVector
to make sure we get a deterministic iteration when doing
the replacements.
- As described above we may get several different replacements
for a single PHI node. The loop in MatchPhiSet that is doing
the replacements could end up calling eraseFromParent several
times for the same PHI node, resulting in segmentation faults.
This problem was supposed to be fixed in rL327250, but due to
the non-determinism(?) it only appeared to be fixed (I still
got crashes sometime when turning on/off -print-after-all etc
to get different iteration order in the DenseSets).
With this patch we follow the deterministic ordering in the
Matched set when replacing the PHI nodes. If we find a new
replacement for an already replaced PHI node we replace the
new replacement by the old replacement instead. This is quite
similar to what happened in the rl327250 patch, but here we
also recursively verify that the old replacement hasn't been
replaced already.
- It was really hard to track down the fault described above
(segementation fault due to doing eraseFromParent multiple
times for the same instruction). The fault was intermittent and
small changes in the code, or simply turning on -print-after-all
etc could make the problem go away. This was basically due to
the iteration over PhiNodesToMatch in MatchPhiSet no being
deterministic. Therefore I've changed the data structure for
the SimplificationTracker::AllPhiNodes into an SmallSetVector.
This gives a deterministic behavior.
Reviewers: skatkov, john.brawn
Reviewed By: skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44571
llvm-svn: 327961
splitMergedValStore will split a store into two if target prefers this, or if
-force-split-store is passed.
This patch adds the missing handling for endianness in this function along
with a test case.
Review: Eli Friedman
https://reviews.llvm.org/D44396
llvm-svn: 327375
When we replace the Phi we created with matched ones it is possible that
there are two identical phi nodes in IR. And matcher is smart enough to find that
new created phi matches both of them. So we try to replace our phi node with
matched ones twice and what is bad we delete our phi node twice causing a crash.
As soon as we found that we have two identical Phi nodes it makes sense to do
a clean-up and replace one phi node by other one.
The patch implements it.
Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43758
llvm-svn: 327250
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.
Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html
I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.
Differential Revision: https://reviews.llvm.org/D42123
llvm-svn: 325102
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
CodeGenPrepare pass to be more aggressive in improving the source and destination alignments
of memcpy/memmove/memset by exploiting our new ability to record independent alignments
for each argument.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 323891
If in complex addressing mode the difference is in GV then
base reg should not be installed because we plan to use
base reg as a merge point of different GVs.
This is a fix for PR35980.
Reviewers: reames, john.brawn, santosh
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42230
llvm-svn: 323192
Summary:
In preparation for https://reviews.llvm.org/D41675 this NFC changes this
prototype of MemIntrinsicInst::setAlignment() to accept an unsigned instead
of a Constant.
llvm-svn: 322403
If the offset is differ in two addressing mode we can continue only if
ScaleReg is not set due to we will use it as merge of different offsets.
It should fix PR35799 and PR35805.
Reviewers: john.brawn, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41227
llvm-svn: 322056
Summary:
The function section prefix for PGO based layout (e.g. hot/unlikely)
should look at the hotness of all blocks not just the entry BB.
A function with a cold entry but a very hot loop should be placed in the
hot section, for example, so that it is located close to other hot
functions it may call. For SamplePGO it was already looking at the
branch weights on calls, and I made that code conditional on whether
this is SamplePGO since it was essentially a noop for instrumentation
PGO anyway.
Reviewers: davidxl
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D41395
llvm-svn: 321197
When we put the value in select placeholder we must pass
the value through simplification tracker due to the value might
be already simplified and erased.
This is a fix for PR35658.
Reviewers: john.brawn, uabelho
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41251
llvm-svn: 320956
Summary:
Move splitIndirectCriticalEdges() from CodeGenPrepare to BasicBlockUtils.h so
that it can be called from other places.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40750
llvm-svn: 319689