I have no idea what's going on here. This code was moved
around/introduced in change cb26b01d57 and starts crashing with a NULL
dereference once I apply https://reviews.llvm.org/D123090. I assume that
I've unwittingly taught the attributor enough that it's able to do more
clever things than in the past, and it's able to trip on this case. I
make no claims about the correctness of this patch, but it passes tests
and seems to fix all the crashes I've been seeing.
Differential Revision: https://reviews.llvm.org/D129589
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
only as an afterthought. `genericValueTraversal` did offer an option
but `AAValueSimplify` did not. Thus, we might end up with "too much"
simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
problems like the infinite recursion bug (#54981) as well as code
duplication.
This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.
Fixes: https://github.com/llvm/llvm-project/issues/54981
Note: A previous version was flawed and consequently reverted in
6555558a80.
We recently learned to place the alloca during the heap2stack
transformation in the entry block but we did not account for other
concurrent modifications. We need to record our decision rather than
checking (then outdated) passes during the manifest stage. This will
also allow us to use a custom (=optimistic) "loop info" in the future.
As integer div/rem constant expressions are no longer supported,
constants can no longer trap and are always safe to speculate.
Remove the Constant::canTrap() method and its usages.
If we are certainly not in a loop we can directly emit the heap2stack
allocas in the function entry block. This will help to get rid of them
(SROA) and avoid stacksave/restore intrinsics when the function is
inlined.
AARGetter is an abstraction over a source of the `AAResults` introduced
to support the legacy pass manager as well as the modern one. Since the
Argument Promotion pass doesn't support the legacy pass manager anymore,
the abstraction is not required and `AAResults` may be used directly.
The instance of the `FunctionAnalysisManager` is passed through the
functions to get all the required analyses just wherever they are
required and do not use the awkward getter callbacks.
The `ReplaceCallSite` parameter was required for the legacy pass manager
only and isn't used anymore, so the parameter has been eliminated.
Differential Revision: https://reviews.llvm.org/D128727
The `isDenselyPacked` static member of the `ArgumentPromotionPass` class
is not used in the class itself anymore. The single known user of the
function is in the `AttributorAttributes.cpp` file, so the function has
been moved into the file.
Differential Revision: https://reviews.llvm.org/D128725
It makes sense to handle byval promotion in the same way as non-byval
but also allowing `store` instructions. However, these should
use the same checks as the `load` instructions do, i.e. be part of the
`ArgsToPromote` collection. For these instructions, the check for
interfering modifications can be disabled, though. The promotion
algorithm itself has been modified a lot: all the accesses (i.e. loads
and stores) are rewritten to the emitted `alloca` instructions. To
optimize these new `alloca`s out, the `PromoteMemToReg` function from
`Transforms/Utils/PromoteMemoryToRegister.cpp` file is invoked after
promotion.
In order to let the `PromoteMemToReg` promote as many `alloca`s as it
is possible, there should be no `GEP`s from the `alloca`s. To
eliminate the `GEP`s, its own `alloca` is generated for every argument
part because a single `alloca` for the whole argument (that
significantly simplifies the code of the pass though) unfortunately
cannot be used.
The idea comes from the following discussion:
https://reviews.llvm.org/D124514#3479676
Differential Revision: https://reviews.llvm.org/D125485
This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.
One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.
Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.
Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D127031
Summary:
Currently in OpenMPOpt we strip `noinline` attributes from runtime
functions. This is here because the device bitcode library that we link
has problems with needed definitions getting prematurely optimized out.
This is only necessary for OpenMP offloading to GPUs so we should narrow
the scope for where we spend time doing this. In the future this
shouldn't be necessary as we move to using a linked library rather than
pulling in a bitcode library in Clang.
Support for the legacy pass manager in ArgPromotion causes
complications in D125485. As the legacy pass manager for middle-end
optimizations is unsupported, drop ArgPromotion from the legacy
pipeline, rather than introducing additional complexity to deal
with it.
Differential Revision: https://reviews.llvm.org/D128536
Drop the requirement that getInitialValueOfAllocation() must be
passed an allocator function, shifting the responsibility for
checking that into the function (which it does anyway). The
motivation is to avoid some calls to isAllocationFn(), which has
somewhat ill-defined semantics (given the number of
allocator-related attributes we have floating around...)
(For this function, all we eventually need is an allockind of
zeroed or uninitialized.)
Differential Revision: https://reviews.llvm.org/D127274
The code has been reformatted in accordance with the code style. Some
function comments were extended to the Doxygen ones and reworded a bit
to eliminate the duplication of the function's/class' name in the
comment.
Differential Revision: https://reviews.llvm.org/D128168
This avoid creating empty bins in AAPointerInfo which can lead to
segfaults. Also ensure we do not try to translate from callee to caller
except if we really take the argument state and move it to the call site
argument state.
Fixes: https://github.com/llvm/llvm-project/issues/55726
When determining liveness via Attributor::isAssumedDead(...) we might
end up without a liveness AA or with one pointing into another function.
Neither is helpful and we will avoid both from now on.
Reapplied after fixing the ASAN error which caused the revert:
db68a25ca9
`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code more opaque.
Differential Revision: https://reviews.llvm.org/D128121
We wanted to check if all uses of the function are direct calls, but the
code didn't account for passing the function as a parameter.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D128104
llvm.used and llvm.compiler.used are often used with inline assembly
that refers to a specific symbol so that the symbol is kept through to
the linker even though there are no references to it from LLVM IR.
This fixes the MergeFunctions pass to preserve references to these
symbols in llvm.used/llvm.compiler.used so they are not deleted from the
IR. This doesn't prevent these functions from being merged, but
guarantees that an alias or thunk with the expected symbol name is kept
in the IR.
Differential Revision: https://reviews.llvm.org/D127751
Previously if the inliner split an SCC such that an empty one remained, the MLInlineAdvisor could potentially lose track of the EdgeCount if a subsequent CGSCC pass modified the calls of a function that was initially in the SCC pre-split. Saving the seen nodes in onPassEntry resolves this.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127693
Adding the `DW_CC_nocall` calling convention to the function debug metadata is needed when either the return values or the arguments of a function are removed as this helps in informing debugger that it may not be safe to call this function or try to interpret the return value.
This translates to setting `DW_AT_calling_convention` with `DW_CC_nocall` for appropriate DWARF DIEs.
The DWARF5 spec (section 3.3.1.1 Calling Convention Information) says:
If the `DW_AT_calling_convention` attribute is not present, or its value is the constant `DW_CC_normal`, then the subroutine may be safely called by obeying the `standard` calling conventions of the target architecture. If the value of the calling convention attribute is the constant `DW_CC_nocall`, the subroutine does not obey standard calling conventions, and it may not be safe for the debugger to call this subroutine.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D127134
Adds option to print the contents of the Inline Advisor after each SCC Inliner pass
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127689
This patch improves the fix in D110529 to prevent from crashing on value
with byval attribute that is not added in SCCP solver.
Authored-by: sinan.lin@linux.alibaba.com
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D126355
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.
On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.
I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.
Differential Revision: https://reviews.llvm.org/D127131
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
only as an afterthought. `genericValueTraversal` did offer an option
but `AAValueSimplify` did not. Thus, we might end up with "too much"
simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
problems like the infinite recursion bug (#54981) as well as code
duplication.
This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good.
Fixes: https://github.com/llvm/llvm-project/issues/54981
When determining liveness via Attributor::isAssumedDead(...) we might
end up without a liveness AA or with one pointing into another function.
Neither is helpful and we will avoid both from now on.
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
We can use constant to allow undef and there is no need to force
integers in the API anyway. The user can decide if a non integer
constant is fine or not.
We need to be careful replacing values as call site arguments
(IRPosition::IRP_CALL_SITE_ARGUMENT) is representing a use and not a
value. This patch replaces the interface to take a IR position instead
making it harder to misuse accidentally. It does not change our tests
right now but a follow up exposed the potential footgun.
We used to be very conservative when integer states were merged.
Instead of adding the known range (which is large due to uncertainty)
into the assumed range (which is hopefully small), we can also only
allow to merge in both at the same time into their respective
counterpart. This will ensure we keep the invariant that assumed is part
of known.
When we recreate instructions as part of simplification we need to take
care of debug metadata and replacing the value multiple times. For now,
we handle both conservatively.
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
This patch introduces the abstract base class InlinePriority to serve as
the comparison function for the priority queue. A derived class, such
as SizePriority, may choose to cache the priorities for different
functions for performance reasons.
This design shields the type used for the priority away from classes
outside InlinePriority and classes derived from it. In turn,
PriorityInlineOrder no longer needs to be a template class.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D126300
This patch introduces the abstract base class InlinePriority to serve as
the comparison function for the priority queue. A derived class, such
as SizePriority, may choose to cache the priorities for different
functions for performance reasons.
This design shields the type used for the priority away from classes
outside InlinePriority and classes derived from it. In turn,
PriorityInlineOrder no longer needs to be a template class.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D126300
There are a few places where we use report_fatal_error when the input is broken.
Currently, this function always crashes LLVM with an abort signal, which
then triggers the backtrace printing code.
I think this is excessive, as wrong input shouldn't give a link to
LLVM's github issue URL and tell users to file a bug report.
We shouldn't print a stack trace either.
This patch changes report_fatal_error so it uses exit() rather than
abort() when its argument GenCrashDiag=false.
Reviewed by: nikic, MaskRay, RKSimon
Differential Revision: https://reviews.llvm.org/D126550
This patch adds !nosanitize metadata to FixedMetadataKinds.def, !nosanitize indicates that LLVM should not insert any sanitizer instrumentation.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D126294
All callers pass true.
select-unfold-freeze.ll is now a subset of select.ll so delete it.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D126501
It doesn't matter which value we use for dead args, so let's switch
to poison, so we can eventually kill undef.
Reviewed By: aeubanks, fhahn
Differential Revision: https://reviews.llvm.org/D125983
The re-apply includes fixes to clang tests that were missed in
the original commit.
Original message:
Prior to this patch we would only set to undef the unused arguments of the
external functions. The rationale was that unused arguments of internal
functions wouldn't need to be turned into undef arguments because they
should have been simply eliminated by the time we reach that code.
This is actually not true because there are plenty of cases where we can't
remove unused arguments. For instance, if the internal function is used in
an indirect call, it may not be possible to change the function signature.
Yet, for statically known call-sites we would still like to mark the unused
arguments as undef.
This patch enables the "set undef arguments" optimization on internal
functions when we encounter cases where internal functions cannot be
optimized. I.e., whenever an internal function is marked "live".
Differential Revision: https://reviews.llvm.org/D124699
It makes sense to make a non-byval promotion attempt first and then
fall back to the byval one. The non-byval ('usual') promotion is
generally better, for example it does promotion even when a structure
has more elements than 'MaxElements' but not all of them are actually
used in the function.
Differential Revision: https://reviews.llvm.org/D124514