The previous version of the patch would incorrect convert an
existing argmemonly attribute into an inaccessiblemem_or_argmemonly
attribute.
-----
This updates checkFunctionMemoryAccess() to infer a precise
FunctionModRefBehavior, rather than an approximation split into
read/write and argmemonly.
Afterwards, we still map this back to imprecise function attributes.
This still allows us to infer some cases that we previously did not
handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly.
In practice, this means we get better memory attributes in the
presence of intrinsics like @llvm.assume.
Differential Revision: https://reviews.llvm.org/D134527
A User like the PHINode may be visited multiple times for the same pointer along
different def-use edges. The uninitialized state of OffsetInfo at the first
visit needs to be distinct from the Unknown value that may be assigned after
processing the PHINode. Without that, a PHINode with all inputs Unknown is never
followed to its uses. This results in incorrect optimization because some
interfering accessess are missed.
Differential Revision: https://reviews.llvm.org/D134704
This updates checkFunctionMemoryAccess() to infer a precise
FunctionModRefBehavior, rather than an approximation split into
read/write and argmemonly.
Afterwards, we still map this back to imprecise function attributes.
This still allows us to infer some cases that we previously did not
handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly.
In practice, this means we get better memory attributes in the
presence of intrinsics like @llvm.assume.
Differential Revision: https://reviews.llvm.org/D134527
Second patch in the series to remove legacy PM and
associated -enable-new-pm=0 flag targets pass that
has not been ported to new PM - PruneEH.
Discussion about this can be found in D44415.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134686
MemoryLocation::getOrNone() already has the necessary logic to
handle different instruction types. Use it, rather than repeating
a subset of the logic. This adds support for previously unhandled
instructions like atomicrmw.
With the recent addition of new parameter MergeAttributes (D134117),
callers need to specify several default parameters before getting to
specify the new parameter.
This patch reorders the parameters so that callers do not have to
specify as many default parameters.
Differential Revision: https://reviews.llvm.org/D134125
InlineOrder::front is a remnant from the era when we had a nested
"while" loops in the module inliner, with the inner one grouping the
call sites with the same caller.
Now that we have a simple "while" loop draining the priority queue, we
can just use InlineOrder::pop.
Differential Revision: https://reviews.llvm.org/D134121
In the past, we've had a bug resulting in a compiler crash after
forgetting to merge function attributes (D105729).
This patch teaches InlineFunction to merge function attributes. This
way, we minimize the "time" when the IR is valid, but the function
attributes are not.
Differential Revision: https://reviews.llvm.org/D134117
UseInlinePriority specifies the priority function. This patch
simplifies the code by moving UseInlinePriority closer to the actual
consumer -- the switch statement inside getInlineOrder.
Differential Revision: https://reviews.llvm.org/D134100
DefaultInlineOrder was largely an exercise in generalizing the
traversal order of call sites within the inliner.
Now that the module inliner is starting to form its shape, there is no
point in sharing DefaultInlineOrder between the module inliner and the
CGSCC inliner. DefaultInlineOrder and all the other inline orders are
mutually exclusive in the following sense:
- The use of DefaultInlineOrder doesn't make sense in the module
inliner because there is no priority inherent in the order in which
call sites are added to the list of call sites -- SmallVector.
- The use of any other inline order doesn't make sense in the CGSCC
inliner because little prioritization can be done within one CGSCC.
This patch essentially reverts the addition of DefaultInlineOrder so
that the loop structure of Inliner.cpp looks like the state just
before we started working on the module inliner (circa June 2021).
At the same time, ww remove the choice of DefaultInlineOrder from
UseInlinePriority.
Differential Revision: https://reviews.llvm.org/D134080
This fixes https://github.com/llvm/llvm-project/issues/57616.
Type test lowering in ThinLTO modules relies on having type id
summaries set up for the referenced types, which provide the type
test resolution. If there is no summary, the type tests are lowered
to false. At the very least, a default type id summary gives the
type tests a resolution of Unknown, which is handled correctly (ignored
by the first invocation of LTT, and lowered to true by the second).
WPD sets up the type id summaries (with a default type test resolution)
as it is processing the type tests, but only does this for the patterns
handled by WPD, which is a type test directly feeding an assume. In the
case of type tests feeding an assume via a phi, the type id summary was
not being set up, leading to the type tests being lowered to false
incorrectly.
Fix this by adding the default type id summary entries for all type ids
used on globals during index-only WPD.
This is not an issue for hybrid (split-lto-unit) LTO, as in that case
the type test resolution is determined and set up during LTT, since the
type definitions are in the regular LTO split module, and exported via
the summary to the ThinLTO split module.
Differential Revision: https://reviews.llvm.org/D134012
These comments refer to the nested loop in the module inliner where
the inner loop grouped call sites from the same caller. We don't
group call sites anymore, so the comment has become stale.
In the CGSCC inliner, DidInline was used as an indicator to update the call graph.
In the module inliner, DidInline is always true at the end of the
"while" loop, so can just drop it.
In the bottom-up inliner, we have a two-level nested "while" loop,
with the inner one grouping call sites with the same caller. We need
to do so to keep CGSCC up to date.
Now, with the module inliner, we don't have any per-caller work. We
don't update CGSCC. Plus, the caller will likely keep changing as we
pop call sites in some priority order.
This patch simply removes the inner "while" loop while indenting its
body. Further cleanup is possible, but that's left for follow-up
patches.
Differential Revision: https://reviews.llvm.org/D133969
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6, ABataev
Differential Revision: https://reviews.llvm.org/D102107
Currently, FunctionModRefBehavior tracks whether the function reads
or writes memory (ModRefInfo) and which locations it can access
(argmem, inaccessiblemem and other). This patch changes it to track
ModRef information per-location instead.
To give two examples of why this is useful:
* D117095 highlights a weakness of ModRef modelling in the presence
of operand bundles. For a memcpy call with deopt operand bundle,
we want to say that it can read any memory, but only write argument
memory. This would allow them to be treated like any other calls.
However, we currently can't express this and have to say that it
can read or write any memory.
* D127383 would ideally be modelled as a separate threadid location,
where threadid Refs outside pre-split coroutines can be ignored
(like other accesses to constant memory). The current representation
does not allow modelling this precisely.
The patch as implemented is intended to be NFC, but there are some
obvious opportunities for improvements and simplification. To fully
capitalize on this we would also want to change the way we represent
memory attributes on functions, but that's a larger change, and I
think it makes sense to separate out the FunctionModRefBehavior
refactoring.
Differential Revision: https://reviews.llvm.org/D130896
Revert "[Attributor] Teach AAPointerInfo to look into aggregates"
This reverts commit 844f6c5d03 and
4ed0a88cd8 as they broke the buildbots
that run openmp/libomptarget/test/offloading/bug49021.cpp.
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
In situations when a submodule is extracted from big module (i.e. using
CloneModule) a lot of debug info is copied via metadata nodes. Despite of
the fact that part of that info is not linked to any instruction in extracted
IR file, StripDeadDebugInfo pass doesn't drop them.
Strengthen criteria for debug info that should be kept in a module:
- Only those compile units are left that referenced by a subprogram debug info
node that is attached to a function definition in the module or to an instruction
in the module that belongs to an inlined function.
Signed-off-by: Mikhail Lychkov <mikhail.lychkov@intel.com>
Differential Revision: https://reviews.llvm.org/D122163
Remove ctx redeclaration.
Format code.
Remove parallel check. Modify tests. Clean-up code.
Fix another test.
Move code to helper functions.
Format file.
Minor fixes.
hasOnlyColdCalls skipped over calls to intrinsics, but it did so after
checking the linkage of the called function. This meant that the presence
of a call to a debug intrinsic could affect the outcome of the
optimization.
In my original reproducer (for an out of tree target) it was particularly
interesting, because the actual IR after GlobalOpt was not different with
debug instrinsics present, so -print-after-all printouts didn't show
anything there.
However, without debuginfo, GlobalOpt went further and ran
BlockFrequencyAnalysis and (more importanly) LoopAnalysis, and later on in
the pipeline, instcombine behaved in different ways when LoopInfo was
present.
So a call to a dbg.declare prevented running LoopAnalysis in
GlobalOpt, which later prevented InstCombine from doing an optimization.
The dbg-intrinsic-loopanalysis.ll testcase tries to expose this.
Then I also noted that adding a dbg.declare actually made the existing
testcase colccc_coldsites.ll generate different code, so I modified that
to now test it behaves the same way with and without the dbg.declare.
Reviewed By: nikic, fhahn
Differential Revision: https://reviews.llvm.org/D133193
This patch replaces calls to greatestCommonDivisor with std::gcd where
both arguments are known to be of unsigned. This means that
std::common_type_t of the two argument types should just be the wider
one of the two.
MisExpect was occasionally crashing under SampleProfiling, due to a division by zero.
We worked around that in D124302 by changing the assert to an early return.
This patch is intended to add a test case for the crashing scenario and
re-enable MisExpect for SampleProfiling.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D124481
The SROA algorithm won't work for Scalable Vectors, since we don't
know how many bytes are loaded/stored. Bail out if a Scalable
Vector is seen.
Differential Revision: https://reviews.llvm.org/D132417
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
Using the legacy PM for the optimization pipeline is deprecated and in
the process of being removed. This is a small step in that direction.
For an example of migrating to the new PM:
853b57fe80
We're seeing non-determinism with loading sample profiles. It seems to
be related to the order in which we merge FunctionSamples in
promoteMergeNotInlinedContextSamples(). Use a MapVector to iterate over
NonInlinedCallSites in the order entries were inserted.
Reviewed By: wenlei, davidxl
Differential Revision: https://reviews.llvm.org/D131592
The relevant property of allocation functions of interest here is
their uniqueness (in the sense of disjoint provenance), which is
encoded by the noalias return attribute.
Differential Revision: https://reviews.llvm.org/D130225