This is a small patch to make FoldBranchToCommonDest poison-safe by default.
After fc3f0c9c, only two syntactic changes are needed to fix unit tests.
This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99452
This *only* changes the cases where we *really* don't care
about the iteration order of the underlying contained,
namely when we will use the values from it to form DTU updates.
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
`FoldBranchToCommonDest()` has a certain budget (`-bonus-inst-threshold=`)
for bonus instruction duplication. And currently it calculates the cost
as-if it will actually duplicate into each predecessor.
But ignoring the budget, it won't always duplicate into each predecessor,
there are some correctness and profitability checks.
So when calculating the cost, we should first check into which blocks
will we *actually* duplicate, and only then use that block count
to do budgeting.
We clone bonus instructions to the end of the predecessor block,
and then use `SSAUpdater::RewriteUseAfterInsertions()`.
But that only deals with the cases where the use-to-be-rewritten
are either in different block from the def, or come after the def.
But in some loop cases, the external use may be in the beginning of
predecessor block, before the newly cloned bonus instruction.
`SSAUpdater::RewriteUseAfterInsertions()` does not deal with that.
Notably, the external use can't happen to be both in the same block
and *after* the newly-cloned instruction, because of the fold preconditions.
To properly handle these cases, when the use is in the same block,
we should instead use `SSAUpdater::RewriteUse()`.
TBN, they do the same thing for PHI users.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49510
Likely Fixes https://bugs.llvm.org/show_bug.cgi?id=49689
2nd try (original: 27ae17a6b0) with fix/test for crash. We must make
sure that TTI is available before trying to use it because it is not
required (might be another bug).
Original commit message:
This is one step towards solving:
https://llvm.org/PR49336
In that example, we disregard the recommended usage of builtin_expect,
so an expensive (unpredictable) branch is folded into another branch
that is guarding it.
Here, we read the profile metadata to see if the 1st (predecessor)
condition is likely to cause execution to bypass the 2nd (successor)
condition before merging conditions by using logic ops.
Differential Revision: https://reviews.llvm.org/D98898
- Give unwieldy repeated expression a name
- Use a ranged `for` basic block iterator
Reviewed by: nikic, dexonsmith
Differential Revisision: https://reviews.llvm.org/D98957
Hoist early return for decl-only clones to before DIFinder
calculation.
Also fix an out of date assert message after invariants changed in
22a52dfddc.
Reviewed by: nikic, dexonsmith
Differential Revisision: https://reviews.llvm.org/D98957
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
This reverts commit 27ae17a6b0.
There are bot failures that end with:
#4 0x00007fff7ae3c9b8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
#5 0x00007fff84e504d8 (linux-vdso64.so.1+0x4d8)
#6 0x00007fff7c419a5c llvm::TargetTransformInfo::getPredictableBranchThreshold() const (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMAnalysis.so.13git+0x479a5c)
...but not sure how to trigger that yet.
This is one step towards solving:
https://llvm.org/PR49336
In that example, we disregard the recommended usage of builtin_expect,
so an expensive (unpredictable) branch is folded into another branch
that is guarding it.
Here, we read the profile metadata to see if the 1st (predecessor)
condition is likely to cause execution to bypass the 2nd (successor)
condition before merging conditions by using logic ops.
Differential Revision: https://reviews.llvm.org/D98898
This attribute represents the minimum and maximum values vscale can
take. For now this attribute is not hooked up to anything during
codegen, this will be added in the future when such codegen is
considered stable.
Additionally hook up the -msve-vector-bits=<x> clang option to emit this
attribute.
Differential Revision: https://reviews.llvm.org/D98030
When eliminating comparisons, we can use common dominator of
all its users as context. This gives better results when ICMP is not
computed right before the branch that uses it.
Differential Revision: https://reviews.llvm.org/D98924
Reviewed By: lebedev.ri
Now that intrinsic name mangling can cope with unnamed types, the custom name mangling in PredicateInfo (introduced by D49126) can be removed.
(See D91250, D48541)
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D91661
We can prove more predicates when we have a context when eliminating ICmp.
As first (and very obvious) approximation we can use the ICmp instruction itself,
though in the future we are going to use a common dominator of all its users.
Need some refactoring before that.
Observed ~0.5% negative compile time impact.
Differential Revision: https://reviews.llvm.org/D98697
Reviewed By: lebedev.ri
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.
This reverts commit 01ac6d1587.
This caused non-deterministic compiler output; see comment on the
code review.
> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232
This reverts commit df69c69427.
This is a patch to add nonnull and align to assume's operand bundle
only if noundef exists.
Since nonnull and align in fn attr have poison semantics, they should be
paired with noundef or noundef-implying attributes to be immediate UB.
Reviewed By: jdoerfert, Tyker
Differential Revision: https://reviews.llvm.org/D98228
The test is reduced from a C source example in:
https://llvm.org/PR49541
It's possible that the test could be reduced further or
the predicate generalized further, but it seems to require
a few ingredients (including the "late" SimplifyCFG options
on the RUN line) to fall into the infinite-loop trap.
For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining.
Differential Revision: https://reviews.llvm.org/D98187
This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic
operations with two or more non-const operands; this includes the GetElementPtr
instruction, and most Binary Operator instructions. These salvages produce
DIArgList locations and are only valid for dbg.values, as currently variadic
DIExpressions must use DW_OP_stack_value. This functionality is also only added
for salvageDebugInfoForDbgValues; other functions that directly call
salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be
updated in a later patch.
Differential Revision: https://reviews.llvm.org/D91722
D96109 was recently submitted which contains the refactored implementation of
-funique-internal-linakge-names by adding the unique suffixes in clang rather
than as an LLVM pass. Deleting the former implementation in this change.
Differential Revision: https://reviews.llvm.org/D98234
This patch refactors out the salvaging of GEP and BinOp instructions into
separate functions, in preparation for further changes to the salvaging of these
instructions coming in another patch; there should be no functional change as a
result of this refactor.
Differential Revision: https://reviews.llvm.org/D92851
See: https://bugs.llvm.org/show_bug.cgi?id=47613
There was an extra sqrt call because shrinking emitted a new powf and at the same time optimizePow replaces the previous pow with sqrt and as the result we have two instructions that will be in worklist of InstCombie despite the fact that %powf is not used by anyone (it is alive because of errno).
As the result we have two instructions:
%powf = call fast float @powf(float %x, float 5.000000e-01)
%sqrt = call fast double @sqrt(double %dx)
%powf will be converted to %sqrtf on a later iteration.
As a quick fix for that I moved shrinking to the end of optimizePow so that pow is replaced with sqrt at first that allows not to emit a new shrunk powf.
Differential Revision: https://reviews.llvm.org/D98235
We encountered an issue where LTO running on IR that used the DSOLocalEquivalent
constant would result in bad codegen. The underlying issue was ValueMapper wasn't
properly handling DSOLocalEquivalent, so this just adds the machinery for handling
it. This code path is triggered by a fix to DSOLocalEquivalent::handleOperandChangeImpl
where DSOLocalEquivalent could potentially not have the same type as its underlying GV.
This updates DSOLocalEquivalent::handleOperandChangeImpl to change the type if
the GV type changes and handles this constant in ValueMapper.
Differential Revision: https://reviews.llvm.org/D97978
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.
Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.
Differential Revision: https://reviews.llvm.org/D88232
The code used for propagating equalities (e.g. assume facts) was conservative in two ways - one of which this patch fixes. Specifically, it shifts the code reasoning about whether a use is dominated by the end of the assume block to consider phi uses to exist on the predecessor edge. This matches the dominator tree handling for dominates(Edge, Use), and simply extends it to dominates(BB, Use).
Note that the decision to use the end of the block is itself a conservative choice. The more precise option would be to use the later of the assume and the value, and replace all uses after that. GVN handles that case separately (with the replace operand mechanism) because it used to be expensive to ask dominator questions within blocks. With the new instruction ordering support, we should probably rewrite this code at some point to simplify.
Differential Revision: https://reviews.llvm.org/D98082
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.
Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
This reverts commit 99108c791d.
Clang is miscompiling LLVM with this change, a stage-2 build hits
multiple failures.
As a repro, I built clang in a stage1 directory and used it this way:
cmake -G Ninja ../llvm \
-DCMAKE_CXX_COMPILER=`pwd`/../build-stage1/bin/clang++ \
-DCMAKE_C_COMPILER=`pwd`/../build-stage1/bin/clang \
-DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \
-DLLVM_ENABLE_PROJECTS=mlir \
-DLLVM_BUILD_EXAMPLES=ON \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_ASSERTIONS=On
ninja check-mlir
This patch makes FoldBranchToCommonDest merge branch conditions into `select i1` rather than `and/or i1` when it is called by SimplifyCFG.
It is known that merging conditions into and/or is poison-unsafe, and this is towards making things *more* correct by removing possible miscompilations.
Currently, InstCombine simply consumes these selects into and/or of i1 (which is also unsafe), so the visible effect would be very small. The unsafe select -> and/or transformation will be removed in the future.
There has been efforts for updating optimizations to support the select form as well, and they are linked to D93065.
The safe transformation is fired when it is called by SimplifyCFG only. This is done by setting the new `PoisonSafe` argument as true.
Another place that calls FoldBranchToCommonDest is LoopSimplify. `PoisonSafe` flag is set to false in this case because enabling it has a nontrivial impact in performance because SCEV is more conservative with select form and InductiveRangeCheckElimination isn't aware of select form of and/or i1.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D95026
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.
This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.
Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.
Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
This patch adds a new metadata node, DIArgList, which contains a list of SSA
values. This node is in many ways similar in function to the existing
ValueAsMetadata node, with the difference being that it tracks a list instead of
a single value. Internally, it uses ValueAsMetadata to track the individual
values, but there is also a reasonable amount of DIArgList-specific
value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special
case in parsing and printing due to the fact that it requires a function state
(as it may reference function-local values).
This patch should not result in any immediate functional change; it allows for
DIArgLists to be parsed and printed, but debug variable intrinsics do not yet
recognize them as a valid argument (outside of parsing).
Differential Revision: https://reviews.llvm.org/D88175
This is included from IR files, and IR doesn't/can't depend on Analysis
(because Analysis depends on IR).
Also fix the implementation - don't use non-member static in headers, as
it leads to ODR violations, inaccurate "unused function" warnings, etc.
And fix the header protection macro name (we don't generally include
"LIB" in the names, so far as I can tell).
This enhances the auto-init remark with information about the variable
that is auto-initialized.
This is based of debug info if available, or alloca names (mostly for
development purposes).
```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes.Variables: var (4096 bytes). [-Rpass-missed=annotation-remarks]
int var[1024];
^
```
This allows to see things like partial initialization of a variable that
the optimizer won't be able to completely remove.
Differential Revision: https://reviews.llvm.org/D97734
explicitly emitting retainRV or claimRV calls in the IR
This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.
Original commit message:
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far.
This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added.
An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed.
Differential Revision: https://reviews.llvm.org/D97482
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.
The optimizations being unblocked are:
1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.
Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D97481
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.
> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
> which indicates the call is implicitly followed by a marker
> instruction and an implicit retainRV/claimRV call that consumes the
> call result. In addition, it emits a call to
> @llvm.objc.clang.arc.noop.use, which consumes the call result, to
> prevent the middle-end passes from changing the return type of the
> called function. This is currently done only when the target is arm64
> and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
> with the operand bundle in the IR and removes the inserted calls after
> processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
> operand bundle. It doesn't remove the operand bundle on the call since
> the backend needs it to emit the marker instruction. The retainRV and
> claimRV calls are emitted late in the pipeline to prevent optimization
> passes from transforming the IR in a way that makes it harder for the
> ARC middle-end passes to figure out the def-use relationship between
> the call and the retainRV/claimRV calls (which is the cause of
> PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
> nothing in the callee prevents it from being paired up with the
> retainRV/claimRV call in the caller. It then inserts a release call if
> claimRV is attached to the call since autoreleaseRV+claimRV is
> equivalent to a release. If it cannot find an autoreleaseRV call, it
> tries to transfer the operand bundle to a function call in the callee.
> This is important since the ARC optimizer can remove the autoreleaseRV
> returning the callee result, which makes it impossible to pair it up
> with the retainRV/claimRV call in the caller. If that fails, it simply
> emits a retain call in the IR if retainRV is attached to the call and
> does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
> constant value if the call has the operand bundle. This ensures the
> call always has at least one user (the call to
> @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
> multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
> calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808
This reverts commit ed4718eccb.
This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.
The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.
This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.
One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.
Reviewed By: lebedev.ri, ebrevnov
Differential Revision: https://reviews.llvm.org/D75980
In the example based on:
https://llvm.org/PR49218
...we are crashing because poison is a subclass of undef, so we merge blocks and create:
PHI node has multiple entries for the same basic block with different incoming values!
%k3 = phi i64 [ poison, %entry ], [ %k3, %g ], [ undef, %entry ]
If both poison and undef values are incoming, we soften the poison values to undef.
Differential Revision: https://reviews.llvm.org/D97495
collectBitParts uses int8_t for the bit indices, leaving a 128-bit limit.
We already test for this before calling collectBitParts, but rGb94c215592bd added truncate handling which meant we could end up processing wider integers.
Thanks to @manojgupta for the repro.
This now analyzes calls to both intrinsics and functions.
For intrinsics, grab the ones we know and care about (mem* family) and
analyze the arguments.
For calls, use TLI to get more information about the libcalls, then
analyze the arguments if known.
```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks]
int var[1024];
^
```
Differential Revision: https://reviews.llvm.org/D97489
This adds support for analyzing the instruction with the !annotation
"auto-init" in order to generate a more user-friendly remark.
For now, support the store size, and whether it's atomic/volatile.
Example:
```
auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks]
int var;
^
```
Differential Revision: https://reviews.llvm.org/D97412
This is a follow up to 22a52dfddc and a
revert of df763188c9.
With this change, we only skip cloning distinct nodes in
MDNodeMapper::mapDistinct if RF_ReuseAndMutateDistinctMDs, dropping the
no-longer-needed local helper `cloneOrBuildODR()`. Skipping cloning in
other cases is unsound and breaks CloneModule, which is why the textual
IR for PR48841 didn't pass previously. This commit adds the test as:
Transforms/ThinLTOBitcodeWriter/cfi-debug-info-cloned-type-references-global-value.ll
Cloning less often exposed a hole in subprogram cloning in
CloneFunctionInto thanks to df763188c9a1ecb1e7e5c4d4ea53a99fbb755903's
test ThinLTO/X86/Inputs/dicompositetype-unique-alias.ll. If a function
has a subprogram attachment whose scope is a DICompositeType that
shouldn't be cloned, but it has no internal debug info pointing at that
type, that composite type was being cloned. This commit plugs that hole,
calling DebugInfoFinder::processSubprogram from CloneFunctionInto.
As hinted at in 22a52dfddcefad4f275eb8ad1cc0e200074c2d8a's commit
message, I think we need to formalize ownership of metadata a bit more
so that ValueMapper/CloneFunctionInto (and similar functions) can deal
with cloning (or not) metadata in a more generic, less fragile way.
This fixes PR48841.
Differential Revision: https://reviews.llvm.org/D96734
This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes.
A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D97244
This is a patch to explicitly mark the size parameter of allocator functions like malloc/realloc/... as noundef.
For C/C++: undef can be created from reading an uninitialized variable or padding.
Calling a function with uninitialized variable is already UB.
Calling malloc with padding value is.. something that's not expected. Padding bits may appear in a coerced aggregate, which doesn't apply to malloc's size.
Therefore, malloc's size can be marked as noundef.
For transformations that introduce malloc/realloc/..: I ran LLVM unit tests with an updated Alive2 semantics, and found no regression, so it seems okay.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D97045
When cloning instructions during jump threading, also clone and
adapt any declared scopes. This is primarily important when
threading loop exits, because we'll end up with two dominating
scope declarations in that case (at least after additional loop
rotation). This addresses a loose thread from
https://reviews.llvm.org/rG2556b413a7b8#975012.
Differential Revision: https://reviews.llvm.org/D97154
Refines the fix in 3c4c205060 to only
put globals whose defs were cloned into the split regular LTO module
on the cloned llvm*.used globals. This avoids an issue where one of the
attached values was a local that was promoted in the original module
after the module was cloned. We only need to have the values defined in
the new module on those globals.
Fixes PR49251.
Differential Revision: https://reviews.llvm.org/D97013
I think we can use here same logic as for nonnull.
strlen(X) - X must be noundef => valid pointer.
for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones)
Reviewed By: jdoerfert, aqjune
Differential Revision: https://reviews.llvm.org/D95122
This moves the willReturn() helper from CallBase to Instruction,
so that it can be used in a more generic manner. This will make
it easier to fix additional passes (ADCE and BDCE), and will give
us one place to change if additional instructions should become
non-willreturn (e.g. there has been talk about handling volatile
operations this way).
I have also included the IntrinsicInst workaround directly in
here, so that it gets applied consistently. (As such this change
is not entirely NFC -- FuncAttrs will now use this as well.)
Differential Revision: https://reviews.llvm.org/D96992
As discussed on the RFC [0], I am sharing the set of patches that
enables checking of original Debug Info metadata preservation in
optimizations. The proof-of-concept/proposal can be found at [1].
The implementation from the [1] was full of duplicated code,
so this set of patches tries to merge this approach into the existing
debugify utility.
For example, the utility pass in the original-debuginfo-check
mode could be invoked as follows:
$ opt -verify-debuginfo-preserve -pass-to-test sample.ll
Since this is very initial stage of the implementation,
there is a space for improvements such as:
- Add support for the new pass manager
- Add support for metadata other than DILocations and DISubprograms
[0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ
[1] https://github.com/djolertrk/llvm-di-checker
Differential Revision: https://reviews.llvm.org/D82545
The test that was failing is now forced to use the old PM.
As discussed on the RFC [0], I am sharing the set of patches that
enables checking of original Debug Info metadata preservation in
optimizations. The proof-of-concept/proposal can be found at [1].
The implementation from the [1] was full of duplicated code,
so this set of patches tries to merge this approach into the existing
debugify utility.
For example, the utility pass in the original-debuginfo-check
mode could be invoked as follows:
$ opt -verify-debuginfo-preserve -pass-to-test sample.ll
Since this is very initial stage of the implementation,
there is a space for improvements such as:
- Add support for the new pass manager
- Add support for metadata other than DILocations and DISubprograms
[0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ
[1] https://github.com/djolertrk/llvm-di-checker
Differential Revision: https://reviews.llvm.org/D82545
Apply the patch for the third time after fixing buildbot failures.
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.
(4) Add inline keyword to avoid duplicated symbols -- they will
be removed later when the class is changed to a template.
Differential Revision: https://reviews.llvm.org/D96455
Revert "[SampleFDO] Add missing #includes to unbreak modules build after D96455"
This reverts commit c73cbf218a.
Revert "[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC)"
This reverts commit a23e6b321c.
Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp"
This reverts commit 6fd5ccff72.
Still seeing link failures when building llc (or other tools), due to
the new SampleProfileLoaderBaseImpl.h containing definitions that get
duplicated across multiple TU's.
```
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::findEquivalenceClasses(llvm::Function&)' in:
tools/llc/CMakeFiles/llc.dir/llc.cpp.o
lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::buildEdges(llvm::Function&)' in:
tools/llc/CMakeFiles/llc.dir/llc.cpp.o
lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(llvm::Function&)' in:
tools/llc/CMakeFiles/llc.dir/llc.cpp.o
lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getFunctionLoc(llvm::Function&)' in:
tools/llc/CMakeFiles/llc.dir/llc.cpp.o
lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getBlockWeight(llvm::BasicBlock const*)' in:
tools/llc/CMakeFiles/llc.dir/llc.cpp.o
lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockWeight(llvm::raw_ostream&, llvm::BasicBlock const*) const' in:
tools/llc/CMakeFiles/llc.dir/llc.cpp.o
lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockEquivalence(llvm::raw_ostream&, llvm::BasicBlock const*)' in:
tools/llc/CMakeFiles/llc.dir/llc.cpp.o
lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printEdgeWeight(llvm::raw_ostream&, std::__1::pair<llvm::BasicBlock const*, llvm::BasicBlock const*>)' in:
tools/llc/CMakeFiles/llc.dir/llc.cpp.o
lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
```
Reapply patch after fixing buildbot failure.
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.
Differential Revision: https://reviews.llvm.org/D96455
This patch changes costAndCollectOperands to use InstructionCost for
accumulated cost values.
isHighCostExpansion will return true if the cost has exceeded the budget.
Reviewed By: CarolineConcatto, ctetreau
Differential Revision: https://reviews.llvm.org/D92238
This commit fixes how metadata is handled in CloneModule to be sound,
and improves how it's handled in CloneFunctionInto (although the latter
is still awkward when called within a module).
Ruiling Song pointed out in PR48841 that CloneModule was changed to
unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in
fa35c1f80f for clarity). This flag papered
over a crash caused by other various changes made to CloneFunctionInto
over the past few years that made it unsound to use cloning between
different modules.
(This commit partially addresses PR48841, fixing the repro from
preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode
became unsound in df763188c9 and this
commit does not address that regression.)
RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use,
avoiding unnecessary clones of all referenced metadata when linking
between modules (with IRMover, the source module is discarded after
linking). It never makes sense to use when you're not discarding the
source. This commit drops its incorrect use in CloneModule.
Sadly, the right thing to do with metadata when cloning a function is
complicated, and this patch doesn't totally fix it.
The first problem is that there are two different types of referenceable
metadata and it's not obvious what to with one of them when remapping.
- `!0 = !{!1}` is metadata's version of a constant. Programatically it's
called "uniqued" (probably a better term would be "constant") because,
like `ConstantArray`, it's stored in uniquing tables. Once it's
constructed, it's illegal to change its arguments.
- `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal
to change the operands after construction.
What should be done with distinct metadata when cloning functions within
the same module?
- Should new, cloned nodes be created?
- Should all references point to the same, old nodes?
The answer depends on whether that metadata is effectively owned by a
function.
And that's the second problem. Referenceable metadata's ownership model
is not clear or explicit. Technically, it's all stored on an
LLVMContext. However, any metadata that is `distinct`, that transitively
references a `distinct` node, or that transitively references a
GlobalValue is specific to a Module and is effectively owned by it. More
specifically, some metadata is effectively owned by a specific Function
within a module.
Effectively function-local metadata was introduced somewhere around
c10d0e5ccd, which made it illegal for two
functions to share a DISubprogram attachment.
When cloning a function within a module, you need to clone the
function-local debug info and suppress cloning of global debug info (the
status quo suppresses cloning some global debug info but not all). When
cloning a function to a new/different module, you need to clone all of
the debug info.
Here's what I think we should do (eventually? soon? not this patch
though):
- Distinguish explicitly (somehow) between pure constant metadata owned
by the LLVMContext, global metadata owned by the Module, and local
metadata owned by a GlobalValue (such as a function).
- Update CloneFunctionInto to trigger cloning of all "local" metadata
(only), perhaps by adding a bit to RemapFlag. Alternatively, split
out a separate function CloneFunctionMetadataInto to prime the
metadata map that callers are updated to call ahead of time as
appropriate.
Here's the somewhat more isolated fix in this patch:
- Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to
an enum called `CloneFunctionChangeType` that is one of
LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule.
- The code maintaining the "functions uniquely own subprograms"
invariant is now only active in the first two cases, where a function
is being cloned within a single module. That's necessary because this
code inhibits cloning of (some) "global" metadata that's effectively
owned by the module.
- The code maintaining the "all compile units must be explicitly
referenced by !llvm.dbg.cu" invariant is now only active in the
DifferentModule case, where a function is being cloned into a new
module in isolation.
- CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create
uses LocalChangeOnly, since fa635d730f
only set `ModuleLevelChanges` to trigger cloning of local metadata.
- CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs
and special handling of !llvm.dbg.cu.
- Fixed some outdated header docs and left a couple of FIXMEs.
Differential Revision: https://reviews.llvm.org/D96531
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.
Differential Revision: https://reviews.llvm.org/D82703
Perform DSOLocal propagation within summary list of every GV. This
avoids the repeated query of this information during function
importing.
Differential Revision: https://reviews.llvm.org/D96398
explicitly emitting retainRV or claimRV calls in the IR
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
The vector reduction intrinsics started life as experimental ops, so backend support
was lacking. As part of promoting them to 1st-class intrinsics, however, codegen
support was added/improved:
D58015
D90247
So I think it is safe to now remove this complication from IR.
Note that we still have an IR-level codegen expansion pass for these as discussed
in D95690. Removing that is another step in simplifying the logic. Also note that
x86 was already unconditionally forming reductions in IR, so there should be no
difference for x86.
I spot checked a couple of the tests here by running them through opt+llc and did
not see any asm diffs.
If we do find functional differences for other targets, it should be possible
to (at least temporarily) restore the shuffle IR with the ExpandReductions IR
pass.
Differential Revision: https://reviews.llvm.org/D96552
This patch changes the VecDesc struct to use ElementCount
instead of an unsigned VF value, in preparation for
future work that adds support for vectorized versions of
math functions using scalable vectors. Since all I'm doing
in this patch is switching the type I believe it's a
non-functional change. I changed getWidestVF to now return
both the widest fixed-width and scalable VF values, but
currently the widest scalable value will be zero.
Differential Revision: https://reviews.llvm.org/D96011
This reverts commit b7d870eae7 and the
subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)"
(commit e6810cab09).
It caused indeterminism in the output, such that e.g. the
polly-x86_64-linux buildbot failed accasionally.
Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and
`MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more
precisely describe its effect and clarify the header documentation.
Found this while helping to investigate PR48841, which pointed out an
unsound use of the flag in `CloneModule()`. For now I've just added a
FIXME there, but I'm hopeful that the new (more precise) name will
prevent other similar errors.
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.
Differential Revision: https://reviews.llvm.org/D82703
PR49043 exposed a problem when it comes to RAUW llvm.assumes. While
D96106 would fix it for GVNSink, it seems a more general concern. To
avoid future problems this patch moves away from the vector of weak
reference model used in the assumption cache. Instead, we track the
llvm.assume calls with a callback handle which will remove itself from
the cache if the call is deleted.
Fixes PR49043.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D96168
Summary:
This resolves an issue posted on Bugzilla. https://bugs.llvm.org/show_bug.cgi?id=48764
In this issue, the loop had multiple exit blocks, which resulted in the
function getExitBlock to return a nullptr, which resulted in hitting the assert.
This patch ensures that loops which only have one exit block as allowed to be
unrolled and jammed.
Reviewed By: Whitney, Meinersbur, dmgreen
Differential Revision: https://reviews.llvm.org/D95806
emitting retainRV or claimRV calls in the IR
This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.
Original commit message:
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
emitting retainRV or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808