Add methods to appropriately extend KnownBits/ConstantRange there,
same as with APInt. Also clean up the known bits handling by
actually doing that extension rather than checking ZExtBits. This
doesn't matter now, but becomes relevant once truncation is
involved.
The information can be implicit (from `ValueTracking`) or explicit.
This implements the backend part of the following RFC
https://groups.google.com/g/llvm-dev/c/T9o51zB1JY.
We still need to settle on how to best represent the information in the
IR, but this is a separate discussion.
Differential Revision: https://reviews.llvm.org/D109746
Rather than separately handling subtraction of offset and variable
indices, make this one operation. Also rewrite the implementation
to use range-based for loops.
There are two sets of new/delete functions, one with Windows/MSVC
mangling and one with Itanium mangling. Mark one set or the other
as unavailable depending on the target.
Split the test malloc-free-delete.ll into three parts: malloc-free.dll
for the C API tests, new-delete-itanium.ll and new-delete-msvc.ll for
the target-specific new/delete tests.
Differential Revision: https://reviews.llvm.org/D110419
I was looking at some missed optimizations in CHERI-enabled targets and
noticed that we weren't removing vtable indirection for calls via known
pointers-to-members. The underlying reason for this is that we represent
pointers-to-function-members as {i8 addrspace(200)*, i64} and generate the
constant offsets using (gep i8 null, <index>). We use a constant GEP here
since inttoptr should be avoided for CHERI capabilities. The pointer-to-member
call uses ptrtoint to extract the index, and due to this missing fold we can't
infer the actual value loaded from the vtable.
This is the initial constant folding change for this pattern, I will add
an InstCombine fold as a follow-up.
We could fold all inbounds GEP to null (and therefore the ptrtoint to
zero) since zero is the only valid offset for an inbounds GEP. If the
offset is not zero, that GEP is poison and therefore returning 0 is valid
(https://alive2.llvm.org/ce/z/Gzb5iH). However, Clang currently generates
inbounds GEPs on NULL for hand-written offsetof() expressions, so this
could lead to miscompilations.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D110245
When using a datalayout that has pointer index width != pointer size this
code triggers an assertion in Value::stripAndAccumulateConstantOffsets().
I encountered this this while compiling FreeBSD for CHERI-RISC-V.
Also update LoadsTest.cpp to use a DataLayout with index width != pointer
width to ensure this case is tested.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D110406
When TargetLibraryInfoImpl::isValidProtoForLibFunc is checking
function signatures to detect lib calls it may check that a parameter
or return value matches with the "size_t" type. For this to work it
has to derive the IR type matching with "size_t". Depending on if
a DataLayout is provided or not, this has been done in two different
way. Either a more strict check being based on IntPtrType (which is
given by the DataLayout) or a more relaxed check assuming that any
integer type matches with "size_t".
Given that the stricter approach exist it seems like we do not want
to trigger rewrites etc if we aren't sure that a function calls
actually match with the library function. Therefore it was questioned
why we actually have the more relaxed approach when not being able
to derive an IR type for "size_t". This patch will take a more
defensive approach, requiring that a DataLayout is passed to
isValidProtoForLibFunc.
Differential Revision: https://reviews.llvm.org/D110584
This reverts commit 8fdac7cb7a.
The issue causing the revert has been fixed a while ago in 60b852092c.
Original message:
Now that SCEVExpander can preserve LCSSA form,
we do not have to worry about LCSSA form when
trying to look through PHIs. SCEVExpander will take
care of inserting LCSSA PHI nodes as required.
This increases precision of the analysis in some cases.
Reviewed By: mkazantsev, bmahjour
Differential Revision: https://reviews.llvm.org/D71539
Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities.
This change propagates (currently default off, turn on with `disable-thinlto-funcattrs=1`) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build:
1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities.
2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time.
Implementation-wise this adds the following summary function attributes:
1. noUnwind: function is noUnwind
2. mayThrow: function contains a non-call instruction that `Instruction::mayThrow` returns true on (e.g. windows SEH instructions)
3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well)
Testing:
Clang self-build passes and 2nd stage build passes check-all
ninja check-all with newly added tests passing
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D36850
This is a followup to D109844 (and alternative to D109907), which
integrates the new "earliest escape" tracking into AliasAnalysis.
This is done by replacing the pre-existing context-free capture
cache in AAQueryInfo with a replaceable (virtual) object with two
implementations: The SimpleCaptureInfo implements the previous
behavior (check whether object is captured at all), while
EarliestEscapeInfo implements the new behavior from DSE.
This combines the "earliest escape" analysis with the full power of
BasicAA: It subsumes the call handling from D109907, considers a
wider range of escape sources, and works with AA recursion. The
compile-time cost is slightly higher than with D109907.
Differential Revision: https://reviews.llvm.org/D110368
The case of an Argument and an identified function local is already
handled earlier, because we don't care about captures in that case.
As such, we don't need to additionally consider the combination of
an Argument with a non-escaping identified function local.
This ensures that isEscapeSource() only returns true for
instructions, which is necessary for D110368.
Other <math>_finite calls are marked as unavailable except on GNU/Linux;
it looks like the sqrt set was just overlooked.
Differential Revision: https://reviews.llvm.org/D110418
This reverts the revert commit df56fc6ebb.
This version of the patch adjusts the location where the EarliestEscapes
cache is cleared when an instruction gets removed. The earliest escaping
instruction does not have to be a memory instruction.
It could be a ptrtoint instruction like in the added test
@earliest_escape_ptrtoint, which subsequently gets removed. We need to
invalidate the EarliestEscape entry referring to the ptrtoint when
deleting it.
This fixes the crash mentioned in
https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6
There are several places in the code that are currently broken where
we assume an Instruction is always a member of a BasicBlock that
lives in a Function. This is a problem specifically when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction's parent also has a parent!
I've added a test for a function-less @llvm.vscale intrinsic call here:
unittests/Analysis/ValueTrackingTest.cpp
There are several places in the code that are currently broken as
they assume an Instruction always has a parent Function when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction has a parent.
I've added a test for a parentless @llvm.vscale intrinsic call here:
unittests/Analysis/ValueTrackingTest.cpp
Differential Revision: https://reviews.llvm.org/D110158
At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.
Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b1 to save compile-time. See PR50220 for more
context.
This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.
If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.
This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.
I will share a follow-up patch to also use the information for call
instructions to fix PR50220.
In terms of compile-time, the impact is low in general,
NewPM-O3: +0.05%
NewPM-ReleaseThinLTO: +0.05%
NewPM-ReleaseLTO-g: +0.03
with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions
Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%
The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions
Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D109844
The return type of strlen is size_t, not just any integer.
This is a partial fix for an example based on:
https://llvm.org/PR50836
There's another bug here because we can still crash
processing a real strlen or something that looks like it.
This comment references behavior that was removed in
ccae43a247, which is a commit from 5 years
ago. It seems safe to assume that that behavior won't be coming back
soon. If it does, we can readd this part of the comment :)
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.
The use VectorCombine is updated to pass through the DT to enable
additional scalarization.
Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D110175
The logic in howManyLessThans is fishy. It first checks invariance of
RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which
is, strictly saying, a different thing. We are seeing a very rare intermittent
failure of availability checks, and it looks like this precondition is
sometimes broken. Before we can figure out what's going on, adding asserts
that all involved values that may possibly to to isLoopEntryGuardedByCond
are available at loop entry.
If either of these asserts fails (OrigRHS is the most likely suspect), it
means that the logic here is flawed.
The implication logic for two values that are both negative or non-negative
says that it doesn't matter whether their predicate is signed and unsigned,
but only flips unsigned into signed for further inference. This patch adds
support for flipping a signed predicate into unsigned as well.
Differential Revision: https://reviews.llvm.org/D109959
Reviewed By: nikic
We implement logic to convert a byte offset into a sequence of GEP
indices for that offset in a number of places. This patch adds a
DataLayout::getGEPIndicesForOffset() method, which implements the
core logic. I've updated SROA, ConstantFolding and InstCombine to
use it, and there's a few more places where it looks relevant.
Differential Revision: https://reviews.llvm.org/D110043
In ValueTracking.cpp we use a function called
computeKnownBitsFromOperator to determine the known bits of a value.
For the vscale intrinsic if the function contains the vscale_range
attribute we can use the maximum and minimum values of vscale to
determine some known zero and one bits. This should help to improve
code quality by allowing certain optimisations to take place.
Tests added here:
Transforms/InstCombine/icmp-vscale.ll
Differential Revision: https://reviews.llvm.org/D109883
isPotentiallyReachable can use LoopInfo to return earlier. This patch
allows passing an optional LI to PointerMayBeCapturedBefore. Used in
D109844.
Reviewed By: nikic, asbirlea
Differential Revision: https://reviews.llvm.org/D109978
There is a piece of logic that uses the fact that signed and unsigned
versions of the same predicate are equivalent when both values are
non-negative. It's also true when both of them are negative.
Differential Revision: https://reviews.llvm.org/D109957
Reviewed By: nikic
Nobody has complained about this, and the documentation for
LLVMContext::yield() states that LLVM is allowed to never call it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D110008
A couple tweaks to
1. allow more thinlto importing by excluding probe intrinsics from IR size in module summary
2. Allow general default attributes (nofree nosync nounwind) for pseudo probe intrinsic. Without those attributes, pseudo probes will be basically treated as unknown calls which will in turn block their containing functions from annotated with those attributes.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D109976
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.
Differential Revision: https://reviews.llvm.org/D109852