Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
When shifting by a byte-multiple:
bswap (shl X, Y) --> lshr (bswap X), Y
bswap (lshr X, Y) --> shl (bswap X), Y
This was limited to constants as a first step in D122010 / 60820e53ec ,
but issue #55327 shows a source example (and there's a test based on that here)
where a variable shift amount is used in this pattern.
If a constrained intrinsic call was replaced by some value, it was not
removed in some cases. The dangling instruction resulted in useless
instructions executed in runtime. It happened because constrained
intrinsics usually have side effect, it is used to model the interaction
with floating-point environment. In some cases side effect is actually
absent or can be ignored.
This change adds specific treatment of constrained intrinsics so that
their side effect can be removed if it actually absents.
Differential Revision: https://reviews.llvm.org/D118426
If a constrained intrinsic call was replaced by some value, it was not
removed in some cases. The dangling instruction resulted in useless
instructions executed in runtime. It happened because constrained
intrinsics usually have side effect, it is used to model the interaction
with floating-point environment. In some cases it is correct behavior
but often the side effect is actually absent or can be ignored.
This change adds specific treatment of constrained intrinsics so that
their side effect can be removed if it actually absents.
Differential Revision: https://reviews.llvm.org/D118426
This check is in the related fold for binops,
but it was missed when the code was adapted
for intrinsics in 432c199e84. The new test
would crash when trying to create a new
intrinsic with mismatched types.
This extends 432c199e84 and 9c4770eaab with an intrinsic
cited directly in issue #46238
Eventually, we will want to use llvm::isTriviallyVectorizable()
or create some new API for this list, but for now, I am intentionally
making a minimum change to reduce risk and only affect an intrinsic
with regression tests in place.
https://alive2.llvm.org/ce/z/sD-JVv
This extends 432c199e84 with a 3 arg intrinsic to demonstrate
that the code works with the extra operand.
Eventually, we will want to use llvm::isTriviallyVectorizable()
or create some new API for this list, but for now, I am intentionally
making a minimum change to reduce risk and only affect an intrinsic
with regression tests in place.
This is an intrinsic version of the existing fold for binops.
As a first step, I only allowed min/max, but the code is set
up to make adding more intrinsics easy (with more or less than
2 arguments).
This (and possible follow-ups) are discussed in issue #46238.
This removes memset with undef char. We already do this for stores
of undef value.
This comes with the caveat that this optimization is not, strictly
speaking, legal for undef values, because we might be overwriting
a poison value. However, our entire load/store model currently still
operates on undef values, so we need to support undef here as well
for internal consistency.
Once https://github.com/llvm/llvm-project/issues/52930 is resolved,
these and related folds can be limited to poison -- I've added
FIXMEs to that effect.
Differential Revision: https://reviews.llvm.org/D124173
It actually implements support for seeing through loads, using alias analysis to
refine the result.
This is rather limited, but I didn't want to rely on more than available
analysis at that point (to be gentle with compilation time), and it does seem to
catch common scenario, as showcased by the included tests.
Differential Revision: https://reviews.llvm.org/D122431
Sometimes we can infer an align from an allocalign but the function
already promised it'd be more-aligned than the allocalign and there's an
existing align that we shouldn't reduce. Make sure we handle that
correctly.
Differential Revision: https://reviews.llvm.org/D121642
Before we gave up if a call through bitcast had parameter attributes.
Interestingly, we allowed attributes for the return value already. We
now handle both the same way, namely, we drop the ones that are
incompatible with the new type and keep the rest. This cannot cause
"more UB" than initially present.
Differential Revision: https://reviews.llvm.org/D119967
The motivation for this is that while both memcpyopt and dse will catch this case, both are limited by MSSA's walk back threshold when finding clobbers. As such, if you have a memcpy of an otherwise dead alloca placed towards the end of a long basic block with lots of other memory instructions, it would be missed. This is a bit undesirable for such an "obviously" useless bit of code.
As noted in comments, we should probably generalize instcombine's escape analysis peephole (see visitAllocInst) to allow read xor write. Doing that would subsume this code in a more general way, but is also a more involved change. For the moment, I went with the easiest fix.
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C
This is an IR implementation of a transform suggested in D120648.
The "swaps cancel" test models the motivating optimization from
that proposal.
Alive2 checks (as noted in the other review, we could use
knownbits to handle shift-by-variable-amount, but that can be an
enhancement patch):
https://alive2.llvm.org/ce/z/pXUaRfhttps://alive2.llvm.org/ce/z/ZnaMLf
Differential Revision: https://reviews.llvm.org/D122010
To make this actually trigger, we also need to check whether the
function types differ, which is a hidden cast under opaque pointers.
The transform is somewhat less relevant there because it is
primarily about pointer bitcasts, but it can also happen with other
bit- or pointer-castable types.
Byval handling is easier with opaque pointers because there is no
need to adjust the byval type, we only need to make sure that it's
still a pointer.
The logic for handling this was fixed in
8d7f118ab2, but the check for byval
on the callee was retained. This resulted in a weird situation
where the transform would work depending on whether the byval
was only on the call or on both the call and the function.
Only call it for intrinsic min/max. The moved implementation is
unchanged apart from the one-use check: It is now hardcoded to
one-use, without the two-use special case for SPF.
A generalization like this was suggested in D119754.
This is the inverse direction of D119851,
and we get all of the folds there plus the one that was missed.
There is precedence for this kind of transform in instcombine
with "or" instructions (but strangely only with that one opcode AFAICT).
Similar justification as in the other patch:
The line between instcombine and reassociate for these kinds of folds
is blurry. This doesn't appear to have much cost and gives us the
expected wins from repeated folds as seen in the last set of test diffs.
Differential Revision: https://reviews.llvm.org/D119955
Integer min/max operations are associative:
max (max X, C0), C1 --> max X, (max C0, C1) --> max X, NewC
https://alive2.llvm.org/ce/z/wW5HVM
This would avoid a regression when we canonicalize to min/max intrinsics
(see D98152 ).
Differential Revision: https://reviews.llvm.org/D119754
We have an instCombine rule to remove identical consecutive fences.
We can extend this to remove weaker fences when we have consecutive stronger
fence.
As stated in the LangRef, a fence with a stronger ordering also implies
ordering weaker than itself: "A fence which has seq_cst ordering, in addition to
having both acquire and release semantics specified above, participates in the
global program order of other seq_cst operations and/or fences."
Reviewed-By: reames
Differential Revision: https://reviews.llvm.org/D118607
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
The behavior in Analysis (knownbits) implements poison semantics already,
and we expect the transforms (for example, in instcombine) derived from
those semantics, so this patch changes the LangRef and remaining code to
be consistent. This is one more step in removing "undef" from LLVM.
Without this, I think https://github.com/llvm/llvm-project/issues/53330
has a legitimate complaint because that report wants to allow subsequent
code to mask off bits, and that is allowed with undef values. The clang
builtins are not actually documented anywhere AFAICT, but we might want
to add that to remove more uncertainty.
Differential Revision: https://reviews.llvm.org/D117912
This patch checks in the masked gather when the first operand value is a
splat and the mask is all one, because the masked gather is reloading the
same value each time. This patch replaces this pattern of masked gather by
a scalar load of the value and splats it in a vector.
Differential Revision: https://reviews.llvm.org/D115726
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
When masked scatter intrinsic does a uniform store to a destination
address from a source vector, and in this case, the mask is all one value.
This patch replaces the masked scatter with an extracted element of the
last lane of the source vector and stores it in the destination vector.
This patch also folds when the value in the masked scatter is a splat.
In this case, the mask cannot be all zero, and it folds to a scalar store
of the value in the destination pointer.
Differential Revision: https://reviews.llvm.org/D115724
Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.
There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically.
This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack. The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC. The later will be reviewed separately.
Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)
This change removes a previous restriction where we had to prove the allocation performed by aligned_alloc was non-zero in size before using the align parameter to annotate the result. I believe this was conservatism around the C11 specification of this routine which allowed UB when size was not a multiple of alignment, but if so, it was a partial one at best. (ex: align 32, size 16 was equally UB, but not restricted) The spec has since been clarified to require nullptr return, not UB.
A nullptr - the documented return for this function on failure for all cases after UB mentioned above was removed - is trivially aligned for any power of two. This isn't totally new behavior even for this transform, we'd previously annotate potentially failing allocs (e.g. huge sizes) meaning we were putting align on potentially null pointers anyways. This change simpy does the same for all failure modes.
Goal is to remove use of isOpNewLike. I looked at a couple approaches to this, and this turned out to be the cheapest one. Just letting deref_or_null be generated causes a bunch of test diffs, and I couldn't convince myself there wasn't a real regression somewhere. A generic instcombine to convert deref_or_null + nonnull to deref is annoying complicated since you have to mix facts from callsite and declaration while manipulating only existing call site attributes. It just wasn't worth the code complexity.
Note that the change in new-delete-itanium.ll is a real regression. If you have a callsite which overrides the builtin status of a nobuiltin declaration, *and* you don't put the apppriate attributes on that callsite, you may lose the deref fact. I decided this didn't matter; if anyone disagrees, you can add this case to the generic non-null inference.
nstCombine appears to duplicate the allocation size logic used inside getObjectSize when figuring out which attributes are safe to place on the callsite. We can use the existing utility function instead.
The test change is correct. With aligned_alloc, a zero alignment is required to return nullptr. As such, deref_or_null is a correct attribute to use.
Differential Revision: https://reviews.llvm.org/D116816
I noticed we weren't propagating tail flags on calls when
FortifiedLibCallSimplifier.optimizeCall() was replacing calls to runtime
checked calls to the non-checked routines (when safe to do so). Make
sure to check this before replacing the original calls!
Also, avoid any libcall transforms when notail/musttail is present.
PR46734
Fixes: https://github.com/llvm/llvm-project/issues/46079
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D107872