The load/store instruction will be transformed to amx intrinsics in the
pass of AMX type lowering. Prohibiting the pointer cast make that pass
happy.
Differential Revision: https://reviews.llvm.org/D94372
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.
Differential Revision: https://reviews.llvm.org/D91927
This reverts commit 899faa50f2.
Upon further consideration, this does not fix the right issue.
Doing this fold for non-inbounds GEPs is legal, because the
resulting pointer is still based-on null, which has no associated
address range, and as such and access to it is UB.
https://bugs.llvm.org/show_bug.cgi?id=48577#c3
Effectively, this is what we were previously already doing when
the GEP was used in conjunction with a load or store, but this
fold can also be applied more generally:
> The only in bounds address for a null pointer in the default
> address-space is the null pointer itself.
If the GEP isn't inbounds, then accessing a GEP of null location
is generally not UB.
While this is a minimal fix, the GEP of null handling should
probably be its own fold.
The warning would fire when calling canReplaceGEPIdxWithZero on a GEP
whose source element type is a scalable vector. The size of scalable
vector types is not known, so this optimization cannot be performed.
This patch fixes the issue by:
- bailing out early in this routine if the GEP instruction's source
element type is a scalable vector.
- making use of getFixedSize -- this removes the dependency on the
deprecated interface.
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D89968
This patch adds metadata !noundef and makes load instructions can optionally have it.
A load with !noundef always return a well-defined value (has no undef bit or isn't poison).
If the loaded value isn't well defined, the behavior is undefined.
This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values.
It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise.
The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead.
The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D89050
This was broken by 16295d521e, when
instructions started being handled and not just constant
expressions. This was re-inserting an equivalent bitcast to the
original memcpy operand, which made a non-functional IR change on
every iteration.
This also fixes a secondary problem where it was inserting
addrspacecasts which may not have been legal (i.e. it changed the
source address space). Start visiting all pointer users and fail out
if we can't process them. Also start handling the relevant memory
intrinsic users. These cases can be dealt with by running
InferAddressSpaces separately.
And another step towards transforms not introducing inttoptr and/or
ptrtoint casts that weren't there already.
As we've been establishing (see D88788/D88789), if there is a int<->ptr cast,
it basically must stay as-is, we can't do much with it.
I've looked, and the most source of new such casts being introduces,
as far as i can tell, is this transform, which, ironically,
tries to reduce count of casts..
On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in
-33.58% less `IntToPtr`s (19014 -> 12629)
and +76.20% more `PtrToInt`s (18589 -> 32753),
which is an increase of +20.69% in total.
However just on RawSpeed, where i know there are basically
none `IntToPtr` in the original source code,
this results in -99.27% less `IntToPtr`s (2724 -> 20)
and +82.92% more `PtrToInt`s (4513 -> 8255).
which is again an increase of 14.34% in total.
To me this does seem like the step in the right direction,
we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`,
which seems like a reasonable trade-off.
See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995
for some more discussion on the subject.
(Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair`
should be taught about this, yes)
Reviewed By: nlopes, nikic
Differential Revision: https://reviews.llvm.org/D88979
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)
This canonicalization seems dubious.
Most importantly, while it does not create `inttoptr` casts by itself,
it may cause them to appear later, see e.g. D88788.
I think it's pretty obvious that it is an undesirable outcome,
by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
are not no-op, and are no longer eager to look past them.
Which e.g. means that given
```
%a = load i32
%b = inttoptr %a
%c = inttoptr %a
```
we likely won't be able to tell that `%b` and `%c` is the same thing.
As we can see in D88789 / D88788 / D88806 / D75505,
we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
And we can't recover the situation post-inlining in instcombine.
So it really does look like this fold is actively breaking
otherwise-good IR, in a way that is not recoverable.
And that means, this fold isn't helpful in exposing the passes
that are otherwise unaware of these patterns it produces.
Thusly, i propose to simply not perform such a canonicalization.
The original motivational RFC does not state what larger problem
that canonicalization was trying to solve, so i'm not sure
how this plays out in the larger picture.
On vanilla llvm test-suite + RawSpeed, this results in
increase of asm instructions and final object size by ~+0.05%
decreases final count of bitcasts by -4.79% (-28990),
ptrtoint casts by -15.41% (-3423),
and of inttoptr casts by -25.59% (-6919, *sic*).
Overall, there's -0.04% less IR blocks, -0.39% instructions.
See https://bugs.llvm.org/show_bug.cgi?id=47592
Differential Revision: https://reviews.llvm.org/D88789
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
Summary:
As @nikic is pointing out in https://bugs.llvm.org/show_bug.cgi?id=46680#c5,
InstCombine should not have forward instruction scans,
so let's move this transform into the proper place.
This is pretty much NFCI.
Reviewers: nikic, spatel
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits, nikic
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83670
We can happen to have a situation with many stores eligible for transform,
but due to our visitation order (top to bottom), when we have processed
the first eligible instruction, we would not try to reprocess the previous
instructions that are now also eligible.
So after we've successfully merged a store that was second-to-last instruction
into successor, if the now-second-to-last instruction is also a such store
that is eligible, add it to worklist to be revisited.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46661
Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.
Differential Revision: https://reviews.llvm.org/D80044
This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.
Differential Revision: https://reviews.llvm.org/D79968
The fact that loads and stores can have the alignment missing is a
constant source of confusion: code that usually works can break down in
rare cases. So fix the LoadInst API so the alignment is never missing.
To reduce the number of changes required to make this work, IRBuilder
and certain LoadInst constructors will grab the module's datalayout and
compute the alignment automatically. This is the same alignment
instcombine would eventually apply anyway; we're just doing it earlier.
There's a minor risk that the way we're retrieving the datalayout
could break out-of-tree code, but I don't think that's likely.
This is the last in a series of patches, so most of the necessary
changes have already been merged.
Differential Revision: https://reviews.llvm.org/D77454
Consider any constant memory type, not just global constants. AMDGPU
kernel parameters are effectively global constants, but appear as
either reads from an intrinsic derived pointer or function argument.
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sdesmalen, rriddle, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77263
Summary:
DataLayout::getTypeAllocSize() return TypeSize. For cases where scalable
property doesn't matter (check for zero-sized alloca), we should explicitly
call getKnownMinSize() to avoid implicit type conversion to uint64_t, which is
invalid for scalable vector type.
Reviewers: sdesmalen, efriedma, spatel, apazos
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76386
Fixes https://bugs.llvm.org/show_bug.cgi?id=44835. Skip the transform
if it wouldn't actually do anything (apart from removing and reinserting
the same instructions).
Note that the test case doesn't loop on current master anymore, only
on the LLVM 10 release branch. The issue is already mitigated on master
due to worklist order fixes, but we should fix the root cause there as well.
As a side note, we should probably assert in combineLoadToNewType()
that it does not combine to the same type. Not doing this here, because
this assertion would also be triggered in another place right now.
Differential Revision: https://reviews.llvm.org/D74278
Adds a replaceOperand() helper, which is like Instruction.setOperand()
but adds the old operand to the worklist. This reduces the amount of
missing or incorrect worklist management.
This only applies the helper to a relatively small subset of
setOperand() calls in InstCombine, namely those of the pattern
`I.setOperand(); return &I;`, where it is most obviously applicable.
Differential Revision: https://reviews.llvm.org/D73803
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.
As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.
Differential Revision: https://reviews.llvm.org/D73745
Fixes https://bugs.llvm.org/show_bug.cgi?id=44552. We need to make
sure that the store is reprocessed, because performing DSE may
expose more DSE opportunities.
There is a slight caveat here though: We need to make sure that we
add back the store the worklist first, because that means it will
be processed after the operands of the removed store have been
processed. This is a general bug in InstCombine worklist management
that I hope to address at some point, but for now it means we need
to do this manually rather than just returning the instruction as
changed.
Differential Revision: https://reviews.llvm.org/D72807
Summary:
If aligment on `LoadInst` isn't specified, load is assumed to be ABI-aligned.
And said aligment may be different for different types.
So if we change load type, but don't pay extra attention to the aligment
(i.e. keep it unspecified), we may either overpromise (if the default aligment
of the new type is higher), or underpromise (if the default aligment
of the new type is smaller).
Thus, if no alignment is specified, we need to manually preserve the implied ABI alignment.
This addresses https://bugs.llvm.org/show_bug.cgi?id=44543 by making combineLoadToNewType preserve ABI alignment of the load.
Reviewers: spatel, lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72710
Don't try to canonicalize loads to scalable vector types to loads
of integers.
This removes one assertion when trying to use a TypeSize as a parameter
to DataLayout::isLegalInteger. It does not handle the second part of the
function (which looks at bitcasts).
This patch also contains a NFC fix for Load Analysis, where a variable
initialization that would cause the same assertion is moved closer to
its use. This allows us to run the new test for InstCombine without
having to teach LocationSize to play nicely with scalable vectors.
Differential Revision: https://reviews.llvm.org/D70075
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.
Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.
llvm-svn: 365468
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.
llvm-svn: 361613