When using instructions which have a MetadataAsValue argument
(e.g. some target-specific intrinsics) MD canonicalization strips
internal MDNodes with a single ConstantAsMetadata child. That
prevented IRTranslator from the proper translation of such a calls.
Add generic helper function that matches constant splat. It has option to
match constant splat with undef (some elements can be undef but not all).
Add util function and matcher for G_FCONSTANT splat.
Differential Revision: https://reviews.llvm.org/D104410
For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't
seem to have debug users of defs. However, in the legalizer we're always calling
MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive.
In some rare cases, this contributes significantly to unreasonably long compile
times when we have lots of artifact combiner activity.
To verify this, I added asserts to that function when it actually replaced a debug
use operand with undef for these artifacts. On CTMark with both -O0 and -Os and
debug info enabled, I didn't see a single case where it triggered.
In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0
for AArch64 with this change.
Differential Revision: https://reviews.llvm.org/D109750
Add eraseInstr(s) utility functions. Before deleting an instruction
collects its use instructions. After deletion deletes use instructions
that became trivially dead.
This patch clears all dead instructions in existing legalizer mir tests.
Differential Revision: https://reviews.llvm.org/D109154
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT
Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.
Relevant matchers now return both DefVReg and APInt/APFloat.
Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant
In other places use integer only constant match.
Differential Revision: https://reviews.llvm.org/D104409
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.
Differential Revision: https://reviews.llvm.org/D109852
The doc comment for isPredecessor says:
Returns true if \p DefMI precedes \p UseMI or they are the same
instruction.
And dominates relies on that behavior for its own:
Returns true if \p DefMI dominates \p UseMI. By definition an
instruction dominates itself.
Make both statements correct by fixing isPredecessor.
Found by inspection.
Always use the byval/inalloca/preallocated type (which is required
nowadays), don't fall back on the pointer element type.
This requires adding Function::getParamPreallocatedType() to
mirror the CallBase API, so that the templated code can work with
both.
Follow up to suggestions in D109103 via hans:
I think UnreachableDefault (or UnreachableFallthrough) would be a
better name now, since it doesn't just omit the range check, it also
omits the last bit test.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D109455
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
This is a port of a combine which matches a pattern where a wide type scalar
value is stored by several narrow stores. It folds it into a single store or
a BSWAP and a store if the targets supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
=>
*((i32)p) = val;
On CTMark AArch64 -Os this results in a good amount of savings:
Program before after diff
SPASS 412792 412788 -0.0%
kc 432528 432512 -0.0%
lencod 430112 430096 -0.0%
consumer-typeset 419156 419128 -0.0%
bullet 475840 475752 -0.0%
tramp3d-v4 367760 367628 -0.0%
clamscan 383388 383204 -0.0%
pairlocalalign 249764 249476 -0.1%
7zip-benchmark 570100 568860 -0.2%
sqlite3 287628 286920 -0.2%
Geomean difference -0.1%
Differential Revision: https://reviews.llvm.org/D109419
Otherwise we end up with an extra conditional jump, following by an
unconditional jump off the end of a function. ie.
bb.0:
BT32rr ..
JCC_1 %bb.4 ...
bb.1:
BT32rr ..
JCC_1 %bb.2 ...
JMP_1 %bb.3
bb.2:
...
bb.3.unreachable:
bb.4:
...
Should be equivalent to:
bb.0:
BT32rr ..
JCC_1 %bb.4 ...
JMP_1 %bb.2
bb.1:
bb.2:
...
bb.3.unreachable:
bb.4:
...
This can occur since at the higher level IR (Instruction) SwitchInsts
are required to have BBs for default destinations, even when it can be
deduced that such BBs are unreachable.
For most programs, this isn't an issue, just wasted instructions since the
unreachable has been statically proven.
The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to
boot though once D106056 is re-applied. D106056 makes it more likely
that correlation-propagation (CVP) can deduce that the default case of
SwitchInsts are unreachable. The x86_64 kernel uses a binary post
processor called objtool, which emits this warning:
vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't
find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b
I haven't debugged precisely why this causes a failure at boot time, but
fixing this very obvious jump off the end of the function fixes the
warning and boot problem.
Link: https://bugs.llvm.org/show_bug.cgi?id=50080
Fixes: https://github.com/ClangBuiltLinux/linux/issues/679
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1440
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D109103
Legalizing G_MUL for non-standard types (like i33) generated an error. Putting
minScalar and maxScalar instead of clampScalar. Also using new rule, instead
of widening to the next power of 2, widen to the next multiple of the passed
argument (32 in this case), so instead of widening i65 to i128, we widen it to
i96.
Patch by: Mateja Marjanovic
Differential Revision: https://reviews.llvm.org/D109228
Add implementation for the legalization of G_ROTL and G_ROTR machine
instructions. They are very similar to funnel shift instructions, the only
difference is funnel shifts have 3 operands, whereas rotate instructions have
two operands, the first being the register that is being rotated and the second
being the number of shifts. The legalization of G_ROTL/G_ROTR is just lowering
them into funnel shift instructions if they are legal.
Patch by: Mateja Marjanovic
Differential Revision: https://reviews.llvm.org/D105347
Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE.
Corresponding intrinsics are replaced by a loop that uses loads/stores in
AMDGPULowerIntrinsics pass unless their length is a constant lower then
MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that
reaches legalizer should have a const length argument and should be expanded
into appropriate number of loads + stores.
Differential Revision: https://reviews.llvm.org/D108357
This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D109161
This adds the following combines:
```
x = ... 0 or 1
c = icmp eq x, 1
->
c = x
```
and
```
x = ... 0 or 1
c = icmp ne x, 0
->
c = x
```
When the target's true value for the relevant types is 1.
This showed up in the following situation:
https://godbolt.org/z/M5jKexWTW
SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.
This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)
Differential Revision: https://reviews.llvm.org/D109130
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)
TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.
While the end result of discussion may lead back to the current design,
it may also not lead to the current design.
Therefore i take it upon myself
to revert the tree back to last known good state.
This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
As noted in the comments in D108227, using G_FPTOSI produces wrong results for
G_ISNAN. Drop the G_FPTOSI and perform the operation on integer types.
Elsewhere in LLVM, a bitcast would be the appropriate choice (as it is in SDAG).
GlobalISel does not distinguish between integer and FP types, so a bitcast would
be meaningless here.
Add support for the GNU C style __attribute__((error(""))) and
__attribute__((warning(""))). These attributes are meant to be put on
declarations of functions whom should not be called.
They are frequently used to provide compile time diagnostics similar to
_Static_assert, but which may rely on non-ICE conditions (ie. relying on
compiler optimizations). This is also similar to diagnose_if function
attribute, but can diagnose after optimizations have been run.
While users may instead simply call undefined functions in such cases to
get a linkage failure from the linker, these provide a much more
ergonomic and actionable diagnostic to users and do so at compile time
rather than at link time. Users instead may be able use inline asm .err
directives.
These are used throughout the Linux kernel in its implementation of
BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be
converted to use _Static_assert because many of the parameters are not
ICEs. The Linux kernel still needs to be modified to make use of these
when building with Clang; I have a patch that does so I will send once
this feature is landed.
To do so, we create a new IR level Function attribute, "dontcall" (both
error and warning boil down to one IR Fn Attr). Then, similar to calls
to inline asm, we attach a !srcloc Metadata node to call sites of such
attributed callees.
The backend diagnoses these during instruction selection, while we still
know that a call is a call (vs say a JMP that's a tail call) in an arch
agnostic manner.
The frontend then reconstructs the SourceLocation from that Metadata,
and determines whether to emit an error or warning based on the callee's
attribute.
Link: https://bugs.llvm.org/show_bug.cgi?id=16428
Link: https://github.com/ClangBuiltLinux/linux/issues/1173
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D106030
The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check,
leading to the generation of potentially illegal G_SEXTLOADs when run after legalization.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108626
Translate the `@llvm.lround.*` family to G_LROUND via
`IRTranslator::translateSimpleIntrinsic`.
Differential Revision: https://reviews.llvm.org/D108418
For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize
completely if the source is <= 64b. This change adds support for that in
the legalizer. If the source has a pow-2 num elements, then we can do
a tree reduction using the scalar operation in the individual elements.
Otherwise, we just create a sequential chain of operations.
For AArch64, we only need to scalarize if the input is <64b. If it's great than
64b then we can first do a fewElements step to 64b, taking advantage of vector
instructions until we reach the point of scalarization.
I also had to relax the verifier checks for reductions because the intrinsics
support <1 x EltTy> types, which we lower to scalars for GlobalISel.
Differential Revision: https://reviews.llvm.org/D108276
Translate the `@llvm.isnan` intrinsic to G_ISNAN when we see it.
This is pretty much the same as the associated SelectionDAGBuilder code. Main
difference is that we don't expand it here. It makes more sense to do that
during legalization in GlobalISel. GlobalISel will just legalize the generated
illegal types.
Differential Revision: https://reviews.llvm.org/D108226
Combine two G_PTR_ADDs, but keep the register bank of the constant.
That way, the combine can be used in post-regbank-select combines.
Introduce two helper methods in CombinerHelper, getRegBank and
setRegBank that get and set an optional register bank to a register.
That way, they can be used before and after register bank selection.
Differential Revision: https://reviews.llvm.org/D103326
This is a fairly common pattern:
```
%mask = G_CONSTANT iN <mask val>
%add = G_ADD %lhs, %rhs
%and = G_AND %add, %mask
```
We have combines to eliminate G_AND with a mask that does nothing.
If we combined the above to this:
```
%mask = G_CONSTANT iN <mask val>
%narrow_lhs = G_TRUNC %lhs
%narrow_rhs = G_TRUNC %rhs
%narrow_add = G_ADD %narrow_lhs, %narrow_rhs
%ext = G_ZEXT %narrow_add
%and = G_AND %ext, %mask
```
We'd be able to take advantage of those combines using the trunc + zext.
For this to work (or be beneficial in the best case)
- The operation we want to narrow then widen must only be used by the G_AND
- The G_TRUNC + G_ZEXT must be free
- Performing the operation at a narrower width must not produce a different
value than performing it at the original width *after masking.*
Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj
At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign.
Differential Revision: https://reviews.llvm.org/D107929
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().
Add hasRetAttr() since it was missing from AttributeList.
If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be
shifted individually by the constant shift amount.
However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization
code was used, producing intermediate shifts that are potentially illegal on some targets.
This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D89100
We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
instead of eraseFromParent().
We should probably use that in other places too but fix this issue which
affects clang bootstrap builds for now.
Instructions that produceSameValue produce same values for operands with
same index. matchEqualDefs used to return true for any two values from
different instructions that produce same values. Fix this by checking if
values are defined by operands with the same index.
Differential Revision: https://reviews.llvm.org/D107362