This would create a new interval missing the subrange and hit this
verifier error:
*** Bad machine code: Live interval for subreg operand has no subranges ***
- function: test_remat_subreg_def
- basic block: %bb.0 (0xa568758) [0B;128B)
- instruction: 32B dead undef %4.sub0:vreg_64 = V_MOV_B32_e32 2, implicit $exec
We still haven't found a solution that correctly handles 'don't care' sub elements properly - given how close it is to the next release branch, I'm making this fail safe change and we can revisit this later if we can't find alternatives.
NOTE: This isn't a reversion of D128570 - it's the removal of undef handling across bitcasts entirely
Fixes#56520
llvm::sort is beneficial even when we use the iterator-based overload,
since it can optionally shuffle the elements (to detect
non-determinism). However llvm::sort is not usable everywhere, for
example, in compiler-rt.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D130406
This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....).
It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.
As noticed on D127115, when splitting ADD/SUB nodes we often end up with cases where overflow from the lower bits is impossible - in such cases we're better off breaking the carry chain dependency as soon as possible.
This path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen diff without a topological worklist.
Concat KnownBits from ISD::SHL_PARTS / ISD::SRA_PARTS / ISD::SRL_PARTS lo/hi operands and perform the KnownBits calculation by the shift amount on the extended type, before splitting the KnownBits based on the requested lo/hi result.
This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.llvm.org/D73739) which avoids zero offset landing pad in basic block sections.
Detailed description:
The current machine functions splitter can create ˜sections which start with a landing pad themselves. This places landing pad at offset zero from LPStart.
```
.section .text.split.foo10,"ax",@progbits
foo10.cold: # %lpad
.cfi_startproc
.cfi_personality 3, __gxx_personality_v0
.cfi_lsda 3, .Lexception5
.cfi_def_cfa %rsp, 16
.Ltmp11: <--- This is a Landing pad and also LP Start as it is start of this section
movq %rax, %rdi <--- first instruction is at offest 0 from LPStart
callq _Unwind_Resume@PLT
```
This will cause landing pad entries to become zero (.Ltmp11-foo10.cold)
```
.Lcst_begin4:
.uleb128 .Ltmp9-.Lfunc_begin2 # >> Call Site 1 <<
.uleb128 .Ltmp10-.Ltmp9 # Call between .Ltmp9 and .Ltmp10
.uleb128 .Ltmp11-foo10.cold <---This is zero # jumps to .Ltmp11
.byte 3 # On action: 2
.uleb128 .Ltmp10-.Lfunc_begin2 # >> Call Site 2 <<
.uleb128 .Lfunc_end9-.Ltmp10 # Call between .Ltmp10 and .Lfunc_end9
.byte 0 # has no landing pad
.byte 0 # On action: cleanup
.p2align 2
```
The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. This change adds a nop instruction at start of such sections so that such a case could be avoided. Output:
```
.section .text.split.foo10,"ax",@progbits
foo10.cold: # %lpad
.cfi_startproc
.cfi_personality 3, __gxx_personality_v0
.cfi_lsda 3, .Lexception5
.cfi_def_cfa %rsp, 16
nop <--- new instruction that is added
.Ltmp11:
movq %rax, %rdi
callq _Unwind_Resume@PLT
```
Reviewed By: modimo, snehasish, rahmanl
Differential Revision: https://reviews.llvm.org/D130133
We were looking for loads or any_extend+load. reduceLoadWidth
hasn't known how to look through such an any_extend to find the
load since D40667 almost 5 years ago.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D130333
Move this out of the switch, so that different branches can
indicate an error by breaking out of the switch. This becomes
important if there are more than the two current error cases.
Vector fptosi_sat and fptoui_sat were being expanded by unrolling the
vector operation. This doesn't work for scalable vector, so this patch
adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable.
Scalable tests are added for AArch64 and RISCV. Some of the AArch64
fptoi_sat operations should be legal, but that will be handled in
another patch.
Differential Revision: https://reviews.llvm.org/D130028
Similar to what we already do in getNode for basic ADD/SUB nodes, return the X operand directly, but here we know that there will be no/zero overflow as well.
As noted on D127115 - this path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen without a topological worklist.
PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target.
This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur.
Differential Revision: https://reviews.llvm.org/D129641
Add promotion and expansion of integer operands for
experimental_vp_strided SelectionDAG nodes; the expansion is actually
just a truncation of the stride operand.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D123112
When determining if an `and` should be merged into an extending load
the constant argument to the `and` is currently not checked if the
argument requires truncation. This prevents the combine happening when
the vector width is half the normal available vector width for SVE VLA
vectors.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D129281
Unlike the name suggests this can reuse any store as a base for a
memory-based vector extract. If that store is underaligned the loads
created to extract will have an invalid alignment. Since most CPUs are
forgiving wrt alignment this is almost never an issue, on x86 this is
only reproducible by extracting a 128 bit vector out of a wider vector.
I tried making a test case in the context of
https://reviews.llvm.org/D127982 but it's really really fragile, as the
output pretty much looks like a missed optimization.
The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits.
This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115
Alive2: https://alive2.llvm.org/ce/z/fl7T7K
Differential Revision: https://reviews.llvm.org/D129933
This revision supports to scalarize a binary operation of two scalable splat vectors.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D122791
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
r175673 changed repairIntervalsInRange to find anchoring end points for
ranges automatically, but the calculation of Begin included the first
instruction found that already had an index. This patch changes it to
exclude that instruction:
1. For symmetry, so that the half open range [Begin,End) only includes
instructions that do not already have indexes.
2. As a possible performance improvement, since repairOldRegInRange
will scan fewer instructions.
3. Because repairOldRegInRange hits assertion failures in some cases
when it sees a def that already has a live interval.
(3) fixes about ten tests in the CodeGen lit test suite when
-early-live-intervals is forced on.
Differential Revision: https://reviews.llvm.org/D110182
The DAG Combiner unnecessarily restricts commutative CSE
to nodes with a single result value. This commit removes
that restriction.
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D129666
Undef tokens may appear in unreached code as result of RAUW of some optimization,
and it should not be considered as bad IR.
Patch by Dmitry Bakunevich!
Differential Revision: https://reviews.llvm.org/D128904
Reviewed By: mkazantsev
Added function to the ExpandVectorPredication pass to handle VP loads
and stores.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D109584
trunc (sign_ext_inreg X, iM) to iN --> sign_ext_inreg (trunc X to iN), iM
There are improvements on existing tests from this, and there are a pair
of large regressions in D127115 for Thumb2 caused by not folding this
pattern.
Differential Revision: https://reviews.llvm.org/D129890
D127595 added the ability to recurse up a (one-use) INSERT_VECTOR_ELT chain to create a BUILD_VECTOR before other combines manage to break the chain, something that is particularly bad in D127115.
The patch generalises this so it doesn't have to build the chain starting from the last element insertion, instead it can now start from any insertion and will recurse up the chain until it finds all elements or finds a UNDEF/BUILD_VECTOR/SCALAR_TO_VECTOR which represents that start of the chain.
Fixes several regressions in D127115
In https://reviews.llvm.org/D30114, support for mismatching address
spaces was introduced to CodeGenPrepare's optimizeMemoryInst, using
addrspacecast as it was argued that only no-op addrspacecasts would be
considered when constructing the address mode. However, by doing
inttoptr/ptrtoint, it's possible to get CGP to emit an addrspace
that's not actually no-op, introducing a miscompilation:
define void @kernel(i8* %julia_ptr) {
%intptr = ptrtoint i8* %julia_ptr to i64
%ptr = inttoptr i64 %intptr to i32 addrspace(3)*
br label %end
end:
store atomic i32 1, i32 addrspace(3)* %ptr unordered, align 4
ret void
}
Gets compiled to:
define void @kernel(i8* %julia_ptr) {
end:
%0 = addrspacecast i8* %julia_ptr to i32 addrspace(3)*
store atomic i32 1, i32 addrspace(3)* %0 unordered, align 4
ret void
}
In the case of NVPTX, this introduces a cvta.to.shared, whereas
leaving out the %end block and branch doesn't trigger this
optimization. This results in illegal memory accesses as seen in
https://github.com/JuliaGPU/CUDA.jl/issues/558
In this change, I introduced a check before doing the pointer cast
that verifies address spaces are the same. If not, it emits a
ptrtoint/inttoptr combination to get a no-op cast between address
spaces. I decided against disallowing ptrtoint/inttoptr with
non-default AS in matchOperationAddr, because now its still possible
to look through multiple sequences of them that ultimately do not
result in a address space mismatch (i.e. the second lit test).
As mentioned on D127115, this patch that attempts to recognise shuffle masks that could be simplified to a AND mask - we already have a similar transform that will fold AND -> 'clear mask' shuffle, but this patch handles cases where the referenced elements are not from the same lane indices but are known to be zero.
Differential Revision: https://reviews.llvm.org/D129150
combineShiftAnd1ToBitTest already matches "and (not (srl X, C)), 1 --> (and X, 1<<C) == 0" patterns, but we can end up with situations where the not is before the shift.
Part of some yak shaving for D127115 to generalise the "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold.
SimplifyDemandedBits is called slightly later which allows the not(sext(x)) -> sext(not(x)) fold to occur via foldLogicOfShifts
As mentioned on D127115, we should be able to further generalise this based off the demanded bits.
This is similar to D125680, but for llvm.experimental.patchpoint
(instead of llvm.experimental.stackmap).
Differential review: https://reviews.llvm.org/D129268
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:
; Before:
%res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
to label %asm.fallthrough [label %foo]
; After:
%res = callbr i8* asm "", "=r,r,!i"(i8* %x)
to label %asm.fallthrough [label %foo]
The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:
* Allow unrolling/peeling/rotation of callbr, or any other
clone-based optimizations
(https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
(https://github.com/llvm/llvm-project/issues/45248)
This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.
Differential Revision: https://reviews.llvm.org/D129288
If we have a variable shift amount and the demanded mask has leading
zeros, we can propagate those leading zeros to not demand those bits
from operand 0. This can allow zero_extend/sign_extend to become
any_extend. This pattern can occur due to C integer promotion rules.
This transform is already done by InstCombineSimplifyDemanded.cpp where
sign_extend can be turned into zero_extend for example.
Reviewed By: spatel, foad
Differential Revision: https://reviews.llvm.org/D121833
Widening a G_FCONSTANT by extending and then generating G_FPTRUNC doesn't produce
the same result all the time. Instead, we can just transform it to a G_CONSTANT
of the same bit pattern and truncate using a plain G_TRUNC instead.
Fixes https://github.com/llvm/llvm-project/issues/56454
Differential Revision: https://reviews.llvm.org/D129743
If an MI will not generate a target instruction, we should not compute its
latency. Then we can compute more precise instruction sequence cost, and get
better result.
Differential Revision: https://reviews.llvm.org/D129615
isSafeToExpand() for addrecs depends on whether the SCEVExpander
will be used in CanonicalMode. At least one caller currently gets
this wrong, resulting in PR50506.
Fix this by a) making the CanonicalMode argument on the freestanding
functions required and b) adding member functions on SCEVExpander
that automatically take the SCEVExpander mode into account. We can
use the latter variant nearly everywhere, and thus make sure that
there is no chance of CanonicalMode mismatch.
Fixes https://github.com/llvm/llvm-project/issues/50506.
Differential Revision: https://reviews.llvm.org/D129630
The SI machine scheduler inherits from ScheduleDAGMI.
This patch adds support for a few features that are implemented
in ScheduleDAGMI (or its base classes) that were missing so far
because their support is implemented in overridden functions.
* Support cl::opt -view-misched-dags
This option allows to open a graphical window of the scheduling DAG.
* Support cl::opt -misched-print-dags
This option allows to print the scheduling DAG in text form.
* After constructing the scheduling DAG, call postprocessDAG()
to apply any registered DAG mutations.
Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp
in case SIScheduler is used.
Still add this to avoid surprises in the future in case mutations are added.
Differential Revision: https://reviews.llvm.org/D128808
We shouldn't use getOpcodeDef() if we need to guarantee the def has only one
user since under the hood it may look through copies and optimization hints,
which themselves may have multiple users.
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.
vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)
We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)
It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.
Differential Revision: https://reviews.llvm.org/D129609
We have the same fold in InstCombine - though implemented via OrZero flag on isKnownToBePowerOfTwo. The reasoning here is that either a) the result of the lshr is a power-of-two, or b) we have a div-by-zero triggering UB which we can ignore.
Differential Revision: https://reviews.llvm.org/D129606
Some rework of getStackGuard() based on comments in
https://reviews.llvm.org/D129505.
- getStackGuard() now creates and returns the destination
register, simplifying calls
- the pointer type is passed to getStackGuard() to avoid
recomputation
- removed PtrMemTy in emitSPDescriptorParent(), because
this type is only used here when loading the value but
not when storing the value
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D129576
To convert CTLZ to popcount we do
x = x | (x >> 1);
x = x | (x >> 2);
...
x = x | (x >>16);
x = x | (x >>32); // for 64-bit input
return popcount(~x);
This smears the most significant set bit across all of the bits
below it then inverts the remaining 0s and does a population count.
To support non-power of 2 types, the last shift amount must be
more than half of the size of the type. For i15, the last shift
was previously a shift by 4, with this patch we add another shift
of 8.
Fixes PR56457.
Differential Revision: https://reviews.llvm.org/D129431
When lowering llvm::stackprotect intrinsic, the SDAG implementation
checks useLoadStackGuardNode() to either create a LOAD_STACK_GUARD or use
the first argument of the intrinsic. This check is not present in the
IRTranslator, which results in always generating a LOAD_STACK_GUARD even
if the target does not support it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D129505
Noticed while investigating the SystemZ regressions in D77804, prefer handling the knownbits analysis/simplification in the bitop nodes directly before falling back to SimplifyMultipleUseDemandedBits
Verify the LiveStacks analysis after a pass that claims to preserve it,
even if there are no further passes (apart from the verifier itself)
that would use the analysis.
Differential Revision: https://reviews.llvm.org/D129200
visitInlineAsm() in SDAGBuilder was duplicating a lot of the code
in ParseConstraints(), in particular all the logic to determine the
operand value and constraint VT.
Rely on the data computed by ParseConstraints() instead, and update
its ConstraintVT implementation to match getCallOperandValEVT()
more precisely.
As far as I can tell what was happening in the original code is
that the getNode call receives the same operands as the original
node with different SDNodeFlags. The logic inside getNode detects
that the node already exists and intersects the flags into the
existing node and returns it. This results in Op and NewOp for the
TLO.CombineTo call always being the same node.
We may have already called CombineTo as part of the recursive handling.
A second call to CombineTo as we unwind the recursion overwrites
the previous CombineTo. I think this means any time we updated the
poison flags that was the only change that ends up getting made
and we relied on DAGCombiner to revisit and call SimplifyDemandedBits
again. The second time the poison flags wouldn't need to be dropped
and we would keep the CombineTo call from further down the recursion.
We can instead call setFlags to drop the poison flags and remove the
call to TLO.CombineTo. This way we keep the CombineTo from deeper in
the recursion which should be more efficient.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D129511
As suggested in the post-commit feedback for D128123,
we can ease the mask constraint to ignore the MSB
(and make the code easier to read by adjusting the check).
https://alive2.llvm.org/ce/z/bbvqWv
Pointed out in Issue #56432: the current reference models may not be
quite friendly to open source projects. Their purpose is only
illustrative - the expectation is that projects would train their own.
To avoid unintentionally pulling such a model, made the URL cmake
setting require explicit user setting.
Differential Revision: https://reviews.llvm.org/D129342
Currently, an error exists when InstrRefBasedLDV observes transfers of
variables across copies, which causes it to lose track of variables
under certain circumstances, resulting in shorter lifetimes for those
variables as LDV gives up searching for live locations for them. This
patch fixes this issue by storing the currently tracked values in
the destination first, then updating them manually later without
clobbering or assigning them the wrong value.
Differential Revision: https://reviews.llvm.org/D128101
This patch restores calls to has_value to make it clear that we are
checking the presence of an optional value, not the underlying value.
This patch partially reverts d08f34b592.
Differential Revision: https://reviews.llvm.org/D129454
D89489 added some logic to the interleaved access pass to attempt to
undo the folding of shuffles into binops, that instcombine performs. If
early-cse is run too, the binops may be commoned into a single operation
with multiple shuffle uses. It is still profitable reverse the transform
though, so long as all the uses are shuffles.
Differential Revision: https://reviews.llvm.org/D129419
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
We already handled this case for add with a constant RHS. A
similar pattern can occur for sub with a constant left hand side.
Test cases use add and a mul representing (neg (shl X, C)) because
that's what I saw in the wild. The mul will be decomposed and then
the new transform can kick in.
Tests have not been committed, but this patch shows the changes.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D128769
Originally encountered with RUST, but also there are cases with distributed LTO
where debug info dwo units contain structurally the same debug information, with
difference in DW_AT_linkage_name. This causes collision on DWO ID.
Differential Revision: https://reviews.llvm.org/D129317
SelectionDAG has a target hook, getExtendForAtomicOps, which it uses
in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty
ugly (as is having a separate load opcode for atomics), so instead
allow making use of atomic zextload. Enable this for AArch64 since the
DAG path defaults in to the zext behavior.
The tablegen changes are pretty ugly, but partially helps migrate
SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with
atomic memory operands. For now the DAG emitter will emit matchers for
patterns which the DAG will not produce.
I'm still a bit confused by the intent of the isLoad/isStore/isAtomic
bits. The DAG implementation rejects trying to use any of these in
combination. For now I've opted to make the isLoad checks also check
isAtomic, although I think having isLoad and isAtomic set on these
makes most sense.
If all the demanded bits of the AND mask covering the inserted subvector 'X' are known to be one, then the mask isn't affecting the subvector at all.
In which case, if the base vector 'C' is undef/constant, then move the AND mask up to just (constant) fold it directly.
Addresses some of the regressions from D129150, particularly the cases where we're attempting to zero the upper elements of a widened vector.
Differential Revision: https://reviews.llvm.org/D129290
After D82916 `updateAllRanges()` started to fix holes in main range with
subranges but it fails on instructions with two subregs def which are parts of
one reg. The main range constructed with //all// subranges of subregs just after
processing the first operand. So the main range gets intervals from subranges
those are not updated yet.
The patch takes into account lane mask to update the main range.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D128553
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.
This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.
This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_
Differential Revision: https://reviews.llvm.org/D128123
This patchs adds a new metadata kind `exclude` which implies that the
global variable should be given the necessary flags during code
generation to not be included in the final executable. This is done
using the ``SHF_EXCLUDE`` flag on ELF for example. This should make it
easier to specify this flag on a variable without needing to explicitly
check the section name in the target backend.
Depends on D129053 D129052
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D129151
Currently we use the `.llvm.offloading` section to store device-side
objects inside the host, creating a fat binary. The contents of these
sections is currently determined by the name of the section while it
should ideally be determined by its type. This patch adds the new
`SHT_LLVM_OFFLOADING` section type to the ELF section types. Which
should make it easier to identify this specific data format.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D129052
This is done during type legalization since the target representation of
these nodes may not be valid until after type legalization, and after
type legalization the fact that these are dealing with i1 types may be
lost.
Differential Revision: https://reviews.llvm.org/D128996
Truncates and compares require some changes to generic legalisation functions
to use ElementCount instead of getNumElements.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D129082
Prior to this change, live variable operands passed to
`llvm.experimental.stackmap` would be emitted directly to target nodes,
meaning that they don't get legalised. The upshot of this is that LLVM
may crash when encountering illegally typed target nodes.
e.g. https://github.com/llvm/llvm-project/issues/21657
This change introduces a platform independent stackmap DAG node whose
operands are legalised as per usual, thus avoiding aforementioned
crashes.
Note that some kinds of argument are still not handled properly, namely
vectors, structs, and large integers, like i128s. These will need to be
addressed in follow-up changes.
Note also that this does not change the behaviour of
`llvm.experimental.patchpoint`. A follow up change will do the same for
this intrinsic.
Differential review:
https://reviews.llvm.org/D125680
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D127041
As constant expressions can no longer trap, it only makes sense to
call isSafeToSpeculativelyExecute on Instructions, so limit the
API to accept only them, rather than general Operators or Values.
As integer div/rem constant expressions are no longer supported,
constants can no longer trap and are always safe to speculate.
Remove the Constant::canTrap() method and its usages.
This removes the insertvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
This is very similar to the extractvalue removal from D125795.
insertvalue is also not supported in bitcode, so no auto-ugprade
is necessary.
ConstantExpr::getInsertValue() can be replaced with
IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(),
depending on whether a constant result is required (with the latter
being fallible).
The ConstantExpr::hasIndices() and ConstantExpr::getIndices()
methods also go away here, because there are no longer any constant
expressions with indices.
Differential Revision: https://reviews.llvm.org/D128719
variable with its multiple aliases.
This patch handles the case where a variable has
multiple aliases.
AIX's assembly directive .set is not usable for the
aliasing purpose, and using different labels allows
AIX to emulate symbol aliases. If a value is emitted
between any two labels, meaning they are not aligned,
XCOFF will automatically calculate the offset for them.
This patch implements:
1) Emits the label of the alias just before emitting
the value of the sub-element that the alias referred to.
2) A set of aliases that refers to the same offset
should be aligned.
3) We didn't emit aliasing labels for common and
zero-initialized local symbols in
PPCAIXAsmPrinter::emitGlobalVariableHelper, but
emitted linkage for them in
AsmPrinter::emitGlobalAlias, which caused a FAILURE.
This patch fixes the bug by blocking emitting linkage
for the alias without a label.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D124654
Before merging two instructions together, GISel does some sanity checks
that the folding is legal. However that check was missing that the
source of the pattern may be convergent. When the destination location
is in a different basic block, the folding is invalid.
Differential Revision: https://reviews.llvm.org/D128539
One motivation to add support for these types are the LD1Q/ST1Q
instructions in SME, for which we have defined a number of load/store
intrinsics which at the moment still take a `<vscale x 16 x i1>` predicate
regardless of their element type.
This patch adds basic support for the nxv1i1 type such that it can be passed/returned
from functions, as well as some basic support to support some existing tests that
result in a nxv1i1 type. It also adds support for splats.
Other operations (e.g. insert/extract subvector, logical ops, etc) will be
supported in follow-up patches.
Reviewed By: paulwalker-arm, efriedma
Differential Revision: https://reviews.llvm.org/D128665
In X86 we split greddy register allocation into 2 passes. The 1st pass
is to allocate tile register, and the 2nd pass is to allocate the rest
of virtual register. In most cases there is no tile register, so the 1st
pass is unnecessary. To improve the compiling time, we check if there is
any register need to be allocated by invoking callback
`ShouldAllocateClass`. If there is no register to be allocated, just
return false in the pass. This would improve the 1st greed RA pass for
normal cases.
Differential Revision: https://reviews.llvm.org/D128804
The filter clause in the landingpad may not have a GlobalVariable operand.
It may instead have a ConstantArray of operands and each operand within this
ConstantArray should also be checked to see if it is a GlobalVariable.
This patch add the check for the ConstantArray as well as a debug message that
outputs the contents of MustKeepGlobalVariables.
Reviewed By: lei, amyk, scui
Differential Revision: https://reviews.llvm.org/D128287
lowerConstant() currently accepts a number of constant expressions
which have corresponding MC expressions, but which cannot be
evaluated as a relocatable expression (unless the operands are
constant, in which case we'll just fold the expression to a constant).
The motivation here is to clarify which constant expressions are
really needed for https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179,
and in particular clarify that we do not need to support any
division expressions, which are particularly problematic.
Differential Revision: https://reviews.llvm.org/D127972
When we fill the shape to tile configure memory, the shape is gotten
from AMX pseudo instruction. However the register for the shape may be
split or spilled by greedy RA. That cause we fill the shape to config
memory after ldtilecfg is executed, so that the shape configuration
would be wrong.
This patch is to split the tile register allocation from greedy register
allocation, so that after tile registers are allocated the shape
registers are still virtual register. The shape register only may be
redefined or multi-defined by phi elimination pass, two address pass.
That doesn't affect tile register configuration.
Differential Revision: https://reviews.llvm.org/D128584
Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance
to evaluate which instruction sequence has lower latency.
Differential Revision: https://reviews.llvm.org/D124564
This is a resurrection of D106421 with the change that it keeps backward-compatibility. This means decoding the previous version of `LLVM_BB_ADDR_MAP` will work. This is required as the profile mapping tool is not released with LLVM (AutoFDO). As suggested by @jhenderson we rename the original section type value to `SHT_LLVM_BB_ADDR_MAP_V0` and assign a new value to the `SHT_LLVM_BB_ADDR_MAP` section type. The new encoding adds a version byte to each function entry to specify the encoding version for that function. This patch also adds a feature byte to be used with more flexibility in the future. An use-case example for the feature field is encoding multi-section functions more concisely using a different format.
Conceptually, the new encoding emits basic block offsets and sizes as label differences between each two consecutive basic block begin and end label. When decoding, offsets must be aggregated along with basic block sizes to calculate the final offsets of basic blocks relative to the function address.
This encoding uses smaller values compared to the existing one (offsets relative to function symbol).
Smaller values tend to occupy fewer bytes in ULEB128 encoding. As a result, we get about 17% total reduction in the size of the bb-address-map section (from about 11MB to 9MB for the clang PGO binary).
The extra two bytes (version and feature fields) incur a small 3% size overhead to the `LLVM_BB_ADDR_MAP` section size.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D121346
`commonAlignment` is a shortcut to pick the smallest of two `Align`
objects. As-is it doesn't bring much value compared to `std::min`.
Differential Revision: https://reviews.llvm.org/D128345
Information in the function `Prologue Data` is intentionally opaque.
When a function with `Prologue Data` is duplicated. The self (global
value) references inside `Prologue Data` is still pointing to the
original function. This may cause errors like `fatal error: error in backend: Cannot represent a difference across sections`.
This patch detaches the information from function `Prologue Data`
and attaches it to a function metadata node.
This and D116130 fix https://github.com/llvm/llvm-project/issues/49689.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D115844
This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table, by rounding their size up to 1. Entries with zero
length violate the DWARF 5 spec, which states:
> Each descriptor is a triple consisting of a segment selector, the beginning
> address within that segment of a range of text or data covered by some entry
> owned by the corresponding compilation unit, followed by the non-zero length
> of that range.
In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.
Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact,
the AsmPrinter does try to avoid emitting such zero-sized symbols when labels
aren't involved, but doesn't when the symbol to emitted is a difference of two
labels; this patch extends that logic to handle the case in which the symbol is
defined via labels.
Furthermore, this patch fixes a bug in which `available_externally` symbols
would cause unpredictable values to be emitted into the `.debug_aranges` table
under certain circumstances. In practice I don't believe that this caused
issues up until now, but the root cause of this bug--an invalid DenseMap
lookup--triggered failures in Chromium when combined with an earlier version of
this patch. Therefore, this patch fixes that bug too.
This is a revised version of diff D126257, which was reverted due to breaking
tests. The now-reverted version of this patch didn't distinguish between
symbols that didn't have their size reported to the DwarfDebug handler and
those that had their size reported to be zero. This new version of the patch
instead restricts the special handling only to the symbols whose size is
definitively known to be zero.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D126835
These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.
Differential Revision: https://reviews.llvm.org/D127976
Patch was reverted in 4c5f10a due to buildbot failures, now being
reapplied with updated AArch64 and RISCV tests.
This patch adds handling for the llvm.powi.* intrinsics in
BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes#53887.
Differential Revision: https://reviews.llvm.org/D128172
If an instruction is sunk into a loop then any kill flags on
operands declared outside the loop must be cleared as these
will be live for all loop iterations.
Fixes#46827
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D126754
According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.
Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.
For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D128286
waitcnt vmcnt instructions are currently generated in loop bodies before using
values loaded outside of the loop. In some cases, it is better to flush the
vmcnt counter in a loop preheader before entering the loop body. This patch
detects these cases and generates waitcnt instructions to flush the counter.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D115747
This reverts commit 719658d078.
Breaks a few things, see comments on https://reviews.llvm.org/D128437
There's disagreement about the best fix.
So let's keep HEAD green while discussions are happening.
The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA derived from
base RA, they all allow allocating specific register class. Fast RA
doesn't support allocating register for specific register class. This
patch is to enable ShouldAllocateClass in fast RA, so that it can
support allocating register for specific register class.
Differential Revision: https://reviews.llvm.org/D126771
We're slowly removing SelectionDAG::GetDemandedBits and replacing it with SimplifyMultipleUseDemandedBits, we no longer have any uses for the vector demanded elt variant.
This patch adds handling for the llvm.powi.* intrinsics in
BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes#53887.
Differential Revision: https://reviews.llvm.org/D128172
For below case, virtual register is defined twice in the self loop. We
don't need to spill %0 after the third instruction `%0 = def (tied %0)`,
because it is defined in the second instruction `%0 = def`.
1 bb.1
2 %0 = def
3 %0 = def (tied %0)
4 ...
5 jmp bb.1
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D125079
An AArch64ISD::DUP is just a splat, where the known bits for each lane
are the same as the input. This teaches that to computeKnownBitsForTargetNode.
Problems arise for constants though, as a constant BUILD_VECTOR can be
lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then
turn back into a constant BUILD_VECTOR leading to an infinite cycle.
This has been prevented by adding a isTargetCanonicalConstantNode node
to prevent the conversion back into a BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D128144
Similar to the existing (shl (srl x, c1), c2) fold
Part of the work to fix the regressions in D77804
Differential Revision: https://reviews.llvm.org/D125836
The VT we want to shrink to may not be legal especially after type
legalization.
Fixes PR56110.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D128135
The included test hits a verifier problems as one of the instructions:
```
%113:tgpreven, %114:tgprodd = MVE_VMLSLDAVas16 %12:tgpreven(tied-def 0), %11:tgprodd(tied-def 1), %7:mqpr, %8:mqpr, 0, $noreg, $noreg
```
Has two inputs that come from different PHIs with the same base reg, but
conflicting regclasses:
```
%11:tgprodd = PHI %103:gpr, %bb.1, %16:gpr, %bb.2
%12:tgpreven = PHI %103:gpr, %bb.1, %17:gpr, %bb.2
```
The MachinePipeliner would attempt to use %103 for both the %11 and %12
operands in the prolog, constraining the register class to the common
subset of both. Unfortunately there are no registers that are both odd
and even, so the second constrainRegClass fails. Fix this situation by
inserting a COPY for the second if the call to constrainRegClass fails.
The register allocation can then fold that extra copy away. The register
allocation of Q regs changed with this test, but the R regs were the
same and no new instructions are needed in the final assembly.
Differential Revision: https://reviews.llvm.org/D127971
D125335 makes regsOverlap skip following control flow, which is not entended
in the original code.
Differential Revision: https://reviews.llvm.org/D128039
WidenVecOp_INSERT_SUBVECTOR only supported cases where widening
effectively converts the insert into a copy. However, when the
widened subvector is no bigger than the vector being inserted into
and we can be sure there's no loss of data, we can simply emit
another INSERT_SUBVECTOR.
Fixes: #54982
Differential Revision: https://reviews.llvm.org/D127508
MinRCSize is 4 and prevents constrainRegClass from changing the
register class if the new class has size less than 4.
IMPLICIT_DEF gets a unique vreg for each use and will be removed
by the ProcessImplicitDef pass before register allocation. I don't
think there is any reason to prevent constraining the virtual register
to whatever register class the use needs.
The attached test case was previously creating a copy of IMPLICIT_DEF
because vrm8nov0 has 3 registers in it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D128005
This helps handling a case where the BUILD_VECTOR has i16 element type
and i32 constant operands
t2: v8i16 = setcc t8, t17, setult:ch
t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ...
t4: v8i16 = and t2, t3
t5: v8i16 = add t8, t4
This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove
t3 and t4 from the DAG.
Differential Revision: https://reviews.llvm.org/D127354
This reverts commit 7207373e1e.
We found another RISC-V bug when landing D126048, and it has been fixed
by D127642 now.
Differential Revision: https://reviews.llvm.org/D126048
This is needed by our downstream and makes bf16 and f16 have the
same set of scalable vector types.
Reviewed By: rui.zhang
Differential Revision: https://reviews.llvm.org/D127877
The use operand may be undefined. In that case we can just continue to
check the next operand since it won't increase register pressure.
Differential Revision: https://reviews.llvm.org/D127848
This is modeled after the half-precision fp support. Two new nodes are
introduced for casting from and to bf16. Since casting from bf16 is a
simple operation I opted to always directly lower it to integer
arithmetic. The other way round is more complicated if you want to
preserve IEEE semantics, so it's handled by a new __truncsfbf2
compiler-rt builtin.
This is of course very bare bones, but sufficient to get a semi-softened
fadd on x86.
Possible future improvements:
- Targets with bf16 conversion instructions can now make fp_to_bf16 legal
- The software conversion to bf16 can be replaced by a trivial
implementation under fast math.
Differential Revision: https://reviews.llvm.org/D126953
When compiling for the RWPI relocation model [1], the debug information
is wrong for readonly global variables.
Writable global variables are accessed by the static base register (R9
on ARM) in the RWPI relocation model. This is being correctly generated
Readonly global variables are not accessed by the static base register
in the RWPI relocation model. This case is incorrectly generating the
same debugging information as for writable global variables.
References:
[1] ARM Read-Write Position Independence: https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#read-write-position-independence-rwpi
Differential Revision: https://reviews.llvm.org/D126361
GetValueInMiddleOfBlock uses result of GetValueAtEndOfBlockInternal if there is no value
defined for current basic block.
If there is already a value it tries (in this order):
to find single register coming from all predecessors
find existing phi node which matches our incoming registers
build new phi.
The compile time improvement is to use current available value if
it is defined out of current BB or it is a PHI register.
This is due to it can be used in the middle basic block.
Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126523
When spilling CSRs, FixupStatepoint pass does simple copy propagation,
trying to find COPY instruction which defines register being spilled
and spill COPY source instead. I.e., if we have CSR $x and found
$x = COPY $y
we will spill $y instead.
But we may be unable to delete COPY instruction for some reason.
Then, spill will be inserted after it, adding another use of $y.
If COPY instruction was last use of $y (killed it), after insertion of
the spill it is not, so `isKill` flag must be cleared. We failed to do
so and this patch fixes this issue.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D127308
This is a fix for https://github.com/llvm/llvm-project/issues/55827.
When register we are trying to re-color is split the original register (we tried to recover)
has no uses after the split. However in rollback actions we assign back physical register to it.
Later it causes different assertions. One of them is in attached test.
This CL fixes this by avoiding assigning physical register back to register which has no usage
or its live interval now is empty.
Reviewed By: arsenm, qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D127281
The last of getEvictor use was removed on Jun 5, 2022 in commit
5c06f7168f, which was itself a patch to
remove unused code.
Once we remove getEvictor, EvictionTrack becomes a write-only data
structure. The data in it won't affect compilation, so the entire
class is essentially dead.
Another issue unearthed by D127115
We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with).
D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded.
This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late.
Differential Revision: https://reviews.llvm.org/D127595
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.
This helps with several of the regressions from D125836
Previously, omitting unnecessary DWARF unwinds was only done in two
cases:
* For Darwin + aarch64, if no DWARF unwind info is needed for all the
functions in a TU, then the `__eh_frame` section would be omitted
entirely. If any one function needed DWARF unwind, then MC would emit
DWARF unwind entries for all the functions in the TU.
* For watchOS, MC would omit DWARF unwind on a per-function basis, as
long as compact unwind was available for that function.
This diff makes it so that we omit DWARF unwind on a per-function basis
for Darwin + aarch64 as well. In addition, we introduce the flag
`--emit-dwarf-unwind=` which can toggle between `always`,
`no-compact-unwind` (only emit DWARF when CU cannot be emitted for a
given function), and the target platform `default`. `no-compact-unwind`
is particularly useful for newer x86_64 platforms: we don't want to omit
DWARF unwind for x86_64 in general due to possible backwards compat
issues, but we should make it possible for people to opt into this
behavior if they are only targeting newer platforms.
**Motivation:** I'm working on adding support for `__eh_frame` to LLD,
but I'm concerned that we would suffer a perf hit. Processing compact
unwind is already expensive, and that's a simpler format than EH frames.
Given that MC currently produces one EH frame entry for every compact
unwind entry, I don't think processing them will be cheap. I tried to do
something clever on LLD's end to drop the unnecessary EH frames at parse
time, but this made the code significantly more complex. So I'm looking
at fixing this at the MC level instead.
**Addendum:** It turns out that there was a latent bug in the X86
backend when `OmitDwarfIfHaveCompactUnwind` is naively enabled, which is
not too surprising given that this combination has not been heretofore
used.
For functions that have unwind info that cannot be encoded with CU, MC
would end up dropping both the compact unwind entry (OK; existing
behavior) as well as the DWARF entries (not OK). This diff fixes things
so that we emit the DWARF entry, as well as a CU entry with encoding
`UNWIND_X86_MODE_DWARF` -- this basically tells the unwinder to look for
the DWARF entry. I'm not 100% sure the `UNWIND_X86_MODE_DWARF` CU entry
is necessary, this was the simplest fix. ld64 seems to be able to handle
both the absence and presence of this CU entry. Ultimately ld64 (and
LLD) will synthesize `UNWIND_X86_MODE_DWARF` if it is absent, so there
is no impact to the final binary size.
Reviewed By: davide, lhames
Differential Revision: https://reviews.llvm.org/D122258
1. When checking if a candidate contains a CFI instruction, actually
iterate over all of the instructions, instead of stopping halfway
through.
2. Make sure copied CFI directives refer to the correct instruction.
Fixes https://github.com/llvm/llvm-project/issues/55842
Differential Revision: https://reviews.llvm.org/D126930
In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`.
The idea is to get support from the compiler to easily write efficient memory function implementations.
This patch could be split in two:
- one for the LLVM part adding the `llvm.memset.inline.*` intrinsics.
- and another one for the Clang part providing the instrinsic as a builtin.
Differential Revision: https://reviews.llvm.org/D126903
D125887 changed the ctlz/cttz despeculation transform to insert
a freeze for the introduced branch on zero. While this does fix
the "branch on poison" issue, we may still get in trouble if we
pick a different value for the branch and for the ctz argument
(i.e. non-zero for the branch, but zero for the ctz). To avoid
this, we should use the same frozen value in both positions.
This does cause a regression in RISCV codegen by introducing an
additional sext. The DAG looks like this:
t0: ch = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %3
t4: i64 = AssertSext t2, ValueType:ch:i32
t23: i64 = freeze t4
t9: ch = CopyToReg t0, Register:i64 %0, t23
t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32>
t18: ch = TokenFactor t9, t16
t25: i64 = sign_extend_inreg t23, ValueType:ch:i32
t24: i64 = setcc t25, Constant:i64<0>, seteq:ch
t28: i64 = and t24, Constant:i64<1>
t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68>
t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80>
I don't see a really obvious way to improve this, as we can't push
the freeze past the AssertSext (which may produce poison).
Differential Revision: https://reviews.llvm.org/D126638
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.
This matches what we do in IR. For the RISC-V test case, this allows
us to use -8 for the AND mask instead of materializing a constant in a register.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D127335
During lowering of memcmp/bcmp, the check for a size of 0 is done
in 2 different ways. In rare cases this can lead to a crash in
SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause
is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a
constant int value which is not yet evaluated. When the value is
turned into a SDValue, then the evaluation is done and results in
a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special
case of 0 length to be handled, which results in an assertion.
The fix is to turn the value into a SDValue, so that both functions
use the same check.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D126900
Summary:
We use the special section name `.llvm.offloading` to store device
imagees in the host object file. We want these to be stripped by the
linker as they are not used after linking so we use the `SHF_EXCLUDE`
flag to instruct the linker to drop them. We used to do this for all
sections that started with `.llvm.offloading` when we encoded metadata
in the section name itself. Now we embed a special binary containing the
metadata, we should only add the flag on this name specifically.
Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work
with TypeSize directly rather than silently casting to unsigned.
To accomplish this I've extended TypeSize with an interface that
essentially allows TypeSize division when both operands have the
same number of dimensions.
There still exists combinations of scalable vector bitcasts that
cause compiler crashes. I call these out by adding "is missing"
entries to sve-bitcast.
Depends on D126957.
Fixes: #55114
Differential Revision: https://reviews.llvm.org/D127126
Bitcasting between unpacked scalable vector types of different
element counts is not a NOP because the live elements are laid out
differently.
01234567
e.g. nxv2i32 = XX??XX??
nxv4f16 = X?X?X?X?
Differential Revision: https://reviews.llvm.org/D126957
Spliter will try to extend a live range into `r` slot for a use operand,
that's works on most situaion, however that not work correctly when the operand
has tied to def, and the def operand is early clobber.
Give an example to demo what's wrong:
0 %0 = ...
16 early-clobber %0 = Op %0 (tied-def 0), ...
32 ... = Op %0
Before extend:
%0 = [0r, 0d) [16e, 32d)
The point we want to extend is 0d to 16e not 16r in this case, but if
we use 16r here we will extend nothing because that already contained
in [16e, 32d).
This patch add check for detect such case and adjust the extend point.
Detailed explanation for testcase: https://reviews.llvm.org/D126047
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D126048
As noticed on D127115 - we were missing this fold, instead just having the shuffle(shuffle(x,undef,splatmask),undef) fold. We should be able to merge these into one using SelectionDAG::isSplatValue, but we'll need to match the shuffle's undef handling first.
This also exposed an issue in SelectionDAG::isSplatValue which was incorrectly propagating the undef mask across a bitcast (it was trying to just bail with a APInt::isSubsetOf if it found any undefs but that was actually the wrong way around so didn't fire for partial undef cases).
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.
Differential Revision: https://reviews.llvm.org/D126910
These assert that there are no "useless" assertzext/assertsext nodes
(that assert a wider width than a following trunc), but I don't think
there is anything preventing such nodes from reaching this code.
I don't think the assertion is relevant for correctness of this
transform either -- if such an assert is present, then the other
one will always be to a smaller width, and we'll pick that one.
The assertion dates back to D37017.
Fixes https://github.com/llvm/llvm-project/issues/55846.
Differential Revision: https://reviews.llvm.org/D126952
Fixes a bug of us not correctly updating the terminator of the loop's
preheader, if multiple terminating branch instructions are present.
This is tested through existing tests. The bug itself is hard or not
possible to get exposed with the upstream Hexagon backend, because
the machine pipeliner checks for an existing preheader, which is
defined as a block with only 1 edge into the header.
The condition of this bug is a block into the loop with more than 1
edge, and not every downstream target checks for an existing preheader.
Differential Revision: https://reviews.llvm.org/D126386
Patch adds new GICombineRules for G_ADD:
G_ADD(x, G_SUB(y, x)) -> y
G_ADD(G_SUB(y, x), x) -> y
Patch additionally adds new combine tests for AArch64 target for
these new rules.
Reviewed by: paquette
Differential Revision: https://reviews.llvm.org/D87936
Move the code that was added for D126896 after the normal recursive calls
to computeKnownBits. This allows us to calculate trailing zeros.
Previously we would break out of the switch before the recursive calls.
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
When promoting a shift, make sure we only fetch the second operand
after promoting the first. Load promotion may replace users of the
old load, and we don't want to be left with a dangling reference to
the old load instruction.
The crashing test case is from https://reviews.llvm.org/D126689#3553212.
Differential Revision: https://reviews.llvm.org/D126886
If C is non-negative, the result of the smax must also be
non-negative, so all sign bits of the result are 0.
This allows DAGCombiner to remove a zext_inreg in the modified test.
This zext_inreg started as a sext that became zext before type
legalization then was promoted to a zext_inreg.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D126896
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.
Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.
Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.
Differential Revision: https://reviews.llvm.org/D125247
Even if CSR list is same between functions, we could have had a different
allocation order if ignoreCSRForAllocationOrder is evaluated differently.
Hence invalidate cached register class information if
ignoreCSRForAllocationOrder changes.
Patch by Srividya Karumuri <srividya_karumuri@apple.com>
Differential Revision: https://reviews.llvm.org/D126565
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.
Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.
Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.
Differential Revision: https://reviews.llvm.org/D125247
D124631 added special processing for STATEPOINT instructions.
It appears that assertion added there is too strong. We can get two
tied operands with the same register tied to different defs. If we
hit such case, do not process it in statepoint-specific code and
delegate it to common case.
Avoid the dependency on TargetInstrInfo, which depends on the subtarget
and therefore the individual function.
Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo.
In order to facilitate copying MachineFunctionInfo, we need to stop allocating these
there. Alternatively we could allow targets to subclass PseudoSourceValueManager,
and allocate them similarly to MachineFunctionInfo.
This includes .seh_* directives for generating it from assembly.
It is designed fairly similarly to the ARM64 handling.
For .seh_handler directives, such as
".seh_handler __C_specific_handler, @except" (which is supported
on x86_64 and aarch64 so far), the "@except" bit doesn't work in
ARM assembly, as '@' is used as a comment character (on all current
platforms).
Allow using '%' instead of '@' for this purpose. This convention
is used by GAS in similar contexts already,
e.g. [1]:
Note on targets where the @ character is the start of a comment
(eg ARM) then another character is used instead. For example the
ARM port uses the % character.
In practice, this unfortunately means that all such .seh_handler
directives will need ifdefs for ARM.
Contrary to ARM64, on ARM, it's quite common that we can't evaluate
e.g. the function length at this point, due to instructions whose
length is finalized later. (Also, inline jump tables end with
a ".p2align 1".)
If unable to to evaluate the function length immediately, emit
it as an MCExpr instead. If we'd implement splitting the unwind
info for a function (which isn't implemented for ARM64 yet either),
we wouldn't know whether we need to split it though.
Avoid calling getFrameIndexOffset() on an unset
FuncInfo.UnwindHelpFrameIdx, to avoid triggering asserts in the
preexisting testcase CodeGen/ARM/Windows/wineh-basic.ll. (Once
MSVC exception handling is fully implemented, those changes
can be reverted.)
[1] https://sourceware.org/binutils/docs/as/Section.html#Section
Differential Revision: https://reviews.llvm.org/D125645
This reverts commit 256a52d9aa (and
also the follow-up commit 38eb4fe74b that moved a test
case to a different directory).
As discussed in https://reviews.llvm.org/D126257 there is a suspicion
that something was wrong with this commit as text section range was
shortened to 1 byte rather than rounded up as shown in the
llvm/test/DebugInfo/X86/dwarf-aranges.ll test case.
STATEPOINT is a special pseudo instruction which represent Moving GC semantic to LLVM.
Every tied def/use VReg pair in STATEPOINT represent same physical register which can
'magically' change during call wrapped by statepoint.
(By construction, tied use operand is not live across STATEPOINT).
This means that when converting into two-address form, there is not need to insert COPY
instruction before stateppoint, what TwoAddressInstruction pass does for 'regular'
instructions.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D124631
VP intrinsics show UB if the %evl parameter is out of bounds - they must
not carry the speculatable attribute. The out-of-bounds UB disappears
when the %evl parameter is expanded into the mask or expansion replaces
the entire VP intrinsic with non-VP code.
This patch
- Removes the speculatable attribute on all VP intrinsics.
- Generalizes the isSafeToSpeculativelyExecute function to let VP
expansion know whether the VP intrinsic replacement will be
speculatable. VP expansion may only discard %evl where this is the
case.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D125296
This relands commit 4d8d2580c5.
The major change here is using 'addUsedIfAvailable<BasicBlockSectionsProfileReader>()` to make sure we don't change the pipeline tests.
Differential Revision: https://reviews.llvm.org/D126518
Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely).
This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`.
The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`.
Differential Revision: https://reviews.llvm.org/D122930
With a fix for an expensive checks build failure exposed by new RISC-V tests.
Something about expanding two rotates in type legalization caused a change
in the remapping tables that the expensive checks verifying wasn't expecting.
See comment in the code for how it was fixed.
Tests came from this commit that exposed the bug
[RISCV] Add test cases showing failure to remove mask on rotate amounts.
If the masking AND has multiple users we fail to remove it.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D126036
treated as Copy instruction in MCP.
This is then used in AArch64 to remove copy instructions after taildup
ran in machine block placement
Differential Revision: https://reviews.llvm.org/D125335
reapply 62a9b36fcf and fix module build
failue:
1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def
MachineCycleInfoWrapperPass is a anylysis pass, should not be there.
2: move the definition for MachineCycleInfoPrinterPass to cpp file.
Otherwise, there are module conflicit for MachineCycleInfoWrapperPass
in MachinePassRegistry.def and MachineCycleAnalysis.h after
62a9b36fcf.
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().
This patch tries to use MachineCycle so that we can handle
irreducible loop better.
Reviewed By: sameerds, MatzeB
Differential Revision: https://reviews.llvm.org/D123995
This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to
the .debug_aranges table, by rounding their size up to 1. Entries with zero
length violate the DWARF 5 spec, which states:
> Each descriptor is a triple consisting of a segment selector, the beginning
> address within that segment of a range of text or data covered by some entry
> owned by the corresponding compilation unit, followed by the non-zero length
> of that range.
In practice, these zero-sized entries produce annoying warnings in lld and
cause GNU binutils to truncate the table when parsing it.
Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module
(specifically the appendRange method), already avoid emitting zero-sized
symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact,
the AsmPrinter does try to avoid emitting such zero-sized symbols when labels
aren't involved, but doesn't when the symbol to emitted is a difference of two
labels; this patch extends that logic to handle the case in which the symbol is
defined via labels.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D126257
This adds support for pointer types for `atomic xchg` and let us write
instructions such as `atomicrmw xchg i64** %0, i64* %1 seq_cst`. This
is similar to the patch for allowing atomicrmw xchg on floating point
types: https://reviews.llvm.org/D52416.
Differential Revision: https://reviews.llvm.org/D124728
When expanding VP reductions to non VP-code, the reduction pass was
ignoring the mask before. Fix this by keeping the mask and selecting
neutral elements where the mask is zero.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D126362
Use container::size_type directly to avoid type mismatch causing build failures in Windows.
Original commit message:
This patch optimizes the transformation of selects to a branch when the heuristics deemed it profitable.
It aggressively sinks eligible instructions to the newly created true/false blocks to prevent their
execution on the common path and interleaves dependence slices to maximize ILP.
Depends on D120232
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D120233
The second argument to is_fp_class specifies the set of floating-point
class to test against. It can be zero, in this case the intrinsic is
expected to return zero value.
Differential Revision: https://reviews.llvm.org/D112025
Followup to D125988 - FPOW is similar to FREM and will most likely scalarize to libcalls, so unroll before widening to prevent use making additional libcalls with UNDEF args.
Any zext 'sink' should already have an operand that is in the legal
value, so avoid using a trunc and just use the trunc operand instead.
Differential Revision: https://reviews.llvm.org/D118905
Currently, two element vectors produced as the result of a binary op are
widened to four element vectors on x86 by
DAGTypeLegalizer::WidenVecRes_BinaryCanTrap. If the op still isn't legal
after widening it is unrolled into 4 scalar ops in SelectionDAG before
being converted into a libcall. This way we end up with 4 libcalls (two of
them on known undef elements) instead of the original two libcalls.
This patch modifies DAGTypeLegalizer::WidenVectorResult to ensure that if
it is known that a binary op will be tunred into a libcall, it is unrolled
instead of being widened. This prevents the creation of the extra scalar
instructions on known undef elements and (eventually) libacalls with known
undef parameters which would otherwise be created when the op gets expanded
post widening.
Differential Revision: https://reviews.llvm.org/D125988
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().
This patch tries to use MachineCycle so that we can handle
irreducible loop better.
Reviewed By: sameerds, MatzeB
Differential Revision: https://reviews.llvm.org/D123995
This patch optimizes the transformation of selects to a branch when the heuristics deemed it profitable.
It aggressively sinks eligible instructions to the newly created true/false blocks to prevent their
execution on the common path and interleaves dependence slices to maximize ILP.
Depends on D120232
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D120233
This patch adds the loop-level heuristics for determining whether branches are more profitable than conditional moves.
These heuristics apply to only inner-most loops.
Depends on D120231
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D120232
This patch adds the base heuristics for determining whether branches are more profitable than conditional moves.
Base heuristics apply to all code apart from inner-most loops.
Depends on D122259
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D120231
This patch implements the actual transformation of selects to branches.
It includes only the base transformation without any sinking.
Depends on D120230
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D122259
RAGreedy has two fields of RegisterClassInfo, one called RCI and another RegClassInfo from its base class.
RCI is initialized without freezeReservedRegs first, while RegClassInfo does. Therefore, if reserved registers
information is changed between last time freezeReservedRegs is called and RAGreedy, it's not picked up by RCI.
Instead of having both fields in RAGreedy, remove RCI and use RegClassInfo instead. Also removed is the TRI field
which is present in its base class.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D125926
If the VT is i2, then 2 is really -2.
Test has not been commited yet, but diff shows the change.
Fixes PR55644.
Differential Revision: https://reviews.llvm.org/D126213
Freeze the condition of the newly introduced conditional branch,
to avoid immediate undefined behavior if the input to ctlz/cttz
was originally poison.
Differential Revision: https://reviews.llvm.org/D125887
I had initially assumed this was the problem with
https://github.com/llvm/llvm-project/issues/55271#issuecomment-1133426243
But it turns out that was a simpler issue. This patch is still
more correct than what we were doing before so figured I'd submit
it anyway.
No test case because I'm not sure how to get an undef around
until expansion.
Looking at the test deltas I wonder if it be valid to combine
(sext_inreg (freeze (aextload X))) -> (freeze (sextload X)).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D126175
abs should only produce a positive value or the signed minimum
value. This means we can't fold abs(undef) to undef as that would
allow more values. Fold to 0 instead to match InstSimplify.
Fixes test mentioned in comment on pr55271.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D126174
Reviewing the code again, I believe the sext is needed on the LHS
or RHS for ICmp and only on the RHS for Add.
Add an opcode check before checking the operand number.
Fixes PR55627.
Differential Revision: https://reviews.llvm.org/D125654
Currently for atomic load, store, and rmw instructions, as long as the
operand is floating-point value, they are casted to integer. Nowadays many
targets can actually support part of atomic operations with floating-point
operands. For example, NVPTX supports atomic load and store of floating-point
values. This patch adds a series interface functions `shouldCastAtomicXXXInIR`,
and the default implementations are same as what we currently do. Later for
targets can have their specialization.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D125652
If the SafeWrap operation is a subtract, we negated the constant
to treat the subtract as an addition. The sext was based on the
operation being addition. So we really need to do (neg (sext (neg C)))
when promoting the constant. This is equivalent to (sext C) for
every value of C except the min signed value. For min signed value
we need to do (zext C) instead.
Fixes PR55490.
Differential Revision: https://reviews.llvm.org/D125653
This is the first commit for the cmov-vs-branch optimization pass.
The goal is to develop a new profile-guided and target-independent cost/benefit analysis
for selecting conditional moves over branches when optimizing for performance.
Initially, this new pass is expected to be enabled only for instrumentation-based PGO.
RFC: https://discourse.llvm.org/t/rfc-cmov-vs-branch-optimization/6040
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D120230
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.
The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.
Differential Revision: https://reviews.llvm.org/D125557
An upcoming patch will extend llvm-symbolizer to provide the source line
information for global variables. The goal is to move AddressSanitizer
off of internal debug info for symbolization onto the DWARF standard
(and doing a clean-up in the process). Currently, ASan reports the line
information for constant strings if a memory safety bug happens around
them. We want to keep this behaviour, so we need to emit debuginfo for
these variables as well.
Reviewed By: dblaikie, rnk, aprantl
Differential Revision: https://reviews.llvm.org/D123534
I noticed https://reviews.llvm.org/D87415 added SDAG combines to fold
FMIN/MAX instrs with NaNs.
The patch implements the same NaN combines for GISel GMIR FMIN/MAX opcodes:
G_FMINNUM(X, NaN) -> X
G_FMAXNUM(X, NaN) -> X
G_FMINIMUM(X, NaN) -> NaN
G_FMAXIMUM(X, NaN) -> NaN
The patch adds AArch64 tests for these combines as well.
Reviewed by: arsenm
Differential revision: https://reviews.llvm.org/D125819
This function tries to match (a >> 8) | (a << 8) as (bswap a) >> 16.
If the SRL isn't masked and the high bits aren't demanded, we still
need to ensure that bits 23:16 are zero. After the right shift they
will be in bits 15:8 which is where the important bits from the SHL
end up. It's only a bswap if the OR on bits 15:8 only takes the bits
from the SHL.
Fixes PR55484.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D125641
The patch does not pass math flags to float VPCmpIntrinsics because LLParser
could not identify float VPCmpIntrinsics as FPMathOperators.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125600
If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node.
AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback.
Part of the work to fix the regressions in D77804
Differential Revision: https://reviews.llvm.org/D125607
Add a new TargetRegisterInfo hook to allow targets to tweak the
priority of live ranges, so that AllocationPriority of the register
class will be treated as more important than whether the range is local
to a basic block or global. This is determined per-MachineFunction.
Differential Revision: https://reviews.llvm.org/D125102
This patch uses VP_REDUCE_AND and VP_REDUCE_OR to replace VP_REDUCE_SMAX,VP_REDUCE_SMIN,VP_REDUCE_UMAX and VP_REDUCE_UMIN for mask vector type.
Differential Revision: https://reviews.llvm.org/D125002
The documentation for this specifically mentions that this should not
happen. We could think about adding target hooks to permit it (and how
to merge IDs) in the future if that is desirable.
This specific test case was merging a scalable-vector slot into a
non-scalable one and dropping the notion of scalability, meaning we
failed to allocate enough stack space for the object.
Reviewed By: arsenm, MaskRay, sdesmalen
Differential Revision: https://reviews.llvm.org/D125699
An upcoming patch will extend llvm-symbolizer to provide the source line
information for global variables. The goal is to move AddressSanitizer
off of internal debug info for symbolization onto the DWARF standard
(and doing a clean-up in the process). Currently, ASan reports the line
information for constant strings if a memory safety bug happens around
them. We want to keep this behaviour, so we need to emit debuginfo for
these variables as well.
Reviewed By: dblaikie, rnk, aprantl
Differential Revision: https://reviews.llvm.org/D123534
The existing redundant copy elimination required a virtual register source, but the same logic works for any physreg where we don't have to worry about clobbers. On RISCV, this helps eliminate redundant CSR reads from VLENB.
Differential Revision: https://reviews.llvm.org/D125564
During early gather/scatter enablement two different approaches
were taken to represent scaled indices:
* A Scale operand whereby byte_offsets = Index * Scale
* An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType)
Having multiple representations is bad as shown by this patch which
fixes instances where the two are out of sync. The dedicated scale
operand is more flexible and pervasive so this patch removes the
UNSCALED values from IndexType. This means all indices are scaled
but the scale can be one, hence unscaled. SDNodes now use the scale
operand to answer the "isScaledIndex" question.
I toyed with the idea of keeping the UNSCALED enums and helper
functions but because they will have no uses and force SDNodes to
validate the set of supported values I figured it's best to remove
them. We can re-add them if there's a real need. For similar
reasons I've kept the IndexType enum when a bool could be used as I
think being explicitly looks better.
Depends On D123347
Differential Revision: https://reviews.llvm.org/D123381
If we use multiply it would be with 0x0101 which is 1 more than a power
of 2. On some targets we would expand this to shl+add. By avoiding the
multiply earlier, we can generate better code.
Note, PowerPC doesn't do the shl+add expansion of multiply so one of
the tests increased in instruction count.
Limiting to scalars because it almost always increased the number of
instructions in vector tests.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D125638
This change adds the constant splat versions of m_ICst() (by using
getBuildVectorConstantSplat()) and uses it in
matchOrShiftToFunnelShift(). The getBuildVectorConstantSplat() name is
shortened to getIConstantSplatVal() so that the *SExtVal() version would
have a more compact name.
Differential Revision: https://reviews.llvm.org/D125516
The patch make users not need to know getNode with SDNodeFlags argument may not
pass its flags.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125659
FunctionLoweringInfo::StatepointRelocationMaps map is used to pass GC pointer
lowering information from statepoint to gc.relocate which may appear ini
different block.
D124444 introduced different lowering for local and non-local relocates.
Local relocates use SDValue and non-local relocates use value exported to VReg.
But I overlooked the fact that StatepointRelocationMap is indexed not by
GCRelocate instruction, but by derived pointer. This works incorrectly when
we have two relocates (one local and another non-local) of the same value,
because they need different relocation records.
This patch fixes the problem by recording relocation information per relocate
instruction, not per derived pointer. This way, each gc.relocate can be lowered
differently.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D125538
FastISel tries to fold loads into the single using instruction.
However, if the register has fixups, then there may be additional
uses through an alias of the register.
In particular, this fixes the problem reported at
https://reviews.llvm.org/D119432#3507087. The load register is
(at the time of load folding) only used in a single call instruction.
However, selection of the bitcast has added a fixup between the
load register and the cross-BB register of the bitcast result.
After fixups are applied, there would now be two uses of the load
register, so load folding is not legal.
Differential Revision: https://reviews.llvm.org/D125459
SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.
SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.
Pulled out of D77804 as its going to be easier to address the regressions individually.
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.
The lost RISCV gorc2 fold shouldn't be a problem - instcombine would have already destroyed that pattern - see https://github.com/llvm/llvm-project/issues/50553
Differential Revision: https://reviews.llvm.org/D124839
When GlobalISel fails, we need to report the error, and we need to set
the FailedISel property. We skipped those steps if stack protector
insertion failed, which led to a very strange miscompile.
Differential Revision: https://reviews.llvm.org/D125584
We commonly want to create either an inbounds or non-inbounds GEP
based on a boolean value, e.g. when preserving inbounds from
existing GEPs. Directly accept such a boolean in the API, rather
than requiring a ternary between CreateGEP and CreateInBoundsGEP.
This change is not entirely NFC, because we now preserve an
inbounds flag in a constant expression edge-case in InstCombine.
Previously it built MIR for the results and returned a Register.
This avoids building constants for earlier elements of the vector if
later elements will fail to fold, and allows CSEMIRBuilder::buildInstr
to avoid unconditionally building a copy from the result.
Use a new helper function MachineIRBuilder::buildBuildVectorConstant
to build a G_BUILD_VECTOR of G_CONSTANTs.
Differential Revision: https://reviews.llvm.org/D117758
If we're promoting an undef I think that means that we expect the
upper bits are zero. undef doesn't guarantee that.
This patch replaces undef with 0 to ensure this. This matches how
a zext or sext of undef would be folded by InstCombine/InstSimplify.
I haven't found a failure from this was just thinking through the code.
Differential Revision: https://reviews.llvm.org/D123174
This is a re-apply of D123599, which was reverted in 4fe2ab5279, now
with a more appropriate assertion. Original commit message follow:
InstrRefBasedLDV can track and describe variable values that are spilt to
the stack -- however it does not current describe the size of the value on
the stack. This can cause uninitialized bytes to be read from the stack if
a small register is spilt for a larger variable, or theoretically on
big-endian machines if a large value on the stack is used for a small
variable.
Fix this by using DW_OP_deref_size to specify the amount of data to load
from the stack, if there's any possibility for ambiguity. There are a few
scenarios where this can be omitted (such as when using DW_OP_piece and a
non-DW_OP_stack_value location), see deref-spills-with-size.mir for an
explicit table of inputs flavours and output expressions.
Differential Revision: https://reviews.llvm.org/D123599
As pointed out in #55342, given non-canonical IR with multiple
constants, we check the second operand in isSafeWrap, but can promote
both with sext. Fix that as suggested by @craig.topper by ensuring we
only extend the second constant if multiple are present.
Fixes#55342
Differential Revision: https://reviews.llvm.org/D125294
This clang-formats the TypePromotion code, with the only meaningful
change being the removal of a verifyFunction call inside a LLVM_DEBUG,
and the printing of the entire function which can be better handled
via -print-after-all.
We often see code like the following after running SCCP:
switch (x) { case 42: phi(42, ...); }
This tends to produce bad code as we currently materialize the constant
phi-argument in the switch-block. This increases register pressure and
if the pattern repeats for `n` case statements, we end up generating `n`
constant values.
This changes CodeGenPrepare to catch this pattern and revert it back to:
switch (x) { case 42: phi(x, ...); }
Differential Revision: https://reviews.llvm.org/D124552
This adds a `TargetLoweringBase::getSwitchConditionType` callback to
give targets a chance to control the type used in
`CodeGenPrepare::optimizeSwitchInst`.
Implement callback for X86 to avoid i8 and i16 types where possible as
they often incur extra zero-extensions.
This is NFC for non-X86 targets.
Differential Revision: https://reviews.llvm.org/D124894
This allows the compiler to support more features than those supported by a
model. The only requirement (development mode only) is that the new
features must be appended at the end of the list of features requested
from the model. The support is transparent to compiler code: for
unsupported features, we provide a valid buffer to copy their values;
it's just that this buffer is disconnected from the model, so insofar
as the model is concerned (AOT or development mode), these features don't
exist. The buffers are allocated at setup - meaning, at steady state,
there is no extra allocation (maintaining the current invariant). These
buffers has 2 roles: one, keep the compiler code simple. Second, allow
logging their values in development mode. The latter allows retraining
a model supporting the larger feature set starting from traces produced
with the old model.
For release mode (AOT-ed models), this decouples compiler evolution from
model evolution, which we want in scenarios where the toolchain is
frequently rebuilt and redeployed: we can first deploy the new features,
and continue working with the older model, until a new model is made
available, which can then be picked up the next time the compiler is built.
Differential Revision: https://reviews.llvm.org/D124565
As suggested from 02f8519502, this uses the
isAnyConstantBuildVector method in lieu of separate
isBuildVectorOfConstantSDNodes calls. It should
otherwise be an NFC.
This prevents an infinite loop from D123801, where code trying to reduce
the total number of bitcasts, but also handling constants, could create
the opposite transform. Prevent the transform in these case to let the
bitcast of a constant transform naturally.
Fixes#55345
Like other shifts, the type isn't required to match. We shouldn't
assume we can call ZExtPromotedInteger.
I tested the PromoteIntOp_FunnelShift locally by removing the promotion
of the shift amount from PromoteIntRes_FunnelShift. But with the final
version of this patch it is never executed on any tests.
Differential Revision: https://reviews.llvm.org/D125106
This is part of an ongoing effort toward making DAGCombine process the nodes in topological order.
This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D124743
Add helper functions to query the signed and scaled properties
of ISD::IndexType along with functions to change them.
Remove setIndexType from MaskedGatherSDNode because it only has
one usage and typically should only be changed alongside its
index operand.
Minimise the direct use of the enum values to lay the groundwork
for more refactoring.
Differential Revision: https://reviews.llvm.org/D123347
Something is going wrong with the BigEndian PowerPC bot. It is hard to
tell what is wrong from here, but attempt to fix it by disabling the
combineShuffleOfBitcast combine for bigendian.
Otherwise we have garbage in the upper bits that can affect the
results of the UREM.
Fixes PR55296.
Differential Revision: https://reviews.llvm.org/D125076
If the mask is made up of elements that form a mask in the higher type
we can convert shuffle(bitcast into the bitcast type, simplifying the
instruction sequence. A v4i32 2,3,0,1 for example can be treated as a
1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load
combines, along with helping simplify a number of other tests.
The PowerPC combine for v16i8 splat vector loads needed some fixes to
keep it working for v16i8 vectors. This improves the handling of v2i64
shuffles to match too, hopefully improving them in general.
Differential Revision: https://reviews.llvm.org/D123801
The result of sign_extend_inreg needs to have as many sign bits
as requested by the VT argument. The easiest way to guarantee this
is to fold it to 0.
SystemZ test was modified to avoid using undef.
Fixes https://github.com/llvm/llvm-project/issues/55178
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D124696
There are many more instances of this pattern, but I chose to limit this change to .rst files (docs), anything in libcxx/include, and string literals. These have the highest chance of being seen by end users.
Reviewed By: #libc, Mordante, martong, ldionne
Differential Revision: https://reviews.llvm.org/D124708
Prior to ordering instructions to be scheduled, the machine pipeliner
update recurrence node sets in groupRemainingNodes() by adding in a
given node set any node on the dependency path from a node set with
higher priority to the given node set. The function computePath() that
determine what constitutes a path follows artificial dependencies.
However, when ordering the nodes in the resulting node sets,
computeNodeOrder() calls ignoreDependence when looking at dependencies
which ignores artificial dependencies. This can cause a node not to be
scheduled which then causes wrong code generation and in the case of a
debug build will lead to an assert failure in generatePhis() in
ModuloScheduler.cpp.
This commit adds calls to ignoreDependence() in computePath() to not add
any node in groupRemainingNodes() that would not be ordered by
computeNodeOrder().
Reviewed By: sgundapa
Differential Revision: https://reviews.llvm.org/D124267
Summary:
When -ffunction-sections is on, this patch makes the compiler to generate unique LSDA and EH info sections for functions on AIX by appending the function name to the section name as a suffix. This will allow the AIX linker to garbage-collect unused function.
Reviewed by: MaskRay, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D124855
This extends the (X & ~Y) | Y to X | Y fold to also work if ~Y is
a truncated not (when taking into account the mask X). This is
done by exporting the infrastructure added in D124856 and reusing
it here.
I've retained the old value of AllowUndefs=false, though probably
this can be switched to true with extra test coverage.
Differential Revision: https://reviews.llvm.org/D124930
Demanded bits analysis may replace a full-width not with a
any_extend (not (truncate X)) pattern. This patch looks through
this kind of pattern in haveNoCommonBitsSet(). Of course, we can
only do this if we only need negated bits in the non-extended part,
as the other bits may now be arbitrary. For example, if we have
haveNoCommonBitsSet(~X & Y, X) then ~X only needs to actually
negate bits set in Y.
This is only a partial solution to the problem in that it allows
add -> or conversion, but the resulting or doesn't get folded yet.
(I guess that will involve exposing getBitwiseNotOperand() as a
more general helper and using that in the relevant transform.)
Differential Revision: https://reviews.llvm.org/D124856
If the tied use is undef value, fastregalloc should free the def
register. There is no reload needed for the undef value.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D124834
Don't assume the rotation amounts have been correctly normalized - do it as part of the constant folding.
Also, the normalization should be performed with UREM not SREM.
This is the DAG variant of D124763. The code already handles the
general pattern, but not this degenerate case.
This allows folding A + (B&~A) to A | (B&~A) which further holds
to A | B.
Handling on the SDAG level is needed because in the motivating
case the add is actually a getelementptr, which only gets converted
into an add on the SDAG level. However, this patch is not quite
sufficient to handle the getelementptr case yet, because of an
interfering demanded bits simplification.
Differential Revision: https://reviews.llvm.org/D124772
In SelectionDAG, DBG_PHI instructions are created to "read" physreg values
and give them an instruction number, when they can't be traced back to a
defining instruction. The most common scenario if arguments to a function.
Unfortunately, if you have 100 inlined methods, each of which has the same
"this" pointer, then the 100 dbg.value instructions become 100
DBG_INSTR_REFs plus 100 DBG_PHIs, where only one DBG_PHI would suffice.
This patch adds a vreg cache for MachienFunction::salvageCopySSA, if we've
already traced a value back to the start of a block and created a DBG_PHI
then it allows us to re-use the DBG_PHI, as well as reducing work.
Differential Revision: https://reviews.llvm.org/D124517
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.
The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.
Differential Revision: https://reviews.llvm.org/D124358
When looking for memory uses,
reassociationCanBreakAddressingModePattern should check uses of
the outer ADD rather than the inner ADD. We want to know if the
two ops we're reassociating are used by a load/store.
In practice, the existing check usually works because CodeGenPrepare
will make one of the load/stores have an offset of 0 relative to
split GEP. That will make the inner add have a memory use.
To test this, I've manually split the GEPs so there is no 0 offset
store.
This issue was recently discussed in the original review D60294.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D124644
SIGN_EXTEND_INREG expansion can trigger a TypeSize error because
"VT.getSizeInBits() == 1" is used to detect for a boolean without
first verifying VT is a scalar.
We try to match as a disguised rotate by constant of these forms
(shl (X | Y), C1) | (srl X, C2) --> (rotl X, C1) | (shl Y, C1)
(shl X, C1) | (srl (X | Y), C2) --> (rotl X, C1) | (srl Y, C2)
We may have also looked through an AND to find the shift. If we
did, we need to apply a mask to the result.
I'll add an AArch64 test and pre-commit it and the RISC-V test
tomorrow.
Fixes PR55201.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D124711
Fixed "private field is not used" warning when compiled
with clang.
original commit: 28d09bbbc3
reverted in: fa49021c68
------
This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7. The t2Bcc
instruction is recognized as a loop-ending branch.
MachinePipeliner is extended by adding support for
"unpipelineable" instructions. These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.
The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.
It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D122672
When looking through extends of gather/scatter indices it's safe
to convert a known positive signed index to unsigned, but unsigned
indices must remain unsigned.
Depends On D123318
Differential Revision: https://reviews.llvm.org/D123326
This is an alternative to D124530. In getUniformBase() only create
scales that match the gather/scatter element size. If targets also
support other scales, then they can produce those scales in target
DAG combines. This is what X86 already does (as long as the
resulting scale would be 1, 2, 4 or 8).
This essentially restores the pre-opaque-pointer state of things.
Fixes https://github.com/llvm/llvm-project/issues/55021.
Differential Revision: https://reviews.llvm.org/D124605
refineUniformBase and selectGatherScatterAddrMode both attempt the
transformation:
base(0) + index(A+splat(B)) => base(B) + index(A)
However, this is only safe when index is not implicitly scaled.
Differential Revision: https://reviews.llvm.org/D123222
PowerPC supports `ppc_fp128`, which is not an IEEE floating point
type. The generic lowering of llvm.is_fpclass could not handle it
properly. This change extends the generic lowering code to
support `ppc_fp128`.
The change was tested on emulator using runtime tests from
https://reviews.llvm.org/D112933 and the patch for clang
https://reviews.llvm.org/D112932.
Differential Revision: https://reviews.llvm.org/D113908
This reverts commit a15b66e76d.
This causes linker to crash at assertion: `Assertion failed: !Expr->isComplex(), file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\CodeGen\LiveDebugValues\InstrRefBasedImpl.cpp, line 907`.
This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7. The t2Bcc
instruction is recognized as a loop-ending branch.
MachinePipeliner is extended by adding support for
"unpipelineable" instructions. These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.
The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.
It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D122672