When using a ptrtoint to a size larger than the pointer width in a
global initializer, we currently create a ptr & low_bit_mask style
MCExpr, which will later result in a relocation error during object
file emission.
This patch rejects the constant expression already during
lowerConstant(), which results in a much clearer error message
that references the constant expression at fault.
This fixes https://github.com/llvm/llvm-project/issues/56400,
for certain definitions of "fix".
Differential Revision: https://reviews.llvm.org/D130366
This would create a new interval missing the subrange and hit this
verifier error:
*** Bad machine code: Live interval for subreg operand has no subranges ***
- function: test_remat_subreg_def
- basic block: %bb.0 (0xa568758) [0B;128B)
- instruction: 32B dead undef %4.sub0:vreg_64 = V_MOV_B32_e32 2, implicit $exec
We still haven't found a solution that correctly handles 'don't care' sub elements properly - given how close it is to the next release branch, I'm making this fail safe change and we can revisit this later if we can't find alternatives.
NOTE: This isn't a reversion of D128570 - it's the removal of undef handling across bitcasts entirely
Fixes#56520
llvm::sort is beneficial even when we use the iterator-based overload,
since it can optionally shuffle the elements (to detect
non-determinism). However llvm::sort is not usable everywhere, for
example, in compiler-rt.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D130406
This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....).
It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.
As noticed on D127115, when splitting ADD/SUB nodes we often end up with cases where overflow from the lower bits is impossible - in such cases we're better off breaking the carry chain dependency as soon as possible.
This path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen diff without a topological worklist.
Concat KnownBits from ISD::SHL_PARTS / ISD::SRA_PARTS / ISD::SRL_PARTS lo/hi operands and perform the KnownBits calculation by the shift amount on the extended type, before splitting the KnownBits based on the requested lo/hi result.
This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.llvm.org/D73739) which avoids zero offset landing pad in basic block sections.
Detailed description:
The current machine functions splitter can create ˜sections which start with a landing pad themselves. This places landing pad at offset zero from LPStart.
```
.section .text.split.foo10,"ax",@progbits
foo10.cold: # %lpad
.cfi_startproc
.cfi_personality 3, __gxx_personality_v0
.cfi_lsda 3, .Lexception5
.cfi_def_cfa %rsp, 16
.Ltmp11: <--- This is a Landing pad and also LP Start as it is start of this section
movq %rax, %rdi <--- first instruction is at offest 0 from LPStart
callq _Unwind_Resume@PLT
```
This will cause landing pad entries to become zero (.Ltmp11-foo10.cold)
```
.Lcst_begin4:
.uleb128 .Ltmp9-.Lfunc_begin2 # >> Call Site 1 <<
.uleb128 .Ltmp10-.Ltmp9 # Call between .Ltmp9 and .Ltmp10
.uleb128 .Ltmp11-foo10.cold <---This is zero # jumps to .Ltmp11
.byte 3 # On action: 2
.uleb128 .Ltmp10-.Lfunc_begin2 # >> Call Site 2 <<
.uleb128 .Lfunc_end9-.Ltmp10 # Call between .Ltmp10 and .Lfunc_end9
.byte 0 # has no landing pad
.byte 0 # On action: cleanup
.p2align 2
```
The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. This change adds a nop instruction at start of such sections so that such a case could be avoided. Output:
```
.section .text.split.foo10,"ax",@progbits
foo10.cold: # %lpad
.cfi_startproc
.cfi_personality 3, __gxx_personality_v0
.cfi_lsda 3, .Lexception5
.cfi_def_cfa %rsp, 16
nop <--- new instruction that is added
.Ltmp11:
movq %rax, %rdi
callq _Unwind_Resume@PLT
```
Reviewed By: modimo, snehasish, rahmanl
Differential Revision: https://reviews.llvm.org/D130133
We were looking for loads or any_extend+load. reduceLoadWidth
hasn't known how to look through such an any_extend to find the
load since D40667 almost 5 years ago.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D130333
Move this out of the switch, so that different branches can
indicate an error by breaking out of the switch. This becomes
important if there are more than the two current error cases.
Vector fptosi_sat and fptoui_sat were being expanded by unrolling the
vector operation. This doesn't work for scalable vector, so this patch
adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable.
Scalable tests are added for AArch64 and RISCV. Some of the AArch64
fptoi_sat operations should be legal, but that will be handled in
another patch.
Differential Revision: https://reviews.llvm.org/D130028
Similar to what we already do in getNode for basic ADD/SUB nodes, return the X operand directly, but here we know that there will be no/zero overflow as well.
As noted on D127115 - this path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen without a topological worklist.
PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target.
This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur.
Differential Revision: https://reviews.llvm.org/D129641
Add promotion and expansion of integer operands for
experimental_vp_strided SelectionDAG nodes; the expansion is actually
just a truncation of the stride operand.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D123112
When determining if an `and` should be merged into an extending load
the constant argument to the `and` is currently not checked if the
argument requires truncation. This prevents the combine happening when
the vector width is half the normal available vector width for SVE VLA
vectors.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D129281
Unlike the name suggests this can reuse any store as a base for a
memory-based vector extract. If that store is underaligned the loads
created to extract will have an invalid alignment. Since most CPUs are
forgiving wrt alignment this is almost never an issue, on x86 this is
only reproducible by extracting a 128 bit vector out of a wider vector.
I tried making a test case in the context of
https://reviews.llvm.org/D127982 but it's really really fragile, as the
output pretty much looks like a missed optimization.
The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits.
This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115
Alive2: https://alive2.llvm.org/ce/z/fl7T7K
Differential Revision: https://reviews.llvm.org/D129933
This revision supports to scalarize a binary operation of two scalable splat vectors.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D122791
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
r175673 changed repairIntervalsInRange to find anchoring end points for
ranges automatically, but the calculation of Begin included the first
instruction found that already had an index. This patch changes it to
exclude that instruction:
1. For symmetry, so that the half open range [Begin,End) only includes
instructions that do not already have indexes.
2. As a possible performance improvement, since repairOldRegInRange
will scan fewer instructions.
3. Because repairOldRegInRange hits assertion failures in some cases
when it sees a def that already has a live interval.
(3) fixes about ten tests in the CodeGen lit test suite when
-early-live-intervals is forced on.
Differential Revision: https://reviews.llvm.org/D110182
The DAG Combiner unnecessarily restricts commutative CSE
to nodes with a single result value. This commit removes
that restriction.
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D129666
Undef tokens may appear in unreached code as result of RAUW of some optimization,
and it should not be considered as bad IR.
Patch by Dmitry Bakunevich!
Differential Revision: https://reviews.llvm.org/D128904
Reviewed By: mkazantsev