WidenVecOp_INSERT_SUBVECTOR only supported cases where widening
effectively converts the insert into a copy. However, when the
widened subvector is no bigger than the vector being inserted into
and we can be sure there's no loss of data, we can simply emit
another INSERT_SUBVECTOR.
Fixes: #54982
Differential Revision: https://reviews.llvm.org/D127508
MinRCSize is 4 and prevents constrainRegClass from changing the
register class if the new class has size less than 4.
IMPLICIT_DEF gets a unique vreg for each use and will be removed
by the ProcessImplicitDef pass before register allocation. I don't
think there is any reason to prevent constraining the virtual register
to whatever register class the use needs.
The attached test case was previously creating a copy of IMPLICIT_DEF
because vrm8nov0 has 3 registers in it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D128005
This helps handling a case where the BUILD_VECTOR has i16 element type
and i32 constant operands
t2: v8i16 = setcc t8, t17, setult:ch
t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ...
t4: v8i16 = and t2, t3
t5: v8i16 = add t8, t4
This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove
t3 and t4 from the DAG.
Differential Revision: https://reviews.llvm.org/D127354
This reverts commit 7207373e1e.
We found another RISC-V bug when landing D126048, and it has been fixed
by D127642 now.
Differential Revision: https://reviews.llvm.org/D126048
This is needed by our downstream and makes bf16 and f16 have the
same set of scalable vector types.
Reviewed By: rui.zhang
Differential Revision: https://reviews.llvm.org/D127877
The use operand may be undefined. In that case we can just continue to
check the next operand since it won't increase register pressure.
Differential Revision: https://reviews.llvm.org/D127848
This is modeled after the half-precision fp support. Two new nodes are
introduced for casting from and to bf16. Since casting from bf16 is a
simple operation I opted to always directly lower it to integer
arithmetic. The other way round is more complicated if you want to
preserve IEEE semantics, so it's handled by a new __truncsfbf2
compiler-rt builtin.
This is of course very bare bones, but sufficient to get a semi-softened
fadd on x86.
Possible future improvements:
- Targets with bf16 conversion instructions can now make fp_to_bf16 legal
- The software conversion to bf16 can be replaced by a trivial
implementation under fast math.
Differential Revision: https://reviews.llvm.org/D126953
When compiling for the RWPI relocation model [1], the debug information
is wrong for readonly global variables.
Writable global variables are accessed by the static base register (R9
on ARM) in the RWPI relocation model. This is being correctly generated
Readonly global variables are not accessed by the static base register
in the RWPI relocation model. This case is incorrectly generating the
same debugging information as for writable global variables.
References:
[1] ARM Read-Write Position Independence: https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#read-write-position-independence-rwpi
Differential Revision: https://reviews.llvm.org/D126361
GetValueInMiddleOfBlock uses result of GetValueAtEndOfBlockInternal if there is no value
defined for current basic block.
If there is already a value it tries (in this order):
to find single register coming from all predecessors
find existing phi node which matches our incoming registers
build new phi.
The compile time improvement is to use current available value if
it is defined out of current BB or it is a PHI register.
This is due to it can be used in the middle basic block.
Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126523
When spilling CSRs, FixupStatepoint pass does simple copy propagation,
trying to find COPY instruction which defines register being spilled
and spill COPY source instead. I.e., if we have CSR $x and found
$x = COPY $y
we will spill $y instead.
But we may be unable to delete COPY instruction for some reason.
Then, spill will be inserted after it, adding another use of $y.
If COPY instruction was last use of $y (killed it), after insertion of
the spill it is not, so `isKill` flag must be cleared. We failed to do
so and this patch fixes this issue.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D127308
This is a fix for https://github.com/llvm/llvm-project/issues/55827.
When register we are trying to re-color is split the original register (we tried to recover)
has no uses after the split. However in rollback actions we assign back physical register to it.
Later it causes different assertions. One of them is in attached test.
This CL fixes this by avoiding assigning physical register back to register which has no usage
or its live interval now is empty.
Reviewed By: arsenm, qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D127281
The last of getEvictor use was removed on Jun 5, 2022 in commit
5c06f7168f, which was itself a patch to
remove unused code.
Once we remove getEvictor, EvictionTrack becomes a write-only data
structure. The data in it won't affect compilation, so the entire
class is essentially dead.
Another issue unearthed by D127115
We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with).
D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded.
This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late.
Differential Revision: https://reviews.llvm.org/D127595
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.
This helps with several of the regressions from D125836
Previously, omitting unnecessary DWARF unwinds was only done in two
cases:
* For Darwin + aarch64, if no DWARF unwind info is needed for all the
functions in a TU, then the `__eh_frame` section would be omitted
entirely. If any one function needed DWARF unwind, then MC would emit
DWARF unwind entries for all the functions in the TU.
* For watchOS, MC would omit DWARF unwind on a per-function basis, as
long as compact unwind was available for that function.
This diff makes it so that we omit DWARF unwind on a per-function basis
for Darwin + aarch64 as well. In addition, we introduce the flag
`--emit-dwarf-unwind=` which can toggle between `always`,
`no-compact-unwind` (only emit DWARF when CU cannot be emitted for a
given function), and the target platform `default`. `no-compact-unwind`
is particularly useful for newer x86_64 platforms: we don't want to omit
DWARF unwind for x86_64 in general due to possible backwards compat
issues, but we should make it possible for people to opt into this
behavior if they are only targeting newer platforms.
**Motivation:** I'm working on adding support for `__eh_frame` to LLD,
but I'm concerned that we would suffer a perf hit. Processing compact
unwind is already expensive, and that's a simpler format than EH frames.
Given that MC currently produces one EH frame entry for every compact
unwind entry, I don't think processing them will be cheap. I tried to do
something clever on LLD's end to drop the unnecessary EH frames at parse
time, but this made the code significantly more complex. So I'm looking
at fixing this at the MC level instead.
**Addendum:** It turns out that there was a latent bug in the X86
backend when `OmitDwarfIfHaveCompactUnwind` is naively enabled, which is
not too surprising given that this combination has not been heretofore
used.
For functions that have unwind info that cannot be encoded with CU, MC
would end up dropping both the compact unwind entry (OK; existing
behavior) as well as the DWARF entries (not OK). This diff fixes things
so that we emit the DWARF entry, as well as a CU entry with encoding
`UNWIND_X86_MODE_DWARF` -- this basically tells the unwinder to look for
the DWARF entry. I'm not 100% sure the `UNWIND_X86_MODE_DWARF` CU entry
is necessary, this was the simplest fix. ld64 seems to be able to handle
both the absence and presence of this CU entry. Ultimately ld64 (and
LLD) will synthesize `UNWIND_X86_MODE_DWARF` if it is absent, so there
is no impact to the final binary size.
Reviewed By: davide, lhames
Differential Revision: https://reviews.llvm.org/D122258
1. When checking if a candidate contains a CFI instruction, actually
iterate over all of the instructions, instead of stopping halfway
through.
2. Make sure copied CFI directives refer to the correct instruction.
Fixes https://github.com/llvm/llvm-project/issues/55842
Differential Revision: https://reviews.llvm.org/D126930