We already have reassociation code for Adds and Ors separately in DAG
combiner, this adds it for the combination of the two where Ors act like
Adds. It reassociates (add (or (x, c), y) -> (add (add (x, y), c)) where
we know that the Ors operands have no common bits set, and the Or has
one use.
Differential Revision: https://reviews.llvm.org/D104765
This applies to memory accesses to (compile-time) constant addresses
(such as memory-mapped registers). Currently when a misaligned access
to such an address is detected, a fatal error is reported. This change
will emit a remark, and the compilation will continue with a trap,
and "undef" (for loads) emitted.
This fixes https://llvm.org/PR50838.
Differential Revision: https://reviews.llvm.org/D50524
This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted all of the tests already, but likely missed a few).
Not sure what the exact syntax and policy should be. We can continue
printing the number of bytes for non-generic instructions to avoid
test churn and only allow non-scalar types for generic instructions.
This will currently print the LLT in parentheses, but accept parsing
the existing integers and implicitly converting to scalar. The
parentheses are a bit ugly, but the parser logic seems unable to deal
without either parentheses or some keyword to indicate the start of a
type.
Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104597
These all (and some others) are being affected by D104597,
but they are manually-written, which rather complicates
checking the effect that change has on them.
This patch extends the SelectionDAG's ability to constant-fold vector
arithmetic to include support for SPLAT_VECTOR. This is not only for
scalable-vector types but also for fixed-length vector types, which
helps Hexagon in a couple of cases.
The original RISC-V test case was in fact an infinite DAGCombine loop.
The pattern `and (truncate v1), (truncate v2)` can be combined to
`truncate (and v1, v2)` but the truncate can similarly be combined back
to `truncate (and v1, v2)` (but, crucially, only when one of `v1` or
`v2` is a constant vector).
It wasn't exposed in on fixed-length types because a TRUNCATE of a
constant BUILD_VECTOR was folded into the BUILD_VECTOR itself, whereas
this did not happen for the equivalent (scalable-vector) SPLAT_VECTOR.
Reviewed By: RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D103246
'not' expands to checking for an xor with a -1 constant. Since
this looks for a ConstantSDNode it will never match for a vector.
Co-authored-by: Craig Topper <craig.topper@sifive.com>
Differential Revision: https://reviews.llvm.org/D100687
Intrinsics for the following instructions are added. The intrinsic
name is "int_hexagon_<inst>[_128B]", e.g.
int_hexagon_V6_vL32b_pred_ai for 64-byte version
int_hexagon_V6_vL32b_pred_ai_128B for 128-byte version
V6_vL32b_pred_ai if (Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_pred_pi if (Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_pred_ppu if (Pv4) Vd32 = vmem(Rx32++Mu2)
V6_vL32b_npred_ai if (!Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_npred_pi if (!Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_npred_ppu if (!Pv4) Vd32 = vmem(Rx32++Mu2)
V6_vL32b_nt_pred_ai if (Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_pred_pi if (Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_pred_ppu if (Pv4) Vd32 = vmem(Rx32++Mu2):nt
V6_vL32b_nt_npred_ai if (!Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_npred_pi if (!Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_npred_ppu if (!Pv4) Vd32 = vmem(Rx32++Mu2):nt
V6_vS32b_pred_ai if (Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_pred_pi if (Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_pred_ppu if (Pv4) vmem(Rx32++Mu2) = Vs32
V6_vS32b_npred_ai if (!Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_npred_pi if (!Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_npred_ppu if (!Pv4) vmem(Rx32++Mu2) = Vs32
V6_vS32Ub_pred_ai if (Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_pred_pi if (Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_pred_ppu if (Pv4) vmemu(Rx32++Mu2) = Vs32
V6_vS32Ub_npred_ai if (!Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_npred_pi if (!Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_npred_ppu if (!Pv4) vmemu(Rx32++Mu2) = Vs32
V6_vS32b_nt_pred_ai if (Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_pred_pi if (Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_pred_ppu if (Pv4) vmem(Rx32++Mu2):nt = Vs32
V6_vS32b_nt_npred_ai if (!Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_npred_pi if (!Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_npred_ppu if (!Pv4) vmem(Rx32++Mu2):nt = Vs32
LLVM test CodeGen/Hexagon/hwloop3.ll tries to check for the absence of a
sequence of consecutive instructions with several CHECK-NOT with one of
those directives using a variable defined in another. However CHECK-NOT
are checked independently so that is using a variable defined in a
pattern that should not occur in the input.
This commit merges the two CHECK-NOT into a single CHECK-NOT that
matches the content of two successive non-blank lines, thereby allowing
to preserve the intent of the test.
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D99778
The offset in HVX loads/stores is only 4 bits long, so often an
extra register is needed to hold the address. Minimize the number
of such registers by "standardizing" the base addresses and reusing
preexisting base registers when replacing frame indices.
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.
PR48779
Initially reverted due to BasicAA running analyses in an unspecified
order (multiple function calls as parameters), fixed by fetching
analyses before the call to construct BasicAA.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D95117
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.
PR48779
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D95117
The Hexagon Vector Combine pass genertes stores for a complete
aligned vector. The start of each section is a multiple of the
vector size, so that value is passed to normalize to compute
the offset of the stores in the section. The first store may
not occur at offset 0 when there is a gap between sections.
The result cannot be widened, unfortunately, because widening vNi1
would depend on the context in which it appears (i.e. the type alone
is not sufficient to tell if it needs to be widened).
Once the default for SimplifyCFG flips, we can no longer pass nullptr
instead of DomTree to SimplifyCFG, so we need to propagate it here.
We don't strictly need to actually preserve DomTree in DwarfEHPrepare,
but we might as well do it, since it's trivial.
AlignVectors treats all loaded/stored values as vectors of bytes,
and masks as corresponding vectors of booleans, so make getMask
produce a 1-element vector for scalars from the start.
Followup to D92112 now that I've learnt about HVX type splitting.
This is some necessary cleanup work for min/max ops to eventually help us move the add/sub sat patterns into DAGCombine - D91876.
Differential Revision: https://reviews.llvm.org/D92169
This should handle the basic integer min/max handling - the HVX ops are still TODO.
This is some necessary cleanup work for min/max ops to eventually help us move the add/sub sat patterns into DAGCombine - D91876.
Differential Revision: https://reviews.llvm.org/D92112