This patch adds 3 methods, one for power-of-2 vectors which use tree
reductions using vector ops, before a final reduction op. For non-pow-2
types it generates multiple narrow reductions and combines the values with
scalar ops.
Differential Revision: https://reviews.llvm.org/D97163
For imported pattern purposes, we have a custom rule that promotes the rotate
amount to 64b as well.
Differential Revision: https://reviews.llvm.org/D99463
Darwin platforms for both AArch64 and X86 can provide optimized `bzero()`
routines. In this case, it may be preferable to use `bzero` in place of a
memset of 0.
This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can
be generated by platforms which may want to use bzero.
To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The
conditions for this are largely a port of the bzero case in
`AArch64SelectionDAGInfo::EmitTargetCodeForMemset`.
The only difference in comparison to the SelectionDAG code is that, when
compiling for minsize, this will fire for all memsets of 0. The original code
notes that it's not beneficial to do this for small memsets; however, using
bzero here will save a mov from wzr. For minsize, I think that it's preferable
to prioritise omitting the mov.
This also fixes a bug in the libcall legalization code which would delete
instructions which could not be legalized. It also adds a check to make sure
that we actually get a libcall name.
Code size improvements (Darwin):
- CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign)
- CTMark -Oz: -0.2% geomean (-0.5% on bullet)
Differential Revision: https://reviews.llvm.org/D99358
This adds some missing legalizer tests, which uncovered a v2s64 selection
test that wasn't working since there's no legalization or instruction for that.
This reverts commit 962b73dd0f.
This commit was reverted because of some internal SPEC test failures.
It turns out that this wasn't actually relevant to anything in open source, so
it's safe to recommit this.
Remove a rule which allows larger scalar types than the destination vector
element type.
This appears to be irrelevant now that we have G_BUILD_VECTOR_TRUNC. Plus,
making a G_BUILD_VECTOR which satisfies this introduces a verifier failure
anyway.
Differential Revision: https://reviews.llvm.org/D97727
This is pretty much just ports `performGlobalAddressCombine` from
AArch64ISelLowering. (AArch64 doesn't use the generic DAG combine for this.)
This adds a pre-legalize combine which looks for this pattern:
```
%g = G_GLOBAL_VALUE @x
%ptr1 = G_PTR_ADD %g, cst1
%ptr2 = G_PTR_ADD %g, cst2
...
%ptrN = G_PTR_ADD %g, cstN
```
And then, if possible, transforms it like so:
```
%g = G_GLOBAL_VALUE @x
%offset_g = G_PTR_ADD %g, -min(cst)
%ptr1 = G_PTR_ADD %offset_g, cst1
%ptr2 = G_PTR_ADD %offset_g, cst2
...
%ptrN = G_PTR_ADD %offset_g, cstN
```
Where min(cst) is the smallest out of the G_PTR_ADD constants.
This means we should save at least one G_PTR_ADD.
This also updates code in the legalizer + selector which assumes that
G_GLOBAL_VALUE will never have an offset and adds/updates relevant tests.
Differential Revision: https://reviews.llvm.org/D96624
We are allowed to store 128-bit-wide values using the q registers on AArch64.
GlobalISel was clamping the number of elements in vector stores into 64 bits
instead.
This results in some poor codegen like below:
https://godbolt.org/z/E56dq8
```
; SDAG uses a stp + q registers in both cases here.
define void @float(<16 x float> %val, <16 x float>* %ptr) {
store <16 x float> %val, <16 x float>* %ptr
ret void
}
define void @double(<8 x double> %val, <8 x double>* %ptr) {
store <8 x double> %val, <8 x double>* %ptr
ret void
}
```
This adds similar legalization for vector stores with s8 and s16 elements.
Differential Revision: https://reviews.llvm.org/D95107
This makes G_SADDE and G_SSUBE legal in preparation for further work
legalizing overflowing operations. It's fine that they don't have an
instruction selector implementation yet, because G_UADDE and G_USUBE are
already legal on AArch64 without an instruction selector implementation. This
completes the set of G_[SU]{ADD,SUB}[EO] operations on AArch64.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D95325
Add support for G_FCONSTANT of FP128 (Quadruple precision) type.
It replaces the constant by emitting a load with a constant pool entry.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D94437
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.
Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).
One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
The lowering of vector selects needs to first splat the scalar mask into a vector
first.
This was causing a crash when building oggenc in the test suite.
Differential Revision: https://reviews.llvm.org/D91655
When there is full fp16 support, there is no reason to widen 16-bit
G_FCONSTANTs to 32 bits. Mark them as legal in this case.
Also, we currently import a pattern for materializing a 16-bit 0.0.
Add a testcase showing we select it.
(All other 16-bit G_FCONSTANTS are not yet selected.)
Differential Revision: https://reviews.llvm.org/D89164
NEON is pretty limited in it's reduction support. As a first step add some
basic rules for the legal types we can select.
Differential Revision: https://reviews.llvm.org/D89070
Truncating to v8i8 is a case where we want to split the source but also generate
intermediate truncates to reduce the size of the source vector before truncating
down to v8i8. This implements the same strategy as what SelectionDAG does, but
I'm not certain where if anywhere in generic code it should live.
Use it for legalization of v8s8 = G_ICMP v8s32.
Differential Revision: https://reviews.llvm.org/D88191
Also use this opportunity start to clean up the mess of vector type lists we
have in the LegalizerInfo. Unfortunately since the legalizer rule builders require
std::initializer_list objects as parameters we can't programmatically generate the
type lists.
This was supposed to be done in the first place as is currently the case for
G_ASHR and G_LSHR but was forgotten when the original shift legalization
overhaul was done last year.
This was exposed because we started falling back on s32 = s32, s64 SHLs
due to a recent combiner change.
Gives a very minor (0.1%) code size -O0 improvement on consumer-typeset.