There was no pattern to fold into these instructions. This patch adds
the pattern obtained from the following ACLE intrinsics so that they
generate sqdmlal/sqdmlsl instructions instead of separate sqdmull and
sqadd/sqsub instructions:
- vqdmlalh_s16, vqdmlslh_s16
- vqdmlalh_lane_s16, vqdmlalh_laneq_s16, vqdmlslh_lane_s16,
vqdmlslh_laneq_s16 (when the lane index is 0)
It also modifies the result of the existing pattern for the latter, when
the lane index is not 0, to use the v1i32_indexed instructions instead
of the v4i16_indexed ones.
Fixes#49997.
Differential Revision: https://reviews.llvm.org/D131700
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code. Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.
Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.
So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.
Discovered while trying to sort out related stuff on D102817.
Differential Revision: https://reviews.llvm.org/D124697
Currenlty all temporal loads are mapped to `LDP` or `LDR`. This patch will map all the non temporal 256-bit loads into `LDNP`. Future patches should address other non-temporal loads.
Reviewed By: fhahn, dmgreen
Differential Revision: https://reviews.llvm.org/D131773
The outer signext_inreg is redundant in the following:
Fold (signext_inreg (extract_subvector (zext|anyext|sext iN_value to _) _) from iN)
-> (extract_subvector (signext iN_value to iM))
Tests are precommitted and clone those by analogy from the AND case in
the same file. Add a negative test to check extension width is handled
correctly.
This patch supersedes D130700.
Differential Revision: https://reviews.llvm.org/D131503
Add some test cases for D131773 where LDNP could be used as well as
negative tests.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D131767
This reverts commit 38c2366b3f.
This patch seems to break boostraping LLVM with `-fglobal-isel -O3`
on AArch64 hardware. Without the revert, there are 500+ test
failures for the `check-llvm-codegen-x86` target.
A bfloat select operation will currently crash, but is allowed from C.
This adds handling for the operation, turning it into a FCSELHrrr if
fullfp16 is present, or converting it to a FCSELSrrr if not. The
FCSELSrrr is created via using INSERT_SUBREG/EXTRACT_SUBREG to convert
the bf16 to a f32 and using the f32 pattern for FCSELSrrr. (I originally
attempted to do this via a tablegen pattern, but it appears that the
nzcv glue is places onto the wrong node, causing it to be forgotten and
incorrect scheduling to be emitted).
The FCSELSrrr can also be used for fp16 selects when +fullfp16 is not
present, which helps avoid an unnecessary promotion to f32.
Differential Revision: https://reviews.llvm.org/D131253
Expand TypePromotion pass to try to promote PHI-nodes in loops that are the
operand of a ZExt, using the ZExt's result type to determine the Promote Width.
Differential Revision: https://reviews.llvm.org/D111237
This intrinsic used a typed pointer for a call target operand. This
change updates the operand to be an opaque pointer and updates all
pointers in all test files that use the intrinsic.
Differential revision: https://reviews.llvm.org/D131261
Currently fcopysign for VLS vectors lowers through NEON even when the
vector width is wider than a NEON vector, causing bad codegen as the
vectors are split. This patch causes SVE to be used for these vectors
instead, giving much better codegen on wide VLS vectors.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D128642
The register operand of DBG_VALUE is not selected to a proper register
bank in both AArch64 and X86. This would cause getRegClass crash after
global ISel. After discussion, we think the MIR should assume all
vritual register should be set proper register class after global ISel,
so this patch is to fix the gap of DBG_VALUE for AArch64 and X86.
Differential Revision: https://reviews.llvm.org/D129037
Add patterns to select predicated instructions when lowering:
fadd(a, select(mask, b, splat(0)))
fsub(a, select(mask, b, splat(0)))
'fadd' is unsafe unless no-signed zeros fast-math flag is set, since
-0.0 + 0.0 = 0.0
changes the sign. Alive2: https://alive2.llvm.org/ce/z/wbhJh_
Also adds FMA patterns for:
fadd(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c)
fsub(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c)
These patterns require the 'contract' fast-math flag to be set, and the
fadd 'nsz' as above.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D130564
NOTE: i8 vector splats are ignored because the immediate range of
DUP already has full coverage.
Differential Revision: https://reviews.llvm.org/D131078
This is a simple addition to emitConditionalComparison, to match CCMP
with immediates using getIConstantVRegValWithLookThrough, letting it
select the CCMPri variants of the instructions.
Differential Revision: https://reviews.llvm.org/D131073
The function `handleDebugValue` has custom logic to handle certain kinds
constants, namely integers, floats and null pointers. However, it does
not handle constant pointers created from IntToPtr ConstantExpressions.
This patch addresses the issue by replacing the Constant with its
integer operand.
A similar bug was addressed for GlobalISel in D130642.
Reviewed By: aprantl, #debug-info
Differential Revision: https://reviews.llvm.org/D130908
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.
This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D130370
rGcf97e0ec42b8 makes $x18 to be treated as callee-saved in functions with
Windows calling convention on non-Windows OSes.
Here we mark $x18 as callee-saved for functions with Windows calling
convention on Darwin, as well as on other non-Windows platforms, in
order to prevent some miscompilations (like miscompilation of
win64cc-darwin-backup-x18.ll).
Since getCalleeSavedRegs doesn't return x18 in list of callee-saved
registers, assignCalleeSavedSpillSlots and determineCalleeSaves
consider different sets of registers as callee-saved. It causes an
error:
```
Assertion failed: ((!HasCalleeSavedStackSize || getCalleeSavedStackSize() == Size) && "Invalid size calculated for callee saves"), function getCalleeSavedStackSize, file
AArch64MachineFunctionInfo.h, line 292.
```
Differential Revision: https://reviews.llvm.org/D130676
This folds a v4i32 Mul(And(Srl(X, 15), 0x10001), 0xffff) into a v8i16
CMLTz instruction. The Srl and And extract the top bit (whether the
input is negative) and the Mul sets all values in the i16 half to all
1/0 depending on if that top bit was set. This is equivalent to a v8i16
CMLTz instruction. The same applies to other sizes with equivalent
constants.
Differential Revision: https://reviews.llvm.org/D130874
Eliminate an AND by redefining an anyext|sext|zext.
(and (extract_subvector (anyext|sext|zext v) _) iN_mask)
=> (extract_subvector (zeroext_iN v))
Differential Revision: https://reviews.llvm.org/D130782
Salvage debug info of instruction that is about to be deleted as dead in
Combiner pass. Currently supported instructions are COPY and G_TRUNC.
It allows to salvage debug info of some dead arguments of functions, by putting
DWARF expression corresponding to the instruction being deleted into related
DBG_VALUE instruction.
Here is an example of missing variables location https://godbolt.org/z/K48osb9dK.
We see that arguments x, y of function foo are not available in debugger, and
corresponding DBG_VALUE instructions have undefined register operand instead of
variables locaton after Aarch64PreLegalizerCombiner pass. The reason is that
registers where variables are located are removed as dead (with instruction
G_TRUNC). We can use salvageDebugInfo analogue for gMIR to preserve debug
locations of dead variables.
Statistics of llvm object files built with vs without this commit on -O2
optimization level (CMAKE_BUILD_TYPE=RelWithDebInfo, -fglobal-isel) on Aarch64 (macOS):
Number of variables with 100% of parent scope covered by DW_AT_location has been increased by 7,9%.
Number of variables with 0% coverage of parent scope has been decreased by 1,2%.
Number of variables processed by location statistics has been increased by 2,9%.
Average PC ranges coverage has been increased by 1,8 percentage points.
Coverage can be improved by supporting more instructions, or by calling
salvageDebugInfo for instructions that are deleted during Combiner rules exection.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D129909
A build vector of two extracted elements is equivalent to an extract
subvector where the inner vector is any-extended to the
extract_vector_elt VT, because extract_vector_elt has the effect of an
any-extend.
(build_vector (extract_elt_i16_to_i32 vec Idx+0) (extract_elt_i16_to_i32 vec Idx+1))
=> (extract_subvector (anyext_i16_to_i32 vec) Idx)
Depends on D130697
Differential Revision: https://reviews.llvm.org/D130698
Currently, the LLVM IR -> MIR translator fails to translate dbg.values
whose first argument is a null pointer. However, in other portions of
the code, such pointers are always lowered to the constant zero, for
example see IRTranslator::Translate(Constant, Register).
This patch addresses the limitation by following the same approach of
lowering null pointers to zero.
A prior test was checking that null pointers were always lowered to
$noreg; this test is changed to check for zero, and the previous
behavior is now checked by introducing a dbg.value whose first argument
is the address of a global variable.
Differential Revision: https://reviews.llvm.org/D130721
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.
This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits.
There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP.
Differential Revision: https://reviews.llvm.org/D77804
Currently, the IR to MIR translator can only handle two kinds of constant
inputs to dbg.values intrinsics: constant integers and constant floats. In
particular, it cannot handle pointers created from IntToPtr ConstantExpression
objects.
This patch addresses the limitation above by replacing the IntToPtr with
its input integer prior to converting the dbg.value input.
Patch by Felipe Piovezan!
Differential Revision: https://reviews.llvm.org/D130642
Without this, the intrinsic will be expanded to an integer; thereby an
explicit copy (from GPR to SIMD register) will be codegen'd. This matches the
general convention of using "v1" types to represent scalar integer operations in
vector registers.
The similar approach is observed in D56616, and the pattern likely applies on
other intrinsic that accepts integer scalars (e.g.,
int_aarch64_neon_sqdmulls_scalar)
Differential Revision: https://reviews.llvm.org/D130548
This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of
materializing a specific constant in code size. Doing so prevents us from
sinking constants which require multiple instructions to generate into
use blocks.
Code size savings on CTMark -Os:
Program size.__text
before after diff
ClamAV/clamscan 381940.00 382052.00 0.0%
lencod/lencod 428408.00 428428.00 0.0%
SPASS/SPASS 411868.00 411876.00 0.0%
kimwitu++/kc 449944.00 449944.00 0.0%
Bullet/bullet 463588.00 463556.00 -0.0%
sqlite3/sqlite3 284696.00 284668.00 -0.0%
consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0%
7zip/7zip-benchmark 595244.00 594972.00 -0.0%
mafft/pairlocalalign 247512.00 247368.00 -0.1%
tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2%
Geomean difference -0.0%
Differential Revision: https://reviews.llvm.org/D130554
SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits.
@lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern.
Differential Revision: https://reviews.llvm.org/D129765
This patch starts small, only detecting sequences of the form
<a, a+n, a+2n, a+3n, ...> where a and n are ConstantSDNodes.
Differential Revision: https://reviews.llvm.org/D125194
This helps fold away the ptest instructions, which needs the knowledge on whether
the general predicate is known to zero the inactive lanes.
This fixes some PTEST regressions introduced by D129282.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D129852
This also adds new sve-ptest tests for FP compares that will retain
the ptest.
This also includes a few other NFC changes:
* Added type mangling to ptest.any intrinsic.
* Regenerated asm using update_llc_tests script.
Further test-failure fallout from D130358. There were a handful of
uses of llvm-objdump in the CodeGen tests as well, which have taken me
longer to get to because more things had to be built.
Optimizing (a * 0 + b) to (b) requires assuming that a is finite and not
NaN. DAGCombiner will do this optimization when the reassoc fast math
flag is set, which is not correct. Change DAGCombiner to only consider
UnsafeMath for this optimization.
Differential Revision: https://reviews.llvm.org/D130232
Co-authored-by: Andrea Faulds <andrea.faulds@arm.com>