llvm-project

Commit Graph

Author	SHA1	Message	Date
OverMighty	232953f996	[AArch64] Add pattern for SQDML*Lv1i32_indexed There was no pattern to fold into these instructions. This patch adds the pattern obtained from the following ACLE intrinsics so that they generate sqdmlal/sqdmlsl instructions instead of separate sqdmull and sqadd/sqsub instructions: - vqdmlalh_s16, vqdmlslh_s16 - vqdmlalh_lane_s16, vqdmlalh_laneq_s16, vqdmlslh_lane_s16, vqdmlslh_laneq_s16 (when the lane index is 0) It also modifies the result of the existing pattern for the latter, when the lane index is not 0, to use the v1i32_indexed instructions instead of the v4i16_indexed ones. Fixes #49997. Differential Revision: https://reviews.llvm.org/D131700	2022-08-17 12:00:47 +01:00
Vitaly Buka	16fecdfa70	Revert "[AArch64] Add `foldCSELOfCSEl` DAG combine" Breaks ubsan on buildbot, details in D125504 This reverts commit `6f9423ef06`.	2022-08-16 20:29:37 -07:00
Eli Friedman	cfd2c5ce58	Untangle the mess which is MachineBasicBlock::hasAddressTaken(). There are two different senses in which a block can be "address-taken". There can be a BlockAddress involved, which means we need to map the IR-level value to some specific block of machine code. Or there can be constructs inside a function which involve using the address of a basic block to implement certain kinds of control flow. Mixing these together causes a problem: if target-specific passes are marking random blocks "address-taken", if we have a BlockAddress, we can't actually tell which MachineBasicBlock corresponds to the BlockAddress. So split this into two separate bits: one for BlockAddress, and one for the machine-specific bits. Discovered while trying to sort out related stuff on D102817. Differential Revision: https://reviews.llvm.org/D124697	2022-08-16 16:15:44 -07:00
Karl Meakin	6f9423ef06	[AArch64] Add `foldCSELOfCSEl` DAG combine Differential Revision: https://reviews.llvm.org/D125504	2022-08-16 12:49:11 +01:00
Zain Jaffal	7155ed4289	[AArch64] Add support for 256-bit non temporal loads Currenlty all temporal loads are mapped to `LDP` or `LDR`. This patch will map all the non temporal 256-bit loads into `LDNP`. Future patches should address other non-temporal loads. Reviewed By: fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D131773	2022-08-16 12:19:36 +01:00
Vitaly Buka	e0e960923f	[AArch64] Fix signed integer overflow in CSINC case Followup to D131815, which overlflows on different values.	2022-08-15 15:04:20 -07:00
Peter Waller	6e85db7293	[DAGCombine] Combine signext_inreg of extract-extend The outer signext_inreg is redundant in the following: Fold (signext_inreg (extract_subvector (zext\|anyext\|sext iN_value to _) _) from iN) -> (extract_subvector (signext iN_value to iM)) Tests are precommitted and clone those by analogy from the AND case in the same file. Add a negative test to check extension width is handled correctly. This patch supersedes D130700. Differential Revision: https://reviews.llvm.org/D131503	2022-08-15 10:58:07 +00:00
Zain Jaffal	df4878d28d	[AArch64] Tests for non-temporal loads. Add some test cases for D131773 where LDNP could be used as well as negative tests. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D131767	2022-08-15 09:16:02 +01:00
Vitaly Buka	f1596952f9	[AArch64] Fix signed integer overflow in CSINC case https://lab.llvm.org/staging/#/builders/224/builds/2/steps/16/logs/stdio Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D131815	2022-08-13 13:12:09 -07:00
Florian Hahn	c2af37dcdb	Revert "[AArch64][GlobalISel] Recognise some CCMPri" This reverts commit `38c2366b3f`. This patch seems to break boostraping LLVM with `-fglobal-isel -O3` on AArch64 hardware. Without the revert, there are 500+ test failures for the `check-llvm-codegen-x86` target.	2022-08-13 17:44:41 +01:00
David Green	a9e9dd9a3a	[AArch64] Add bf16 select handling A bfloat select operation will currently crash, but is allowed from C. This adds handling for the operation, turning it into a FCSELHrrr if fullfp16 is present, or converting it to a FCSELSrrr if not. The FCSELSrrr is created via using INSERT_SUBREG/EXTRACT_SUBREG to convert the bf16 to a f32 and using the f32 pattern for FCSELSrrr. (I originally attempted to do this via a tablegen pattern, but it appears that the nzcv glue is places onto the wrong node, causing it to be forgotten and incorrect scheduling to be emitted). The FCSELSrrr can also be used for fp16 selects when +fullfp16 is not present, which helps avoid an unnecessary promotion to f32. Differential Revision: https://reviews.llvm.org/D131253	2022-08-11 14:20:36 +01:00
Andre Vieira	1640679187	[TypePromotion] Search from ZExt + PHI Expand TypePromotion pass to try to promote PHI-nodes in loops that are the operand of a ZExt, using the ZExt's result type to determine the Promote Width. Differential Revision: https://reviews.llvm.org/D111237	2022-08-11 09:50:10 +01:00
Edd Barrett	fa250250b2	Migrate llvm.experimental.patchpoint() to ptr. This intrinsic used a typed pointer for a call target operand. This change updates the operand to be an opaque pointer and updates all pointers in all test files that use the intrinsic. Differential revision: https://reviews.llvm.org/D131261	2022-08-10 13:18:02 +01:00
David Truby	b1b9c39629	[AArch64][SVE] Use SVE for VLS fcopysign for wide vectors Currently fcopysign for VLS vectors lowers through NEON even when the vector width is wider than a NEON vector, causing bad codegen as the vectors are split. This patch causes SVE to be used for these vectors instead, giving much better codegen on wide VLS vectors. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D128642	2022-08-10 10:17:19 +00:00
David Green	20e6239a44	[AArch64] Regenerate arm64-fmax.ll test. NFC	2022-08-09 16:59:00 +01:00
Peter Waller	310962f25e	[DAGCombine][NFC] Precommit extract-subvec-combine sext tests	2022-08-09 15:44:15 +00:00
Luo, Yuanke	aaf6c7b05c	[globalisel] Select register bank for DBG_VALUE The register operand of DBG_VALUE is not selected to a proper register bank in both AArch64 and X86. This would cause getRegClass crash after global ISel. After discussion, we think the MIR should assume all vritual register should be set proper register class after global ISel, so this patch is to fix the gap of DBG_VALUE for AArch64 and X86. Differential Revision: https://reviews.llvm.org/D129037	2022-08-09 13:11:51 +08:00
Cullen Rhodes	a6dec9f5b2	[AArch64][SVE] Add patterns to select masked FP arith Add patterns to select predicated instructions when lowering: fadd(a, select(mask, b, splat(0))) fsub(a, select(mask, b, splat(0))) 'fadd' is unsafe unless no-signed zeros fast-math flag is set, since -0.0 + 0.0 = 0.0 changes the sign. Alive2: https://alive2.llvm.org/ce/z/wbhJh_ Also adds FMA patterns for: fadd(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c) fsub(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c) These patterns require the 'contract' fast-math flag to be set, and the fadd 'nsz' as above. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130564	2022-08-08 08:44:13 +00:00
Cullen Rhodes	17ac26a78e	[AArch64][SVE] NFC: Add tests for masked FP arith patterns (D130564)	2022-08-08 08:44:13 +00:00
Paul Walker	0533c39a76	[SVE] Expand DUPM patterns to handle all integer vector types. NOTE: i8 vector splats are ignored because the immediate range of DUP already has full coverage. Differential Revision: https://reviews.llvm.org/D131078	2022-08-05 16:00:08 +00:00
David Green	38c2366b3f	[AArch64][GlobalISel] Recognise some CCMPri This is a simple addition to emitConditionalComparison, to match CCMP with immediates using getIConstantVRegValWithLookThrough, letting it select the CCMPri variants of the instructions. Differential Revision: https://reviews.llvm.org/D131073	2022-08-05 07:48:42 +01:00
David Green	6ff873ac86	[AArch64] Add some extra GlobalISel CCMP tests coverage. NFC	2022-08-04 20:52:26 +01:00
Felipe de Azevedo Piovezan	a5a8a05c78	[SelectionDAG] Handle IntToPtr constants in dbg.value The function `handleDebugValue` has custom logic to handle certain kinds constants, namely integers, floats and null pointers. However, it does not handle constant pointers created from IntToPtr ConstantExpressions. This patch addresses the issue by replacing the Constant with its integer operand. A similar bug was addressed for GlobalISel in D130642. Reviewed By: aprantl, #debug-info Differential Revision: https://reviews.llvm.org/D130908	2022-08-03 14:10:05 -04:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Vladislav Dzhidzhoev	71aecbb75c	[AArch64] Treat x18 as callee-saved in functions with Windows calling convention on Darwin rGcf97e0ec42b8 makes $x18 to be treated as callee-saved in functions with Windows calling convention on non-Windows OSes. Here we mark $x18 as callee-saved for functions with Windows calling convention on Darwin, as well as on other non-Windows platforms, in order to prevent some miscompilations (like miscompilation of win64cc-darwin-backup-x18.ll). Since getCalleeSavedRegs doesn't return x18 in list of callee-saved registers, assignCalleeSavedSpillSlots and determineCalleeSaves consider different sets of registers as callee-saved. It causes an error: ``` Assertion failed: ((!HasCalleeSavedStackSize \|\| getCalleeSavedStackSize() == Size) && "Invalid size calculated for callee saves"), function getCalleeSavedStackSize, file AArch64MachineFunctionInfo.h, line 292. ``` Differential Revision: https://reviews.llvm.org/D130676	2022-08-02 20:33:42 +03:00
David Green	1206f72e31	[AArch64] Fold Mul(And(Srl(X, 15), 0x10001), 0xffff) to CMLTz This folds a v4i32 Mul(And(Srl(X, 15), 0x10001), 0xffff) into a v8i16 CMLTz instruction. The Srl and And extract the top bit (whether the input is negative) and the Mul sets all values in the i16 half to all 1/0 depending on if that top bit was set. This is equivalent to a v8i16 CMLTz instruction. The same applies to other sizes with equivalent constants. Differential Revision: https://reviews.llvm.org/D130874	2022-08-02 13:01:59 +01:00
David Green	29f97ec845	[AArch64] Mul fold tests for D130874. NFC	2022-08-02 12:29:40 +01:00
Tim Northover	b586dc21a7	Outliner: add "target-cpu" feature from source function to outlined The CPU is used to determine which inline asm instructions are allowed, so needs to be copied across in case the outlined function contains any.	2022-08-02 09:33:29 +01:00
David Sherwood	41119a0f52	[DAGCombiner] Extend visitAND to include EXTRACT_SUBVECTOR Eliminate an AND by redefining an anyext\|sext\|zext. (and (extract_subvector (anyext\|sext\|zext v) _) iN_mask) => (extract_subvector (zeroext_iN v)) Differential Revision: https://reviews.llvm.org/D130782	2022-08-01 10:32:32 +01:00
Vladislav Dzhidzhoev	facb3ac385	[GlobalISel][DebugInfo] salvageDebugInfo analogue for gMIR Salvage debug info of instruction that is about to be deleted as dead in Combiner pass. Currently supported instructions are COPY and G_TRUNC. It allows to salvage debug info of some dead arguments of functions, by putting DWARF expression corresponding to the instruction being deleted into related DBG_VALUE instruction. Here is an example of missing variables location https://godbolt.org/z/K48osb9dK. We see that arguments x, y of function foo are not available in debugger, and corresponding DBG_VALUE instructions have undefined register operand instead of variables locaton after Aarch64PreLegalizerCombiner pass. The reason is that registers where variables are located are removed as dead (with instruction G_TRUNC). We can use salvageDebugInfo analogue for gMIR to preserve debug locations of dead variables. Statistics of llvm object files built with vs without this commit on -O2 optimization level (CMAKE_BUILD_TYPE=RelWithDebInfo, -fglobal-isel) on Aarch64 (macOS): Number of variables with 100% of parent scope covered by DW_AT_location has been increased by 7,9%. Number of variables with 0% coverage of parent scope has been decreased by 1,2%. Number of variables processed by location statistics has been increased by 2,9%. Average PC ranges coverage has been increased by 1,8 percentage points. Coverage can be improved by supporting more instructions, or by calling salvageDebugInfo for instructions that are deleted during Combiner rules exection. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D129909	2022-08-01 11:14:53 +02:00
David Sherwood	487fa6f8c3	[AArch64][DAGCombine] Add performBuildVectorCombine 'extract_elt ~> anyext' A build vector of two extracted elements is equivalent to an extract subvector where the inner vector is any-extended to the extract_vector_elt VT, because extract_vector_elt has the effect of an any-extend. (build_vector (extract_elt_i16_to_i32 vec Idx+0) (extract_elt_i16_to_i32 vec Idx+1)) => (extract_subvector (anyext_i16_to_i32 vec) Idx) Depends on D130697 Differential Revision: https://reviews.llvm.org/D130698	2022-07-29 09:51:09 +01:00
David Sherwood	6953e754c7	[NFC][AArch64] Precommit vector-fcvt tests Add tests which show code quality of uitofp and sitofp. Differential Revision: https://reviews.llvm.org/D130697	2022-07-29 09:29:15 +01:00
Felipe de Azevedo Piovezan	58526b2d2b	[GlobalISel] Handle nullptr constants in dbg.value Currently, the LLVM IR -> MIR translator fails to translate dbg.values whose first argument is a null pointer. However, in other portions of the code, such pointers are always lowered to the constant zero, for example see IRTranslator::Translate(Constant, Register). This patch addresses the limitation by following the same approach of lowering null pointers to zero. A prior test was checking that null pointers were always lowered to $noreg; this test is changed to check for zero, and the previous behavior is now checked by introducing a dbg.value whose first argument is the address of a global variable. Differential Revision: https://reviews.llvm.org/D130721	2022-07-28 14:58:14 -07:00
Simon Pilgrim	69d5a038b9	[DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits. There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP. Differential Revision: https://reviews.llvm.org/D77804	2022-07-28 14:10:44 +01:00
Amara Emerson	93e3aeb9a8	[AArch64][GlobalISel] Fix custom legalization of rotates using sext for shift vs zext. Rotates are defined according to DAG documentation as having unsigned shifts, so we need to zero-extend instead of sign-extend here. Fixes issue 56664	2022-07-27 22:10:42 -07:00
Amara Emerson	c16fa781f4	GlobalISel: update legalize-rotr-rotl.mir checks before change.	2022-07-27 22:10:04 -07:00
Adrian Prantl	719ab04acf	[GlobalISel] Handle IntToPtr constants in dbg.value Currently, the IR to MIR translator can only handle two kinds of constant inputs to dbg.values intrinsics: constant integers and constant floats. In particular, it cannot handle pointers created from IntToPtr ConstantExpression objects. This patch addresses the limitation above by replacing the IntToPtr with its input integer prior to converting the dbg.value input. Patch by Felipe Piovezan! Differential Revision: https://reviews.llvm.org/D130642	2022-07-27 13:42:07 -07:00
Mingming Liu	34348814e1	[AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.pmull64 Without this, the intrinsic will be expanded to an integer; thereby an explicit copy (from GPR to SIMD register) will be codegen'd. This matches the general convention of using "v1" types to represent scalar integer operations in vector registers. The similar approach is observed in D56616, and the pattern likely applies on other intrinsic that accepts integer scalars (e.g., int_aarch64_neon_sqdmulls_scalar) Differential Revision: https://reviews.llvm.org/D130548	2022-07-27 11:11:16 -07:00
Amara Emerson	19cdd1908b	[AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT. This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of materializing a specific constant in code size. Doing so prevents us from sinking constants which require multiple instructions to generate into use blocks. Code size savings on CTMark -Os: Program size.__text before after diff ClamAV/clamscan 381940.00 382052.00 0.0% lencod/lencod 428408.00 428428.00 0.0% SPASS/SPASS 411868.00 411876.00 0.0% kimwitu++/kc 449944.00 449944.00 0.0% Bullet/bullet 463588.00 463556.00 -0.0% sqlite3/sqlite3 284696.00 284668.00 -0.0% consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0% 7zip/7zip-benchmark 595244.00 594972.00 -0.0% mafft/pairlocalalign 247512.00 247368.00 -0.1% tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2% Geomean difference -0.0% Differential Revision: https://reviews.llvm.org/D130554	2022-07-27 10:51:16 -07:00
Matt Devereau	35e781fb05	[AArch64][SVE] Add Gather Index narrowing tests	2022-07-27 15:32:18 +00:00
Simon Pilgrim	529bd4f352	[DAG] SimplifyDemandedBits - don't early-out for multiple use values SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits. @lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern. Differential Revision: https://reviews.llvm.org/D129765	2022-07-27 10:54:06 +01:00
Amara Emerson	9cc1dd209d	[AArch64][GlobalISel] Lower vector G_CTTZ. Fixes issue 56398	2022-07-27 00:14:30 -07:00
Amara Emerson	aeeb174cec	Update checks legalize-cttz.mir test before change.	2022-07-26 23:41:09 -07:00
Jessica Paquette	39d431d811	[GlobalISel] Import patterns for G_FMAXIMUM + G_FMINIMUM Allows us to select scalar instructions on AArch64. Differential Revision: https://reviews.llvm.org/D115381	2022-07-26 10:58:44 -07:00
Paul Walker	e5c892dd85	[SVE][SelectionDAG] Use INDEX to generate matching instances of BUILD_VECTOR. This patch starts small, only detecting sequences of the form <a, a+n, a+2n, a+3n, ...> where a and n are ConstantSDNodes. Differential Revision: https://reviews.llvm.org/D125194	2022-07-26 15:28:37 +00:00
Sander de Smalen	a41ddf178e	[AArch64][SVE] Sink ptrue into loop if it is used by PTEST. This helps fold away the ptest instructions, which needs the knowledge on whether the general predicate is known to zero the inactive lanes. This fixes some PTEST regressions introduced by D129282. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D129852	2022-07-26 15:07:41 +01:00
Sander de Smalen	370ff43a15	[AArch64][SVE] Consider more intrinsics in 'isZeroingInactiveLanes'. This fixes some PTEST regressions introduced by D129282. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D129851	2022-07-26 15:07:41 +01:00
Sander de Smalen	5a594c2831	[AArch64][SVE] NFC: Add test-case to sve-ptest-removal-cmp* tests This also adds new sve-ptest tests for FP compares that will retain the ptest. This also includes a few other NFC changes: * Added type mangling to ptest.any intrinsic. * Regenerated asm using update_llc_tests script.	2022-07-26 15:07:41 +01:00
Simon Tatham	5c396be575	[llvm-objdump,ARM] Fix further test failures. Further test-failure fallout from D130358. There were a handful of uses of llvm-objdump in the CodeGen tests as well, which have taken me longer to get to because more things had to be built.	2022-07-26 11:35:16 +01:00
Sven van Haastregt	c8d91b07bb	Reassoc FMF should not optimize FMA(a, 0, b) to (b) Optimizing (a * 0 + b) to (b) requires assuming that a is finite and not NaN. DAGCombiner will do this optimization when the reassoc fast math flag is set, which is not correct. Change DAGCombiner to only consider UnsafeMath for this optimization. Differential Revision: https://reviews.llvm.org/D130232 Co-authored-by: Andrea Faulds <andrea.faulds@arm.com>	2022-07-26 09:39:12 +01:00

1 2 3 4 5 ...

5822 Commits