llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Roelofs	6611fbc62a	[AArch64] Dump a little more info about unimplemented reg-to-reg copies. NFC	2021-07-12 15:37:11 -07:00
Eli Friedman	6c04b7dd4f	[AArch64] Optimize overflow checks for [s\|u]mul.with.overflow.i32. Saves one instruction for signed, uses a cheaper instruction for unsigned. Differential Revision: https://reviews.llvm.org/D105770	2021-07-12 15:30:42 -07:00
Benjamin Kramer	0da3573a9e	[AArch64] Silence unused variable warning. NFC. AArch64ISelLowering.cpp:15167:8: warning: unused variable 'OpCode' [-Wunused-variable] auto OpCode = N->getOpcode(); ^	2021-07-12 16:01:11 +02:00
Cullen Rhodes	9e42675103	[AArch64] Add target features for Armv9-A Scalable Matrix Extension (SME) First patch in a series adding MC layer support for the Arm Scalable Matrix Extension. This patch adds the following features: sme, sme-i64, sme-f64 The sme-i64 and sme-f64 flags are for the optional I16I64 and F64F64 features. If a target supports I16I64 then the following instructions are implemented: * 64-bit integer ADDHA and ADDVA variants (D105570). * SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA, UMOPS, USMOPA, and USMOPS instructions that accumulate 16-bit integer outer products into 64-bit integer tiles. If a target supports F64F64 then the FMOPA and FMOPS instructions that accumulate double-precision floating-point outer products into double-precision tiles are implemented. Outer products are implemented in D105571. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D105569	2021-07-12 13:28:10 +00:00
Michael Liao	8253fa2298	Fix warning '-Wparentheses'. NFC.	2021-07-12 09:25:30 -04:00
David Green	f73334c46d	[AArch64] Set the latency of Cortex-A55 stores to 1 This sets the latency of stores to 1 in the Cortex-A55 scheduling model, to better match the values given in the software optimization guide. The latency of a store in normal llvm scheduling does not appear to have a lot of uses. If the store has no outputs then the latency is somewhat meaningless (and pre/post increment update operands use the WriteAdr write for those operands instead). The one place it does alter things is the latency between a store and the end of the scheduling region, which can in turn have an effect on the critical path length. As a result a latency of 1 is more correct and offers ever-so-slightly better scheduling of instructions near the end of the block. They are marked as RetireOOO to keep the llvm-mca from introducing stalls where non would exist. Differential Revision: https://reviews.llvm.org/D105541	2021-07-12 13:39:35 +01:00
David Truby	c305557acd	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-12 11:14:17 +01:00
Fangrui Song	57503524b1	[AArch64] De-capitalize some Emit* functions AsmParser/AsmPrinter/Streamer are mostly consistent on emit* functions now.	2021-07-11 22:05:39 -07:00
Amara Emerson	97c426394a	[AArch64][GlobalISel] Implement moreElements legalization for G_SHUFFLE_VECTOR. Differential Revision: https://reviews.llvm.org/D103301	2021-07-10 00:25:26 -07:00
Amara Emerson	58a2cb5143	[GlobalISel] Add a new artifact combiner for unmerge which looks through general artifact expressions. The original motivation for this was to implement moreElementsVector of shuffles on AArch64, which resulted in complex sequences of artifacts like unmerge(unmerge(concat...)) which the combiner couldn't handle. It seemed here that the better option, instead of writing ever-more-complex combines, was to have a way to find the original "non-artifact" source registers for a given definition, walking through arbitrary expressions of unmerge/concat/insert. As long as the bits aren't extended or truncated, this is a pretty simple algorithm that avoids the need for lots of combines and instead jumps straight to the final result we want. I've only used this new technique in 2 places within tryCombineUnmerge, using it in more general situations resulted in infinite loops in AMDGPU. So for now it's used when we would otherwise fail to combine and that seems to work. In order to support looking through G_INSERTs, I also had to add it as an artifact in isArtifact(), which caused a whole lot of issues in tests. AMDGPU started infinite looping since full legalization of G_INSERT doensn't seem to be there. To work around this, I've temporarily added a CLI option to use the old behaviour so that the MIR tests will still run and terminate. Other minor changes include no longer making >128b G_MERGE/UNMERGE legal. We never had isel support for that anyway and it was a remnant of the legacy legalizer rules. However being legal prevented the combiner from checking if it was dead and deleting them. Differential Revision: https://reviews.llvm.org/D104355	2021-07-09 22:35:00 -07:00
David Green	38c9a4068d	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484	2021-07-09 11:51:16 +01:00
Muhammad Omair Javaid	932e3d9960	Revert "GlobalISel/AArch64: don't optimize away redundant branches at -O0" This reverts commit `458c230b5e`. This broke LLDB buildbot testcase where breakpoint set at start of loop failed to hit. https://lab.llvm.org/buildbot/#/builders/96/builds/9404 https://github.com/llvm/llvm-project/blob/main/lldb/test/API/commands/process/attach/main.cpp#L15 Differential Revision: https://reviews.llvm.org/D105238	2021-07-09 08:23:36 +05:00
Matt Arsenault	9b057f647d	GlobalISel: Track original argument index in ArgInfo SelectionDAG's equivalents in ISD::InputArg/OutputArg track the original argument index. Mips relies on this, and its currently reinventing its own parallel CallLowering infrastructure which tracks these indexes on the side. Add this to help move towards deleting the custom mips handling.	2021-07-08 13:39:02 -04:00
Bradley Smith	026bb84bcd	[AArch64][SVE] Add ISel patterns for floating point compare with zero instructions Additionally, lower the floating point compare SVE intrinsics to SETCC_MERGE_ZERO ISD nodes to avoid duplicating ISel patterns. Differential Revision: https://reviews.llvm.org/D105486	2021-07-08 10:46:12 +00:00
Adrian Prantl	458c230b5e	GlobalISel/AArch64: don't optimize away redundant branches at -O0 This patch prevents GlobalISel from optimizing out redundant branch instructions when compiling without optimizations. The motivating example is code like the following common pattern in Swift, where users expect to be able to set a breakpoint on the early exit: public func f(b: Bool) { guard b else { return // I would like to set a breakpoint here. } ... } The patch modifies two places in GlobalISEL: The first one is in IRTranslator.cpp where the removal of redundant branches is made conditional on the optimization level. The second one is in AArch64InstructionSelector.cpp where an -O0 only optimization is being removed. Disabling these optimizations increases code size at -O0 by ~8%. However, doing so improves debuggability, and debug builds are the primary reason why developers compile without optimizations. We thus concluded that this is the right trade-off. rdar://79515454 Differential Revision: https://reviews.llvm.org/D105238	2021-07-07 12:51:55 -07:00
Irina Dobrescu	5888a194c1	[AArch64][GlobalISel] Lower vector types for min/max Differential Revision: https://reviews.llvm.org/D105433	2021-07-07 15:34:03 +01:00
Eli Friedman	56b3e9edc4	[AArch64] Sync isDef32 to the current x86 version. We should probably come up with some better way to do this, but let's make sure to catch known issues for now.	2021-07-06 17:05:01 -07:00
Bradley Smith	5ab9000fbb	[AArch64][SVE] Fix selection failures for scalable MLOAD nodes with passthru Differential Revision: https://reviews.llvm.org/D105348	2021-07-06 14:17:23 +00:00
Kerry McLaughlin	a7512401e5	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Peter Waller	c5dfee44b9	[CodeGen][AArch64][SVE] Use ld1r[bhsd] for vector splat from memory This avoids the use of the vector unit for copying from scalar to vector. There is an extra ptrue instruction, but a predicate register with the ptrue pattern populated is likely to be free in the context of real code. Tests were generated from a template to cover the axes mentioned at the top of the test file. Co-authored-by: Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D103170	2021-07-06 12:03:54 +00:00
Tiehu Zhang	d4ed965b2d	[AArch64ISelDAGToDAG] Fix ORRWrs/ORRXrs usefulbits calculation bug For the following case: t8: i32 = or t7, t4 t10: i32 = ORRWrs t8, t8, TargetConstant:i32<73> Current code wrongly returns (t8 >> shiftConstant) as the UsefulBits of t8, which in fact is (t8 \| (t8 >> shiftConstant)). Reviewed by: sdesmalen, mdchen Differential Revision: https://reviews.llvm.org/D102759	2021-07-06 00:38:42 +08:00
Paul Walker	88522455c0	Fix typo in help text for -aarch64-enable-branch-targets.	2021-07-05 16:15:40 +01:00
Caroline Concatto	a2c5c56055	[AArch64][CostModel] Add cost model for experimental.vector.splice This patch adds a new ShuffleKind SK_Splice and then handle the cost in getShuffleCost, as in experimental.vector.reverse. Differential Revision: https://reviews.llvm.org/D104630	2021-07-05 14:30:24 +01:00
Bradley Smith	cc273983f7	[AArch64][SVE] Improve fixed length codegen for common vector shuffle case Improve codegen when lowering the common vector shuffle case from the vectorizer (op1[last]:op2[0:last-1]). This patch only handles this common case as it is difficult to handle this more generally when using fixed length vectors, due to being unable to use the SVE ext instruction. Differential Revision: https://reviews.llvm.org/D105289	2021-07-05 12:09:27 +01:00
Sjoerd Meijer	ee752134ac	[AArch64] Cost-model i8 vector loads/stores Loads of <4 x i8> vectors were modeled as extremely expensive. And while we don't have a load instruction that supports this, it isn't that expensive to create a vector of i8 elements. The codegen for this was fixed/optimised in D105110. This now tweaks the cost model and enables SLP vectorisation of my motivating case loadi8.ll. Differential Revision: https://reviews.llvm.org/D103629	2021-07-05 11:25:10 +01:00
Paul Walker	287d39dd5a	[NFC] Fix a few whitespace issues and typos.	2021-07-04 11:49:58 +01:00
David Green	fbc329efbd	[AArch64] Add S/UQXTRN tablegen patterns. This adds simple patterns for signed and unsigned saturating extract narrow instructions. They combine a min/max/truncate into a single instruction, providing that the immediates on the min/max are correct for the saturation type. This is just handled in tablegen with some extra patterns. v2i64->v2i32 is not handled here as the min/max nodes are not legal, making the lowering quite different. Differential Revision: https://reviews.llvm.org/D103263	2021-07-03 07:57:19 +01:00
Krzysztof Parzyszek	df88c26f0d	[OpaquePtr] Add type parameter to emitLoadLinked Differential Revision: https://reviews.llvm.org/D105353	2021-07-02 13:07:40 -05:00
Florian Hahn	1a248233a5	[AArch64] Use custom lowering for fp16 vector copysign. The custom copysign lowering already supports fp16. Use it. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105277	2021-07-02 11:15:30 +01:00
Eli Friedman	0176ac9503	[AArch64] Optimize SVE bitcasts of unpacked types. Target-independent code only knows how to spill to the stack; instead, use AArch64ISD::REINTERPRET_CAST. Differential Revision: https://reviews.llvm.org/D104573	2021-07-01 15:35:48 -07:00
Matt Arsenault	99c7e918b5	GlobalISel: Use LLT in call lowering callbacks This preserves the memory type so the lowerings can rely on them.	2021-07-01 12:15:54 -04:00
Bradley Smith	2668727929	[SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR Inserting into a smaller-than-legal scalable vector would result in an internal compiler error. For example, inserting a <vscale x 4 x i8> into a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a crash. This crash was happening because there was no code to promote (legalise) the result of an INSERT_SUBVECTOR node. This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises the ISD node. This is currently done by going through memory. This is necessary because of the requirement that the SubVec parameter of the INSERT_SUBVECTOR node must be smaller than the Vec parameter, which means that INSERT_SUBVECTOR cannot always have a legal result/operand types. Co-Authored-by: Joe Ellis <joe.ellis@arm.com> Differential Revision: https://reviews.llvm.org/D102766	2021-07-01 17:05:53 +01:00
Irina Dobrescu	71d5b0a757	[AArch64][GlobalISel]Legalise some vector types for min/max Differential Revision: https://reviews.llvm.org/D105200	2021-07-01 16:29:38 +01:00
Bradley Smith	01b846674d	[AArch64][SVE] Add support for fixed length MSCATTER/MGATHER Since gather lowering can now lower to nodes that may need expansion via the vector legalizer, do MGATHER lowering via vector legalizer. Additionally, as part of adding passthru support for fixed typed gathers, fix passthru support for scalable types. Depends on D104910 Differential Revision: https://reviews.llvm.org/D104217	2021-07-01 12:13:59 +01:00
Jun Ma	ae5433945f	[AArch64][SVEIntrinsicOpts] Convect cntb/h/w/d to vscale intrinsic or constant. As is mentioned above Differential Revision: https://reviews.llvm.org/D104852	2021-07-01 10:09:47 +08:00
Fangrui Song	17858da022	[AArch64] Remove unneeded ExternalSymbolSDNode code for machine constraint "S". NFC ExternalSymbolSDNode is implicitly generated libcalls but with an address taking operation we cannot reference an ExternalSymbolSDNode.	2021-06-30 17:52:56 -07:00
Matt Arsenault	28f2f66200	GlobalISel: Use LLT in memory legality queries This enables proper lowering of non-byte sized loads. We still aren't faithfully preserving memory types everywhere, so the legality checks still only consider the size.	2021-06-30 17:44:13 -04:00
Jon Roelofs	a642872476	[GISel] Support llvm.memcpy.inline Differential revision: https://reviews.llvm.org/D105072	2021-06-30 12:39:05 -07:00
Florian Mayer	a24f104645	[MTE] Remove redundant helper function. Looking at PostDominatorTree::dominates, we can see that has the same logic (with the addition of handling Phi nodes - which are not used as inputs in this pass) as the helper function. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105141	2021-06-30 11:11:26 +01:00
Sjoerd Meijer	b062fff87a	Recommit "[AArch64] Custom lower <4 x i8> loads" This recommits D104782 including a fix for adding a wrong operand to the new load node. Differential Revision: https://reviews.llvm.org/D105110	2021-06-30 09:18:06 +01:00
Matt Arsenault	990278d026	CodeGen: Store LLT instead of uint64_t in MachineMemOperand GlobalISel is relying on regular MachineMemOperands to track all of the memory properties of accesses. Just the raw byte size is insufficent to disambiguate all situations. For example, if we need to split an unaligned extending load, we need to know the number of bits in the original source value and can't infer it from the result type. This is also a problem for extending vector loads. This does decrease the maximum representable size from the full uint64_t bytes to a maximum of 16-bits. No in tree testcases hit this, other than places using UINT64_MAX for unknown sizes. This may be an issue for G_MEMCPY and co., although they can just use unknown size for large static sizes. This also has potential for backend abuse by relying on the type when it really shouldn't be relevant after selection. This does not include the necessary MIR printer/parser changes to represent this.	2021-06-29 17:38:51 -04:00
Dylan Fleming	c3d3defd11	[SVE] Added CodeGen support for inserting an element into a predicate vector Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D104722	2021-06-29 14:55:40 +01:00
Sjoerd Meijer	3a7cea2858	Revert "[AArch64] Custom lower <4 x i8> loads" This reverts commit `51e434fc25` because of a build bot failure in test-suite::GCC-C-execute-pr60960.test that I need to investigate.	2021-06-28 17:44:46 +01:00
Bradley Smith	c089e29aa4	[AArch64][SVE] DAG combine SETCC_MERGE_ZERO of a SETCC_MERGE_ZERO This helps remove extra comparisons when generating masks for fixed length masked operations. Differential Revision: https://reviews.llvm.org/D104910	2021-06-28 15:06:06 +01:00
Brendon Cahoon	f9f5d41545	[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX Adds legalizer, register bank select, and instruction select support for G_SBFX and G_UBFX. These opcodes generate scalar or vector ALU bitfield extract instructions for AMDGPU. The instructions allow both constant or register values for the offset and width operands. The 32-bit scalar version is expanded to a sequence that combines the offset and width into a single register. There are no 64-bit vgpr bitfield extract instructions, so the operations are expanded to a sequence of instructions that implement the operation. If the width is a constant, then the 32-bit bitfield extract instructions are used. Moved the AArch64 specific code for creating G_SBFX to CombinerHelper.cpp so that it can be used by other targets. Only bitfield extracts with constant offset and width values are handled currently. Differential Revision: https://reviews.llvm.org/D100149	2021-06-28 09:06:44 -04:00
Lucas Prates	88b1135e72	[Aarch64] Adding support for Armv9-A Realm Management Extension This adds support for Armv9-A's Realm Management Extension, including three new system registers - MFAR_EL3, GPCCR_EL3 and GPTBR_EL3 - and four new TLBI instructions. The reference for the Realm Management Extension can be found at: https://developer.arm.com/documentation/ddi0615/aa. Based on patches by Victor Campos. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D104773	2021-06-28 13:45:22 +01:00
David Green	2887f14639	[ISel] Port AArch64 SABD and UABD to DAGCombine This ports the AArch64 SABD and USBD over to DAG Combine, where they can be used by more backends (notably MVE in a follow-up patch). The matching code has changed very little, just to handle legal operations and types differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing a ubds/abdu which is zexted to the original type. Differential Revision: https://reviews.llvm.org/D91937	2021-06-26 19:34:16 +01:00
Sander de Smalen	b732e6c9a8	Revert "[GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize." This patch seems to be causing build errors, reverting it for now. This reverts commit `aeab9d9570`.	2021-06-25 17:37:16 +01:00
Sander de Smalen	aeab9d9570	[GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize. To reflect that the size may be scalable, a TypeSize is returned instead of an unsigned. In places where the result is used, it currently relies on an implicit cast of TypeSize -> uint64_t, which asserts that the type is not scalable. This patch is NFC for fixed-width vectors. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104454	2021-06-25 17:06:50 +01:00
Sander de Smalen	c9acd2f32e	[GlobalISel] NFC: Change LLT::changeNumElements to LLT::changeElementCount. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104453	2021-06-25 15:54:00 +01:00

1 2 3 4 5 ...

5262 Commits