llvm-project

Commit Graph

Author	SHA1	Message	Date
Arthur Eubanks	46cf82532c	[NFC] Replace Function handling of attributes with less confusing calls To avoid magic constants and confusing indexes.	2021-08-17 21:05:40 -07:00
Simon Pilgrim	1e770f0388	[ARM] ARMDAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results. dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct. Fixes static analyser warnings.	2021-08-17 18:40:59 +01:00
David Green	52e0cf9d61	[ARM] Enable subreg liveness This enables subreg liveness in the arm backend when MVE is present, which allows the register allocator to detect when subregister are alive/dead, compared to only acting on full registers. This can helps produce better code on MVE with the way MQPR registers are made up of SPR registers, but is especially helpful for MQQPR and MQQQQPR registers, where there are very few "registers" available and being able to split them up into subregs can help produce much better code. Differential Revision: https://reviews.llvm.org/D107642	2021-08-17 14:10:33 +01:00
David Green	62e892fa2d	[ARM] Add MQQPR and MQQQQPR spill and reload pseudo instructions As a part of D107642, this adds pseudo instructions for MQQPR and MQQQQPR register classes, that can spill and reloads entire registers whilst keeping them combined, not splitting them into multiple D subregs that a VLDMIA/VSTMIA would use. This can help certain analyses, and helps to prevent verifier issues with subreg liveness.	2021-08-17 13:51:34 +01:00
David Green	9236dea255	[ARM] Create MQQPR and MQQQQPR register classes Similar to the MQPR register class as the MVE equivalent to QPR, this adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR and QQQQPR registers. The MVE MQPR seemed have worked out quite well, and adding MQQPR and MQQQQPR allows us to a little more accurately specify the number of registers, calculating register pressure limits a little better. Differential Revision: https://reviews.llvm.org/D107463	2021-08-16 22:58:12 +01:00
Simon Pilgrim	d6fe8d37c6	[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597	2021-08-16 16:06:54 +01:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
David Green	ae9a346ef8	[ARM] Fix DAG combine loop in reduction distribution Given a constant operand, the MVE and DAGCombine combines could fight, each redistributing in the opposite order. Add a guard to the MVE vecreduce distribution to prevent that.	2021-08-12 16:37:39 +01:00
David Green	8c50b5fbfe	[ARM] Add extra debug messages for validating live outs. NFC We are running into more and more cases where the liveouts of low overhead loops do not validate. Add some extra debug messages to make it clearer why.	2021-08-11 10:35:53 +01:00
David Green	c140ff493e	[ARM] Change a couple of instances of LiveRegs.contains to !LiveRegs.available This changes a couple of calls to LiveRegs.contains to !LiveRegs.available, one in Thumb1FrameLoweringInfo (which modifies a test to look more correct to me, given r7 should be the frame pointer so is not available), and another in the ARMLoadStoreOptimizer, that I don't have a test for, it was just found by inspection. Differential Revision: https://reviews.llvm.org/D107454	2021-08-10 09:53:26 +01:00
Amara Emerson	2b067e3335	Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG. DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.	2021-08-06 12:57:53 -07:00
David Green	77e8f4eeee	[ARM] Define ComplexPatternFuncMutatesDAG Some of the Arm complex pattern functions call canExtractShiftFromMul, which can modify the DAG in-place. For this to be valid and handled successfully we need to define ComplexPatternFuncMutatesDAG. Differential Revision: https://reviews.llvm.org/D107476	2021-08-06 17:35:11 +01:00
Simon Pilgrim	dbce6a8d9d	[ARM] Fold insert_subvector to concat_vectors D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage. I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.	2021-08-06 11:21:31 +01:00
Amara Emerson	4fee756c75	Delete copy-ctor of MachineFrameInfo. I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo() without declaring the variable as a reference. Deleting the copy-constructor also showed a place in the ARM backend which was doing the same thing, albeit it didn't impact correctness there from the looks of it.	2021-08-05 23:24:37 -07:00
Igor Kudrin	2c14798ead	[ARM][llvm-objdump] Annotate PC-relative memory operands of VLDR instructions This extends D105979 and adds support for VLDR instructions. Differential Revision: https://reviews.llvm.org/D105980	2021-08-05 14:11:11 +07:00
Igor Kudrin	ddbe812bcc	[ARM][llvm-objdump] Annotate PC-relative memory operands This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for Arm so that the disassembler can print the target address of memory operands that use PC+immediate addressing. Differential Revision: https://reviews.llvm.org/D105979	2021-08-05 14:11:11 +07:00
Tomas Matheson	40650f27b5	[ARM][atomicrmw] Fix CMP_SWAP_32 expand assert This assert is intended to ensure that the high registers are not selected when it is passed to one of the thumb UXT instructions. However it was triggering even for 32 bit where no UXT instruction is emitted. Fixes PR51313. Differential Revision: https://reviews.llvm.org/D107363	2021-08-04 15:02:02 +01:00
Arthur Eubanks	ad25344620	[MC][CodeGen] Emit constant pools earlier Previously we would emit constant pool entries for ldr inline asm at the very end of AsmPrinter::doFinalization(). However, if we're emitting dwarf aranges, that would end all sections with aranges. Then if we have constant pool entries to be emitted in those same sections, we'd hit an assert that the section has already been ended. We want to emit constant pool entries before emitting dwarf aranges. This patch splits out arm32/64's constant pool entry emission into its own MCTargetStreamer virtual method. Fixes PR51208 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107314	2021-08-03 20:55:31 -07:00
Roman Lebedev	6f6e9a867f	[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop I'm not sure this is the best way to approach this, but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107271	2021-08-03 00:57:26 +03:00
David Green	c423a586a7	[ARM] Remove setPreservesCFG from ARMBlockPlacement As of `2829391840` it no longer preserves the CFG, needing to split blocks in order to add DLS instructions.	2021-08-02 14:15:45 +01:00
Simon Pilgrim	0579050116	Fix MSVC signed/unsigned comparison warning. NFCI.	2021-08-02 11:23:43 +01:00
David Green	2829391840	[ARM] Revert WLSTP to DLSTP if the target block is out of range If the block target for a WLSTP instruction is known to be out of range, and cannot be fixed by the ARMBlockPlacementPass, we can relax it to a DLSTP (and cmp/branch) to still allow the creation of tail predicated loops. That is what this patch does, adding extra revert code to the fallback path of ARMBlockPlacementPass. Due to the code produced when reverting, this creates a DLSTP between a Bcc and a Br. As a DLS isn't necessarily a terminator we need to split the block to move the DLS/Br into. Differential Revision: https://reviews.llvm.org/D104709	2021-08-02 10:59:52 +01:00
David Green	15a1d7e839	[ARM] Switch order of creating VADDV and VMLAV. It can be beneficial to attempt to try the larger VMLAV patterns before VADDV, in case both may match the same code.	2021-07-31 16:28:52 +01:00
David Green	69cdadddec	[ARM] Distribute reductions based on ascending load offset This distributes reductions based on the relative offset of loads, if one is found from their operands. Given chains of reductions this will then sort them in ascending load order, which in turn can help simple prefetches latch on to increasing strides more easily. Differential Revision: https://reviews.llvm.org/D106569	2021-07-30 19:50:07 +01:00
David Green	532d05b714	[ARM] Attempt to distribute reductions This adds a combine for adds of reductions, distributing them so that they occur sequentially to enable better use of accumulating VADDVA instructions. It combines: add(X, add(vecreduce(Y), vecreduce(Z))) -> add(add(X, vecreduce(Y)), vecreduce(Z)) and add(add(A, reduce(B)), add(C, reduce(D))) -> add(add(add(A, C), reduce(B)), reduce(D)) These together distribute the add's so that more reductions can be selected to VADDVA. Differential Revision: https://reviews.llvm.org/D106532	2021-07-30 14:48:31 +01:00
David Green	4b56306762	[ARM] Turn vecreduce_add(add(x, y)) into vecreduce(x) + vecreduce(y) Under MVE we can use VADDV/VADDVA's to perform integer add reductions, so it can be beneficial to use more reductions than summing subvectors and reducing once. Especially for VMLAV/VMLAVA the mul can be incorporated into the reduction, producing less instructions. Some of the test cases currently get larger due to extra integer adds, but will be improved in a followup patch. Differential Revision: https://reviews.llvm.org/D106531	2021-07-30 10:10:41 +01:00
David Green	d4a2daa919	[ARM] Define a couple more ssub indexes. NFC Same as `91bd3ad128`, this doesn't really change anything but gives the registers better names than the ones tablegen would define. And fills in the missing gaps.	2021-07-29 23:00:35 +01:00
David Green	41cedb1c9a	[LV][ARM] Tighten up MLA reduction costing This makes a couple of changes to the costing of MLA reduction patterns, to more accurately cost various patterns that can come up from vectorization. - The Arm implementation of getExtendedAddReductionCost is altered to only provide costs for legal or smaller types. Larger than legal types need to be split, which currently does not work very well, especially for predicated reductions where the predicate may be legal but needs to be split. Currently we limit it to legal or smaller input types. - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext)) is a pattern that can come up, and can be treated the same as reduce(mul(ext, ext)) providing the extension types match. - And it has been adjusted to not count the ext in reduce(mul(ext, ext)) as part of a reduce(mul) pattern. Together these changes help to more accurately cost the mla reductions in cases such as where the extend types don't match or the extend opcodes are different, picking better vector factors that don't result in expanded reductions. Differential Revision: https://reviews.llvm.org/D106166	2021-07-28 12:50:58 +01:00
Anirudh Prasad	a8cfa4b9bd	[SystemZ][z/OS] Initial code to generate assembly files on z/OS - This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target. - Only the .text and the .bss sections are added for now. - The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections. - This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target - Further improvements and additions will be made in future patches. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106380	2021-07-27 11:29:15 -04:00
David Green	54c91c0c74	[ARM] Implement isLoad/StoreFromStackSlot for MVE stack stores accesses This implements the isLoadFromStackSlot and isStoreToStackSlot for MVE MVE_VSTRWU32 and MVE_VLDRWU32 functions. They behave the same as many other loads/stores, expecting a FI in Op1 and zero offset in Op2. At the same time this alters VLDR_P0_off and VSTR_P0_off to use the same code too, as they too should be returning VPR in Op0, take a FI in Op1 and zero offset in Op2. Differential Revision: https://reviews.llvm.org/D106797	2021-07-27 09:11:58 +01:00
David Green	010f8e3057	[ARM] Ensure correct regclass in distributing postinc The register class required for some MVE loads/stores is more constrained than the register we use when creating postinc. Make sure we constrain the register class to keep the code correct.	2021-07-26 14:26:38 +01:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
David Green	ba42f6a4b5	[ARM] Pass SelectionDAG to methods that dont require DCI. NFC In these methods DCI is never used, only the DAG from it. Pass the DAG directly, cleaning up the code a little.	2021-07-21 22:11:09 +01:00
Tim Northover	19d2e42be2	ARM: don't return by popping PC if we have to adjust the stack afterwards. In mandatory tail calling conventions we might have to deallocate stack space used by our arguments before return. This happens after popping CSRs, so the pop cannot be turned into the return itself in this case. The else branch here was already a nop, so removing it as a tidy-up.	2021-07-21 09:35:14 +01:00
David Green	5561ad8b36	[ARM] Remove PromotedBitwiseVT for NEON types This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32, treating them the same as the AArch64 and MVE backends where we just add the relevant patterns for each legal type. This prevents a lot of bitcasts from being added to the DAG, which have the potential to make optimizations more difficult. It does mean adding extra patterns, and some codegen can change due to the types now being legal, not promoted. Differential Revision: https://reviews.llvm.org/D105588	2021-07-19 16:36:33 +01:00
David Green	eb1e95dbdf	[ARM] Extend more reductions during lowering This relaxes the VMLAV and VADDV reduction recognition code to handle smaller than legal types, extending them as needed. That was already handled for some reductions, this extends it to more types in a more generic way. If a smaller than legal value is found it is extended to the legal type as needed. Differential Revision: https://reviews.llvm.org/D106051	2021-07-19 08:58:03 +01:00
David Green	5acddf5b09	[ARM] Lower non-extended small gathers via truncated gathers. Corollary to `1113e06821` this allows us to match gather that dont produce a full vector width results. They use an extended gather which is truncated back to the original type.	2021-07-17 22:38:31 +01:00
David Green	ad8e75caa2	[ARM] Fix for matching reductions that are both sext and zext. Fix a silly mistake that was not making sure that _both_ operands were the correct extend code.	2021-07-16 23:11:42 +01:00
Mehdi Amini	76374573ce	Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959	2021-07-16 07:38:16 +00:00
Mehdi Amini	8d051d8546	Revert "Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer" This reverts commit `af9321739b`. Still some specific config broken in some way that requires more investigation.	2021-07-16 07:35:13 +00:00
Mehdi Amini	af9321739b	Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959	2021-07-16 06:54:26 +00:00
Mehdi Amini	16b5e9d6a2	Revert "Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer" This reverts commit `42f588f39c`. Broke some buildbots	2021-07-16 03:46:53 +00:00
Mehdi Amini	42f588f39c	Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959	2021-07-16 03:33:20 +00:00
Sam Tebbs	ff0ef6a518	[ARM][LowOverheadLoops] Make some stack spills valid for tail predication This patch makes vector spills valid for tail predication when all loads from the same stack slot are within the loop Differential Revision: https://reviews.llvm.org/D105443	2021-07-15 19:23:52 +01:00
David Green	dad506bd4e	[ARM] Expand types handled in VQDMULH recognition We have a DAG combine for recognizing the sequence of nodes that make up an MVE VQDMULH, but only currently handles specifically legal types. This patch expands that to other power-2 vector types. For smaller than legal types this means any_extending the type and casting it to a legal type, using a VQDMULH where we only use some of the lanes. The result is sign extended back to the original type, to properly set the invalid lanes. Larger than legal types are split into chunks with extracts and concat back together. Differential Revision: https://reviews.llvm.org/D105814	2021-07-15 14:47:53 +01:00
David Green	31b8f40006	[ARM] Move add(VMLALVA(A, X, Y), B) to VMLALVA(add(A, B), X, Y) For i64 reductions we currently try and convert add(VMLALV(X, Y), B) to VMLALVA(B, X, Y), incorporating the addition into the VMLALVA. If we have an add of an existing VMLALVA, this patch pushes the add up above the VMLALVA so that it may potentially be simplified further, for example being folded into another VMLALV. Differential Revision: https://reviews.llvm.org/D105686	2021-07-14 20:06:49 +01:00
David Green	338314f9c2	[ARM] Lower v16i8 -> i64 VMLA reductions. MVE does not have a VMLALV instruction that can perform v16i8 -> i64 reductions, like it does for v8i16->i64 and v4i32->i64 reductions. That means that the pattern to create them will be spilt up by type legalization, creating a lot of instructions. This extends the patterns for matching i64 reductions a little to handle the v16i8->i64 case. We need to turn them into a pair of v8i16->i64 VMLALVs that each perform half of the reduction and are summed together (so the later is a VMLALVA). The order of the lanes does not matter for the reduction so we generate a MVEEXT for the extension, that will either be folded into a extending load or can be optimized to a VREV/VMOVL. Some of the resulting codegen isn't optimal, but will be improved in a later patch. Differential Revision: https://reviews.llvm.org/D105680	2021-07-14 18:11:32 +01:00
Tim Northover	b18bda6791	ARM: reuse existing libcall global variable if possible. If we try to create a new GlobalVariable on each iteration, the Module will detect the name collision and "helpfully" rename later iterations by appending ".1" etc. But "___udivsi3.1" doesn't exist and we definitely don't want to try to call it. So instead check whether there's already a global with the right name in the module and use that if so.	2021-07-14 14:14:47 +01:00
Matt Arsenault	121541fdcd	Mips/GlobalISel: Use more standard call lowering infrastructure This also fixes some missing implicit uses on call instructions, adds missing G_ASSERT_SEXT/ZEXT annotations, and some missing outgoing sext/zexts. This also fixes not respecting tablegen requested type promotions. This starts treating f64 passed in i32 GPRs as a type of custom assignment, which restores some previously XFAILed tests. This is due to getNumRegistersForCallingConv returns a static value, but in this case it is context dependent on other arguments. Most of the ugliness is reproducing a hack CC_MipsO32 uses in SelectionDAG. CC_MipsO32 depends on a bunch of vectors populated from the original IR argument types in MipsCCState. The way this ends up working in GlobalISel is it only ends up inspecting the most recently added vector element. I'm pretty sure there are cleaner ways to do this, but this seemed easier than fixing up the current DAG handling. This is another case where it would be easier of the CCAssignFns were passed the original type instead of only the pre-legalized ones. There's still a lot of junk here that shouldn't be necessary. This also likely breaks big endian handling, but it wasn't complete/tested anyway since the IRTranslator gives up on big endian targets.	2021-07-13 11:04:10 -04:00
David Green	ca78151001	[ARM] Introduce MVEEXT ISel lowering Similar to D91921 (and D104515) this introduces two MVESEXT and MVEZEXT nodes that larger-than-legal sext and zext are lowered to. These either get optimized away or end up becoming a series of stack loads/store, in order to perform the extending whilst keeping the order of the lanes correct. They are generated from v8i16->v8i32, v16i8->v16i16 and v16i8->v16i32 extends, potentially with a intermediate extend for the larger v16i8->v16i32 extend. A number of combines have been added for obvious cases that come up in tests, notably MVEEXT of shuffles. More may be needed in the future, but this seems to cover most of the cases that come up in the tests. Differential Revision: https://reviews.llvm.org/D105090	2021-07-13 07:21:20 +01:00

1 2 3 4 5 ...

11532 Commits