llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
David Green	dfac521da1	[ARM] Fix vector saddsat costs. It turns out the vectorizer calls the getIntrinsicInstrCost functions with a scalar return type and vector VF. This updates the costmodel to handle that, still producing the correct vector costs. A vectorizer test is added to show it vectorizing at the correct factor again.	2021-01-21 15:30:39 +00:00
David Green	6a563eef13	[ARM] Expand vXi1 VSELECT's We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as expanded to turn them into a series of logical operations. Differential Revision: https://reviews.llvm.org/D94946	2021-01-19 17:56:50 +00:00
David Green	f373b30923	[ARM] Add MVE add.sat costs This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs, based on when the instruction is legal. With smaller than legal types that are promoted we generate shr(qadd(shl, shl)), so the cost is 4 appropriately. Differential Revision: https://reviews.llvm.org/D94958	2021-01-19 15:38:46 +00:00
Yvan Roux	244ad228f3	[ARM][MachineOutliner] Add stack fixup feature This patch handles cases where we have to save/restore the link register into the stack and and load/store instruction which use the stack are part of the outlined region. It checks that there will be no overflow introduced by the new offset and fixup these instructions accordingly. Differential Revision: https://reviews.llvm.org/D92934	2021-01-19 10:59:09 +01:00
David Green	e7dc083a41	[ARM] Don't handle low overhead branches in AnalyzeBranch It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the unconditional branches from the end of tail predicated instruction can run into asserts and verifier issues. This effectively reverts `372eb2bbb6`, but adds handling to t2DoLoopEndDec which are not branches, so can be safely skipped.	2021-01-18 17:16:07 +00:00
David Green	1454724215	[ARM] Align blocks that are not fallthough targets If the previous block in a function does not fallthough, adding nop's to align it will never be executed. This means we can freely (except for codesize) align more branches. This happens in constantislandspass (as it cannot happen later) and only happens at aggressive optimization levels as it does increase codesize. Differential Revision: https://reviews.llvm.org/D94394	2021-01-16 22:19:35 +00:00
David Green	372eb2bbb6	[ARM] Add low overhead loops terminators to AnalyzeBranch This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the direct branches on the end of the block may be removed. This helps remove the unnecessary branches earlier, which can help produce better codegen (and change block layout in a number of cases). Differential Revision: https://reviews.llvm.org/D94392	2021-01-16 18:30:21 +00:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
David Green	f5abf0bd48	[ARM] Tail predication with constant loop bounds The TripCount for a predicated vector loop body will be ceil(ElementCount/Width). This alters the conversion of an active.lane.mask to a VCPT intrinsics to match. Differential Revision: https://reviews.llvm.org/D94608	2021-01-15 18:17:31 +00:00
Sam Tebbs	1a497ae9b8	[ARM][Block placement] Check the predecessor exists before processing it Not all machine loops will have a predecessor. so the pass needs to check it before continuing. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D94780	2021-01-15 15:45:13 +00:00
Sam Tebbs	5e4480b6c0	[ARM] Don't run the block placement pass at O0 The block placement pass shouldn't run unless optimisations are enabled. Differential Revision: https://reviews.llvm.org/D94691	2021-01-15 13:59:29 +00:00
Oliver Stannard	3676ef1053	[ARM][GISel] Treat calls as variadic even if only fixed arguments provided For the ARM hard-float calling convention, calls to variadic functions need to be treated diffrently, even if only the fixed arguments are provided. This fixes GCC-C-execute-pr68390 in the test-suite, which is failing on the ARM GlobaISel bot.	2021-01-15 09:37:01 +00:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Sam Tebbs	60fda8ebb6	[ARM] Add a pass that re-arranges blocks when there is a backwards WLS branch Blocks can be laid out such that a t2WhileLoopStart branches backwards. This is forbidden by the architecture and so it fails to be converted into a low-overhead loop. This new pass checks for these cases and moves the target block, fixing any fall-through that would then be broken. Differential Revision: https://reviews.llvm.org/D92385	2021-01-13 17:23:00 +00:00
David Green	c29ca8551a	[ARM] Update isVMOVNOriginalMask to handle single input shuffle vectors The isVMOVNOriginalMask was previously only checking for two input shuffles that could be better expanded as vmovn nodes. This expands that to single input shuffles that will later be legalized to multiple vectors. Differential Revision: https://reviews.llvm.org/D94189	2021-01-13 08:51:28 +00:00
Stephan Herhut	2e17d9c0ee	[ARM] Add uses for locals introduced for debug messages. NFC. This adds uses for locals introduced for new debug messages for the load store optimizer. Those locals are only used on debug statements and otherwise create unused variable warnings. Differential Revision: https://reviews.llvm.org/D94398	2021-01-11 14:27:28 +01:00
David Green	8165a03420	[ARM] Add debug messages for the load store optimizer. NFC	2021-01-11 09:24:28 +00:00
David Green	dcefcd51e0	[ARM] Update trunc costs We did not have specific costs for larger than legal truncates that were not otherwise cheap (where they were next to stores, for example). As MVE does not have a dedicated instruction for them (and we do not use loads/stores yet), they should be expensive as they get expanded to a series of lane moves. Differential Revision: https://reviews.llvm.org/D94260	2021-01-11 08:59:28 +00:00
David Green	024af42c60	[ARM] Custom lower i1 vector truncates The ISel patterns we have for truncating to i1's under MVE do not seem to be correct. Instead custom lower to icmp(ne, and(x, 1), 0). Differential Revision: https://reviews.llvm.org/D94226	2021-01-08 18:21:00 +00:00
Kazu Hirata	b934160aaa	[Target] Use llvm::find_if (NFC)	2021-01-07 20:29:36 -08:00
David Green	63dce70b79	[ARM] Handle any extend whilst lowering addw/addl/subw/subl Same as `a9b6440edd`, use zanyext to treat any_extends as zero extends during lowering to create addw/addl/subw/subl nodes. Differential Revision: https://reviews.llvm.org/D93835	2021-01-06 11:26:39 +00:00
David Green	ddb82fc76c	[ARM] Handle any extend whilst lowering mull Similar to `78d8a821e2` but for ARM, this handles any_extend whilst creating MULL nodes, treating them as zextends. Differential Revision: https://reviews.llvm.org/D93834	2021-01-06 10:51:12 +00:00
Christudasan Devadasan	d68458bd56	[GlobalISel] Base implementation for sret demotion. If the return values can't be lowered to registers SelectionDAG performs the sret demotion. This patch contains the basic implementation for the same in the GlobalISel pipeline. Furthermore, targets should bring relevant changes during lowerFormalArguments, lowerReturn and lowerCall to make use of this feature. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92953	2021-01-06 10:30:50 +05:30
QingShan Zhang	2962f1149c	[NFC] Add the getSizeInBytes() interface for MachineConstantPoolValue Current implementation assumes that, each MachineConstantPoolValue takes up sizeof(MachineConstantPoolValue::Ty) bytes. For PowerPC, we want to lump all the constants with the same type as one MachineConstantPoolValue to save the cost that calculate the TOC entry for each const. So, we need to extend the MachineConstantPoolValue that break this assumption. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89108	2021-01-05 03:22:45 +00:00
Kazu Hirata	eb198f4c3c	[llvm] Use llvm::any_of (NFC)	2021-01-04 11:42:47 -08:00
David Green	901cc9b6f3	[ARM] Extend lowering for i64 reductions The lowering of a <4 x i16> or <4 x i8> vecreduce.add into an i64 would previously be expanded, due to the i64 not being legal. This patch adjusts our reduction matchers, making it produce a VADDLV(sext A to v4i32) instead. Differential Revision: https://reviews.llvm.org/D93622	2021-01-04 12:44:43 +00:00
Kazu Hirata	0e219b6443	[Target] Construct SmallVector with iterator ranges (NFC)	2021-01-03 09:57:45 -08:00
Kazu Hirata	331c28f60d	[ARM] Declare Op within an if statement (NFC)	2020-12-30 17:45:36 -08:00
Mark Murray	5abfeccf10	[ARM][AArch64] Add Cortex-A78C Support for Clang and LLVM This patch upstreams support for the Armv8-a Cortex-A78C processor for AArch64 and ARM. In detail: Adding cortex-a78c as cpu option for aarch64 and arm targets in clang Adding Cortex-A78C CPU name and ProcessorModel in llvm Details of the CPU can be found here: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78c	2020-12-29 10:18:59 +00:00
David Penry	a9f14cdc62	[ARM] Add bank conflict hazarding Adds ARMBankConflictHazardRecognizer. This hazard recognizer looks for a few situations where the same base pointer is used and then checks whether the offsets lead to a bank conflict. Two parameters are also added to permit overriding of the target assumptions: arm-data-bank-mask=<int> - Mask of bits which are to be checked for conflicts. If all these bits are equal in the offsets, there is a conflict. arm-assume-itcm-bankconflict=<bool> - Assume that there will be bank conflicts on any loads to a constant pool. This hazard recognizer is enabled for Cortex-M7, where the Technical Reference Manual states that there are two DTCM banks banked using bit 2 and one ITCM bank. Differential Revision: https://reviews.llvm.org/D93054	2020-12-23 14:00:59 +00:00
Fangrui Song	d9a0c40bce	[MC] Split MCContext::createTempSymbol, default AlwaysAddSuffix to true, and add comments CanBeUnnamed is rarely false. Splitting to a createNamedTempSymbol makes the intention clearer and matches the direction of reverted r240130 (to drop the unneeded parameters). No behavior change.	2020-12-21 14:04:13 -08:00
Fangrui Song	d33abc337c	Migrate MCContext::createTempSymbol call sites to AlwaysAddSuffix=true Most call sites set AlwaysAddSuffix to true. The two use cases do not really need false and can be more consistent with other temporary symbol usage.	2020-12-21 14:04:13 -08:00
Fangrui Song	7b9890e17e	[MC][ELF] Remove unneeded MCSymbol::setExternal calls ELF code uses symbol bindings and does not call isExternal().	2020-12-20 21:26:36 -08:00
Kazu Hirata	966f1431de	[Target] Use llvm::erase_if (NFC)	2020-12-20 17:43:22 -08:00
Kristof Beyls	df8ed39283	[ARM] harden-sls-blr: avoid r12 and lr in indirect calls. As a linker is allowed to clobber r12 on function calls, the code transformation that hardens indirect calls is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. This patch makes sure that r12 or lr are not used for indirect calls when harden-sls-blr is enabled. Differential Revision: https://reviews.llvm.org/D92469	2020-12-19 12:39:59 +00:00
Kristof Beyls	a4c1f5160e	[ARM] Harden indirect calls against SLS To make sure that no barrier gets placed on the architectural execution path, each indirect call calling the function in register rN, it gets transformed to a direct call to __llvm_slsblr_thunk_mode_rN. mode is either arm or thumb, depending on the mode of where the indirect call happens. The llvm_slsblr_thunk_mode_rN thunk contains: bx rN <speculation barrier> Therefore, the indirect call gets split into 2; one direct call and one indirect jump. This transformation results in not inserting a speculation barrier on the architectural execution path. The mitigation is off by default and can be enabled by the harden-sls-blr subtarget feature. As a linker is allowed to clobber r12 on function calls, the above code transformation is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. Avoiding r12/lr being used is done in a follow-on patch to make reviewing this code easier. Differential Revision: https://reviews.llvm.org/D92468	2020-12-19 12:33:42 +00:00
Kristof Beyls	320fd3314e	[ARM] Implement harden-sls-retbr for Thumb mode The only non-trivial consideration in this patch is that the formation of TBB/TBH instructions, which is done in the constant island pass, does not understand the speculation barriers inserted by the SLSHardening pass. As such, when harden-sls-retbr is enabled for a function, the formation of TBB/TBH instructions in the constant island pass is disabled. Differential Revision: https://reviews.llvm.org/D92396	2020-12-19 12:32:47 +00:00
Kristof Beyls	195f44278c	[ARM] Implement harden-sls-retbr for ARM mode Some processors may speculatively execute the instructions immediately following indirect control flow, such as returns, indirect jumps and indirect function calls. To avoid a potential miss-speculatively executed gadget after these instructions leaking secrets through side channels, this pass places a speculation barrier immediately after every indirect control flow where control flow doesn't return to the next instruction, such as returns and indirect jumps, but not indirect function calls. Hardening of indirect function calls will be done in a later, independent patch. This patch is implementing the same functionality as the AArch64 counter part implemented in https://reviews.llvm.org/D81400. For AArch64, returns and indirect jumps only occur on RET and BR instructions and hence the function attribute to control the hardening is called "harden-sls-retbr" there. On AArch32, there is a much wider variety of instructions that can trigger an indirect unconditional control flow change. I've decided to stick with the name "harden-sls-retbr" as introduced for the corresponding AArch64 mitigation. This patch implements this for ARM mode. A future patch will extend this to also support Thumb mode. The inserted barriers are never on the correct, architectural execution path, and therefore performance overhead of this is expected to be low. To ensure these barriers are never on an architecturally executed path, when the harden-sls-retbr function attribute is present, indirect control flow is never conditionalized/predicated. On targets that implement that Armv8.0-SB Speculation Barrier extension, a single SB instruction is emitted that acts as a speculation barrier. On other targets, a DSB SYS followed by a ISB is emitted to act as a speculation barrier. These speculation barriers are implemented as pseudo instructions to avoid later passes to analyze them and potentially remove them. The mitigation is off by default and can be enabled by the harden-sls-retbr subtarget feature. Differential Revision: https://reviews.llvm.org/D92395	2020-12-19 11:42:39 +00:00
David Green	e1c1adf9dc	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. This is a recommit of `6cc3d80a84` after fixing the backward instruction definitions.	2020-12-18 16:13:08 +00:00
David Green	6e913e4451	Revert "[ARM] Match dual lane vmovs from insert_vector_elt" This one needed more testing.	2020-12-18 13:33:40 +00:00
Yvan Roux	923ca0b411	[ARM][MachineOutliner] Fix costs model. Fix candidates calls costs models allocation and prepare stack fixups handling. Differential Revision: https://reviews.llvm.org/D92933	2020-12-17 16:08:23 +01:00
Lucas Prates	c5046ebdf6	[ARM] Adding v8.7-A command-line support for the ARM target This extends the command-line support for the 'armv8.7-a' architecture name to the ARM target. Based on a patch written by Momchil Velikov. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D93231	2020-12-17 13:48:54 +00:00
Lucas Prates	42b92b31b8	[ARM][AArch64] Adding basic support for the v8.7-A architecture This introduces support for the v8.7-A architecture through a new subtarget feature called "v8.7a". It adds two new "WFET" and "WFIT" instructions, the nXS limited-TLB-maintenance qualifier for DSB and TLBI instructions, a new CPU id register, ID_AA64ISAR2_EL1, and the new HCRX_EL2 system register. Based on patches written by Simon Tatham and Victor Campos. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D91772	2020-12-17 13:45:08 +00:00
David Green	6cc3d80a84	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. Differential Revision: https://reviews.llvm.org/D92553	2020-12-15 15:58:52 +00:00
David Green	1de3e7fd62	[ARM] Improve handling of empty VPT blocks in tail predicated loops A vpt block that just contains either VPST;VCTP or VPT;VCTP, once the VCTP is removed will become invalid. This fixed the first by removing the now empty block and bails out for the second, as we have no simple way of converting a VPT to a VCMP. Differential Revision: https://reviews.llvm.org/D92369	2020-12-14 11:17:01 +00:00
David Green	a4823377fd	[ARM] Add basic masked load/store costs This adds some basic MVE masked load/store costs, notably changing the cost of legal loads/stores to the MVECostFactor and the cost of scalarized instructions to 8*NumElts. Differential Revision: https://reviews.llvm.org/D86538	2020-12-12 15:26:32 +00:00
David Green	3f571be1c0	[ARM] Make t2DoLoopStartTP a terminator Although this was something that I was hoping we would not have to do, this patch makes t2DoLoopStartTP a terminator in order to keep it at the end of it's block, so not allowing extra MVE instruction between it and the end. With t2DoLoopStartTP's also starting tail predication regions, it also marks them as having side effects. The t2DoLoopStart is still not a terminator, giving it the extra scheduling freedom that can be helpful, but now that we have a TP version they can be treated differently. Differential Revision: https://reviews.llvm.org/D91887	2020-12-11 09:23:57 +00:00
Haojian Wu	2fc4afda0f	Fix a -Wunused-variable warning in release build.	2020-12-10 14:52:45 +01:00
David Green	0447f3508f	[ARM][RegAlloc] Add t2LoopEndDec We currently have problems with the way that low overhead loops are specified, with LR being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail. Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions. This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them. Differential Revision: https://reviews.llvm.org/D91358	2020-12-10 12:14:23 +00:00
David Green	b0ce615b2d	[ARM] Remove copies from low overhead phi inductions. The phi created in a low overhead loop gets created with a default register class it seems. There are then copied inserted between the low overhead loop pseudo instructions (which produce/consume GPRlr instructions) and the phi holding the induction. This patch removes those as a step towards attempting to make t2LoopDec and t2LoopEnd a single instruction, and appears useful in it's own right as shown in the tests. Differential Revision: https://reviews.llvm.org/D91267	2020-12-10 10:30:31 +00:00
David Green	384383e15c	[ARM] Common inverse constant predicates to VPNOT This scans through blocks looking for constants used as predicates in MVE instructions. When two constants are found which are the inverse of one another, the second can be replaced by a VPNOT of the first, potentially allowing that not to be folded away into an else predicate of a vpt block. Differential Revision: https://reviews.llvm.org/D92470	2020-12-09 07:56:44 +00:00
David Green	03e675fd12	[ARM] Turn pred_cast(xor(x, -1)) into xor(pred_cast(x), -1) This folds a not (an xor -1) though a predicate_cast, so that it can be turned into a VPNOT and potentially be folded away as an else predicate inside a VPT block. Differential Revision: https://reviews.llvm.org/D92235	2020-12-08 15:22:46 +00:00
David Green	91fb9eac0b	[ARM] Remove dead instructions before creating VPT block bundles We remove VPNOT instructions in VPT blocks as we create them, turning them into else predicates. We don't remove the dead instructions until after the block has been created though. Because the VPNOT will have killed the vpr register it used, this makes finalizeBundle add internal flags to the vpr uses of any instructions after the VPNOT. These incorrect flags can then confuse what is alive and what is not, leading to machine verifier problems. This patch removes them earlier instead, before the bundle is finalized so that kill flags remain valid. Differential Revision: https://reviews.llvm.org/D92227	2020-12-08 14:05:07 +00:00
David Green	d9bf6245bf	[ARM] Revert low overhead loops with calls before registry allocation. This adds code to revert low overhead loops with calls in them before register allocation. Ideally we would not create low overhead loops with calls in them to begin with, but that can be difficult to always get correct. If we want to try and glue together t2LoopDec and t2LoopEnd into a single instruction, we need to ensure that no instructions use LR in the loop. (Technically the final code can be better too, as it doesn't need to use the same registers but that has not been optimized for here, as reverting loops with calls is expected to be very rare). It also adds a MVETailPredUtils.h header to share the revert code between different passes, and provides a place to expand upon, with RevertLoopWithCall becoming a place to perform other low overhead loop alterations like removing copies or combining LoopDec and End into a single instruction. Differential Revision: https://reviews.llvm.org/D91273	2020-12-07 15:44:40 +00:00
Evgeny Leviant	993eaf2d69	Recommit [TableGen][SchedModels] Fix read/write variant substitution Original commit rG112b3cb6ba49 introduced non-determinism in subtarget generator due to iteration over DenseMap. New patch fixes this changing ProcModelMapTy from DenseMap to std::map.	2020-12-04 21:50:34 +03:00
Fangrui Song	86fa896363	Revert D90844 "[TableGen][SchedModels] Fix read/write variant substitution" This reverts commit `112b3cb6ba`. D90844 made lib/Target/AArch64/AArch64GenSubtargetInfo.inc non-deterministic.	2020-12-03 14:24:29 -08:00
dfukalov	2ce38b3f03	[NFC] Reduce include files dependency. 1. Removed #include "...AliasAnalysis.h" in other headers and modules. 2. Cleaned up includes in AliasAnalysis.h. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92489	2020-12-03 18:25:05 +03:00
Mircea Trofin	bab72dd5d5	[NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister. Typing the API appropriately. Differential Revision: https://reviews.llvm.org/D92341	2020-12-02 15:46:38 -08:00
David Green	eedf0ed63e	[ARM] Mark select and selectcc of MVE vector operations as expand. We already expand select and select_cc in codegenprepare, but they can still be generated under some situations. Explicitly mark them as expand to ensure they are not produced, leading to a failure to select the nodes. Differential Revision: https://reviews.llvm.org/D92373	2020-12-01 15:05:55 +00:00
David Green	7923d71b4a	[ARM] PREDICATE_CAST demanded bits The PREDICATE_CAST node is used to model moves between MVE predicate registers and gpr's, and eventually become a VMSR p0, rn. When moving to a predicate only the bottom 16 bits of the sources register are demanded. This adds a simple fold for that, allowing it to potentially remove instructions like uxth. Differential Revision: https://reviews.llvm.org/D92213	2020-12-01 10:32:24 +00:00
Evgeny Leviant	112b3cb6ba	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes multiple issues related to expansion of variant sched reads and writes. Differential revision: https://reviews.llvm.org/D90844	2020-11-30 11:55:55 +03:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
David Green	0e49a40d75	[ARM] Cleanup for the MVETailPrediction pass This strips out a lot of the code that should no longer be needed from the MVETailPredictionPass, leaving the important part - find active lane mask instructions and convert them to VCTP operations. Differential Revision: https://reviews.llvm.org/D91866	2020-11-26 15:10:44 +00:00
Mark Murray	2b6691894a	[ARM][AArch64] Adding Neoverse N2 CPU support Add support for the Neoverse N2 CPU to the ARM and AArch64 backends. Differential Revision: https://reviews.llvm.org/D91695	2020-11-25 11:42:54 +00:00
Evgeny Leviant	a6a6d11c7b	[MC][ARM] Fix number of operands of tMOVSr Differential revision: https://reviews.llvm.org/D92029	2020-11-24 18:13:10 +03:00
Evgeny Leviant	50bd686695	Add support for branch forms of ALU instructions to Cortex-A57 model Patch fixes scheduling of ALU instructions which modify pc register. Patch also fixes computation of mutually exclusive predicates for sequences of variants to be properly expanded Differential revision: https://reviews.llvm.org/D91266	2020-11-24 11:43:51 +03:00
Craig Topper	4252f7773a	[SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets. X86 was already specially marking fma as commutable which allowed tablegen to autogenerate commuted patterns. This moves it to the target independent definition and fix up the targets to remove now unneeded patterns. Unfortunately, the tests change because the commuted version of the patterns are generating operands in a different than the explicit patterns. Differential Revision: https://reviews.llvm.org/D91842	2020-11-23 10:09:20 -08:00
David Green	c8c3a411c5	[ARM] Ensure MVE_TwoOpPattern is used inside Predicate's	2020-11-22 21:38:00 +00:00
David Green	f08c37da7b	[ARM] Disable WLSTP loops This checks to see if the loop will likely become a tail predicated loop and disables wls loop generation if so, as the likelihood for reverting is currently too high. These should be fairly rare situations anyway due to the way iterations and element counts are used during lowering. Just not trying can alter how SCEV's are materialized however, leading to different codegen. It also adds a option to disable all while low overhead loops, for debugging. Differential Revision: https://reviews.llvm.org/D91663	2020-11-20 13:30:44 +00:00
Sam Tebbs	8ecb015ed5	[ARM][LowOverheadLoops] Convert intermediate vpr use assertion to condition This converts the intermediate VPR use assertion to a condition in the if-statement to protect against assertion failures in case behaviuour is changed. This is a follow-up to https://reviews.llvm.org/D90935 and implements the post-approval comments. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91790	2020-11-19 17:15:45 +00:00
David Green	006b3bdedd	[ARM] Deliberately prevent inline asm in low overhead loops. NFC This was already something that was handled by one of the "else" branches in maybeLoweredToCall, so this patch is an NFC but makes it explicit and adds a test. We may in the future want to support this under certain situations but for the moment just don't try and create low overhead loops with inline asm in them. Differential Revision: https://reviews.llvm.org/D91257	2020-11-19 13:28:21 +00:00
Duncan P. N. Exon Smith	5abf76fbe3	ADT: Add assertions to SmallVector::insert, etc., for reference invalidation `2c196bbc6b` asserted that `SmallVector::push_back` doesn't invalidate the parameter when it needs to grow. Do the same for `resize`, `append`, `assign`, `insert`, and `emplace_back`. Differential Revision: https://reviews.llvm.org/D91744	2020-11-18 17:36:28 -08:00
Mikhail Goncharov	f45c052c9e	Fix unused variables in release build Differential Revision: https://reviews.llvm.org/D91705	2020-11-18 15:18:31 +01:00
Sam Tebbs	da2e4728c7	[ARM][LowOverheadLoops] Merge VCMP and VPST across VPT blocks This patch adds support for combining a VPST with a dangling VCMP from a previous VPT block. Differential Revision: https://reviews.llvm.org/D90935	2020-11-18 12:54:16 +00:00
Florian Hahn	b2f4c5fddc	[AsmWriter] Factor out mnemonic generation to accessible getMnemonic. This patch factors out the part of printInstruction that gets the mnemonic string for a given MCInst. This is intended to be used subsequently for the instruction-mix remarks to display the final mnemonic (D90040). Unfortunately making `getMnemonic` available to the AsmPrinter seems to require making it virtual. Not sure if there's a way around that with the current layering of the AsmPrinters. Reviewed By: Paul-C-Anagnostopoulos Differential Revision: https://reviews.llvm.org/D90039	2020-11-17 09:47:38 +00:00
David Penry	48b43c9d4f	[ARM] Cortex-M7 schedule This patch adds the SchedMachineModel for Cortex-M7. It also adds test cases for the scheduling information. Details of the pipeline and descriptions are in comments in file ARMScheduleM7.td included in this patch. Differential Revision: https://reviews.llvm.org/D91355	2020-11-16 10:16:07 +00:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
David Green	11dee2eae2	[ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP Of course there was something missing, in this case a check that the def of the count register we are adding to a t2DoLoopStartTP would dominate the insertion point. In the future, when we remove some of these COPY's in between, the t2DoLoopStartTP will always become the last instruction in the block, preventing this from happening. In the meantime we need to check they are created in a sensible order. Differential Revision: https://reviews.llvm.org/D91287	2020-11-12 13:47:46 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Sam Parker	898a81dfc5	[NFC][ARM] Replace lambda with any_of	2020-11-11 10:02:55 +00:00
Pirama Arumuga Nainar	8262e94a6d	[ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt. Previously we used setRegClass to rgpr, which may expand the register domain if the result was already in a constrained class (tcgpr in the above PR). Differential Revision: https://reviews.llvm.org/D91192	2020-11-10 13:38:11 -08:00
Benjamin Kramer	92c61a045f	[ARM] Silence unused variable warning in Release builds. NFC.	2020-11-10 20:35:28 +01:00
David Green	08d1c2d470	[ARM] Introduce t2DoLoopStartTP This introduces a new pseudo instruction, almost identical to a t2DoLoopStart but taking 2 parameters - the original loop iteration count needed for a low overhead loop, plus the VCTP element count needed for a DLSTP instruction setting up a tail predicated loop. The idea is that the instruction holds both values and the backend ARMLowOverheadLoops pass can pick between the two, depending on whether it creates a tail predicated loop or falls back to a low overhead loop. To do that there needs to be something that converts a t2DoLoopStart to a t2DoLoopStartTP, for which this patch repurposes the MVEVPTOptimisationsPass as a "tail predication and vpt optimisation" pass. The extra operand for the t2DoLoopStartTP is chosen based on the operands of VCTP's in the loop, and the instruction is moved as late in the block as possible to attempt to increase the likelihood of making tail predicated loops. Differential Revision: https://reviews.llvm.org/D90591	2020-11-10 18:08:12 +00:00
David Green	dbe1bf63aa	[ARM] Cleanup for ARMLowOverheadLoops. NFC	2020-11-10 17:28:07 +00:00
David Green	c7e275388e	[ARM] Don't aggressively unroll vector remainder loops We already do not unroll loops with vector instructions under MVE, but that does not include the remainder loops that the vectorizer produces. These remainder loops will be rarely executed and are not worth unrolling, as the trip count is likely to be low if they get executed at all. Luckily they get llvm.loop.isvectorized to make recognizing them simpler. We have wanted to do this for a while but hit issues with low overhead loops being reverted due to difficult registry allocation. With recent changes that seems to be less of an issue now. Differential Revision: https://reviews.llvm.org/D90055	2020-11-10 17:01:31 +00:00
David Green	73a6cd4b6b	[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely. The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter. Differential Revision: https://reviews.llvm.org/D89883	2020-11-10 16:28:57 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
David Green	c8cd7e2bbf	[ARM] Remove MI variable aliasing. NFC This was accidentally using the same name for two different variables in the same line. Whilst it seems to work for some compilers, others have trouble and it is probably not a fantastic idea.	2020-11-09 18:18:43 +00:00
Momchil Velikov	937ab6a785	[ARM][MachineOutliner] Emit more CFI instructions This patch make the outliner emit CFI instructions in a few more places: * after LR is restored, but before the return in an outlined function * around save/restore of LR to/from a register at calls to outlined functions * around save/restore of LR to/from the stack at calls to outlined functions The latter two only when the function does NOT spill LR. If the function spills LR, then outliner generated saves/restores around calls are not considered interesting for unwinding the frame. Differential Revision: https://reviews.llvm.org/D89483	2020-11-09 15:26:18 +00:00
Sam Tebbs	40a3f7e48d	[ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT There were cases where a VCMP and a VPST were merged even if the VCMP didn't have the same defs of its operands as the VPST. This is fixed by adding RDA checks for the defs. This however gave rise to cases where the new VPST created would precede the un-merged VCMP and so would fail a predicate mask assertion since the VCMP wasn't predicated. This was solved by converting the VCMP to a VPT instead of inserting the new VPST. Differential Revision: https://reviews.llvm.org/D90461	2020-11-09 15:03:48 +00:00
David Green	a0a9e1c798	[ARM] Remove kill flags between VCMP and insertion point When we fold a VCMP into a VPST instruction any kill flags between the old VCMP position and the new insertion point need to be removed, in order to keep the verifier happy. Differential Revision: https://reviews.llvm.org/D90964	2020-11-09 13:17:53 +00:00
Lucas Prates	c2c2cc1360	[ARM][AArch64] Adding Neoverse V1 CPU support Add support for the Neoverse V1 CPU to the ARM and AArch64 backends. This is based on patches from Mark Murray and Victor Campos. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D90765	2020-11-09 13:15:40 +00:00
Sanjay Patel	264a6df353	[ARM] remove cost-kind predicate for cmp/sel costs This is the cmp/sel sibling to D90692. Again, the reasoning is: the throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uops). We need to check for a valid (non-null) condition type parameter because SimplifyCFG may pass nullptr for that (and so we will crash multiple regression tests without that check). I'm not sure if passing nullptr makes sense, but other code in the cost model does appear to check if that param is set or not. Differential Revision: https://reviews.llvm.org/D90781	2020-11-05 14:52:25 -05:00
Sander de Smalen	d57bba7cf8	[SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference. To accommodate frame layouts that have both fixed and scalable objects on the stack, describing a stack location or offset using a pointer + uint64_t is not sufficient. For this reason, we've introduced the StackOffset class, which models both the fixed- and scalable sized offsets. The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset, so that this can be used in other interfaces, such as to eliminate frame indices in PEI or to emit Debug locations for variables on the stack. This patch is purely mechanical and doesn't change the behaviour of how the result of this function is used for fixed-sized offsets. The patch adds various checks to assert that the offset has no scalable component, as frame offsets with a scalable component are not yet supported in various places. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90018	2020-11-05 11:02:18 +00:00
Cameron McInally	c126eb7529	[SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247. Differential Revision: https://reviews.llvm.org/D90644	2020-11-04 14:20:31 -06:00
David Green	eb611930b6	[ARM] Remove unused variable. NFC	2020-11-04 09:00:03 +00:00
Sanjay Patel	c40126e740	[ARM] remove cost-kind predicate for most math op costs This is based on the same idea that I am using for the basic model implementation and what I have partly already done for x86: throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uop)). Differential Revision: https://reviews.llvm.org/D90692	2020-11-03 17:23:46 -05:00
David Green	bd32386410	[ARM] Remove unused variable. NFC	2020-11-03 12:58:10 +00:00
David Green	e474499402	[ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops If an instruction will be lowered to a call there is no advantage of using a low overhead loop as the LR register will need to be spilled and reloaded around the call, and the low overhead will end up being reverted. This teaches our hardware loop lowering that these memory intrinsics will be calls under certain situations. Differential Revision: https://reviews.llvm.org/D90439	2020-11-03 11:53:09 +00:00
Momchil Velikov	7360d6d921	[ARM][MachineOutliner] Do not overestimate LR liveness in return block The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers callee-saved registers to be alive at the point after the return instruction in a block. In the ARM backend, the `LR` register is classified as callee-saved, which is not really correct (from an ARM eABI or just common sense point of view). These two conditions cause the `MachineOutliner` to overestimate the liveness of `LR`, which results in unnecessary saves/restores of `LR` around calls to outlined sequences. It also causes the `MachineVerifer` to crash in some cases, because the save instruction reads a dead `LR`, for example when the following program: int h(int, int); int f(int a, int b, int c, int d) { a = h(a + 1, b - 1); b = b + c; return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d); } int g(int a, int b, int c, int d) { a = h(a - 1, b + 1); b = b + c; return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d); } is compiled with `-target arm-eabi -march=armv7-m -Oz`. This patch computes the liveness of `LR` in return blocks only, while taking into account the few ARM instructions, which read `LR`, but nevertheless the register is not mentioned (explicitly or implicitly) in the instruction operands. Differential Revision: https://reviews.llvm.org/D89189	2020-11-02 16:47:22 +00:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Evgeny Leviant	cc96a82291	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes case when sched class has write and read variants belonging to different processor models. Differential revision: https://reviews.llvm.org/D89777	2020-11-02 17:39:04 +03:00
David Green	30ad742644	[ARM] Fix crash for gather of pointer costs. If the elt size is unknown due to it being a pointer, a comparison against 0 will cause an assert. Make sure the elt size is large enough before comparing and for the moment just return the scalar cost.	2020-10-31 13:10:14 +00:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Cameron McInally	dda1e74b58	[Legalize] Add legalizations for VECREDUCE_SEQ_FADD Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass. Differential Revision: https://reviews.llvm.org/D90247	2020-10-30 16:02:55 -05:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
David Green	d14db8c8dc	[ARM] Match MVE vqdmulh This adds ISel matching for a form of VQDMULH. There are several ir patterns that we could match to that instruction, this one is for: min(ashr(mul(sext(a), sext(b)), 7), 127) Which is what llvm will optimize to once it has removed the max that usually makes up the min/max saturate pattern, as in this case the compare will always be false. The additional complication to match i32 patterns (which extend into an i64) is that the min will be a vselect/setcc, as vmin is not supported for i64 vectors. Tablegen patterns have also been updated to attempt to reuse the MVE_TwoOpPattern patterns. Differential Revision: https://reviews.llvm.org/D90096	2020-10-30 13:34:27 +00:00
Nicholas Guy	eb9fe24eaf	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. Differential Revision: https://reviews.llvm.org/D88496	2020-10-29 15:17:31 +00:00
Evgeny Leviant	a28388f95b	[ARM][SchedModels] Move IsLDMBaseRegInListPred to ARMSchedule.td. NFC This predicate is not specific to cortex-a57 and can be used in other processor models as well.	2020-10-26 22:31:41 +03:00
Evgeny Leviant	e74f66125e	[ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90150	2020-10-26 20:22:41 +03:00
Evgeny Leviant	a877bda397	Fix issue in cortex-a57 sched model Differential revision: https://reviews.llvm.org/D90152	2020-10-26 20:16:40 +03:00
Evgeny Leviant	a95ce5f65f	[ARM][SchedModels] Rename and generalize predicate. NFC	2020-10-26 12:14:55 +03:00
Evgeny Leviant	99b2756517	[ARM][SchedModels] Get rid of IsLdrAm2ScaledPred Differential revision: https://reviews.llvm.org/D90024	2020-10-26 12:01:39 +03:00
Evgeny Leviant	a4fc18e641	[ARM][SchedModels] Convert IsLdstsoMinusRegPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90029	2020-10-26 11:54:08 +03:00
Evgeny Leviant	d613e39d52	[ARM][SchedModels] Convert IsLdrAm3NegRegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90045	2020-10-26 11:43:02 +03:00
David Green	61bc18de0b	[Schedule] Add a MultiHazardRecognizer This adds a MultiHazardRecognizer and starts to make use of it in the ARM backend. The idea of the class is to allow multiple independent hazard recognizers to be added to a single base MultiHazardRecognizer, allowing them to all work in parallel without requiring them to be chained into subclasses. They can then be added or not based on cpu or subtarget features, which will become useful in the ARM backend once more hazard recognizers are being used for various things. This also renames ARMHazardRecognizer to ARMHazardRecognizerFPMLx in the process, to more clearly explain what that recognizer is designed for. Differential Revision: https://reviews.llvm.org/D72939	2020-10-26 08:06:17 +00:00
David Green	92205bf122	[ARM] Remove some dead code. NFC	2020-10-24 17:22:49 +01:00
Paul C. Anagnostopoulos	876af264c1	[TableGen] Change !getop and !setop to !getdagop and !setdagop. Differential Revision: https://reviews.llvm.org/D89814	2020-10-23 10:36:05 -04:00
Evgeny Leviant	cb86522c94	[ARM][SchedModels] Convert IsR1P0AndLaterPred to MCSchedPredicate. NFC Differential revision: https://reviews.llvm.org/D90017	2020-10-23 14:27:49 +03:00
Evgeny Leviant	7a78073be7	[ARM][SchedModels] Let ldm* instruction scheduling use MCSchedPredicate Differential revision: https://reviews.llvm.org/D89957	2020-10-23 10:33:20 +03:00
Mircea Trofin	e24537d48f	[NFC][MC] Use MCRegister for ReachingDefAnalysis APIs Also updated the users of the APIs; and a drive-by small change to RDFRegister.cpp Differential Revision: https://reviews.llvm.org/D89912	2020-10-22 08:47:35 -07:00
Evgeny Leviant	ed6a91f456	[ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89939	2020-10-22 18:03:01 +03:00
Evgeny Leviant	bf9edcb6fd	[ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89876	2020-10-21 20:49:10 +03:00
Paul C. Anagnostopoulos	dfd6b69e01	[ARM] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans Differential Revision: https://reviews.llvm.org/D89822	2020-10-21 09:52:45 -04:00
Nicholas Guy	9a2d2bedb7	Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate Some instructions may be removable through processes such as IfConversion, however DefinesPredicate can not be made aware of when this should be considered. This parameter allows DefinesPredicate to distinguish these removable instructions on a per-call basis, allowing for more fine-grained control from processes like ifConversion. Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose Differential Revision: https://reviews.llvm.org/D88494	2020-10-21 11:52:47 +01:00
Evgeny Leviant	991e86156c	[ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89460	2020-10-20 11:14:21 +03:00
David Green	6dcbc323fd	Revert "[ARM][LowOverheadLoops] Adjust Start insertion." This reverts commit `38f625d0d1`. This commit contains some holes in its logic and has been causing issues since it was commited. The idea sounds OK but some cases were not handled correctly. Instead of trying to fix that up later it is probably simpler to revert it and work to reimplement it in a more reliable way.	2020-10-20 08:55:21 +01:00
Luqman Aden	51892a42da	[COFF][ARM] Fix CodeView for Windows on 32bit ARM targets. Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D89622	2020-10-19 22:16:16 -07:00
Evgeny Leviant	8a7ca143f8	[ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89553	2020-10-19 11:37:54 +03:00
David Green	b93d74ac9c	[ARM] Basic getArithmeticReductionCost reduction costs This adds some basic costs for MVE reductions - currently just costing the simple legal add vectors as a single MVE instruction. More complex costing can be added in the future when the framework more readily allows it. Differential Revision: https://reviews.llvm.org/D88980	2020-10-17 10:29:00 +01:00
David Green	d79ee3a807	[ARM] Add a very basic active_lane_mask cost This adds a very basic cost for active_lane_mask under MVE - making the assumption that they will be free and then apologizing for that in a comment. In reality they may either be free (by being nicely folded into a tail predicated loop), cost the same as a VCTP or be expanded into vdup's, adds and cmp's. It is difficult to detect the difference from a single getIntrinsicInstrCost call, so makes the assumption that the vectorizer is adding them, and only added them where it makes sense. We may need to change this in the future to better model predicate costs in the vectorizer, especially at -Os or non-tail predicated loops. The vectorizer currently does not query the cost of these instructions but that will change in the future and a zero cost there probably makes the most sense at the moment. Differential Revision: https://reviews.llvm.org/D88989	2020-10-17 10:09:42 +01:00
David Sherwood	47f2dc7e5f	[SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets In most of lib/Target we know that we are not dealing with scalable types so it's perfectly fine to replace TypeSize comparison operators with their fixed width equivalents, making use of getFixedSize() and so on. Differential Revision: https://reviews.llvm.org/D89101	2020-10-15 09:01:21 +01:00
Evgeny Leviant	2ad82b0ed1	[ARM.td] Make instruction definitions visible to sched models Differential revision: https://reviews.llvm.org/D89308	2020-10-14 09:58:45 +03:00
David Green	cb27006a94	[ARM] Attempt to make Tail predication / RDA more resilient to empty blocks There are a number of places in RDA where we assume the block will not be empty. This isn't necessarily true for tail predicated loops where we have removed instructions. This attempt to make the pass more resilient to empty blocks, not casting pointers to machine instructions where they would be invalid. The test contains a case that was previously failing, but recently been hidden on trunk. It contains an empty block to begin with to show a similar error. Differential Revision: https://reviews.llvm.org/D88926	2020-10-10 14:50:25 +01:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Sam Tebbs	68e002e181	[ARM] Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV. Differential Revision: https://reviews.llvm.org/D87836	2020-10-06 14:44:58 +01:00
Amara Emerson	c9f5cdd453	Revert "[ARM]Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV" This reverts commit `2573cf3c3d`. These seem to break some lit tests.	2020-10-05 10:52:43 -07:00
Sam Tebbs	2573cf3c3d	[ARM]Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV. Differential Revision: https://reviews.llvm.org/D87836	2020-10-05 15:51:28 +01:00
David Green	7feafa0286	[ARM] Fix pointer offset when splitting stores from VMOVDRR We were not accounting for the pointer offset when splitting a store from a VMOVDRR node, which could lead to incorrect aliasing info. In this case it is the fneg via integer arithmetic that gives us a store->load pair that we started getting wrong. Differential Revision: https://reviews.llvm.org/D88653	2020-10-03 16:47:50 +01:00
Meera Nakrani	f7c0e2b8f2	[ARM] Prevent constants from iCmp instruction from being hoisted if part of a min(max()) pattern Marks constants of an ICmp instruction as free if it's only user is a select instruction that is part of a min(max()) pattern. Ensures that in loops, in particular when loop unrolling is turned on, SSAT will still be correctly generated. Differential Revision: https://reviews.llvm.org/D88662	2020-10-02 09:28:35 +00:00
Meera Nakrani	48c9e8244b	[ARM] Removed hasSideEffects from signed/unsigned saturates Removed hasSideEffects from SSAT and USAT so that they are no longer marked as unpredictable. Differential Revision: https://reviews.llvm.org/D88545	2020-10-01 14:55:01 +00:00
Sam Parker	7e02bc81c6	[NFC][ARM] LowOverheadLoop DEBUG statements	2020-10-01 13:38:16 +01:00
Sam Parker	38f625d0d1	[ARM][LowOverheadLoops] Adjust Start insertion. Try to move the insertion point to become the terminator of the block, usually the preheader. Differential Revision: https://reviews.llvm.org/D88638	2020-10-01 10:49:19 +01:00
Sam Parker	6ec5f32497	[ARM][LowOverheadLoops] Iteration count liveness Before deciding to insert a [W\|D]LSTP, check that defining LR with the element count won't affect any other instructions that should be taking the iteration count. Differential Revision: https://reviews.llvm.org/D88549	2020-10-01 10:11:10 +01:00
Sam Parker	7b90516d47	[ARM][LowOverheadLoops] Start insertion point If possible, try not to move the start position earlier than it already is. Differential Revision: https://reviews.llvm.org/D88542	2020-10-01 10:05:25 +01:00
Sam Parker	dfa2c14b8f	[ARM][LowOverheadLoops] Use iterator for InsertPt. Use a MachineBasicBlock::iterator instead of a MachineInstr* for the position of our LoopStart instruction. NFCish, as it change debug info.	2020-10-01 08:32:35 +01:00
Rahman Lavaee	8955950c12	Exception support for basic block sections This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables. The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a nop operation as the first non-CFI instruction in the exception section. Background on Exception Handling in C++ ABI https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function. The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows: \| CallSite \| --> Position of the call site (relative to the function entry) \| CallSite length \| --> Length of the call site. \| Landing Pad \| --> Position of the landing pad (relative to the landing pad fragment’s begin label) \| Action record offset \| --> Position of the first action record The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table). The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted. The action record offset is an index into the action table which includes information about which exception types are caught. C++ Exceptions with Basic Block Sections Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section. This patch introduces a new CallSiteRange structure which specifies the range of call-sites which correspond to every section: `struct CallSiteRange { // Symbol marking the beginning of the precedure fragment. MCSymbol FragmentBeginLabel = nullptr; // Symbol marking the end of the procedure fragment. MCSymbol FragmentEndLabel = nullptr; // LSDA symbol for this call-site range. MCSymbol *ExceptionLabel = nullptr; // Index of the first call-site entry in the call-site table which // belongs to this range. size_t CallSiteBeginIdx = 0; // Index just after the last call-site entry in the call-site table which // belongs to this range. size_t CallSiteEndIdx = 0; // Whether this is the call-site range containing all the landing pads. bool IsLPRange = false; };` With N basic-block-sections, the call-site table is partitioned into N call-site ranges. Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed. Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the `IsLPRange` field in CallSiteRange). \| @LPStart \| ---> Landing pad fragment ( LSDA1 points here) \| CallSite Table Length \| ---> Used to find the action table. \| CallSites \| \| … \| \| … \| \| @LPStart \| ---> Landing pad fragment ( LSDA2 points here) \| CallSite Table Length \| \| CallSites \| \| … \| \| … \| … … \| Action Table \| \| Types Table \| Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D73739	2020-09-30 11:05:55 -07:00
Sam Parker	779a8a028f	[ARM][LowOverheadLoops] TryRemove helper. Make a helper function that wraps around RDA::isSafeToRemove and utilises the existing DCE IT block checks.	2020-09-30 09:37:24 +01:00
Sam Parker	195c22f273	[ARM] Change VPT state assertion Just because we haven't encountered an instruction setting the VPR, it doesn't mean we can't create a VPT block - the VPR maybe a live-in. Differential Revision: https://reviews.llvm.org/D88224	2020-09-30 08:01:10 +01:00
Sam Parker	4c19b89b25	[NFC][ARM] Comments and lambdas Add some comments in LowOverheadLoops and make some lambda variables explicit arguments instead of capturing.	2020-09-29 08:41:53 +01:00
Sam Parker	e82a0084d3	[ARM][LowOverheadLoops] Cleanup and re-arrange Rename and reorganise how we decide where to put the LoopStart instruction.	2020-09-28 16:06:30 +01:00
Tres Popp	509fba75df	[llvm] Fix unused variable in non-debug configurations	2020-09-28 17:04:08 +02:00
Meera Nakrani	675431b987	[ARM] Added more patterns to generate SSAT/USAT with shift Added patterns to generate an SSAT or USAT with shift for SSAT/USAT instructions that are matched from IR patterns. Differential Revision: https://reviews.llvm.org/D88145	2020-09-28 14:50:19 +00:00
Sam Parker	3d1d089155	[NFC][ARM] Factor out some logic for LoLoops. Create a DCE function that accepts an instruction.	2020-09-28 14:51:52 +01:00
Sjoerd Meijer	1696dd27fb	[ARM][MVE] Enable tail-predication by default We have been running tests/benchmarks downstream with tail-predication enabled for some time now and this behaves as expected: we are not aware of any correctness issues, and this performs better across the board than with tail-predication disabled. Time to flip the switch! Differential Revision: https://reviews.llvm.org/D88093	2020-09-28 14:01:23 +01:00
Sjoerd Meijer	f39f92c1f6	[ARM][MVE] tail-predication: overflow checks for elementcount, cont'd This is a reimplementation of the overflow checks for the elementcount, i.e. the 2nd argument of intrinsic get.active.lane.mask. The element count is lowered in each iteration of the tail-predicated loop, and we must prove that this expression doesn't overflow. Many thanks to Eli Friedman and Sam Parker for all their help with this work. Differential Revision: https://reviews.llvm.org/D88086	2020-09-28 09:20:51 +01:00
David Green	e4b9867cb6	[ARM] Expand cannotInsertWDLSTPBetween to the last instruction `9d9a11c7be` added this check for predicatable instructions between the D/WLSTP and the loop's start, but it was missing the last instruction in the block. Change it to use some iterators instead. Differential Revision: https://reviews.llvm.org/D88354	2020-09-28 09:14:40 +01:00
Sam Parker	a399d1880b	[ARM] Find VPT implicitly predicated by VCTP On failing to find a VCTP in the list of instructions that explicitly predicate the entry of a VPT block, inspect whether the block is controlled via VPT which is implicitly predicated due to it's predicated operand(s). Differential Revision: https://reviews.llvm.org/D87819	2020-09-25 08:50:53 +01:00
Sam Parker	00ee52ae04	[NFC][ARM] Remove dead loop. Remove a loop that just calculated a couple of values that were now longer needed.	2020-09-24 15:37:26 +01:00
Sjoerd Meijer	2fc690ac90	[ARM] LowoverheadLoops: add an option to disable tail-predication This might be useful for testing. We already have an option -tail-predication but that controls the MVETailPredication pass. This -arm-loloops-disable-tail-pred is just for disabling it in the LowoverheadLoops pass. Differential Revision: https://reviews.llvm.org/D88212	2020-09-24 13:30:48 +01:00
Sam Parker	9d9a11c7be	[ARM] Check for LSTP side-effects. If the LSTP instruction is inserted with an element count low enough to immediately predicate some lanes as false, this can have some unintended effects on any proceeding MVE instructions in the preheader. Differential Revision: https://reviews.llvm.org/D88209	2020-09-24 13:28:35 +01:00
Eli Friedman	3f739f736b	[SelectionDAG][GISel] Make LegalizeDAG lower FNEG using integer ops. Previously, if a floating-point type was legal, but FNEG wasn't legal, we would use FSUB. Instead, we should use integer ops, to preserve the semantics. (Alternatively, there's a compiler-rt call we could use, but there isn't much reason to use that.) It turns out we actually are still using this obscure codepath in a few cases: on some targets, we have "legal" floating-point types that don't actually support any floating-point operations. In particular, ARM and AArch64 are using this path. The implementation for SelectionDAG is pretty simple because we can reuse the infrastructure from FCOPYSIGN. See also `9a3dc3e`, the corresponding change to type legalization. Also includes a "bonus" change to STRICT_FSUB legalization, so we can lower a STRICT_FSUB to a float libcall. Includes the changes to both LegalizeDAG and GlobalISel so we don't have inconsistent results in the future. Fixes https://bugs.llvm.org/show_bug.cgi?id=46792 . Differential Revision: https://reviews.llvm.org/D84287	2020-09-23 14:10:33 -07:00
David Sherwood	e077367a28	[SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits An existing function Type::getScalarSizeInBits returns a uint64_t instead of a TypeSize class because the caller is requesting a scalar size, which cannot be scalable. This patch makes other similar functions requesting a scalar size consistent with that, thereby eliminating more than 1000 implicit TypeSize -> uint64_t casts. Differential revision: https://reviews.llvm.org/D87889	2020-09-23 09:20:08 +01:00
Stefanos Baziotis	a7873e5abc	Small fixes for "[LoopInfo] empty() -> isInnermost(), add isOutermost()"	2020-09-22 23:59:34 +03:00
Stefanos Baziotis	89c1e35f3c	[LoopInfo] empty() -> isInnermost(), add isOutermost() Differential Revision: https://reviews.llvm.org/D82895	2020-09-22 23:28:51 +03:00
Sam Parker	94c799fecf	[ARM] Trying to fix asan buildbot	2020-09-22 13:43:23 +01:00
Meera Nakrani	a3d0dce260	[ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop Changes TTI function getIntImmCostInst to take an additional Instruction parameter, which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT. We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated. Required minor changes in some non-ARM backends to allow for the optional parameter to be included. Differential Revision: https://reviews.llvm.org/D87457	2020-09-22 11:54:10 +00:00
Sam Parker	b4fa884a73	[ARM] Improve VPT predicate tracking The VPTBlock has been modified to track the 'global' state of the VPR, as well as the state for each block. Each object now just holds a list of instructions that makeup the block, while static structures hold the predicate information. This enables global access for querying how both a VPT block and individual instructions are predicated. These changes now allow us, again, to handle more complicated cases where multiple instructions build a predicate and/or where the same predicate in used in multiple blocks. It doesn't, however, get us back to before the tracking was 'fixed' as some extra logic will be required to properly handle VPT instructions. Currently a VPT could be effectively predicated because of it's inputs, but the existing logic will not detect that and so will refuse to perform the transformation. This can be seen in remat-vctp.ll test where we still don't perform the transform. Differential Revision: https://reviews.llvm.org/D87681	2020-09-22 10:40:27 +01:00
Sam Parker	a0c1dcc318	[ARM] Remove MVEDomain from VLDR/STR of P0 Remove the domain from the instructions and create a shouldInspect helper for LowOverheadLoops which queries it or a vpr operand. Differential Revision: https://reviews.llvm.org/D87900	2020-09-22 09:05:50 +01:00
Sam Parker	e461921d6c	[ARM] VPT validForTailPredication Mark all VPT instructions as valid. Differential Revision: https://reviews.llvm.org/D87759	2020-09-22 08:58:37 +01:00
Momchil Velikov	742250bf62	[ARM][CMSE] Issue an error if passing arguments through memory across security boundary It was never supported and that part was accidentally omitted when upstreaming D76518. Differential Revision: https://reviews.llvm.org/D86478 Change-Id: If6ba9506eb0431c87a1d42a38aa60e47ce263039	2020-09-21 17:26:10 +01:00
David Green	f4c5cadbcb	[ARM] Select f32 constants with vmov.f16 This adds lowering for f32 values using the vmov.f16, which zeroes the top bits whilst setting the lower bits to a pattern. This range of values does not often come up, except where a f16 constant value has been converted to a f32. Differential Revision: https://reviews.llvm.org/D87790	2020-09-21 11:10:47 +01:00
David Green	29bd8ea110	[ARM] Constant fold VMOVrh This adds simple constant folding for VMOVrh, to constant fold fp16 constants to integer values. It can help especially with soft calling conventions, but some of the results are not optimal as we end up loading using a vldr. This will be improved in a follow up patch. Differential Revision: https://reviews.llvm.org/D87789	2020-09-20 21:32:51 +01:00
Gabriel Hjort Åkerlund	c10200536f	[TableGen][GlobalISel] Fix handling of zero_reg When generating matching tables for GlobalISel, TableGen would output "::zero_reg" whenever encountering the zero_reg, which in turn would result in compilation error. This patch fixes that by instead outputting NoRegister (== 0), which is the same result that TableGen produces when generating matching tables for ISelDAG. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86215	2020-09-18 11:01:11 +02:00
David Green	7f7993e0da	[ARM] Expand distributing increments to also handle existing pre/post inc instructions. This extends the distributing postinc code in load/store optimizer to also handle the case where there is an existing pre/post inc instruction, where subsequent instructions can be modified to use the adjusted offset from the increment. This can save us having to keep the old register live past the increment instruction. Differential Revision: https://reviews.llvm.org/D83377	2020-09-17 16:58:35 +01:00
David Green	34b27b9441	[ARM] Sink splats to MVE intrinsics The predicated MVE intrinsics are generated as, for example, llvm.arm.mve.add.predicated(x, splat(y). p). We need to sink the splat value back into the loop, like we do for other instructions, so we can re-select qr variants. Differential Revision: https://reviews.llvm.org/D87693	2020-09-17 16:00:51 +01:00
Simon Pilgrim	f026812110	InstCombiner.h - remove unnecessary KnownBits.h include. NFCI. Move the include down to cpp files with an implicit dependency.	2020-09-17 14:28:42 +01:00
Sjoerd Meijer	b5c3efeb7b	[ARM][MVE] Tail-predication: predicate new elementcount checks on force-enabled Additional sanity checks were added to get.active.lane.mask's second argument, the loop tripcount/elementcount, in rG635b87511ec3. Like the other (overflow) checks, skip this if tail-predication is forced. Differential Revision: https://reviews.llvm.org/D87769	2020-09-16 17:05:14 +01:00
Sam Parker	3ce9ec0cfa	[ARM] Reorder some logic Re-order some checks in ValidateMVEInst.	2020-09-16 13:39:22 +01:00
Sam Parker	a63b2a4614	[ARM] Fix tail predication predicate tracking Clear the CurrentPredicate when we find an instruction which would completely overwrite the VPR. This fix essentially means we're back to not really being able to handle VPT instructions when tail predicating. Differential Revision: https://reviews.llvm.org/D87610	2020-09-16 11:59:29 +01:00
Sam Parker	86172ce378	[ARM] Add more validForTailPredication Modify the unit test to inspect all MVE instructions and mark the load/store/move of vpr/p0 as valid, as well as the remaining scalar shifts. Differential Revision: https://reviews.llvm.org/D87753	2020-09-16 11:51:50 +01:00
Sam Tebbs	ef0b9f3307	[ARM][LowOverheadLoops] Combine a VCMP and VPST into a VPT This patch combines a VCMP followed by a VPST into a VPT, which has the same semantics as the combination of the former two.	2020-09-16 09:27:10 +01:00
Yvan Roux	070b96962f	[ARM][MachineOutliner] Add calls handling. Handles calls inside outlined regions, by saving and restoring the link register. Differential Revision: https://reviews.llvm.org/D87136	2020-09-16 09:54:26 +02:00
Sjoerd Meijer	635b87511e	[ARM][MVE] Tail-predication: use unsigned SCEV ranges for tripcount Loop tripcount expressions have a positive range, so use unsigned SCEV ranges for them. Differential Revision: https://reviews.llvm.org/D87608	2020-09-15 13:23:02 +01:00
Meera Nakrani	1119bf95be	[ARM] Corrected condition in isSaturatingConditional Fixed a small error in an if condition to prevent usat/ssat being generated if (upper constant + 1) is not a power of 2.	2020-09-15 10:14:30 +00:00
Sjoerd Meijer	b4b1b84106	[MVE] fix typo in llvm debug message. NFC.	2020-09-15 10:13:54 +01:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Nikita Popov	53f36f06af	[Legalize][ARM][X86] Add float legalization for VECREDUCE This adds SoftenFloatRes, PromoteFloatRes and SoftPromoteHalfRes legalizations for VECREDUCE, to fill the remaining hole in the SDAG legalization. These legalizations simply expand the reduction and let it be recursively legalized. For the PromoteFloatRes case at least it is possible to do better than that, but it's pretty tricky (because we need to consider the interaction of three different vector legalizations and the type promotion) and probably not really worthwhile. I haven't added ExpandFloatRes support, as I am not familiar with ppc_fp128. Differential Revision: https://reviews.llvm.org/D87569	2020-09-14 20:42:09 +02:00
Simon Pilgrim	98eaacd73d	Assert we've found both vector types. NFCI. Fixes clang static analyzer warning about potential null dereferences.	2020-09-14 13:24:17 +01:00
Meera Nakrani	dd519bf0b0	[ARM] Selects SSAT/USAT from correct LLVM IR LLVM will canonicalize conditional selectors to a different pattern than the old code that was used. This is updating the function to match the new expected patterns and select SSAT or USAT when successful. Tests have also been updated to use the new patterns. Differential Review: https://reviews.llvm.org/D87379	2020-09-14 10:58:21 +00:00
Sjoerd Meijer	676febc044	[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform several sanity checks to see if it is of the form that we expect similarly like we already do for the IV which is the first argument of get.active.lane. Differential Revision: https://reviews.llvm.org/D86074	2020-09-14 11:32:15 +01:00
Simon Wallis	4946802c5f	[ARM] Fix so immediates and pc relative checks Treating an SoImm offset as a multiple of 4 between -1020 and 1020 mis-handles the second of a pair of 16-bit constants where the offset is a multiple of 2 but not a multiple of 4, leading to an LLVM ERROR: out of range pc-relative fixup value For 32-bit and larger (64-bit) constants, continue to treat an SoImm offset as a multiple of 4 between -1020 and 1020. For smaller (16-bit) constants, treat an SoImm offset as a multiple of 1 between -255 and 255. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86949	2020-09-14 08:52:59 +01:00
David Green	74760bb00f	[LV][ARM] Add preferInloopReduction target hook. This allows the backend to tell the vectorizer to produce inloop reductions through a TTI hook. For the moment on ARM under MVE this means allowing integer add reductions of the correct size. In the future this can include integer min/max too, under -Os. Differential Revision: https://reviews.llvm.org/D75512	2020-09-12 17:47:04 +01:00
David Green	6cfd38d03d	[ARM] Fixup single source mla reductions. This fixes a complication on top of D87276. If we are sign extending around a mul with the two operands that are the same, instcombine will helpfully convert one of the sext to a zext. Reverse that so that we again generate a reduction. Differnetial Revision: https://reviews.llvm.org/D87287	2020-09-12 14:31:26 +01:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
David Green	c437446d90	[ARM] Recognize "double extend" reduction patterns We can sometimes get code that does: xe = zext i16 x to i32 ye = zext i16 y to i32 m = mul i32 xe, ye me = zext i32 m to i64 r = vecreduce.add(me) This "double extend" can trip up the reduction identification, but should give identical results. This extends the pattern matching to handle them. Differential Revision: https://reviews.llvm.org/D87276	2020-09-12 13:51:42 +01:00
Sam Tebbs	b81c57d646	[ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane values The effects of unpredicated vector instruction with unknown lanes cannot be predicted and therefore cannot be tail predicated. This does not apply to predicated vector instructions and so this patch allows tail predication on them. Differential Revision: https://reviews.llvm.org/D87376	2020-09-10 10:34:32 +01:00
Sam Parker	1919b65052	[ARM] Tail predicate VQDMULH and VQRDMULH Mark the family of instructions as valid for tail predication. Differential Revision: https://reviews.llvm.org/D87348	2020-09-10 08:20:07 +01:00
Amara Emerson	e5784ef8f6	[GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator. We weren't using this before, so none of the MachineFunction CFG edges had the branch probability information added. As a result, block placement later in the pipeline was flying blind. This is enabled only with optimizations enabled like SelectionDAG. Differential Revision: https://reviews.llvm.org/D86824	2020-09-09 14:31:12 -07:00
Sam Parker	3ebc755227	[ARM] Try to rematerialize VCTP instructions We really want to try and avoid spilling P0, which can be difficult since there's only one register, so try to rematerialize any VCTP instructions. Differential Revision: https://reviews.llvm.org/D87280	2020-09-09 07:41:22 +01:00
Sam Tebbs	7aabb6ad77	[ARM][LowOverheadLoops] Remove modifications to the correct element count register After my patch at D86087, code that now uses the mov operand rather than the vctp operand will no longer remove modifications to the vctp operand as they should. This patch fixes that by explicitly removing modifications to the vctp operand rather than the register used as the element count.	2020-09-08 10:30:05 +01:00
Roman Lebedev	bb7d3af113	Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline This was reverted in `503deec218` because it caused gigantic increase (3x) in branch mispredictions in certain benchmarks on certain CPU's, see https://reviews.llvm.org/D84108#2227365. It has since been investigated and here are the results: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html > It's an amazingly severe regression, but it's also all due to branch > mispredicts (about 3x without this). The code layout looks ok so there's > probably something else to deal with. I'm not sure there's anything we can > reasonably do so we'll just have to take the hit for now and wait for > another code reorganization to make the branch predictor a bit more happy :) > > Thanks for giving us some time to investigate and feel free to recommit > whenever you'd like. > > -eric So let's just reland this. Original commit message: I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108 This reverts commit `503deec218`.	2020-09-08 00:24:03 +03:00
Sam Parker	0af4147804	[ARM][CostModel] CodeSize costs for i1 arith ops When optimising for size, make the cost of i1 logical operations relatively expensive so that optimisations don't try to combine predicates. Differential Revision: https://reviews.llvm.org/D86525	2020-09-07 09:27:18 +01:00
David Green	294c0cc3eb	[ARM] Fold predicate_cast(load) into vldr p0 This adds a simple tablegen pattern for folding predicate_cast(load) into vldr p0, providing the alignment and offset are correct. Differential Revision: https://reviews.llvm.org/D86702	2020-09-04 11:29:59 +01:00
David Green	ffd0b31c7c	Revert "[ARM] Register pressure with -mthumb forces register reload before each call" Expensive checks are failing, complaining about additional MMO operands added to the branch.	2020-09-01 07:39:54 +01:00
Prathamesh Kulkarni	85b4d286d7	[ARM] Register pressure with -mthumb forces register reload before each call This patch implements the foldMemoryOperand hook in Thumb1InstrInfo, allowing tBLXr and a spilled function address to be combined back into a tBL. This can help with codesize at Oz, especailly in the tinycrypt library. Differential Revision: https://reviews.llvm.org/D79785	2020-08-31 20:00:30 +01:00
Craig Topper	aab90384a3	[Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None) There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None. So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline. This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations. Differential Revision: https://reviews.llvm.org/D86744	2020-08-28 13:23:45 -07:00
Sjoerd Meijer	5f1cad4d29	[ARM] Skip combining base updates for vld1x NEON intrinsics Skip this for now, to avoid a backend crash in: UNREACHABLE executed at llvm/lib/Target/ARM/ARMISelLowering.cpp:13412 This should fix PR45824. Differential Revision: https://reviews.llvm.org/D86784	2020-08-28 20:29:15 +01:00
Anna Welker	064981f0ce	[ARM][MVE] Enable MVE gathers and scatters by default Enable MVE gather/scatters by default, which requires some minor adaptations in some tests. Differential revision: https://reviews.llvm.org/D86776	2020-08-28 19:05:29 +01:00
David Green	4ca60915bc	[ARM] Correct predicate operand for offset gather/scatter These arm_mve_vldr_gather_offset_predicated and arm_mve_vstr_scatter_offset_predicated have some extra parameters meaning the predicate is at a later operand. If a loop contains _only_ those masked instructions, we would miss transforming the active lane mask. Differential Revision: https://reviews.llvm.org/D86791	2020-08-28 17:48:15 +01:00
Sam Parker	b30adfb529	[ARM][LowOverheadLoops] Liveouts and reductions Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the loop body. This has required some reorganisation and fixes around live-out and predication checks, as well as looking for cases where an input/output is initialised to zero. Differential Revision: https://reviews.llvm.org/D86613	2020-08-28 13:56:16 +01:00
Mikhail Maltsev	ae1396c7d4	[ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics This patch adjusts the following ARM/AArch64 LLVM IR intrinsics: - neon_bfmmla - neon_bfmlalb - neon_bfmlalt so that they take and return bf16 and float types. Previously these intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from implementation lacking bf16 IR type). The neon_vbfdot[q] intrinsics are adjusted similarly. This change required some additional selection patterns for vbfdot itself and also for vector shuffles (in a previous patch) because of SelectionDAG transformations kicking in and mangling the original code. This patch makes the generated IR cleaner (less useless bitcasts are produced), but it does not affect the final assembly. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86146	2020-08-27 18:43:16 +01:00
Sam Parker	03141aa04a	[ARM] Enable outliner at -Oz for M-class Enable default outlining when the function has the minsize attribute and we're targeting an m-class core. Differential Revision: https://reviews.llvm.org/D82951	2020-08-27 08:02:56 +01:00
Sam Parker	a3e41d4581	[ARM] Make MachineVerifier more strict about terminators Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict. Differential Revision: https://reviews.llvm.org/D40061	2020-08-27 07:10:20 +01:00
David Green	677c1590c0	[ARM] Increase MVE gather/scatter cost by MVECostFactor. MVE Gather scatter codegeneration is looking a lot better than it used to, but still has some issues. The instructions we currently model as 1 cycle per element, which is a bit low for some cases. Increasing the cost by the MVECostFactor brings them in-line with our other instruction costs. This will have the effect of only generating then when the extra benefit is more likely to overcome some of the issues. Notably in running out of registers and vectorizing loops that could otherwise be SLP vectorized. In the short-term whilst we look at other ways of dealing with those more directly, we can increase the costs of gathers to make them more likely to be beneficial when created. Differential Revision: https://reviews.llvm.org/D86444	2020-08-26 13:03:46 +01:00
Sjoerd Meijer	c352e7fbda	[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303	2020-08-25 14:38:03 +01:00
Anna Welker	8048068c3e	[ARM][MVE] Allow tail predication for strides !=1 with gather/scatters If gather/scatters are enabled, ARMTargetTransformInfo now allows tail predication for loops with a much wider range of strides, up to anything that is loop invariant. Differential Revision: https://reviews.llvm.org/D85410	2020-08-24 13:54:47 +01:00
Roman Lebedev	503deec218	Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline" As disscussed in post-commit review starting with https://reviews.llvm.org/D84108#2227365 while this appears to be mostly a win overall, especially code-size-wise, this appears to shake //certain// code pattens in a way that is extremely unfavorable for performance (+30% runtime regression) on certain CPU's (i personally can't reproduce). So until the behaviour is better understood, and a path forward is mapped, let's back this out for now. This reverts commit `1d51dc38d8`.	2020-08-22 00:33:22 +03:00
Sam Parker	acf0bb41e4	[ARM][CostModel] Select instruction costs. Modify the ARM getCmpSelInstrCost implementation for the code size costs of selects. Now consider the legalization cost and increase the cost of i1 because those values wouldn't live in a general purpose register. We also make selects +1 more expensive to account for the IT instruction. Differential Revision: https://reviews.llvm.org/D82091	2020-08-21 08:49:56 +01:00
David Green	2b69efded0	[ARM][LV] Add a preferPredicatedReductionSelect target hook As part of D84741, this adds a target hook for the preferPredicatedReductionSelect option and makes use of it under MVE, allowing us to tail predicate most reduction loops. Differential Revision: https://reviews.llvm.org/D85980	2020-08-21 08:48:12 +01:00
Yvan Roux	0459f29e8b	[ARM][MachineOutliner] Add default mode. Use the stack to save and restore the link register when there is no available register to do it. Differential Revision: https://reviews.llvm.org/D76069	2020-08-20 09:25:33 +02:00
Meera Nakrani	545de56f87	[ARM] Enabled VMLAV and Add instructions to use VMLAVA Used InstCombine to enable VMLAV and Add instructions to generate VMLAVA instead with tests.	2020-08-19 08:36:49 +00:00
Fangrui Song	c466c5fa7e	[ARM] Fix build after D86087	2020-08-18 09:20:32 -07:00
David Green	3471520b1f	[ARM] Allow tail predication of VLDn VLD2/4 instructions cannot be predicated, so we cannot tail predicate them from autovec. From intrinsics though, they should be valid as they will just end up loading extra values into off vector lanes, not effecting the on lanes. The same is true for loads in general where so long as we are not using the other vector lanes, an unpredicated load can be converted to a predicated one. This marks VLD2 and VLD4 instructions as validForTailPredication and allows any unpredicated load in tail predication loop, which seems to be valid given the other checks we have. Differential Revision: https://reviews.llvm.org/D86022	2020-08-18 17:15:45 +01:00
Sam Tebbs	31f02ac60a	[ARM] Use mov operand if the mov cannot be moved while tail predicating There are some cases where the instruction that sets up the iteration count for a tail predicated loop cannot be moved before the dlstp, stopping tail predication entirely. This patch checks if the mov operand can be used and if so, uses that instead. Differential Revision: https://reviews.llvm.org/D86087	2020-08-18 17:10:29 +01:00
Craig Topper	c7a0b2684f	[X86][MC][Target] Initial backend support a tune CPU to support -mtune This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line. This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned. One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU. I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning. Differential Revision: https://reviews.llvm.org/D85165	2020-08-14 15:31:50 -07:00
Sam Parker	eb82d58f83	[NFC][ARM] Port MaybeCall into ARMTTImpl method Renamed to maybeLoweredToCall.	2020-08-14 10:23:20 +01:00
David Green	0c390c22a5	Revert "[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz" This reverts commit `18279a54b5` as it is causing some chromium android test problems.	2020-08-13 22:40:36 +01:00
David Green	2632c625ed	[ARM] Mark VMINNMA/VMAXNMA as commutative These operations take Qda and Rn register operands, which are commutative so long as the instruction is not predicated. Differential Revision: https://reviews.llvm.org/D85813	2020-08-13 18:01:11 +01:00
Anna Welker	9eb9ba076a	[ARM][MVE] Fix for tail predication for loops containing MVE gather/scatters Fix to include non-predicated version of write-back gather in special case treatment for deducting the instruction type. (This is fixing https://reviews.llvm.org/D85138 for corner cases) Differential Revision: https://reviews.llvm.org/D85889	2020-08-13 12:24:19 +01:00
David Green	1bb3488685	[ARM] Predicated VFMA patterns Similar to the Two op + select patterns that were added recently, this adds some patterns for select + fma to turn them into predicated operations. Differential Revision: https://reviews.llvm.org/D85824	2020-08-12 18:35:01 +01:00
Anna Welker	4fe5615eab	[ARM][MVE] Enable tail predication for loops containing MVE gather/scatters Widen the scope of memory operations that are allowed to be tail predicated to include gathers and scatters, such that loops that are auto-vectorized with the option -enable-arm-maskedgatscat (and actually end up containing an MVE gather or scatter) can be tail predicated. Differential Revision: https://reviews.llvm.org/D85138	2020-08-12 15:32:37 +01:00
Sam Parker	ea8448e361	[LoopUnroll] Adjust CostKind query When TTI was updated to use an explicit cost, TCK_CodeSize was used although the default implicit cost would have been the hand-wavey cost of size and latency. So, revert back to this behaviour. This is not expected to have (much) impact on targets since most (all?) of them return the same value for SizeAndLatency and CodeSize. When optimising for size, the logic has been changed to query CodeSize costs instead of SizeAndLatency. This patch also adds a testing option in the unroller so that OptSize thresholds can be specified. Differential Revision: https://reviews.llvm.org/D85723	2020-08-12 12:56:09 +01:00
Sjoerd Meijer	6716e7868e	[ARM][MVE] tail-predication: overflow checks for backedge taken count. This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrinisc that enables tail-predication. For a 2d auto-correlation kernel and its inner loop j: M = Size - i; for (j = 0; j < M; j++) Sum += Input[j] * Input[j+i]; For this inner loop, the SCEV backedge taken count (BTC) expression is: (-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body> and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC cannot be max" could not be determined. So overflow behaviour had to be assumed in the loop tripcount expression that uses the BTC. As a result tail-predication had to be forced (with an option) for this case. This change solves that by using ScalarEvolution's helper getConstantMaxBackedgeTakenCount which is able to determine the range of BTC, thus can determine it is safe, so that we no longer need to force tail-predication as reflected in the changed test cases. Differential Revision: https://reviews.llvm.org/D85737	2020-08-12 09:32:26 +01:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
Sam Parker	4f9f4b21e0	[ARM] Unrestrict Armv8-a IT when at minsize IT blocks with more than one instruction were performance deprecated in Armv8 but that doesn't mean we should follow that advise when optimising for size. Differential Revision: https://reviews.llvm.org/D85638	2020-08-10 14:59:53 +01:00
David Green	186a7f81e8	[ARM] Add VADDV and VMLAV patterns for v16i16 This adds patterns for v16i16's vecreduce, using all the existing code to go via an i32 VADDV/VMLAV and truncating the result. Differential Revision: https://reviews.llvm.org/D85452	2020-08-09 11:09:49 +01:00
David Green	8590e5abad	[ARM] Allow vecreduce_add in tail predicated loops This allows vecreduce_add in loops so that we can tailpredicate them. Differential Revision: https://reviews.llvm.org/D85454	2020-08-09 10:57:17 +01:00
David Green	296faa91ed	[ARM] Some formatting and predicate VRHADD patterns. NFC This formats some of the MVE patterns, and adds a missing Predicates = [HasMVEInt] to some VRHADD patterns I noticed as going through. Although I don't believe NEON would ever use the patterns (as it would use ADDL and VSHRN instead) they should ideally be predicated on having MVE instructions.	2020-08-09 10:07:52 +01:00
Martin Storsjö	5eedc01a82	[ARM, AArch64] Fix a comment typo. NFC.	2020-08-06 09:23:45 +03:00
Sam Parker	f2675ab45f	[ARM][CostModel] Implement getCFInstrCost As with other targets, set the throughput cost of control-flow instructions to free so that we don't miss out of vectorization opportunities. Differential Revision: https://reviews.llvm.org/D85283	2020-08-05 12:44:51 +01:00
Meera Nakrani	20283ff491	[ARM] Generated SSAT and USAT instructions with shift Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests. Differential Review: https://reviews.llvm.org/D85120	2020-08-04 09:38:17 +00:00
Christopher Tetreault	b5059b7140	[SVE] Remove bad call to VectorType::getNumElements() from ARM Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85152	2020-08-03 15:41:14 -07:00
Jordan Rupprecht	af3ec731d5	[NFC][ARM] Silence unused variable in release builds	2020-08-03 15:21:44 -07:00
David Green	22916481c1	[ARM] Convert VPSEL to VMOV in tail predicated loops VPSEL has slightly different semantics under tail predication (it can end up selecting from Qn, Qm and Qd). We do not model that at the moment so they block tail predicated loops from being formed. This just converts them into a predicated VMOV instead (via a VORR), allowing tail predication to happen whilst still modelling the original behaviour of the input. Differential Revision: https://reviews.llvm.org/D85110	2020-08-03 22:03:14 +01:00
Nicholas Guy	18279a54b5	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. This was due to the CPSR register being marked as dead, while this case was not accounted for. Differential Revision: https://reviews.llvm.org/D83667	2020-08-03 13:20:32 +01:00
David Green	fd69df62ed	[ARM] Distribute post-inc for Thumb2 sign/zero extending loads/stores This adds sign/zero extending scalar loads/stores to the MVE instructions added in D77813, allowing us to create up more post-inc instructions. These are comparatively simple, compared to LDR/STR (which may be better turned into an LDRD/LDM), but still require some additions over MVE instructions. Because there are i12 and i8 variants of the offset loads/stores dealing with different signs, we may need to convert an i12 address to a i8 negative instruction. t2LDRBi12 can also be shrunk to a tLDRi under the right conditions, so we need to be careful with codesize too. Differential Revision: https://reviews.llvm.org/D78625	2020-08-01 14:01:18 +01:00
Benjamin Kramer	c6f08b14d4	Hide some internal symbols. NFC.	2020-07-31 17:28:02 +02:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00
Roman Lebedev	1d51dc38d8	[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108	2020-07-29 20:05:30 +03:00
David Green	9ddb28964c	[ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores This patch uses the feature added in D79162 to fix the cost of a sext/zext of a masked load, or a trunc for a masked store. Previously, those were considered cheap or even free, but it's not the case as we cannot split the load in the same way we would for normal loads. This updates the costs to better reflect reality, and adds a test for it in test/Analysis/CostModel/ARM/cast.ll. It also adds a vectorizer test that showcases the improvement: in some cases, the vectorizer will now choose a smaller VF when tail-predication is enabled, which results in better codegen. (Because if it were to use a higher VF in those cases, the code we see above would be generated, and the vmovs would block tail-predication later in the process, resulting in very poor codegen overall) Original Patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79163	2020-07-29 13:41:34 +01:00
David Green	60280e9818	[Analysis] TTI: Add CastContextHint for getCastInstrCost Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types. Sometimes there is a context instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization. Thus, the current system can get the cost of certain casts incorrect as the correct cost can vary greatly based on the context in which it's used. For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext(load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar situations can come up with how masked loads can be extended when being split. To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce. Original patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79162	2020-07-29 13:32:53 +01:00
Sjoerd Meijer	85342c27a3	[ARM] Optimize immediate selection Optimize some specific immediates selection by materializing them with sub/mvn instructions as opposed to loading them from the constant pool. Patch by Ben Shi, powerman1st@163.com. Differential Revision: https://reviews.llvm.org/D83745	2020-07-29 13:29:17 +01:00
Anna Welker	0c64233bb7	[ARM][MVE] Teach MVEGatherScatterLowering to merge successive getelementpointers A patch following up on the introduction of pointer induction variables, adding a preprocessing step to the address optimisation in the MVEGatherScatterLowering pass. If the getelementpointer that is the address is itself using a getelementpointer as base, they will be merged into one by summing up the offsets, after checking that this will not cause an overflow (this can be repeated recursively). Differential Revision: https://reviews.llvm.org/D84027	2020-07-28 17:31:20 +01:00
Tim Northover	39108f4c7a	ARM: make Thumb1 instructions non-flag-setting in IT block. Many Thumb1 instructions are defined to set CPSR if executed outside an IT block, but leave it alone from inside one. In MachineIR this is represented by whether an optional register is CPSR or NoReg (0), and affects how the instructions are printed. This sets the instruction to the appropriate form during if-conversion.	2020-07-28 13:31:17 +01:00
Arthur Eubanks	beb7e3bb70	Rename t2-reduce-size -> thumb2-reduce-size For readability and consistency with other thumb2 passes like "thumb2-it". Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84696	2020-07-27 13:42:13 -07:00
Arthur Eubanks	2a672767cc	Prefix some AArch64/ARM passes with "aarch64-"/"arm-" For consistency with other target specific passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84560	2020-07-27 11:00:39 -07:00
Meera Nakrani	db37937a47	[ARM] Added additional patterns to VABD instruction Added extra patterns to VABD instruction so it is selected in place of VSUB and VABS. Added corresponding regression test too. Differential Revision: https://reviews.llvm.org/D84500	2020-07-24 17:46:25 +00:00
Meera Nakrani	805e6bcf22	Test Commit Test commit - added whitespace in ARMInstrMVE.td	2020-07-24 17:22:56 +00:00
David Green	b37e92201c	[ARM] Add predicated mla reduction patterns Similar to `8fa824d7a3` but this time for MLA patterns, this selects predicated vmlav/vmlava/vmlalv/vmlava instructions from vecreduce.add(select(p, mul(x, y), 0)) nodes. Differential Revision: https://reviews.llvm.org/D84102	2020-07-23 21:47:59 +01:00
David Green	411eb87c79	[ARM] Fix missing MVE_VMUL_qr predicate This was missed out of `1030e82598`, but hopefully fixes the issues reported with NEON accidentally generating MVE instructions.	2020-07-22 20:43:02 +01:00
Matt Arsenault	0c92bfa4b8	GlobalISel: Don't use virtual for distinguishing arg handlers There's no reason to involve the hassle of a virtual method targets have to override for a simple boolean. Not sure exactly what's going on with Mips, but it seems to define its own totally separate handler classes.	2020-07-22 14:14:43 -04:00
David Green	8fa824d7a3	[ARM] Add predicated add reduction patterns Given a vecreduce.add(select(p, x, 0)), we can convert that to a predicated vaddv, as the else value for the select is the identity value, a zero. That is what this patch does for the vaddv, vaddva, vaddlv and vaddlva instructions, copying the existing patterns to also handle predication through a select. Differential Revision: https://reviews.llvm.org/D84101	2020-07-22 17:30:02 +01:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
David Green	f8abecf337	[ARM] Extra MVE select(binop) patterns This is very similar to 243970d03cace2, but handling a slightly different form of predicated operations. When starting with a pattern of the form select(p, BinOp(x, y), x), Instcombine will often transform this to BinOp(x, select(p, y, 0)), where 0 is the identity value of the binop (0 for adds/subs, 1 for muls, -1 for ands etc). This adds the patterns that transforms those back into predicated binary operations. There is also a very minor adjustment to tablegen null_frag in here, to allow it to also be recognized as a PatLeaf node, so that it can be used in MVE_TwoOpPattern to easily exclude the cases where we do not need the alternate transform. Differential Revision: https://reviews.llvm.org/D84091	2020-07-22 14:08:29 +01:00
David Green	3533e0a08d	[ARM] Add patterns for select(p, BinOp(x, y), z) -> BinOpT(x, y,p z) Most MVE instructions can be predicated to fold a select into the instruction, using the predicate and the selects else as a passthough. This adds tablegen patterns for most two operand instructions using the newly added TwoOpPattern from `1030e82598`. Differential Revision: https://reviews.llvm.org/D83222	2020-07-22 13:24:01 +01:00
Simon Wallis	94e4e37d55	[Thumb] set code alignment for 16-bit load from constant pool Summary: [Thumb] set code alignment for 16-bit load from constant pool LLVM miscompiles this code when compiling for a target with v8.2-A FP16 and the Thumb ISA at -O0: extern void bar(__fp16 P5); int main() { __fp16 P5 = 1.96875; bar(P5); } The code section containing main has 2 byte alignment. It needs to have 4 byte alignment, because the load literal instruction has an offset from the load address with the low 2 bits zeroed. I do not include a test case in this check-in. llc and llvm-mc do not exhibit this bug. They do not set code section alignment in the same manner as clang. Reviewers: dnsampaio Reviewed By: dnsampaio Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84169	2020-07-22 10:12:41 +01:00
David Spickett	3a34194606	[ARM] Fix Asm/Disasm of TBB/TBH instructions Summary: This fixes Bugzilla #46616 in which it was reported that "tbb [pc, r0]" was marked as SoftFail (aka unpredictable) incorrectly. Expected behaviour is: * ARMv8 is required to use sp as rn or rm (tbb/tbh only have a Thumb encoding so using Arm mode is not an option) * If rm is the pc then the instruction is always unpredictable Some of this was implemented already and this fixes the rest. Added tests cover the new and pre-existing handling. Reviewers: ostannard Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84227	2020-07-22 09:31:56 +01:00
David Green	1030e82598	[ARM] Add MVE_TwoOpPattern. NFC This commons out a chunk of the different two operand MVE patterns into a single helper multidef. Or technically two multidef patterns so that the Dup qr patterns can also get the same treatment. This is most of the two address instructions that we have some codegen pattern for (not ones that we select purely from intrinsics). It does not include shifts, which are more spread out and will need some extra work to be given the same treatment. Differential Revision: https://reviews.llvm.org/D83219	2020-07-21 19:51:37 +01:00
David Green	30371df85f	[ARM] More unpredictable VCVT instructions. These extra vcvt instructions were missed from `74ca67c109` because they live in a different Domain, but should be treated in the same way. Differential Revision: https://reviews.llvm.org/D83204	2020-07-21 07:24:37 +01:00
David Green	3504acc33e	[ARM] Don't mark vctp as having sideeffects As far as I can tell, it should not be necessary for VCTP to be unpredictable in tail predicated loops. Either it has a a valid loop counter as a operand which will naturally keep it in the right loop, or it doesn't and it won't be converted to a tail predicated loop. Not marking it as having side effects allows it to be scheduled more cleanly for cases where it is not expected to become a tail predicate loop. Differential Revision: https://reviews.llvm.org/D83907	2020-07-19 09:28:09 +01:00
Simon Wallis	3e0ccf9a90	[ARM] halfword store hits llvm_unreachable with big-endian Summary: [ARM] halfword store hits llvm_unreachable with big-endian Provide missing case in getFixupKindContainerSizeBytes(). This stops execution reaching llvm_unreachable("Unknown fixup kind!") D83947 Reviewers: olista01, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83947 Change-Id: I598aa1fb51fd1c6f424c557c85d6df6d1958bc62	2020-07-17 08:56:44 +01:00
Matt Arsenault	023883a834	IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref When the byref attribute is added, there will need to be two similar functions for the existing cases which have an associate value copy, and byref which does not. Most, but not all of the existing uses will use the existing version. The associated size function added by D82679 also needs to contextually differ, and will help eliminate a few places still relying on pointee element types.	2020-07-16 13:50:49 -04:00
David Green	7bbde17e62	[ARM] Add a PreferNoCSEL option. NFC This disables CSEL, falling back to the old predicated move behaviour for cases where that is useful for debugging.	2020-07-16 12:42:07 +01:00
Roman Lebedev	fb432a51f4	Reland "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions" This reverts commit `1067d3e176`, which reverted commit `b2018198c3`, because it introduced a Dependency Cycle between Transforms/Scalar and Transforms/Utils. So let's just move SimplifyCFGOptions.h into Utils/, thus avoiding the cycle.	2020-07-16 13:40:01 +03:00
Pavel Iliin	b9a6fb6428	[ARM] VBIT/VBIF support added. Vector bitwise selects are matched by pseudo VBSP instruction and expanded to VBSL/VBIT/VBIF after register allocation depend on operands registers to minimize extra copies.	2020-07-16 11:25:53 +01:00
David Green	146d35b6ee	[ARM] CSEL generation This adds a peephole optimisation to turn a t2MOVccr that could not be folded into any other instruction into a CSEL on 8.1-m. The t2MOVccr would usually be expanded into a conditional mov, that becomes an IT; MOV pair. We can instead generate a CSEL instruction, which can potentially be smaller and allows better register allocation freedom, which can help reduce codesize. Performance is more variable and may depend on the micrarchitecture details, but initial results look good. If we need to control this per-cpu, we can add a subtarget feature as we need it. Original patch by David Penry. Differential Revision: https://reviews.llvm.org/D83566	2020-07-16 11:10:53 +01:00
Adrian Kuegel	1067d3e176	Revert "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions" This reverts commit `b2018198c3`. This commit introduced a Dependency Cycle between Transforms/Scalar and Transforms/Utils. Transforms/Scalar already depends on Transforms/Utils, so if SimplifyCFGOptions.h is moved to Scalar, and Utils/Local.h still depends on it, we have a cycle.	2020-07-16 10:54:10 +02:00
Roman Lebedev	b2018198c3	[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions Taking so many parameters is simply unmaintainable. We don't want to include the entire llvm/Transforms/Utils/Local.h into llvm/Transforms/Scalar.h so i've split SimplifyCFGOptions into it's own header.	2020-07-16 01:27:54 +03:00
Sjoerd Meijer	959eaa50d6	[ARM][MVE] Only tail-fold integer add reductions If a vector body has live-out values, it is probably a reduction, which needs a final reduction step after the loop. MVE has a VADDV instruction to reduce integer vectors, but doesn't have an equivalent one for float vectors. A live-out value that is not recognised as reduction later in the optimisation pipeline will result in the tail-predicated loop to be reverted to a non-predicated loop and this is very expensive, i.e. it has a significant performance impact, which is what we hope to avoid with fine tuning the ARM TTI hook preferPredicateOverEpilogue implementation. Differential Revision: https://reviews.llvm.org/D82953	2020-07-14 10:15:07 +01:00
Sjoerd Meijer	595270ae39	[ARM][MVE] Refactor option -disable-mve-tail-predication This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather than several different ones. This is also a prep step for D82953, in which we want to reject reductions unless that is requested with this option. Differential Revision: https://reviews.llvm.org/D83133	2020-07-13 13:40:33 +01:00
Sidharth Baveja	e541e1b757	[NFC] Separate Peeling Properties into its own struct (re-land after minor fix) Summary: This patch separates the peeling specific parameters from the UnrollingPreferences, and creates a new struct called PeelingPreferences. Functions which used the UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-10 18:39:30 +00:00
Luke Geeson	954db63cd1	[ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM This patch upstreams support for the Arm-v8 Cortex-A78 and Cortex-X1 processors for AArch64 and ARM. In detail: - Adding cortex-a78 and cortex-x1 as cpu options for aarch64 and arm targets in clang - Adding Cortex-A78 and Cortex-X1 CPU names and ProcessorModels in llvm details of the CPU can be found here: https://www.arm.com/products/cortex-x https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78 The following people contributed to this patch: - Luke Geeson - Mikhail Maltsev Reviewers: t.p.northover, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits, miyuki Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D83206	2020-07-10 18:24:11 +01:00
serge-sans-paille	e4ec6d0afe	Correctly update return status for MVEGatherScatterLowering `Changed` should reflect all possible changes. Differential Revision: https://reviews.llvm.org/D83459	2020-07-09 11:18:54 +02:00
Nikita Popov	0b39d2d752	Revert "[NFC] Separate Peeling Properties into its own struct" This reverts commit `0369dc98f9`. Many failing tests.	2020-07-08 21:43:32 +02:00
Sidharth Baveja	0369dc98f9	[NFC] Separate Peeling Properties into its own struct Summary: This patch makes the peeling properties of the loop accessible by other loop transformations. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-08 18:59:59 +00:00
Anh Tuyen Tran	6965af43e6	Revert "[NFC] Separate Peeling Properties into its own struct" This reverts commit `fead250b43`.	2020-07-08 18:58:05 +00:00
Anh Tuyen Tran	fead250b43	[NFC] Separate Peeling Properties into its own struct Summary: This patch makes the peeling properties of the loop accessible by other loop transformations. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-08 18:56:03 +00:00
David Green	146dad0077	[ARM] MVE FP16 cost adjustments This adjusts the MVE fp16 cost model, similar to how we already do for integer casts. It uses the base cost of 1 per cvt for most fp extend / truncates, but adjusts it for loads and stores where we know that a extending load has been used to get the load into the correct lane, and only an MVE VCVTB is then needed. Differential Revision: https://reviews.llvm.org/D81813	2020-07-06 15:57:51 +01:00
David Green	afdb2ef2ed	[ARM] Adjust default fp extend and trunc costs This adds some default costs for fp extends and truncates, generally costing them as 1 per lane. If the type is not legal then the cost will include a call to an __aeabi_ function. Some NEON code is also adjusted to make sure it applies to the expected types, now that fp16 is a more common thing. Differential Revision: https://reviews.llvm.org/D82458	2020-07-06 14:23:17 +01:00
David Green	60b8b2beea	[ARM] Add extra extend and trunc costs for cast instructions This expands the existing extend costs with a few extras for larger types than legal, which will usually be split under MVE. It also adds trunk support for the same thing. These should not have a large effect on many things, but makes the costs explicit and keeps a certain balance between the trunks and extends. Differential Revision: https://reviews.llvm.org/D82457	2020-07-06 11:33:05 +01:00
David Green	55227f85d0	[ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost This alters getMemoryOpCost to use the Base TargetTransformInfo version that includes some additional checks for whether extending loads are legal. This will generally have the effect of making <2 x ..> and some <4 x ..> loads/stores more expensive, which in turn should help favour larger vector factors. Notably it alters the cost of a <4 x half>, which with the current codegen will be expensive if it is not extended. Differential Revision: https://reviews.llvm.org/D82456	2020-07-06 10:58:40 +01:00
David Green	74ca67c109	[ARM] Remove hasSideEffects from FP converts Whether an instruction is deemed to have side effects in determined by whether it has a tblgen pattern that emits a single instruction. Because of the way a lot of the the vcvt instructions are specified either in dagtodag code or with patterns that emit multiple instructions, they don't get marked as not having side effects. This just marks them as not having side effects manually. It can help especially with instruction scheduling, to not create artificial barriers, but one of these tests also managed to produce fewer instructions. Differential Revision: https://reviews.llvm.org/D81639	2020-07-05 16:23:24 +01:00
Luke Geeson	8bf99f1e6f	[ARM] Add Cortex-A77 Support for Clang and LLVM This patch upstreams support for the Arm-v8 Cortex-A77 processor for AArch64 and ARM. In detail: - Adding cortex-a77 as a cpu option for aarch64 and arm targets in clang - Cortex-A77 CPU name and ProcessorModel in llvm details of the CPU can be found here: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77 and a similar submission to GCC can be found here: `e0664b7a63` The following people contributed to this patch: - Luke Geeson - Mikhail Maltsev Reviewers: t.p.northover, dmgreen, ostannard, SjoerdMeijer Reviewed By: dmgreen Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits, miyuki Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82887	2020-07-03 13:00:54 +01:00
Guillaume Chatelet	87e2751cf0	[Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D83082	2020-07-03 08:06:43 +00:00
Nicholas Guy	dc8e4d8566	[ARM] Rearrange SizeReduction when using -Oz Move the Thumb2SizeReduce pass to before IfConversion when optimising for minimal code size. Running the Thumb2SizeReduction pass before IfConversionallows T1 instructions to propagate to the final output, rather than the ifConverter modifying T2 instructions and preventing them from being reduced later. This change does introduce a regression regarding execution time, so it's only applied when optimising for size. Running the LLVM Test Suite with this change produces a geomean difference of -0.1% for the size..text metric. Differential Revision: https://reviews.llvm.org/D82439	2020-07-02 09:19:38 +01:00
James Y Knight	4b0aa5724f	Change the INLINEASM_BR MachineInstr to be a non-terminating instruction. Before this instruction supported output values, it fit fairly naturally as a terminator. However, being a terminator while also supporting outputs causes some trouble, as the physreg->vreg COPY operations cannot be in the same block. Modeling it as a non-terminator allows it to be handled the same way as invoke is handled already. Most of the changes here were created by auditing all the existing users of MachineBasicBlock::isEHPad() and MachineBasicBlock::hasEHPadSuccessor(), and adding calls to isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate. Reviewed By: nickdesaulniers, void Differential Revision: https://reviews.llvm.org/D79794	2020-07-01 12:51:50 -04:00
Guillaume Chatelet	d3085c2501	[Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82956	2020-07-01 14:31:56 +00:00
Sam Parker	3ee580d017	[ARM][LowOverheadLoops] Handle reductions While validating live-out values, record instructions that look like a reduction. This will comprise of a vector op (for now only vadd), a vorr (vmov) which store the previous value of vadd and then a vpsel in the exit block which is predicated upon a vctp. This vctp will combine the last two iterations using the vmov and vadd into a vector which can then be consumed by a vaddv. Once we have determined that it's safe to perform tail-predication, we need to change this sequence of instructions so that the predication doesn't produce incorrect code. This involves changing the register allocation of the vadd so it updates itself and the predication on the final iteration will not update the falsely predicated lanes. This mimics what the vmov, vctp and vpsel do and so we then don't need any of those instructions. Differential Revision: https://reviews.llvm.org/D75533	2020-07-01 08:31:49 +01:00
Guillaume Chatelet	28de229bc6	[Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82894	2020-07-01 07:28:11 +00:00
Samuel Tebbs	3324e3a6ee	[ARM] Allow the fabs intrinsic to be tail predicated This patch stops the fabs intrinsic from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82570	2020-06-30 17:27:28 +01:00
Samuel Tebbs	66fa313999	[ARM] Allow the usub_sat and ssub_sat intrinsics to be tail predicated This patch stops the usub_sat and ssub_sat intrinsics from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82571	2020-06-30 17:16:58 +01:00
Sjoerd Meijer	af45907653	[ARM][MVE] Tail-predication: clean-up of unused code After the rewrite of this pass (D79175) I missed one thing: the inserted VCTP intrinsic can be cloned to exit blocks if there are instructions present in it that perform the same operation, but this wasn't triggering anymore. However, it turns out that for handling reductions, see D75533, it's actually easier not not to have the VCTP in exit blocks, so this removes that code. This was possible because it turned out that some other code that depended on this, rematerialization of the trip count enabling more dead code removal later, wasn't doing much anymore due to more aggressive dead code removal that was added to the low-overhead loops pass. Differential Revision: https://reviews.llvm.org/D82773	2020-06-30 17:09:36 +01:00
Samuel Tebbs	d9cb811cbf	[ARM] Allow rounding intrinsics to be tail predicated This patch stops the trunc, rint, round, floor and ceil intrinsics from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82553	2020-06-30 16:52:25 +01:00
Guillaume Chatelet	c1cd61e02a	[Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemcpy to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82849	2020-06-30 13:12:31 +00:00
Guillaume Chatelet	306d7c6929	[Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemmove to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82850	2020-06-30 12:46:59 +00:00
Guillaume Chatelet	6a6af30d43	[Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemset to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82851	2020-06-30 12:46:26 +00:00
Guillaume Chatelet	4f5133a4dc	[Alignment][NFC] Migrate AArch64, ARM, Hexagon, MSP and NVPTX backends to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82749	2020-06-30 07:56:17 +00:00
David Green	deb72ce298	[ARM] Better reductions MVE has native reductions for integer add and min/max. The others need to be expanded to a series of extract's and scalar operators to reduce the vector into a single scalar. The default codegen for that expands the reduction into a series of in-order operations. This modifies that to something more suitable for MVE. The basic idea is to use vector operations until there are 4 remaining items then switch to pairwise operations. For example a v8f16 fadd reduction would become: Y = VREV X Z = ADD(X, Y) z0 = Z[0] + Z[1] z1 = Z[2] + Z[3] return z0 + z1 The awkwardness (there is always some) comes in from something like a v4f16, which is first legalized by adding identity values to the extra lanes of the reduction, and which can then not be optimized away through the vrev; fadd combo, the inserts remain. I've made sure they custom lower so that we can produce the pairwise additions before the extra values are added. Differential Revision: https://reviews.llvm.org/D81397	2020-06-29 16:04:13 +01:00
Guillaume Chatelet	368a5e3a66	[Alignment][NFC] migrate DataLayout::getPreferredAlignment This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82752	2020-06-29 11:24:36 +00:00
Simon Pilgrim	973685fc78	[TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.	2020-06-29 11:46:58 +01:00
Guillaume Chatelet	b66e33a689	[Alignment][NFC] Migrate TTI::getGatherScatterOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82577	2020-06-26 11:08:27 +00:00
Guillaume Chatelet	fdc7c7fb87	[Alignment][NFC] Migrate TTI::getInterleavedMemoryOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82573	2020-06-26 11:00:53 +00:00
David Green	d428f88152	[ARM] VCVTT fpround instruction selection Similar to the recent patch for fpext, this adds vcvtb and vcvtt with insert into vector instruction selection patterns for fptruncs. This helps clear up a lot of register shuffling that we would otherwise do. Differential Revision: https://reviews.llvm.org/D81637	2020-06-26 10:24:06 +01:00
David Green	76e0e1a55d	[ARM] VCVTT instruction selection We current extract and convert from a top lane of a f16 vector using a VMOVX;VCVTB pair. We can simplify that to use a single VCVTT. The pattern is mostly copied from a vector extract pattern, but produces a VCVTTHS f32 directly. This had to move some code around so that ARMInstrVFP had access to the required pattern frags that were previously part of ARMInstrNEON. Differential Revision: https://reviews.llvm.org/D81556	2020-06-26 08:58:55 +01:00
Sjoerd Meijer	1319d9bb84	[ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass Don't revert intrinsic get.active.lane.mask here, this is moved to isel legalization in D82292. Differential Revision: https://reviews.llvm.org/D82105	2020-06-26 07:42:39 +01:00
David Green	d79b57b8bb	[ARM] Split FPExt loads This extends PerformSplittingToWideningLoad to also handle FP_Ext, as well as sign and zero extends. It uses an integer extending load followed by a VCVTL on the bottom lanes to efficiently perform an fpext on a smaller than legal type. The existing code had to be rewritten a little to not just split the node in two and let legalization handle it from there, but to actually split into legal chunks. Differential Revision: https://reviews.llvm.org/D81340	2020-06-25 21:55:13 +01:00
David Green	8532b2ee89	[ARM] MVE VCVT lowering for f16->f32 extends This adds code to lower f16 to f32 fp_exts's using an MVE VCVT instructions, similar to a recent similar patch for fp_trunc. Again it goes through the lowering of a BUILD_VECTOR, but is slightly simpler only having to deal with interleaved indices. It adds a VCVTL node to lower to, similar to VCVTN. Differential Revision: https://reviews.llvm.org/D81339	2020-06-25 20:54:26 +01:00
David Green	0bfb4c2506	[ARM] Add FP_ROUND handling to splitting MVE stores This splits MVE vector stores of a fp_trunc in the same way that we do for standard trunc's. It extends PerformSplittingToNarrowingStores to handle fp_round, splitting the store into pieces and adding a VCVTNb to perform the actual fp_round. The actual store is then converted to an integer store so that it can truncate bottom lanes of the result. Differential Revision: https://reviews.llvm.org/D81141	2020-06-25 19:37:15 +01:00
David Green	b044a82270	[ARM] Fixup for signed comparison warning. NFC	2020-06-25 16:29:44 +01:00
David Green	3cb2190b0b	[ARM] MVE VCVT lowering for f32->f16 truncs This adds code to lower f32 to f16 fp_trunc's using a pair of MVE VCVT instructions. Due to v4f16 not being legal, fp_round are often split up fairly early. So this reconstructs the vcvt's from a buildvector of fp_rounds from two vector inputs. Something like: BUILDVECTOR(FP_ROUND(EXTRACT_ELT(X, 0), FP_ROUND(EXTRACT_ELT(Y, 0), FP_ROUND(EXTRACT_ELT(X, 1), FP_ROUND(EXTRACT_ELT(Y, 1), ...) It adds a VCVTN node to handle this, which like VMOVN or VQMOVN lowers into the top/bottom lanes of an MVE instruction. Differential Revision: https://reviews.llvm.org/D81139	2020-06-25 15:59:36 +01:00
Guillaume Chatelet	324cda2073	[Alignment][NFC] Conform X86, ARM and AArch64 TargetTransformInfo backends to the public API The main interface has been migrated to Align already but a few backends where broadening the type from Align to MaybeAlign. This patch makes sure all implementations conform to the public API. Differential Revision: https://reviews.llvm.org/D82465	2020-06-25 13:23:13 +00:00
Guillaume Chatelet	2e7bba693e	[Alignment][NFC] Use Align for TargetCallingConv::OrigAlign This patch replaces D69249. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82307	2020-06-25 13:21:22 +00:00
Sam Tebbs	187f627a50	[ARM] Allow tail predication on sadd_sat and uadd_sat intrinsics This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication. Differential revision: https://reviews.llvm.org/D82377	2020-06-25 11:54:29 +01:00
Stefan Agner	b7d41a11cd	[ARM] Make cp10 and cp11 usage a warning The ARM ARM considers p10/p11 valid arguments for MCR/MRC instructions. MRC instructions with p10 arguments are also used in kernel code which is shared for different architectures. Turn usage of p10/p11 to warnings for ARMv7/ARMv8-M. Reviewers: rengolin, olista01, t.p.northover, efriedma, psmith, simon_tatham Reviewed By: simon_tatham Subscribers: hiraditya, danielkiss, jcai19, tpimh, nickdesaulniers, peter.smith, javed.absar, kristof.beyls, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59733	2020-06-24 23:37:54 +02:00
dfukalov	7ddee0922f	[NFCI][CostModel] Add const to Value*. Summary: Get back `const` partially lost in one of recent changes. Additionally specify explicit qualifiers in few places. Reviewers: samparker Reviewed By: samparker Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82383	2020-06-24 23:16:08 +03:00
Simon Pilgrim	c18b753686	LoopUtils.h - reduce AliasAnalysis.h include to forward declarations. NFC. Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.	2020-06-24 17:58:38 +01:00
Simon Tatham	b769eb02b5	[ARM][BFloat] Legalize bf16 type even without fullfp16. Summary: This change permits scalar bfloats to be loaded, stored, moved and used as function call arguments and return values, whenever the bf16 feature is supported by the subtarget. Previously that was only supported in the presence of the fullfp16 feature, because the code generation strategy depended on instructions from that extension. This change adds alternative code generation strategies so that those operations can be done even without fullfp16. The strategy for loads and stores is to replace VLDRH/VSTRH with integer LDRH/STRH plus a move between register classes. I've written isel patterns for those, conditional on //not// having the fullfp16 feature (so that in the fullfp16 case, the existing patterns will still be used). For function arguments and returns, instead of writing isel patterns to match `VMOVhr` and `VMOVrh`, I've avoided generating those SDNodes in the first place, by factoring out the code that constructs them into helper functions `MoveToHPR` and `MoveFromHPR` which have a fallback for non-fullfp16 subtargets. The current output code is not especially pretty: in the new test file you can see unnecessary store/load pairs implementing no-op bitcasts, and lots of pointless moves back and forth between FP registers and GPRs. But it at least works, which is an improvement on the previous situation. Reviewers: dmgreen, SjoerdMeijer, stuij, chill, miyuki, labrinea Reviewed By: dmgreen, labrinea Subscribers: labrinea, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82372	2020-06-24 09:36:26 +01:00
Eli Friedman	e9d4e34ab8	[AArch64][SVE] Add legalization support for i32/i64 vector srem/urem Implement them on top of sdiv/udiv, similar to what we do for integer types. Potential future work: implementing i8/i16 srem/urem, optimizations for constant divisors, optimizing the mul+sub to mls. Differential Revision: https://reviews.llvm.org/D81511	2020-06-23 16:27:52 -07:00
David Green	d604cc6e9a	[ARM] Mark more integer instructions as not having side effects. LDRD and STRD along with UBFX and SBFX are selected from DAGToDAG transforms, so do not have tblgen patterns. They don't get marked as having side effects so cannot be scheduled as efficiently as you would like. This specifically marks then as not having side effects. Differential Revision: https://reviews.llvm.org/D82358	2020-06-23 22:45:51 +01:00
Momchil Velikov	adf7973fd3	[ARM] Describe defs/uses of VLLDM and VLSTM The VLLDM and VLSTM instructions are incompletely specified. They (potentially) write (or read, respectively) registers Q0-Q7, VPR, and FPSCR, but the compiler is unaware of it. In the new test case `cmse-vlldm-no-reorder.ll` case the compiler missed an anti-dependency and reordered a `VLLDM` ahead of the instruction, which stashed the return value from the non-secure call, effectively clobbering said value. This test case does not fail with upstream LLVM, because of scheduling differences and I couldn't find a test case for the VLSTM either. Differential Revision: https://reviews.llvm.org/D81586	2020-06-23 16:04:23 +01:00
Mikhail Maltsev	3f353a2e5a	[BFloat] Add convert/copy instrinsic support This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a Specifically it adds intrinsic support in clang and llvm for Arm and AArch64. The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile The following people contributed to this patch: - Alexandros Lamprineas - Luke Cheeseman - Mikhail Maltsev - Momchil Velikov - Luke Geeson Differential Revision: https://reviews.llvm.org/D80928	2020-06-23 14:27:05 +00:00
Mikhail Maltsev	9c579540ff	[ARM] BFloat MatMul Intrinsics&CodeGen Summary: This patch adds support for BFloat Matrix Multiplication Intrinsics and Code Generation from __bf16 to AArch32. This includes IR intrinsics. Tests are provided as needed. This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile The following people contributed to this patch: - Luke Geeson - Momchil Velikov - Mikhail Maltsev - Luke Cheeseman - Simon Tatham Reviewers: stuij, t.p.northover, SjoerdMeijer, sdesmalen, fpetrogalli, LukeGeeson, simon_tatham, dmgreen, MarkMurrayARM Reviewed By: MarkMurrayARM Subscribers: MarkMurrayARM, danielkiss, kristof.beyls, hiraditya, cfe-commits, llvm-commits, chill, miyuki Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D81740	2020-06-23 12:06:37 +00:00
Christopher Tetreault	cd6848f6e1	[SVE] Remove calls to VectorType::getNumElements from ARM Reviewers: efriedma, greened, c-rhodes, david-arm, dmgreen Reviewed By: dmgreen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, dmgreen, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82216	2020-06-22 15:18:58 -07:00
Eric Christopher	8116d01905	Typos around a -> an.	2020-06-20 14:04:48 -07:00
Eric Christopher	cf23852587	[Target] As part of using inclusive language within the llvm project, migrate away from the use of blacklist and whitelist. This change affects an internal llvm command line option.	2020-06-20 00:06:39 -07:00
Sjoerd Meijer	4aa893b8f2	[ARM][MVE] tail-predication: renamed internal option. Renamed -force-tail-predication to -force-mve-tail-predication because that's more descriptive and consistent.	2020-06-19 15:07:06 +01:00
Mikhail Maltsev	490f78c038	[ARM][BFloat] Implement lowering of bf16 load/store intrinsics Reviewers: labrinea, dmgreen, pratlucas, LukeGeeson Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81486	2020-06-19 14:02:35 +00:00
Mikhail Maltsev	7526881246	[ARM][BFloat] Lowering of create/get/set/dup intrinsics This patch adds codegen for the following BFloat operations to the ARM backend: * concatenation of bf16 vectors * bf16 vector element extraction * bf16 vector element insertion * duplication of a bf16 value into each lane of a vector * duplication of a bf16 vector lane into each lane Differential Revision: https://reviews.llvm.org/D81411	2020-06-19 12:52:40 +00:00
Matt Arsenault	7f8b2e1b91	GlobalISel: Pass LegalizerHelper to custom legalize callbacks This was passing in all the parameters needed to construct a LegalizerHelper in the custom legalization, when it's simpler to just pass in the existing helper. This is slightly more annoying to use in the common case where you don't need the legalizer helper, but we could add back the common parameters back in addition to the helper. I didn't propagate this to all the internal target changes that this logically implies, but did update a sample one for legalizeMinNumMaxNum. This is in preparation for moving AMDGPU load/store legalization entirely into custom lowering. The current set of legalization actions is really constraining and not really capable of expressing all the actions needed to legalize loads/stores. In particular there's no way to express when the memory access itself needs to change size vs. the result type. There's also a lot of redundancy since the same split/widen actions need to be applied in both vector and scalar cases. All of the sub-cases logically belong as steps in the legalizer helper, but it will be easier to consider everything at once in custom lowering.	2020-06-18 17:17:38 -04:00
Alexandros Lamprineas	ecdf48f15b	[ARM] Basic bfloat support This patch adds basic support for BFloat in the Arm backend. For now the code generation relies on fullfp16 being present. Briefly: * adds the bfloat scalar and vector types in the necessary register classes, * adjusts the calling convention to cope with bfloat argument passing and return, * adds codegen patterns for moves, loads and stores. It's tested mostly by the intrinsic patches that depend on it (load/store, convert/copy). The following people contributed to this patch: * Alexandros Lamprineas * Ties Stuij Differential Revision: https://reviews.llvm.org/D81373	2020-06-18 17:26:24 +01:00
Lucas Prates	92ad6d57c2	[ARM] Moving CMSE handling of half arguments and return to the backend Summary: As half-precision floating point arguments and returns were previously coerced to either float or int32 by clang's codegen, the CMSE handling of those was also performed in clang's side by zeroing the unused MSBs of the coercer values. This patch moves this handling to the backend's calling convention lowering, making sure the high bits of the registers used by half-precision arguments and returns are zeroed. Reviewers: chill, rjmccall, ostannard Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D81428	2020-06-18 13:16:29 +01:00
Lucas Prates	a255931c40	[ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend Summary: Half-precision floating point arguments and returns are currently promoted to either float or int32 in clang's CodeGen and there's no existing support for the lowering of `half` arguments and returns from IR in AArch32's backend. Such frontend coercions, implemented as coercion through memory in clang, can cause a series of issues in argument lowering, as causing arguments to be stored on the wrong bits on big-endian architectures and incurring in missing overflow detections in the return of certain functions. This patch introduces the handling of half-precision arguments and returns in the backend using the actual "half" type on the IR. Using the "half" type the backend is able to properly enforce the AAPCS' directions for those arguments, making sure they are stored on the proper bits of the registers and performing the necessary floating point convertions. Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer Reviewed By: ostannard Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75169	2020-06-18 13:15:13 +01:00
David Green	158e734af1	[ARM] Adjust AND/OR combines to not call isConstantSplat on i1 vectors. NFC. The rearranges PerformANDCombine and PerformORCombine to try and make sure we don't call isConstantSplat on any i1 vectors. As pointed out in D81860 it may not be very well defined in those cases.	2020-06-18 08:25:44 +01:00
Sjoerd Meijer	d1522513d4	[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use intrinsic @llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in D79100. This intrinsic generates a predicate for the masked loads/stores, and consumes the Backedge Taken Count (BTC) as its second argument. We can now use that to reconstruct the loop tripcount, instead of the IR pattern match approach we were using before. Many thanks to Eli Friedman and Sam Parker for all their help with this work. This also adds overflow checks for the different, new expressions that we create: the loop tripcount, and the sub expression that calculates the remaining elements to be processed. For the latter, SCEV is not able to calculate precise enough bounds, so we work around that at the moment, but is not entirely correct yet, it's conservative. The overflow checks can be overruled with a force flag, which is thus potentially unsafe (but not really because the vectoriser is the only place where this intrinsic is emitted at the moment). It's also good to mention that the tail-predication pass is not yet enabled by default. We will follow up to see if we can implement these overflow checks better, either by a change in SCEV or we may want revise the definition of llvm.get.active.lane.mask. Differential Revision: https://reviews.llvm.org/D79175	2020-06-17 15:17:42 +01:00
Sjoerd Meijer	e345d547a0	Recommit "[LV] Emit @llvm.get.active.lane.mask for tail-folded loops" Fixed ARM regression test. Please see the original commit message rG47650451738c for details.	2020-06-17 13:12:15 +01:00
Sjoerd Meijer	d4e183f686	Revert "[LV] Emit @llvm.get.active.mask for tail-folded loops" This reverts commit `4765045173` while I investigate the build bot failures.	2020-06-17 10:09:54 +01:00
Sjoerd Meijer	4765045173	[LV] Emit @llvm.get.active.mask for tail-folded loops This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised loops if the intrinsic is supported by the backend, which is checked by querying TargetTransform hook emitGetActiveLaneMask. This intrinsic creates a mask representing active and inactive vector lanes, which is used by the masked load/store instructions that are created for tail-folded loops. The semantics of @llvm.get.active.mask are described here in LangRef: https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics This intrinsic is also used to provide a hint to the backend. That is, the second argument of the intrinsic represents the back-edge taken count of the loop. For MVE, for example, we use that to set up tail-predication, which is a new form of predication in MVE for vector loops that implicitely predicates the last vector loop iteration by implicitely setting active/inactive lanes, i.e. the tail loop is predicated. In order to set up a tail-predicated vector loop, we need to know the number of data elements processed by the vector loop, which corresponds the the tripcount of the scalar loop, which we can now reconstruct using @llvm.get.active.mask. Differential Revision: https://reviews.llvm.org/D79100	2020-06-17 09:53:58 +01:00
Sjoerd Meijer	20835cff27	[TTI] Refactor emitGetActiveLaneMask Refactor TTI hook emitGetActiveLaneMask and remove the unused arguments as suggested in D79100.	2020-06-17 09:53:58 +01:00

... 5 6 7 8 9 ...

11532 Commits