llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	fdfdee98ac	[DAGCombiner] Teach SimplifySetCC SETUGE X, SINTMIN -> SETLT X, 0 and SETULE X, SINTMAX -> SETGT X, -1. These aren't the canonical forms we'd get from InstCombine, but we do have X86 tests for them. Recognizing them is pretty cheap. While there make use of APInt:isSignedMinValue/isSignedMaxValue instead of creating a new APInt to compare with. Also use SelectionDAG::getAllOnesConstant helper to hide the all ones APInt creation.	2020-08-08 22:27:16 -07:00
Sanjay Patel	f22ac1d15b	[DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division, part 2 Follow-up to D82716 / rGea71ba11ab11 We do not have the fabs removal fold in IR yet for the case where the sqrt operand is repeated, so that's another potential improvement.	2020-08-08 10:38:06 -04:00
Bevin Hansson	5de6c56f7e	[Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. Summary: This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat, which perform signed and unsigned saturating left shift, respectively. These are useful for implementing the Embedded-C fixed point support in Clang, originally discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html and http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html Reviewers: leonardchan, craig.topper, bjope, jdoerfert Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83216	2020-08-07 15:09:24 +02:00
Simon Pilgrim	66a163f328	[DAG] GetDemandedBits - remove custom AND handling. As mentioned on D85463, we should be using SimplifyMultipleUseDemandedBits (which is the default fallback). The minor regression in illegal-bitfield-loadstore.ll will be addressed properly by D77804.	2020-08-07 12:55:47 +01:00
Simon Pilgrim	fcefb53222	Remove unreachable break. NFC	2020-08-07 12:37:49 +01:00
Craig Topper	ffc248f3b8	[LegalTypes] Move VSELECT node creation out of WidenVSELECTAndMask and push to 2 of the 3 callers. One of the callers only wants the condition, but the vselect can be simplified by getNode making it hard or impossible to retrieve the condition. Instead, return the condition and make the other 2 callers responsible for creating the vselect node using the condition. Rename the function to WidenVSELECTMask accordingly. Differential Revision: https://reviews.llvm.org/D85468	2020-08-06 13:18:16 -07:00
Paul Walker	0d33a8ef5b	[SVE] Lower scalable vector mul operations. This allows us to remove extra patterns from AArch64SVEInstrInfo.td because we can reuse those required for fixed length vectors. Differential Revision: https://reviews.llvm.org/D85328	2020-08-06 11:15:35 +01:00
Simon Pilgrim	4aaf301fb8	[DAG] Fold vector (aext (load x)) -> (zext (truncate (zextload x))) We currently don't do anything to fold any_extend vector loads as no target has such an instruction. Instead I've added support for folding to a zextload, SimplifyDemandedBits does a good job of adjusting the zext(truncate(()) stages as required later on. We still need the custom scalar extload handling instead of using the tryToFoldExtOfLoad helper as it has different legality tests - we can probably tweak that to reduce most of the code duplication. Fixes the regression I mentioned in rG99a971cadff7 Differential Revision: https://reviews.llvm.org/D85129	2020-08-05 11:22:23 +01:00
Eli Friedman	4a47f1c4ce	[SelectionDAG][SVE] Support scalable vectors in getConstantFP() Differential Revision: https://reviews.llvm.org/D85249	2020-08-04 15:32:43 -07:00
Cameron McInally	0f2b47b6da	[FastISel] Don't transform FSUB(-0, X) -> FNEG(X) in FastISel This corresponds with the SelectionDAGISel change in D84056. Also, rename some poorly named tests in CodeGen/X86/fast-isel-fneg.ll with NFC. Differential Revision: https://reviews.llvm.org/D85149	2020-08-04 14:42:53 -05:00
Jay Foad	28e322ea93	[PowerPC] Custom lowering for funnel shifts The custom lowering saves an instruction over the generic expansion, by taking advantage of the fact that PowerPC shift instructions are well defined in the shift-by-bitwidth case. Differential Revision: https://reviews.llvm.org/D83948	2020-08-04 16:30:49 +01:00
Cameron McInally	31c7a2fd5c	[FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder. This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D84056	2020-08-03 10:22:25 -05:00
Simon Pilgrim	b8ffbf0e02	[DAG] TargetLowering::expandMUL_LOHI - pass SDLoc as const& Try to be more consistent with the SDLoc param in the TargetLowering methods. This also exposes an issue where we were passing a SDNode as a SDLoc, relying on the implicit SDLoc(SDNode) constructor.	2020-08-02 15:31:36 +01:00
Simon Pilgrim	d14a22da5e	[DAG] TargetLowering::LowerAsmOutputForConstraint - pass SDLoc as const& Try to be more consistent with the SDLoc param in the TargetLowering methods.	2020-08-02 15:12:02 +01:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Eli Friedman	7e88efa7c5	[LegalizeTypes][SVE] Support widen/split legalization for SPLAT_VECTOR Just the obvious implementation that rewrites the result type. Also fix warning from EXTRACT_SUBVECTOR legalization that triggers on the test. Differential Revision: https://reviews.llvm.org/D84706	2020-07-30 16:17:45 -07:00
Jon Roelofs	afae6d97fa	[SelectionDAG] Fix lowering of vector geps This fixes an assertion failure that was being triggered in SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32> to i64 (which should have been <2xi64>). Fixes: rdar://66016901 Differential Revision: https://reviews.llvm.org/D84884	2020-07-30 14:56:53 -06:00
Sam Tebbs	276ed5f7e4	[DAGCombiner] Fold sext_inreg of a masked load into a sign extended masked load This patch adds a DAG combine fold for a sext(masked_load) into a sign extended masked load. Differential Revision: https://reviews.llvm.org/D84332	2020-07-30 10:34:02 +01:00
Philip Reames	755f91f12c	[Statepoint] Enable cross block relocates w/vreg lowering This change is mechanical, it just removes the restriction and updates tests. The key building blocks were submitted in `31342eb` and `8fe2abc`. Note that this (and preceeding changes) entirely subsumes D83965. I did includes a couple of it's tests. From the codegen changes, an interesting observation: this doesn't actual reduce spilling, it just let's the register allocator do it's job. That results in a slightly different overall result which has both pros and cons over the eager spill lowering. (i.e. We'll have some perf tuning to do once this is stable.)	2020-07-29 13:32:51 -07:00
Philip Reames	8fe2abc190	[Statepoint] Consolidate relocation type tracking [NFC] Change the way we track how a particular pointer was relocated at a statepoint in selection dag. Previously, we used an optional<location> for the spill lowering, and a block local Register for the newly introduced vreg lowering. Combine all three lowerings (norelocate, spill, and vreg) into a single helper class, and keep a single copy of the information. This is submitted separately as it really does make the code more readible on it's own, but the indirect motivation is to move vreg tracking from StatepointLowering to FunctionLoweringInfo. This is the last piece needed to support cross block relocations with vregs; that will follow in a separate (non-NFC) patch.	2020-07-29 11:45:31 -07:00
Simon Pilgrim	fdc902774e	[DAG][AMDGPU][X86] Add SimplifyMultipleUseDemandedBits handling for SIGN/ZERO_EXTEND + SIGN/ZERO_EXTEND_VECTOR_INREG Peek through multiple use ops like we already do for ANY_EXTEND/ANY_EXTEND_VECTOR_INREG Differential Revision: https://reviews.llvm.org/D84863	2020-07-29 18:10:59 +01:00
Philip Reames	31342eb63e	[Statepoint] When using the tied def lowering, unconditionally use vregs [almost NFC] This builds on `3da1a96` on the path towards supporting invokes and cross block relocations. The actual change attempts to be NFC, but does fail in one corner-case explained below. The change itself is fairly mechanical. Rather than remember SDValues - which are inherently block local - immediately produce a virtual register copy and remember that. Once this lands, we'll update the FunctionLoweringInfo::StatepointSpillMap map to allow register based lowerings, delete VirtRegs from StatepointLowering, and drop the restriction against cross block relocations. I deliberately separate the semantic part into it's own change for easy of understanding and fault isolation. The corner-case which isn't quite NFC is that the old implementation implicitly CSEd gc.relocates of the same SDValue regardless of type. The new implementation still only relocates once, but it produces distinct vregs for the bitcast and it's source, whereas SelectionDAG's generic CSE was able to remove the bitcast in the old implementation. Note that the final assembly doesn't change (at least in the test), as our MI level optimizations catch the duplication. I assert that this is an uninteresting corner-case. It's functionally correct, and if we find a case where this influences performance, we should really be canonicalizing types to i8* at the IR level. Differential Revision: https://reviews.llvm.org/D84692	2020-07-29 09:23:52 -07:00
David Sherwood	2078771759	[SVE][CodeGen] Add simple integer add tests for SVE tuple types I have added tests to: CodeGen/AArch64/sve-intrinsics-int-arith.ll for doing simple integer add operations on tuple types. Since these tests introduced new warnings due to incorrect use of getVectorNumElements() I have also fixed up these warnings in the same patch. These fixes are: 1. In narrowExtractedVectorBinOp I have changed the code to bail out early for scalable vector types, since we've not yet hit a case that proves the optimisations are profitable for scalable vectors. 2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced calls to getVectorNumElements with getVectorMinNumElements in cases that work with scalable vectors. For the other cases I have added asserts that the vector is not scalable because we should not be using shuffle vectors and build vectors in such cases. Differential revision: https://reviews.llvm.org/D84016	2020-07-29 13:32:10 +01:00
David Sherwood	5d84eafc6b	[CodeGen] Remove calls to getVectorNumElements in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR In DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR I have replaced calls to getVectorNumElements with getVectorMinNumElements, since this code path works for both fixed and scalable vector types. For scalable vectors the index will be multiplied by VSCALE. Fixes warnings in this test: sve-sext-zext.ll Differential revision: https://reviews.llvm.org/D83198	2020-07-29 13:05:39 +01:00
Simon Pilgrim	b4b6e77454	[DAG] isSplatValue - add support for TRUNCATE/SIGN_EXTEND/ZERO_EXTEND These are just pass-throughs to the source operand - we can't assume that ANY_EXTEND(splat) will still be a splat though.	2020-07-28 19:56:11 +01:00
Changpeng Fang	9162b70e51	DADCombiner: Don't simplify the token factor if the node's number of operands already exceeds TokenFactorInlineLimit Summary: In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000. We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose to give up simplification with the consideration of compile time. Reviewers: @spatel, @arsenm Differential Revision: https://reviews.llvm.org/D84204	2020-07-25 21:20:59 -07:00
Eric Christopher	18975762c1	Fold StatepointBB into checks as it's only used from an NDEBUG or ASSERT context fixing an unused variable warning.	2020-07-25 18:36:53 -07:00
Philip Reames	55dae9c20c	[Statepoints] Style cleanup after `3da1a963` [NFC] Just fixing a few minor stylistic issues.	2020-07-25 16:40:39 -07:00
Philip Reames	3da1a9634e	[Statepoints] Support lowering gc relocations to virtual registers (Disabled under flag for the moment) This is part of a larger project wherein we are finally integrating lowering of gc live operands with the register allocator. Today, we force spill all operands in SelectionDAG. The code to do so is distinctly non-optimal. The approach this patch is working towards is to instead lower the relocations directly into the MI form, and let the register allocator pick which ones get spilled and which stack slots they get spilled to. In terms of performance, the later part is actually more important as it avoids redundant shuffling of values between stack slots. This particular change adds ISEL support to produce the variadic def STATEPOINT form required by the above. In particular, the first N are lowered to variadic tied def/use pairs. So new statepoint looks like this: reloc1,reloc2,... = STATEPOINT ..., base1, derived1<tied-def0>, base2, derived2<tied-def1>, ... N is limited by the maximal number of tied registers machine instruction can have (15 at the moment). The current patch is restricted to handling relocations within a single basic block. Cross block relocations (e.g. invokes) are handled via the legacy mechanism. This restriction will be relaxed in future patches. Patch By: dantrushin Differential Revision: https://reviews.llvm.org/D81648	2020-07-25 14:26:05 -07:00
Craig Topper	8131e19064	[LegalizeTypes] Teach DAGTypeLegalizer::GenWidenVectorLoads to pad with undef if needed when concatenating small or loads to match a larger load In the included test case the align 16 allowed the v23f32 load to handled as load v16f32, load v4f32, and load v4f32(one element not used). These loads all need to be concatenated together into a final vector. In this case we tried to concatenate the two v4f32 loads to match the type of the v16f32 load so we could do a second concat_vectors, but those loads alone only add up to v8f32. So we need to two v4f32 undefs to pad it. It appears we've tried to hack around a similar issue in this code before by adding undef padding to loads in one of the earlier loops in this function. Originally in r147964 by padding all loads narrower than previous loads to the same size. Later modifed to only the last load in r293088. This patch removes that earlier code and just handles it on demand where we know we need it. Fixes PR46820 Differential Revision: https://reviews.llvm.org/D84463	2020-07-23 19:02:03 -07:00
Nikita Popov	deb4bb2b3a	[IR] Add min/max/abs intrinsics This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(), and llvm.smax() intrinsics specified in D81829. For SelectionDAG, the ISD opcodes and all the legalization and lowering already exist, so this just wires them up to the intrinsic in the SDAG builder and adds rudimentary tests. For GlobalISel only the min/max intrinsics are wired up, as llvm.abs() will require the addition of a G_ABS op, and corresponding legalization support. Differential Revision: https://reviews.llvm.org/D84125	2020-07-23 20:56:19 +02:00
Florian Hahn	6c9da995fc	[ScheduleDAGRRList] Pacify overload mismatch in std::min. On systems where size() doesn't return unsigned long, this leads to an overloading mismatch. Convert the constant to whatever type is used for Q.size() on the system.	2020-07-23 11:56:50 +01:00
Florian Hahn	2f8e6b5f3c	[ScheduleDAGRRList] Limit number of candidates to explore. Currently popFromQueueImpl iterates over all candidates to find the best one. While the candidate queue is small, this is not a problem. But it becomes a problem once the queue gets larger. For example, the snippet below takes 330s to compile with llc -O0, but completes in 3s with this patch. define void @test(i4000000* %ptr) { entry: store i4000000 0, i4000000* %ptr, align 4 ret void } This patch limits the number of candidates to check to 1000. This limit ensures that it never triggers for test-suite/SPEC2000/SPEC2006 on X86 and AArch64 with -O3, while still drastically limiting the compile-time in case of very large queues. It would be even better to use a binary heap to manage to queue (D83335), but some heuristics change the score of a node in the queue after another node has been scheduled. I plan to address this for backends that use the MachineScheduler in the future, but that requires a more careful evaluation. In the meantime, the limit should help users impacted by this issue. The patch includes a slightly smaller version of the motivating example as test case, to guard against the issue. Reviewers: efriedma, paquette, niravd Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84328	2020-07-23 11:35:33 +01:00
Simon Pilgrim	fa95688237	SelectionDAGBuilder.cpp - remove duplicate includes that already exist in SelectionDAGBuilder.h. NFC.	2020-07-22 14:19:41 +01:00
Matt Arsenault	f659c44016	CodeGen: Add support for lowering byref attribute	2020-07-21 17:38:15 -04:00
Matt Arsenault	2fe0ea8261	DAG: Handle expanding strict_fsub into fneg and strict_fadd The AMDGPU handling of f16 vectors is terrible still since it gets scalarized even when the vector operation is legal. The code is is essentially duplicated between the non-strict and strict case. Apparently no other expansions are currently trying to do this. This is mostly because I found the behavior of getStrictFPOperationAction to be confusing. In the ARM case, it would expand strict_fsub even though it shouldn't due to the later check. At that point, the logic required to check for legality was more complex than just duplicating the 2 instruction expansion.	2020-07-21 16:17:10 -04:00
Eli Friedman	b8f765a1e1	[AArch64][SVE] Add support for trunc to <vscale x N x i1>. This isn't a natively supported operation, so convert it to a mask+compare. In addition to the operation itself, fix up some surrounding stuff to make the testcase work: we need concat_vectors on i1 vectors, we need legalization of i1 vector truncates, and we need to fix up all the relevant uses of getVectorNumElements(). Differential Revision: https://reviews.llvm.org/D83811	2020-07-20 13:11:02 -07:00
Florian Hahn	e297006d6f	[ScheduleDAG] Move DBG_VALUEs after first term forward. MBBs are not allowed to have non-terminator instructions after the first terminator. Currently in some cases (see the modified test), EmitSchedule can add DBG_VALUEs after the last terminator, for example when referring a debug value that gets folded into a TCRETURN instruction on ARM. This patch updates EmitSchedule to move inserted DBG_VALUEs just before the first terminator. I am not sure if there are terminators produce values that can in turn be used by a DBG_VALUE. In that case, moving the DBG_VALUE might result in referencing an undefined register. But in any case, it seems like currently there is no way to insert a proper DBG_VALUEs for such registers anyways. Alternatively it might make sense to just remove those extra DBG_VALUES. I am not too familiar with the details of debug info in the backend and would appreciate any suggestions on how to address the issue in the best possible way. Reviewers: vsk, aprantl, jpaquette, efriedma, paquette Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D83561	2020-07-17 10:27:43 +01:00
Matt Arsenault	9d3e56e2ee	DAG: Try scalarizing when expanding saturating add/sub In an upcoming AMDGPU patch, the scalar cases will be legal and vector ops should be scalarized, rather than producing a long sequence of vector ops which will also need to be scalarized. Use a lazy heuristic that seems to work and improves the thumb2 MVE test.	2020-07-16 14:05:16 -04:00
Matt Arsenault	023883a834	IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref When the byref attribute is added, there will need to be two similar functions for the existing cases which have an associate value copy, and byref which does not. Most, but not all of the existing uses will use the existing version. The associated size function added by D82679 also needs to contextually differ, and will help eliminate a few places still relying on pointee element types.	2020-07-16 13:50:49 -04:00
Kerry McLaughlin	2762da0a16	[SVE][CodeGen] Legalisation of masked loads and stores Summary: This patch modifies IncrementMemoryAddress to use a vscale when calculating the new address if the data type is scalable. Also adds tablegen patterns which match an extract_subvector of a legal predicate type with zip1/zip2 instructions Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma, david-arm Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83137	2020-07-16 10:55:45 +01:00
Hiroshi Yamauchi	f233b92f92	[PGO][PGSO] Add profile guided size optimization to LegalizeDAG. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83333	2020-07-15 10:03:38 -07:00
Cameron McInally	ae51a70030	[Legalize] Hoist invariant condition in ExpandVectorBuildThroughStack(...) The operands of a BUILD_VECTOR must all have the same type, so we can hoist this invariant condition out of the loop. Differential Revision: https://reviews.llvm.org/D83882	2020-07-15 11:05:20 -05:00
Tim Northover	5165b2b5fd	AArch64+ARM: make LLVM consider system registers volatile. Some of the system registers readable on AArch64 and ARM platforms return different values with each read (for example a timer counter), these shouldn't be hoisted outside loops or otherwise interfered with, but the normal @llvm.read_register intrinsic is only considered to read memory. This introduces a separate @llvm.read_volatile_register intrinsic and maps all system-registers on ARM platforms to use it for the __builtin_arm_rsr calls. Registers declared with asm("r9") or similar are unaffected.	2020-07-15 09:47:36 +01:00
Roger Ferrer Ibanez	14bc5e149d	[DAGCombiner] Rebuild (setcc x, y, ==) from (xor (xor x, y), 1) The existing code already considered this case. Unfortunately a typo in the condition prevents it from triggering. Also the existing code, had it run, forgot to do the folding. This fixes PR42876. Differential Revision: https://reviews.llvm.org/D65802	2020-07-15 07:34:22 +00:00
Paul Walker	6e198aae1d	[SelectionDAG] Prevent warnings when extracting fixed length vector from scalable. ComputeNumSignBits and computeKnownBits both trigger "Scalable flag may be dropped" warnings when a fixed length vector is extracted from a scalable vector. This patch assumes nothing about the demanded elements thus matching the behaviour when extracting a scalable vector from a scalable vector. Differential Revision: https://reviews.llvm.org/D83642	2020-07-14 11:12:56 +00:00
David Sherwood	3b8eaf26db	[SVE][CodeGen] Fix implicit TypeSize->uint64_t conversion in TransformFPLoadStorePair In DAGCombiner::TransformFPLoadStorePair we were dropping the scalable property of TypeSize when trying to create an integer type of equivalent size. In fact, this optimisation makes no sense for scalable types since we don't know the size at compile time. I have changed the code to bail out when encountering scalable type sizes. I've added a test to llvm/test/CodeGen/AArch64/sve-fp.ll that exercises this code path. The test already emits an error if it encounters warnings due to implicit TypeSize->uint64_t conversions. Differential Revision: https://reviews.llvm.org/D83572	2020-07-14 08:07:30 +01:00
Sanjay Patel	8779b11410	[DAGCombiner] rot i16 X, 8 --> bswap X We have this generic transform in IR (instcombine), but as shown in PR41098: http://bugs.llvm.org/PR41098 ...the pattern may emerge in codegen too. x86 has a potential refinement/reversal opportunity here, but that should come later or needs a target hook to avoid the transform. Converting to bswap is the more specific form, so we should use it if it is available.	2020-07-13 12:01:53 -04:00
Sanjay Patel	2df46a5743	[DAGCombiner] allow load/store merging if pairs can be rotated into place This carves out an exception for a pair of consecutive loads that are reversed from the consecutive order of a pair of stores. All of the existing profitability/legality checks for the memops remain between the 2 altered hunks of code. This should give us the same x86 base-case asm that gcc gets in PR41098 and PR44895: http://bugs.llvm.org/PR41098 http://bugs.llvm.org/PR44895 I think we are missing a potential subsequent conversion to use "movbe" if the target supports that. That might be similar to what AArch64 would use to get "rev16". Differential Revision: https://reviews.llvm.org/D83567	2020-07-13 08:57:00 -04:00
Sanjay Patel	f1bbf3acb4	Revert "[DAGCombiner] allow load/store merging if pairs can be rotated into place" This reverts commit `591a3af5c7`. The commit message was cut off and failed to include the review citation.	2020-07-13 08:55:29 -04:00
Sanjay Patel	591a3af5c7	[DAGCombiner] allow load/store merging if pairs can be rotated into place This carves out an exception for a pair of consecutive loads that are reversed from the consecutive order of a pair of stores. All of the existing profitability/legality checks for the memops remain between the 2 altered hunks of code. This should give us the same x86 base-case asm that gcc gets in PR41098 and PR44895:i http://bugs.llvm.org/PR41098 http://bugs.llvm.org/PR44895 I think we are missing a potential subsequent conversion to use "movbe" if the target supports that. That might be similar to what AArch64 would use to get "rev16". Differential Revision:	2020-07-13 08:53:06 -04:00
Kerry McLaughlin	afcc9a81d2	[SVE][Codegen] Add a helper function for pointer increment logic Summary: Helper used when splitting load & store operations to calculate the pointer + offset for the high half of the split Reviewers: efriedma, sdesmalen, david-arm Reviewed By: efriedma Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83577	2020-07-13 10:53:40 +01:00
Sanjay Patel	39009a8245	[DAGCombiner] tighten fast-math constraints for fma fold fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E) This is only allowed when "reassoc" is present on the fadd. As discussed in D80801, this transform goes beyond what is allowed by "contract" FMF (-ffp-contract=fast). That is because we are fusing the trailing add of 'E' with a multiply, but without "reassoc", the code mandates that the products AB and CD are added together before adding in 'E'. I've added this example to the LangRef to try to clarify the meaning of "contract". If that seems reasonable, we should probably do something similar for the clang docs because there does not appear to be any formal spec for the behavior of -ffp-contract=fast. Differential Revision: https://reviews.llvm.org/D82499	2020-07-12 08:51:49 -04:00
Sanjay Patel	02fec9d2a5	[DAGCombiner] move/rename variables for readability; NFC	2020-07-10 11:28:51 -04:00
David Sherwood	da731894a2	[CodeGen] Replace calls to getVectorNumElements() in DAGTypeLegalizer::SetSplitVector In DAGTypeLegalizer::SetSplitVector I have changed calls in the assert from getVectorNumElements() to getVectorElementCount(), since this code path works for both fixed and scalable vectors. This fixes up one warning in the test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83196	2020-07-10 08:29:17 +01:00
David Sherwood	229dfb4728	[CodeGen] Replace calls to getVectorNumElements() in SelectionDAG::SplitVector This patch replaces some invalid calls to getVectorNumElements() with calls to getVectorMinNumElements() instead, since the code paths changed in this patch work for both fixed and scalable vector types. Fixes warnings in this test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83203	2020-07-10 08:11:30 +01:00
Sanjay Patel	a46cf40240	[DAGCombiner] convert if-chain in store merging to switch; NFC	2020-07-09 17:20:04 -04:00
Sanjay Patel	b476e6a642	[DAGCombiner] add helper function for store merging of loaded values; NFC	2020-07-09 17:20:04 -04:00
Sanjay Patel	f98a602c2e	[DAGCombiner] add helper function for store merging of extracts; NFC	2020-07-09 17:20:03 -04:00
Sanjay Patel	8d74cb01b7	[DAGCombiner] add helper function for store merging of constants; NFC	2020-07-09 17:20:03 -04:00
Sanjay Patel	6890e2a17b	[DAGCombiner] add helper function to manage list of consecutive stores; NFC	2020-07-09 17:20:03 -04:00
Christopher Tetreault	ff5b9a7b3b	[SVE] Remove calls to VectorType::getNumElements from CodeGen Reviewers: efriedma, fpetrogalli, sdesmalen, RKSimon, arsenm Reviewed By: RKSimon Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82210	2020-07-09 12:43:36 -07:00
Lucas Prates	fc39a9ca0e	[CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand Summary: When legalizing a biscast operation from an fp16 operand to an i16 on a target that requires both input and output types to be promoted to 32-bits, an assertion can fail when building the new node due to a mismatch between the the operation's result size and the type specified to the node. This patches fix the issue by making sure the bit width of the types match for the FP_TO_FP16 node, covering the difference with an extra ANYEXTEND operation. Reviewers: ostannard, efriedma, pirama, jmolloy, plotfi Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82552	2020-07-09 09:46:17 +01:00
Qiu Chaofan	4254ed5c32	[Legalizer] Fix wrong operand in split vector helper This should be a typo introduced in D69275, which may cause an unknown segment fault in getNode. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D83376	2020-07-09 09:57:29 +08:00
Matt Arsenault	18bd821f02	DAG: Remove redundant finalizeLowering call 9cac4e6d1403554b06ec2fc9d834087b1234b695/D32628 intended to eliminate this, and move all isel pseudo expansion to FinalizeISel. This was a bad rebase or something, and failed to actually delete this call. GlobalISel also has a redundant call of finalizeLowering. However, it requires more work to remove it since it currently triggers a lot of verifier errors in tests.	2020-07-08 18:48:20 -04:00
Matt Arsenault	2ec5fc0c61	DAG: Remove redundant handling of reg fixups It looks like `9cac4e6d14` accidentally added a second copy of this from a bad rebase or something. This second copy was added, and the finalizeLowering call was not deleted as intended.	2020-07-08 18:32:43 -04:00
Sanjay Patel	1265eb2d5f	[DAGCombiner] clean up in mergeConsecutiveStores(); NFC	2020-07-08 14:48:05 -04:00
Sanjay Patel	12c2271e53	[DAGCombiner] fix code comment and improve readability; NFC	2020-07-08 14:48:05 -04:00
Sanjay Patel	683a7f7025	[DAGCombiner] fix function-name formatting; NFC	2020-07-08 12:49:59 -04:00
Sanjay Patel	39329d5724	[DAGCombiner] add enum for store source value; NFC This removes existing code duplication and allows us to assert that we are handling the expected cases. We have a list of outstanding bugs that could benefit by handling truncated source values, so that's a possible addition going forward.	2020-07-08 12:49:59 -04:00
Paul Walker	bb35f0fd89	[SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. ExpandVectorBuildThroughStack is also used for CONCAT_VECTORS. However, when calculating the offsets for each of the operands we incorrectly use the element size rather than actual size and thus the stores overlap. Differential Revision: https://reviews.llvm.org/D83303	2020-07-08 15:39:25 +00:00
Ties Stuij	26a22478cd	[CodeGen] Don't combine extract + concat vectors with non-legal types Summary: The following combine currently breaks in the DAGCombiner: ``` extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x -> extract_vector_elt a, x ``` This happens because after we have combined these nodes we have inserted nodes that use individual instances of the vector element type. In the above example i16. However this isn't a legal type on all backends, and when the combining pass calls the legalizer it breaks as it expects types to already be legal. The type legalizer has already been run, and running it again would make a mess of the nodes. In the example code at least, the generated code is still efficient after the change. Reviewers: miyuki, arsenm, dmgreen, lebedev.ri Reviewed By: miyuki, lebedev.ri Subscribers: lebedev.ri, wdng, hiraditya, steven.zhang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83231	2020-07-08 15:29:57 +01:00
David Sherwood	9e66e9c30a	[CodeGen] Fix wrong use of getVectorNumElements() in DAGTypeLegalizer::SplitVecRes_ExtendOp In DAGTypeLegalizer::SplitVecRes_ExtendOp I have replaced an invalid call to getVectorNumElements() with a call to getVectorMinNumElements(), since the code path works for both fixed and scalable vectors. This fixes up a warning in the following test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83197	2020-07-08 09:53:20 +01:00
David Sherwood	5b14f5051f	[CodeGen] Fix wrong use of getVectorNumElements in PromoteIntRes_EXTRACT_SUBVECTOR Calling getVectorNumElements() is not safe for scalable vectors and we should normally use getVectorElementCount() instead. However, for the code changed in this patch I decided to simply move the instantiation of the variable 'OutNumElems' lower down to the place where only fixed-width vectors are used, and hence it is safe to call getVectorNumElements(). Fixes up one warning in this test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83195	2020-07-08 09:36:34 +01:00
Heejin Ahn	7e6793aa33	[WebAssembly] Generate unreachable after __stack_chk_fail `__stack_chk_fail` does not return, but `unreachable` was not generated following `call __stack_chk_fail`. This had a possibility to generate an invalid binary for functions with a return type, because `__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can be the last instruction in the function whose return type is non-void. Generating `unreachable` after it makes sure CFGStackify's `fixEndsAtEndOfFunction` handles it correctly. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D83277	2020-07-08 01:02:05 -07:00
Philip Reames	22596e7b2f	[Statepoint] Use early return to reduce nesting and clarify comments [NFC]	2020-07-07 16:19:05 -07:00
Philip Reames	9955876d74	[Statepoint] Reduce intendation and change a variable name [NFC]	2020-07-07 16:19:05 -07:00
Philip Reames	b172cd7812	[Statepoint] Factor out logic for non-stack non-vreg lowering [almost NFC] This is inspired by D81648. The basic idea is to have the set of SDValues which are lowered as either constants or direct frame references explicit in one place, and to separate them clearly from the spilling logic. This is not NFC in that the handling of constants larger than > 64 bit has changed. The old lowering would crash on values which could not be encoded as a sign extended 64 bit value. The new lowering just spills all constants > 64 bits. We could be consistent about doing the sext(Con64) optimization, but I happen to know that this code path is utterly unexercised in practice, so simple is better for now.	2020-07-07 13:34:28 -07:00
Kerry McLaughlin	cdf2eef613	[SVE][CodeGen] Legalisation of unpredicated store instructions Summary: When splitting a store of a scalable type, the new address is calculated in SplitVecOp_STORE using a vscale and an add instruction. Reviewers: sdesmalen, efriedma, david-arm Reviewed By: david-arm Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83041	2020-07-07 11:47:10 +01:00
Kerry McLaughlin	5e8084beba	[SVE][CodeGen] Legalisation of unpredicated load instructions Summary: When splitting a load of a scalable type, the new address is calculated in SplitVecRes_LOAD using a vscale and an add instruction. This patch also adds a DAG combiner fold to visitADD for vscale: - Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1))) Reviewers: sdesmalen, efriedma, david-arm Reviewed By: david-arm Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82792	2020-07-07 11:05:03 +01:00
David Sherwood	c061e56e88	[CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll This patch fixes all remaining warnings in: llvm/test/CodeGen/AArch64/sve-trunc.ll llvm/test/CodeGen/AArch64/sve-vector-splat.ll I hit some warnings related to getCopyPartsToVector. I fixed two issues: 1. In widenVectorToPartType() we assumed that we'd always be using BUILD_VECTOR nodes to expand from one vector type to another, which is incorrect for scalable vector types. I've fixed this for now by simply bailing out immediately for scalable vectors. 2. In getCopyToPartsVector() I've changed the code to compare the element counts of different types. Differential Revision: https://reviews.llvm.org/D83028	2020-07-07 09:21:47 +01:00
Sanjay Patel	ea71ba11ab	[DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division X / (fabs(A) * sqrt(Z)) --> X / sqrt(AAZ) --> X * rsqrt(AAZ) In the motivating case from PR46406: https://bugs.llvm.org/show_bug.cgi?id=46406 ...this is restoring the sequence that was originally in the source code. We extracted a term from within the sqrt because we do not know in instcombine whether a target will expand a sqrt call. Note: we could say that the transform in IR should be restricted, but that would not solve the problem if the source was originally in the pattern shown here. This is a gray area for fast-math-flag requirements. I think we should at least check fast-math-flags on the fdiv and fmul because I view this transform as 2 pieces: reassociate the fmul operands and form reciprocal from the fdiv (as with the existing transform). We could argue that the sqrt also needs FMF, but that was not required before, so we should change that in a follow-up patch if that seems better. We don't currently have a way to check that the target will produce a sqrt or recip estimate without actually creating nodes (the APIs are SDValue getSqrtEstimate() and SDValue getRecipEstimate()), so we clean up speculatively created nodes if we are not able to create an estimate. The x86 test with doubles verifies that we are not changing a test with no estimate sequence. Differential Revision: https://reviews.llvm.org/D82716	2020-07-06 19:12:21 -04:00
Jay Foad	babbeafa00	[TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount Use a simpler code sequence when the shift amount is known not to be zero modulo the bit width. Nothing much uses this until D77152 changes the translation of fshl and fshr intrinsics. Differential Revision: https://reviews.llvm.org/D82540	2020-07-06 12:07:14 +01:00
Jay Foad	e7a4a24dc5	[TargetLowering] Improve expansion of ROTL/ROTR Using a negation instead of a subtraction from a constant can save an instruction on some targets. Nothing much uses this until D77152 changes the translation of fshl and fshr intrinsics. Differential Revision: https://reviews.llvm.org/D82539	2020-07-06 12:07:14 +01:00
Craig Topper	76123d338d	[DAGCombiner] visitSIGN_EXTEND_INREG should fold sext_vector_inreg(undef) to 0 not undef. We need to ensure that the sign bits of the result all match so we can't fold to undef. Similar to PR46585. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D83163	2020-07-04 14:35:49 -07:00
Craig Topper	120c5f1057	[DAGCombiner] Don't fold zext_vector_inreg/sext_vector_inreg(undef) to undef. Fold to 0. zext_vector_inreg needs to produces 0s in the extended bits and sext_vector_inreg needs to produce upper bits that are all the same. So we should fold them to a 0 vector instead of undef. Fixes PR46585.	2020-07-04 11:42:53 -07:00
Simon Pilgrim	56a8a5c9fe	[DAG] matchBinOpReduction - match subvector reduction patterns beyond a matched shufflevector reduction Currently matchBinOpReduction only handles shufflevector reduction patterns, but in many cases these only occur in the final stages of a reduction, once we're down to legal vector widths. Before this its likely that we are performing reductions using subvector extractions to repeatedly split the source vector in half and perform the binop on the halves. Assuming we've found a non-partial reduction, this patch continues looking for subvector reductions as far as it can beyond the last shufflevector. Fixes PR37890	2020-07-04 15:28:15 +01:00
Guillaume Chatelet	87e2751cf0	[Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D83082	2020-07-03 08:06:43 +00:00
Sanjay Patel	bc110de78a	[SelectionDAG] don't split branch on logic-of-vector-compares SelectionDAGBuilder converts logic-of-compares into multiple branches based on a boolean TLI setting in isJumpExpensive(). But that probably never considered the pattern of extracted bools from a vector compare - it seems unlikely that we would want to turn vector logic into control-flow. The motivating x86 reduction case is shown in PR44565: https://bugs.llvm.org/show_bug.cgi?id=44565 ...and that test shows the expected improvement from using pmovmsk codegen. For AArch64, I modified the test to include an extra op because the simpler test gets transformed by a codegen invocation of SimplifyCFG. Differential Revision: https://reviews.llvm.org/D82602	2020-07-02 17:05:24 -04:00
Sander de Smalen	143e324e75	[CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics, that shouldn't really have been there (I suspect this was a remnant from when we expected the wider vector always to have come from a vector CONCAT). When I tried to create a more minimal reproducer, I found a bug in DAGCombiner where it drops the scalable flag when trying to fold: extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index') This patch fixes both issues. Reviewers: david-arm, efriedma, spatel Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D82910	2020-07-02 10:16:43 +01:00
David Sherwood	c7df35d2b2	[CodeGen] Fix warnings in getCopyToPartsVector Whilst trying to assemble the following test: clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_set2.c I discovered we were hitting some warnings about possible invalid calls to getVectorNumElements() in getCopyToPartsVector(). I've tried to fix these by using ElementCount types where possible and I've made the assumption that we don't support using a fixed width vector to copy parts of a scalable vector, and vice versa. Looking at how the copy is implemented I think that's the right thing for now. Differential Revision: https://reviews.llvm.org/D82744	2020-07-02 09:08:20 +01:00
Craig Topper	51e92b223b	[X86] Speculatively apply the same fix from `361853c96f` to PromoteIntOp_MGATHER. The UpdateNodeOperands here is also subject to CSE.	2020-07-01 11:57:59 -07:00
Craig Topper	361853c96f	[LegalizeTypes] Properly handle the case when UpdateNodeOperands in PromoteIntOp_MLOAD triggers CSE instead of updating the node in place. The caller can't handle the node having multiple results like a masked load does. So we need to detect the case and do our own result replacement. Fixes PR46532.	2020-07-01 11:48:50 -07:00
David Sherwood	f11305780f	[CodeGen] Fix warnings in DAGCombiner::visitSCALAR_TO_VECTOR In visitSCALAR_TO_VECTOR we try to optimise cases such as: scalar_to_vector (extract_vector_elt %x) into vector shuffles of %x. However, it led to numerous warnings when %x is a scalable vector type, so for now I've changed the code to only perform the combination on fixed length vectors. Although we probably could change the code to work with scalable vectors in certain cases, without a proper profit analysis it doesn't seem worth it at the moment. This change fixes up one of the warnings in: llvm/test/CodeGen/AArch64/sve-merging-stores.ll I've also added a simplified version of the same test to: llvm/test/CodeGen/AArch64/sve-fp.ll which already has checks for no warnings. Differential Revision: https://reviews.llvm.org/D82872	2020-07-01 18:47:13 +01:00
James Y Knight	4b0aa5724f	Change the INLINEASM_BR MachineInstr to be a non-terminating instruction. Before this instruction supported output values, it fit fairly naturally as a terminator. However, being a terminator while also supporting outputs causes some trouble, as the physreg->vreg COPY operations cannot be in the same block. Modeling it as a non-terminator allows it to be handled the same way as invoke is handled already. Most of the changes here were created by auditing all the existing users of MachineBasicBlock::isEHPad() and MachineBasicBlock::hasEHPadSuccessor(), and adding calls to isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate. Reviewed By: nickdesaulniers, void Differential Revision: https://reviews.llvm.org/D79794	2020-07-01 12:51:50 -04:00
Guillaume Chatelet	ef36f5143d	[Alignment] TargetLowering::hasPairedLoad must use Align for RequiredAlignment As per documentation of `hasPairLoad`: "`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load." In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned. There is only one implementor of this interface. Differential Revision: https://reviews.llvm.org/D82958	2020-07-01 14:32:30 +00:00
Guillaume Chatelet	d3085c2501	[Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82956	2020-07-01 14:31:56 +00:00
Guillaume Chatelet	27bbc8ede1	[Alignment][NFC] Migrate TargetTransformInfo::CreateVariableSizedObject to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82939	2020-07-01 14:31:21 +00:00
David Sherwood	97a7a9abb2	[CodeGen] Fix up warnings in visitEXTRACT_SUBVECTOR It's perfectly valid to do certain DAG combines where we extract subvectors from a concat vector when we have scalable vector types. However, we can do this in a way that avoids generating compiler warnings by replacing calls to getVectorNumElements() with getVectorMinNumElements(). Due to the way subvector extracts are designed to work with scalable vector types this is ok. This eliminates some warnings from existing tests in this file: llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll Differential Revision: https://reviews.llvm.org/D82655	2020-07-01 15:10:53 +01:00
Guillaume Chatelet	28de229bc6	[Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82894	2020-07-01 07:28:11 +00:00
Guillaume Chatelet	c1cd61e02a	[Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemcpy to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82849	2020-06-30 13:12:31 +00:00
Guillaume Chatelet	306d7c6929	[Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemmove to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82850	2020-06-30 12:46:59 +00:00
Guillaume Chatelet	6a6af30d43	[Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemset to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82851	2020-06-30 12:46:26 +00:00
Guillaume Chatelet	5f8bdb3e6a	[Alignment][NFC] TargetLowering::allowsMemoryAccess Second patch of a series to adapt TargetLowering::allowsXXX functions This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82785	2020-06-30 08:17:00 +00:00
David Sherwood	c02332a693	[CodeGen] Fix warning in getNode for EXTRACT_SUBVECTOR Fix a warning in getNode() when extracting a subvector from a concat vector. We can simply replace the call to getVectorNumElements with getVectorMinNumElements as this follows the defined behaviour for EXTRACT_SUBVECTOR. Differential Revision: https://reviews.llvm.org/D82746	2020-06-30 08:11:41 +01:00
David Sherwood	46a7f4d6f4	[SVE][CodeGen] Fix bug in DAGCombiner::reduceBuildVecToShuffle When trying to reduce a BUILD_VECTOR to a SHUFFLE_VECTOR it's important that we carefully check the vector types that led to that BUILD_VECTOR. In the test I have attached to this commit there is a case where the results of two SVE faddv instructions are being stored to consecutive memory locations. With my fix, as part of merging those stores we discover that each BUILD_VECTOR element came from an extract of a SVE vector element and therefore bail out. Differential Revision: https://reviews.llvm.org/D82564	2020-06-30 07:28:15 +01:00
Simon Pilgrim	3521ecf1f8	[X86] Add vector support to targetShrinkDemandedConstant for OR/XOR opcodes If a constant is only allsignbits in the demanded/active bits, then sign extend it to an allsignbits bool pattern for OR/XOR ops. This also requires SimplifyDemandedBits XOR handling to be modified to call ShrinkDemandedConstant on any (non-NOT) XOR pattern to account for non-splat cases. Next step towards fixing PR45808 - with this patch we now get a <-1,-1,0,0> v4i64 constant instead of <1,1,0,0>. Differential Revision: https://reviews.llvm.org/D82257	2020-06-29 12:19:05 +01:00
Simon Pilgrim	973685fc78	[TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.	2020-06-29 11:46:58 +01:00
Guillaume Chatelet	3500d9ec95	Fix invalid alignment in DAGCombiner::isLegalNarrowLdSt `ShAmt / 8` can be a non power of two, this can lead to an invalid alignment. context: https://reviews.llvm.org/D41350#inline-749165 Differential Revision: https://reviews.llvm.org/D82565	2020-06-29 09:22:15 +00:00
Simon Pilgrim	6bdb3ce452	[DAG] reduceBuildVecExtToExtBuildVec - don't combine if it would break a splat. reduceBuildVecExtToExtBuildVec was breaking a splat(zext(x)) pattern into buildvector(x, 0, x, 0, ..) resulting in much more complex insert+shuffle codegen. We already go to some lengths to avoid this in SimplifyDemandedVectorElts etc. when we encounter splat buildvectors. It should be OK to fold all splat(aext(x)) patterns - we might need to tighten this if we find a case where we mustn't introduce a buildvector(x, undef, x, undef, ..) but I can't find one. Fixes PR46461.	2020-06-27 11:03:57 +01:00
Sanjay Patel	e7f7715eb9	[DAGCombiner] rename variables for readability; NFC PR46406 shows a pattern where we can do better, so try to clean this up before adding more code.	2020-06-26 14:22:11 -04:00
Sjoerd Meijer	243a5329d4	[SelectionDAG] Lower @llvm.get.active.lane.mask to setcc This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp ule, and creates vectors for its 2 arguments on which the comparison is performed. Differential Revision: https://reviews.llvm.org/D82292	2020-06-26 07:46:38 +01:00
Eli Friedman	e9d4e34ab8	[AArch64][SVE] Add legalization support for i32/i64 vector srem/urem Implement them on top of sdiv/udiv, similar to what we do for integer types. Potential future work: implementing i8/i16 srem/urem, optimizations for constant divisors, optimizing the mul+sub to mls. Differential Revision: https://reviews.llvm.org/D81511	2020-06-23 16:27:52 -07:00
Kerry McLaughlin	5080503174	[SVE][CodeGen] Legalisation of vsetcc with scalable types Summary: Changes SplitVecOp_VSETCC to use getVectorElementCount() Reviewers: sdesmalen, efriedma, dancgr Reviewed By: efriedma Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79167	2020-06-23 11:56:29 +01:00
Simon Pilgrim	bcc0dc3832	[DAG] visitSIGN_EXTEND_INREG - rename EVT variable. NFCI. We had a EVT type variable called EVT, which isn't a good idea....	2020-06-23 10:45:27 +01:00
Paul Walker	499c63288f	[SVE] Code generation for fixed length vector loads & stores. Summary: This patch adds base support for code generating fixed length vector operations targeting a known SVE vector length. To achieve this we lower fixed length vector operations to equivalent scalable vector operations, whereby SVE predication is used to limit the elements processed to those present within the fixed length vector. Specifically this patch implements load and store operations, which get lowered to their masked counterparts thusly: V = load(Addr) => V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr)) store(V, (Addr)) => masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr)) Reviewers: rengolin, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80385	2020-06-23 09:39:03 +00:00
Simon Pilgrim	0acd22b8fb	StatepointLowering.cpp - fix implicit CommandLine.h dependency. NFC. StatepointLowering defines a cl::opt but don't include CommandLine.h.	2020-06-23 09:43:39 +01:00
Michael Liao	b1360caa82	[SDAG] Add new AssertAlign ISD node. Summary: - AssertAlign node records the guaranteed alignment on its source node, where these alignments are retrieved from alignment attributes in LLVM IR. These tracked alignments could help DAG combining and lowering generating efficient code. - In this patch, the basic support of AssertAlign node is added. So far, we only generate AssertAlign nodes on return values from intrinsic calls. - Addressing selection in AMDGPU is revised accordingly to capture the new (base + offset) patterns. Reviewers: arsenm, bogner Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81711	2020-06-23 00:51:11 -04:00
Simon Pilgrim	48d1a2d6d0	[DAG] Add SimplifyMultipleUseDemandedVectorElts helper for SimplifyMultipleUseDemandedBits. NFCI. We have many cases where we call SimplifyMultipleUseDemandedBits and demand specific vector elements, but all the bits from them - this adds a helper wrapper to handle this.	2020-06-22 14:24:39 +01:00
Simon Pilgrim	ecc5d7ee0d	[DAG] SimplifyMultipleUseDemandedBits - drop unnecessary *_EXTEND_VECTOR_INREG cases For little endian targets, if we only need the lowest element and none of the extended bits then we can just use the (bitcasted) source vector directly. We already do this in SimplifyDemandedBits, this adds the SimplifyMultipleUseDemandedBits equivalent.	2020-06-22 12:35:32 +01:00
David Sherwood	7edc7f6edb	[CodeGen] Fix SimplifyDemandedBits for scalable vectors For now I have changed SimplifyDemandedBits and it's various callers to assume we know nothing for scalable vectors and to ignore the demanded bits completely. I have also done something similar for SimplifyDemandedVectorElts. These changes fix up lots of warnings due to calls to EVT::getVectorNumElements() for types with scalable vectors. These functions are all used for optimisations, rather than functional requirements. In future we can revisit this code if there is a need to improve code quality for SVE. Differential Revision: https://reviews.llvm.org/D80537	2020-06-19 07:59:35 +01:00
David Sherwood	9e811b0d93	[CodeGen] Fix ComputeNumSignBits for scalable vectors When trying to calculate the number of sign bits for scalable vectors we should just bail out for now and pretend we know nothing. Differential Revision: https://reviews.llvm.org/D81093	2020-06-19 07:58:42 +01:00
Simon Pilgrim	2474421398	[TargetLowering] SimplifyMultipleUseDemandedBits - drop already extended ISD::SIGN_EXTEND_INREG nodes. If the source of the SIGN_EXTEND_INREG node is already sign extended, use the source directly.	2020-06-18 16:41:08 +01:00
Lucas Prates	a255931c40	[ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend Summary: Half-precision floating point arguments and returns are currently promoted to either float or int32 in clang's CodeGen and there's no existing support for the lowering of `half` arguments and returns from IR in AArch32's backend. Such frontend coercions, implemented as coercion through memory in clang, can cause a series of issues in argument lowering, as causing arguments to be stored on the wrong bits on big-endian architectures and incurring in missing overflow detections in the return of certain functions. This patch introduces the handling of half-precision arguments and returns in the backend using the actual "half" type on the IR. Using the "half" type the backend is able to properly enforce the AAPCS' directions for those arguments, making sure they are stored on the proper bits of the registers and performing the necessary floating point convertions. Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer Reviewed By: ostannard Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75169	2020-06-18 13:15:13 +01:00
David Sherwood	65912a9768	[CodeGen] Fix warnings in foldCONCAT_VECTORS Instead of asserting the number of elements is the same, we should be comparing the element counts instead. In addition, when looking at concats of extract_subvectors it's fine to use getVectorMinNumElements() for scalable vectors. I discovered these warnings when compiling the structured loads tests in this file: test/CodeGen/AArch64/sve-intrinsics-loads.ll Differential Revision: https://reviews.llvm.org/D81936	2020-06-18 09:29:37 +01:00
Aaron Smith	7e01675ea5	[SelectionDAG] Add MVT::bf16 to getConstantFP() Summary: This was probably overlooked in recent bfloat patches. Needed to handle bf16 constants in SelectionDAG. ConstantFP:bf16<APFloat(0)> Reviewers: stuij Reviewed By: stuij Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81779	2020-06-16 15:10:05 -07:00
Qiu Chaofan	f8ef7c99a0	[DAGCombiner] Require ninf for division estimation Current implementation of division estimation isn't correct for some cases like 1.0/0.0 (result is nan, not expected inf). And this change exposes a potential infinite loop: we use isConstOrConstSplatFP in combineRepeatedFPDivisors to look up if the divisor is some constant. But it doesn't work after legalized on some platforms. This patch restricts the method to act before LegalDAG. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D80542	2020-06-14 22:58:22 +08:00
Amanieu d'Antras	6973125cb7	Fix FastISel dropping srcloc metadata from InlineAsm Summary: Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46060 I've also added the Extra_IsConvergent flag which was missing from FastISel. Reviewers: echristo Reviewed By: echristo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80759	2020-06-13 16:52:37 +01:00
Michael Liao	e7b920e6fe	[DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2)) Reviewers: arsenm Subscribers: sdardis, wdng, hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, ecnelises, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81708	2020-06-12 13:53:08 -04:00
Simon Pilgrim	5509e2cc2e	[DAG] foldAddSubOfSignBit - add support for non-uniform vector constants	2020-06-12 14:58:15 +01:00
David Sherwood	bd97342a0c	[CodeGen] Let computeKnownBits do something sensible for scalable vectors Until we have a real need for computing known bits for scalable vectors I have simply changed the code to bail out for now and pretend we know nothing. I've also fixed up some simple callers of computeKnownBits too. Differential Revision: https://reviews.llvm.org/D80437	2020-06-11 08:17:11 +01:00
Sanjay Patel	702cf93356	[DAGCombiner] allow more folding of fadd + fmul into fma If fmul and fadd are separated by an fma, we can fold them together to save an instruction: fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1)) The fold implemented here is actually a specialization - we should be able to peek through >1 fma to find this pattern. That's another patch if we want to try that enhancement though. This transform was guarded by the TLI hook enableAggressiveFMAFusion(), so it was done for some in-tree targets like PowerPC, but not AArch64 or x86. The hook is protecting against forming a potentially more expensive computation when fma takes longer to execute than a single fadd. That hook may be needed for other transforms, but in this case, we are replacing fmul+fadd with fma, and the fma should never take longer than the 2 individual instructions. 'contract' FMF is all we need to allow this transform. That flag corresponds to -ffp-contract=fast in Clang, so we are allowed to form fma ops freely across expressions. Differential Revision: https://reviews.llvm.org/D80801	2020-06-09 10:41:27 -04:00
Guillaume Chatelet	800e100588	Revert "[Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess" This reverts commit `f21c52667e`.	2020-06-09 10:43:59 +00:00
Simon Wallis	4dba59689d	[ARM] prologue instructions emitted for naked function with >64 byte argument Summary: The naked function attribute is meant to suppress all function prologue/epilogue instructions. On ARM, some are still emitted if an argument greater than 64 bytes in size (the threshold for using the byval attribute in IR) is passed partially in registers. Perform the check for Attribute::Naked and early exit in SelectionDAGISel::LowerArguments(). Checking in ARMFrameLowering::determineCalleeSaves() is too late. A test case is included. Reviewers: llvm-commits, olista01, danielkiss Reviewed By: danielkiss Subscribers: kristof.beyls, hiraditya, danielkiss Tags: #llvm Differential Revision: https://reviews.llvm.org/D80715 Change-Id: Icedecf2a4ad31bc3c35ab0df7489a9d346e1f7cc	2020-06-09 11:33:03 +01:00
Guillaume Chatelet	f21c52667e	[Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess Summary: Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMemoryAccess` without marking it override. This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81379	2020-06-09 10:11:07 +00:00
Guillaume Chatelet	e26ed6bdae	Fix unused variable warning	2020-06-09 08:56:05 +00:00
David Sherwood	cc8872400c	[CodeGen] Ensure callers of CreateStackTemporary use sensible alignments In two instances of CreateStackTemporary we are sometimes promoting alignments beyond the stack alignment. I have introduced a new function called getReducedAlign that will return the alignment for the broken down parts of illegal vector types. For example, on NEON a <32 x i8> type is made up of two <16 x i8> types - in this case the sensible alignment is 16 bytes, not 32. In the legalization code wherever we create stack temporaries I have started using the reduced alignments instead for illegal vector types. I added a test to CodeGen/AArch64/build-one-lane.ll that tries to insert an element into an illegal fixed vector type that involves creating a temporary stack object. Differential Revision: https://reviews.llvm.org/D80370	2020-06-09 08:10:17 +01:00
Christopher Tetreault	caa2fddce7	[SVE] Eliminate calls to default-false VectorType::get() from CodeGen Reviewers: efriedma, c-rhodes, david-arm, spatel, craig.topper, aqjune, paquette, arsenm, gchatelet Reviewed By: spatel, gchatelet Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80313	2020-06-08 10:26:10 -07:00
Sanjay Patel	302cc8a121	[DAGCombiner] clean-up FMA+FMUL folds; NFC D80801 suggests some readability improvements before mocing this block.	2020-06-06 10:32:54 -04:00
Matt Arsenault	45e1a22a92	GlobalISel: Make known bits/alignment API more consistent Just computing the alignment makes sense without caring about the general known bits, such as for non-integral pointers. Separate the two and start calling into the TargetLowering hooks for frame indexes. Start calling the TargetLowering implementation for FrameIndexes, which improves the AMDGPU matching for stack addressing modes. Also introduce a new hook for returning known alignment of target instructions. For AMDGPU, it would be useful to report the known alignment implied by certain intrinsic calls. Also stop using MaybeAlign.	2020-06-05 14:57:22 -04:00
Sander de Smalen	937cb7a8c7	Reland D80640: [CodeGen][SVE] Calculate correct type legalization for scalable vectors. This reverts commit `9bcef270d7`.	2020-06-05 18:09:31 +01:00
Sander de Smalen	9bcef270d7	Revert "[CodeGen][SVE] Calculate correct type legalization for scalable vectors." Seems to break some buildbots, reverting the patch for now. This reverts commit `164f4b9d26`.	2020-06-05 16:03:52 +01:00
Sander de Smalen	164f4b9d26	[CodeGen][SVE] Calculate correct type legalization for scalable vectors. This patch updates TargetLoweringBase::computeRegisterProperties and TargetLoweringBase::getTypeConversion to support scalable vectors, and make the right calls on how to legalise them. These changes are required to legalise both MVTs and EVTs. Reviewers: efriedma, david-arm, ctetreau Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D80640	2020-06-05 15:20:34 +01:00
Kerry McLaughlin	89fc0166f5	[CodeGen][SVE] Legalisation of extends with scalable types Summary: This patch adds legalisation of extensions where the operand of the extend is a legal scalable type but the result is not. EXTRACT_SUBVECTOR is used to split the result, before being replaced by target-specific [S\|U]UNPK[HI\|LO] operations. For example: ``` zext <vscale x 16 x i8> %a to <vscale x 16 x i16> ``` should emit: ``` uunpklo z2.h, z0.b uunpkhi z1.h, z0.b ``` Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79587	2020-06-05 12:08:42 +01:00
Philip Reames	4c735439fd	[Statepoint] Migrate a few tests to gc-live bundle format and fix assert The assert was missed in `0e7c7705`, migrating the test revealed the problem.	2020-06-04 18:15:58 -07:00
Matt Arsenault	af867b7850	DAG: Change computeKnownBitsForFrameIndex to be usable by GISel This wasn't getting much value from the DAG or depth arguments, since it's only called on the frame index root nodes. FrameIndexes can also only return a scalar value, so it also didn't need DemandedElts.	2020-06-04 10:50:26 -04:00
Sanjay Patel	652b3757c8	[x86] add test/code comment for chain value use (PR46195); NFC	2020-06-04 09:15:17 -04:00
Simon Pilgrim	adf10dcf2e	[DAG] scalarizeBinOpOfSplats - extract from the source of splat vector (PR46189) D79003/rG9fa58d1bf2f8 exposed an issue with scalarizeBinOpOfSplats that we were extracting from the splatted vector result instead of the source, the splat index is only valid for the source vector not the result, which may contain undefs, including at the splat index.	2020-06-04 11:58:59 +01:00

1 2 3 4 5 ...

11007 Commits