llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	f925fd3304	[DAGCombiner] give magic number a name in getStoreMergeCandidates; NFC	2020-08-17 15:37:55 -04:00
Sanjay Patel	046b4a550a	[DAGCombiner] reduce code duplication in getStoreMergeCandidates; NFC	2020-08-17 15:37:55 -04:00
Sanjay Patel	20c85fd1ab	[DAGCombiner] simplify bool return in getStoreMergeCandidates; NFC	2020-08-17 15:37:55 -04:00
Sanjay Patel	52cd8f1ecb	[DAGCombiner] clean up getStoreMergeCandidates(); NFC 1. Move bailouts and local var declarations. 2. Convert if-chain to switch on StoreSource with unreachable default.	2020-08-17 15:37:54 -04:00
Sanjay Patel	27708db3e3	[DAGCombiner] convert StoreSource if-chain to switch; NFC The "isa" checks were less constrained because they allow target constants, but the later matching code would bail out on those anyway, so this should be slightly more efficient.	2020-08-17 15:37:54 -04:00
Matt Arsenault	5b53b17cd3	DAG: Add missing comment for transform	2020-08-17 10:01:12 -04:00
Matt Arsenault	c7191e3185	DAG: Don't pass 0 alignment value to allowsMisalignedMemoryAccesses I think not unconditionally passing getDstAlign is broken, but leave that for another change.	2020-08-13 09:33:17 -04:00
Kerry McLaughlin	30af595f05	[SVE][CodeGen] Legalisation of EXTRACT_VECTOR_ELT for scalable vectors This patch changes SplitVecOp_EXTRACT_VECTOR_ELT to work correctly for scalable vectors and also fixes an a bug in DAGCombiner where the scalable property is dropped in visitTRUNCATE when attempting to fold an extract + a truncate. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85754	2020-08-13 12:32:59 +01:00
David Sherwood	6af1677161	[SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorStores In DAGTypeLegalizer::GenWidenVectorStores the algorithm assumes it only ever deals with fixed width types, hence the offsets for each individual store never take 'vscale' into account. I've changed the main loop in that function to use TypeSize instead of unsigned for tracking the remaining store amount and offset increment. In addition, I've changed the loop to use the new IncrementPointer helper function for updating the addresses in each iteration, since this handles scalable vector types. Whilst fixing this function I also fixed a minor issue in IncrementPointer whereby we were not adding the no-unsigned-wrap flag for the add instruction in the same way as the fixed width case does. Also, I've added a report_fatal_error in GenWidenVectorTruncStores, since this code currently uses a sequence of element-by-element scalar stores. I've added new tests in CodeGen/AArch64/sve-intrinsics-stores.ll CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll for the changes in GenWidenVectorStores. Differential Revision: https://reviews.llvm.org/D84937	2020-08-13 11:07:17 +01:00
David Sherwood	3ec3fcb97a	[CodeGen] In narrowExtractedVectorLoad bail out for scalable vectors In narrowExtractedVectorLoad there is an optimisation that tries to combine extract_subvector with a narrowing vector load. At the moment this produces warnings due to the incorrect calls to getVectorNumElements() for scalable vector types. I've got this working for scalable vectors too when the extract subvector index is a multiple of the minimum number of elements. I have added a new variant of the function: MachineFunction::getMachineMemOperand that copies an existing MachineMemOperand, but replaces the pointer info with a null version since we cannot currently represent scaled offsets. I've added a new test for this particular case in: CodeGen/AArch64/sve-extract-subvector.ll Differential Revision: https://reviews.llvm.org/D83950	2020-08-13 10:46:18 +01:00
David Sherwood	88bbd30736	[SVE][CodeGen] Fix issues with EXTRACT_SUBVECTOR when using scalable FP vectors In this patch I have fixed two issues: 1. Our SVE tuple get/set intrinsics were using the wrong constant type for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the function SelectionDAG::getVectorIdxConstant to create the value. Also, I have updated the documentation for EXTRACT_SUBVECTOR describing what type the constant index should be and we now enforce this when creating the node. 2. The AArch64 backend was missing the appropriate patterns for extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types. I have added them as part of this patch. The only way that I could find to test the new patterns was to use the SVE tuple get intrinsics, although I realise it looks a bit unusual. Tests added here: test/CodeGen/AArch64/sve-extract-subvector.ll Differential Revision: https://reviews.llvm.org/D85516	2020-08-12 08:35:46 +01:00
Kerry McLaughlin	455ed56d48	[SVE][CodeGen] Legalisation of INSERT_VECTOR_ELT for scalable vectors When the result type of insertelement needs to be split, SplitVecRes_INSERT_VECTOR_ELT will try to store the vector to a stack temporary, store the element at the location of the stack temporary plus the index, and reload the Lo/Hi parts. This patch does the following to ensure this works for scalable vectors: - Sets the StackID with getStackIDForScalableVectors() in CreateStackTemporary - Adds an IsScalable flag to getMemBasePlusOffset() and scales the offset by VScale when this is true - Ensures the immediate is clamped correctly by clampDynamicVectorIndex so that we don't try to use an out of range index Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D84874	2020-08-11 12:57:28 +01:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
QingShan Zhang	61ede38da0	[CodeGen] Expand float operand for STRICT_FSETCC/STRICT_FSETCCS This patch is the continue work of https://reviews.llvm.org/D69281 to implement the way that expands STRICT_FSETCC/STRICT_FSETCCS. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D81906	2020-08-11 05:55:00 +00:00
Craig Topper	fdfdee98ac	[DAGCombiner] Teach SimplifySetCC SETUGE X, SINTMIN -> SETLT X, 0 and SETULE X, SINTMAX -> SETGT X, -1. These aren't the canonical forms we'd get from InstCombine, but we do have X86 tests for them. Recognizing them is pretty cheap. While there make use of APInt:isSignedMinValue/isSignedMaxValue instead of creating a new APInt to compare with. Also use SelectionDAG::getAllOnesConstant helper to hide the all ones APInt creation.	2020-08-08 22:27:16 -07:00
Sanjay Patel	f22ac1d15b	[DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division, part 2 Follow-up to D82716 / rGea71ba11ab11 We do not have the fabs removal fold in IR yet for the case where the sqrt operand is repeated, so that's another potential improvement.	2020-08-08 10:38:06 -04:00
Bevin Hansson	5de6c56f7e	[Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. Summary: This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat, which perform signed and unsigned saturating left shift, respectively. These are useful for implementing the Embedded-C fixed point support in Clang, originally discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html and http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html Reviewers: leonardchan, craig.topper, bjope, jdoerfert Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83216	2020-08-07 15:09:24 +02:00
Simon Pilgrim	66a163f328	[DAG] GetDemandedBits - remove custom AND handling. As mentioned on D85463, we should be using SimplifyMultipleUseDemandedBits (which is the default fallback). The minor regression in illegal-bitfield-loadstore.ll will be addressed properly by D77804.	2020-08-07 12:55:47 +01:00
Simon Pilgrim	fcefb53222	Remove unreachable break. NFC	2020-08-07 12:37:49 +01:00
Craig Topper	ffc248f3b8	[LegalTypes] Move VSELECT node creation out of WidenVSELECTAndMask and push to 2 of the 3 callers. One of the callers only wants the condition, but the vselect can be simplified by getNode making it hard or impossible to retrieve the condition. Instead, return the condition and make the other 2 callers responsible for creating the vselect node using the condition. Rename the function to WidenVSELECTMask accordingly. Differential Revision: https://reviews.llvm.org/D85468	2020-08-06 13:18:16 -07:00
Paul Walker	0d33a8ef5b	[SVE] Lower scalable vector mul operations. This allows us to remove extra patterns from AArch64SVEInstrInfo.td because we can reuse those required for fixed length vectors. Differential Revision: https://reviews.llvm.org/D85328	2020-08-06 11:15:35 +01:00
Simon Pilgrim	4aaf301fb8	[DAG] Fold vector (aext (load x)) -> (zext (truncate (zextload x))) We currently don't do anything to fold any_extend vector loads as no target has such an instruction. Instead I've added support for folding to a zextload, SimplifyDemandedBits does a good job of adjusting the zext(truncate(()) stages as required later on. We still need the custom scalar extload handling instead of using the tryToFoldExtOfLoad helper as it has different legality tests - we can probably tweak that to reduce most of the code duplication. Fixes the regression I mentioned in rG99a971cadff7 Differential Revision: https://reviews.llvm.org/D85129	2020-08-05 11:22:23 +01:00
Eli Friedman	4a47f1c4ce	[SelectionDAG][SVE] Support scalable vectors in getConstantFP() Differential Revision: https://reviews.llvm.org/D85249	2020-08-04 15:32:43 -07:00
Cameron McInally	0f2b47b6da	[FastISel] Don't transform FSUB(-0, X) -> FNEG(X) in FastISel This corresponds with the SelectionDAGISel change in D84056. Also, rename some poorly named tests in CodeGen/X86/fast-isel-fneg.ll with NFC. Differential Revision: https://reviews.llvm.org/D85149	2020-08-04 14:42:53 -05:00
Jay Foad	28e322ea93	[PowerPC] Custom lowering for funnel shifts The custom lowering saves an instruction over the generic expansion, by taking advantage of the fact that PowerPC shift instructions are well defined in the shift-by-bitwidth case. Differential Revision: https://reviews.llvm.org/D83948	2020-08-04 16:30:49 +01:00
Cameron McInally	31c7a2fd5c	[FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder. This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D84056	2020-08-03 10:22:25 -05:00
Simon Pilgrim	b8ffbf0e02	[DAG] TargetLowering::expandMUL_LOHI - pass SDLoc as const& Try to be more consistent with the SDLoc param in the TargetLowering methods. This also exposes an issue where we were passing a SDNode as a SDLoc, relying on the implicit SDLoc(SDNode) constructor.	2020-08-02 15:31:36 +01:00
Simon Pilgrim	d14a22da5e	[DAG] TargetLowering::LowerAsmOutputForConstraint - pass SDLoc as const& Try to be more consistent with the SDLoc param in the TargetLowering methods.	2020-08-02 15:12:02 +01:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Eli Friedman	7e88efa7c5	[LegalizeTypes][SVE] Support widen/split legalization for SPLAT_VECTOR Just the obvious implementation that rewrites the result type. Also fix warning from EXTRACT_SUBVECTOR legalization that triggers on the test. Differential Revision: https://reviews.llvm.org/D84706	2020-07-30 16:17:45 -07:00
Jon Roelofs	afae6d97fa	[SelectionDAG] Fix lowering of vector geps This fixes an assertion failure that was being triggered in SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32> to i64 (which should have been <2xi64>). Fixes: rdar://66016901 Differential Revision: https://reviews.llvm.org/D84884	2020-07-30 14:56:53 -06:00
Sam Tebbs	276ed5f7e4	[DAGCombiner] Fold sext_inreg of a masked load into a sign extended masked load This patch adds a DAG combine fold for a sext(masked_load) into a sign extended masked load. Differential Revision: https://reviews.llvm.org/D84332	2020-07-30 10:34:02 +01:00
Philip Reames	755f91f12c	[Statepoint] Enable cross block relocates w/vreg lowering This change is mechanical, it just removes the restriction and updates tests. The key building blocks were submitted in `31342eb` and `8fe2abc`. Note that this (and preceeding changes) entirely subsumes D83965. I did includes a couple of it's tests. From the codegen changes, an interesting observation: this doesn't actual reduce spilling, it just let's the register allocator do it's job. That results in a slightly different overall result which has both pros and cons over the eager spill lowering. (i.e. We'll have some perf tuning to do once this is stable.)	2020-07-29 13:32:51 -07:00
Philip Reames	8fe2abc190	[Statepoint] Consolidate relocation type tracking [NFC] Change the way we track how a particular pointer was relocated at a statepoint in selection dag. Previously, we used an optional<location> for the spill lowering, and a block local Register for the newly introduced vreg lowering. Combine all three lowerings (norelocate, spill, and vreg) into a single helper class, and keep a single copy of the information. This is submitted separately as it really does make the code more readible on it's own, but the indirect motivation is to move vreg tracking from StatepointLowering to FunctionLoweringInfo. This is the last piece needed to support cross block relocations with vregs; that will follow in a separate (non-NFC) patch.	2020-07-29 11:45:31 -07:00
Simon Pilgrim	fdc902774e	[DAG][AMDGPU][X86] Add SimplifyMultipleUseDemandedBits handling for SIGN/ZERO_EXTEND + SIGN/ZERO_EXTEND_VECTOR_INREG Peek through multiple use ops like we already do for ANY_EXTEND/ANY_EXTEND_VECTOR_INREG Differential Revision: https://reviews.llvm.org/D84863	2020-07-29 18:10:59 +01:00
Philip Reames	31342eb63e	[Statepoint] When using the tied def lowering, unconditionally use vregs [almost NFC] This builds on `3da1a96` on the path towards supporting invokes and cross block relocations. The actual change attempts to be NFC, but does fail in one corner-case explained below. The change itself is fairly mechanical. Rather than remember SDValues - which are inherently block local - immediately produce a virtual register copy and remember that. Once this lands, we'll update the FunctionLoweringInfo::StatepointSpillMap map to allow register based lowerings, delete VirtRegs from StatepointLowering, and drop the restriction against cross block relocations. I deliberately separate the semantic part into it's own change for easy of understanding and fault isolation. The corner-case which isn't quite NFC is that the old implementation implicitly CSEd gc.relocates of the same SDValue regardless of type. The new implementation still only relocates once, but it produces distinct vregs for the bitcast and it's source, whereas SelectionDAG's generic CSE was able to remove the bitcast in the old implementation. Note that the final assembly doesn't change (at least in the test), as our MI level optimizations catch the duplication. I assert that this is an uninteresting corner-case. It's functionally correct, and if we find a case where this influences performance, we should really be canonicalizing types to i8* at the IR level. Differential Revision: https://reviews.llvm.org/D84692	2020-07-29 09:23:52 -07:00
David Sherwood	2078771759	[SVE][CodeGen] Add simple integer add tests for SVE tuple types I have added tests to: CodeGen/AArch64/sve-intrinsics-int-arith.ll for doing simple integer add operations on tuple types. Since these tests introduced new warnings due to incorrect use of getVectorNumElements() I have also fixed up these warnings in the same patch. These fixes are: 1. In narrowExtractedVectorBinOp I have changed the code to bail out early for scalable vector types, since we've not yet hit a case that proves the optimisations are profitable for scalable vectors. 2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced calls to getVectorNumElements with getVectorMinNumElements in cases that work with scalable vectors. For the other cases I have added asserts that the vector is not scalable because we should not be using shuffle vectors and build vectors in such cases. Differential revision: https://reviews.llvm.org/D84016	2020-07-29 13:32:10 +01:00
David Sherwood	5d84eafc6b	[CodeGen] Remove calls to getVectorNumElements in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR In DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR I have replaced calls to getVectorNumElements with getVectorMinNumElements, since this code path works for both fixed and scalable vector types. For scalable vectors the index will be multiplied by VSCALE. Fixes warnings in this test: sve-sext-zext.ll Differential revision: https://reviews.llvm.org/D83198	2020-07-29 13:05:39 +01:00
Simon Pilgrim	b4b6e77454	[DAG] isSplatValue - add support for TRUNCATE/SIGN_EXTEND/ZERO_EXTEND These are just pass-throughs to the source operand - we can't assume that ANY_EXTEND(splat) will still be a splat though.	2020-07-28 19:56:11 +01:00
Changpeng Fang	9162b70e51	DADCombiner: Don't simplify the token factor if the node's number of operands already exceeds TokenFactorInlineLimit Summary: In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000. We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose to give up simplification with the consideration of compile time. Reviewers: @spatel, @arsenm Differential Revision: https://reviews.llvm.org/D84204	2020-07-25 21:20:59 -07:00
Eric Christopher	18975762c1	Fold StatepointBB into checks as it's only used from an NDEBUG or ASSERT context fixing an unused variable warning.	2020-07-25 18:36:53 -07:00
Philip Reames	55dae9c20c	[Statepoints] Style cleanup after `3da1a963` [NFC] Just fixing a few minor stylistic issues.	2020-07-25 16:40:39 -07:00
Philip Reames	3da1a9634e	[Statepoints] Support lowering gc relocations to virtual registers (Disabled under flag for the moment) This is part of a larger project wherein we are finally integrating lowering of gc live operands with the register allocator. Today, we force spill all operands in SelectionDAG. The code to do so is distinctly non-optimal. The approach this patch is working towards is to instead lower the relocations directly into the MI form, and let the register allocator pick which ones get spilled and which stack slots they get spilled to. In terms of performance, the later part is actually more important as it avoids redundant shuffling of values between stack slots. This particular change adds ISEL support to produce the variadic def STATEPOINT form required by the above. In particular, the first N are lowered to variadic tied def/use pairs. So new statepoint looks like this: reloc1,reloc2,... = STATEPOINT ..., base1, derived1<tied-def0>, base2, derived2<tied-def1>, ... N is limited by the maximal number of tied registers machine instruction can have (15 at the moment). The current patch is restricted to handling relocations within a single basic block. Cross block relocations (e.g. invokes) are handled via the legacy mechanism. This restriction will be relaxed in future patches. Patch By: dantrushin Differential Revision: https://reviews.llvm.org/D81648	2020-07-25 14:26:05 -07:00
Craig Topper	8131e19064	[LegalizeTypes] Teach DAGTypeLegalizer::GenWidenVectorLoads to pad with undef if needed when concatenating small or loads to match a larger load In the included test case the align 16 allowed the v23f32 load to handled as load v16f32, load v4f32, and load v4f32(one element not used). These loads all need to be concatenated together into a final vector. In this case we tried to concatenate the two v4f32 loads to match the type of the v16f32 load so we could do a second concat_vectors, but those loads alone only add up to v8f32. So we need to two v4f32 undefs to pad it. It appears we've tried to hack around a similar issue in this code before by adding undef padding to loads in one of the earlier loops in this function. Originally in r147964 by padding all loads narrower than previous loads to the same size. Later modifed to only the last load in r293088. This patch removes that earlier code and just handles it on demand where we know we need it. Fixes PR46820 Differential Revision: https://reviews.llvm.org/D84463	2020-07-23 19:02:03 -07:00
Nikita Popov	deb4bb2b3a	[IR] Add min/max/abs intrinsics This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(), and llvm.smax() intrinsics specified in D81829. For SelectionDAG, the ISD opcodes and all the legalization and lowering already exist, so this just wires them up to the intrinsic in the SDAG builder and adds rudimentary tests. For GlobalISel only the min/max intrinsics are wired up, as llvm.abs() will require the addition of a G_ABS op, and corresponding legalization support. Differential Revision: https://reviews.llvm.org/D84125	2020-07-23 20:56:19 +02:00
Florian Hahn	6c9da995fc	[ScheduleDAGRRList] Pacify overload mismatch in std::min. On systems where size() doesn't return unsigned long, this leads to an overloading mismatch. Convert the constant to whatever type is used for Q.size() on the system.	2020-07-23 11:56:50 +01:00
Florian Hahn	2f8e6b5f3c	[ScheduleDAGRRList] Limit number of candidates to explore. Currently popFromQueueImpl iterates over all candidates to find the best one. While the candidate queue is small, this is not a problem. But it becomes a problem once the queue gets larger. For example, the snippet below takes 330s to compile with llc -O0, but completes in 3s with this patch. define void @test(i4000000* %ptr) { entry: store i4000000 0, i4000000* %ptr, align 4 ret void } This patch limits the number of candidates to check to 1000. This limit ensures that it never triggers for test-suite/SPEC2000/SPEC2006 on X86 and AArch64 with -O3, while still drastically limiting the compile-time in case of very large queues. It would be even better to use a binary heap to manage to queue (D83335), but some heuristics change the score of a node in the queue after another node has been scheduled. I plan to address this for backends that use the MachineScheduler in the future, but that requires a more careful evaluation. In the meantime, the limit should help users impacted by this issue. The patch includes a slightly smaller version of the motivating example as test case, to guard against the issue. Reviewers: efriedma, paquette, niravd Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84328	2020-07-23 11:35:33 +01:00
Simon Pilgrim	fa95688237	SelectionDAGBuilder.cpp - remove duplicate includes that already exist in SelectionDAGBuilder.h. NFC.	2020-07-22 14:19:41 +01:00
Matt Arsenault	f659c44016	CodeGen: Add support for lowering byref attribute	2020-07-21 17:38:15 -04:00
Matt Arsenault	2fe0ea8261	DAG: Handle expanding strict_fsub into fneg and strict_fadd The AMDGPU handling of f16 vectors is terrible still since it gets scalarized even when the vector operation is legal. The code is is essentially duplicated between the non-strict and strict case. Apparently no other expansions are currently trying to do this. This is mostly because I found the behavior of getStrictFPOperationAction to be confusing. In the ARM case, it would expand strict_fsub even though it shouldn't due to the later check. At that point, the logic required to check for legality was more complex than just duplicating the 2 instruction expansion.	2020-07-21 16:17:10 -04:00
Eli Friedman	b8f765a1e1	[AArch64][SVE] Add support for trunc to <vscale x N x i1>. This isn't a natively supported operation, so convert it to a mask+compare. In addition to the operation itself, fix up some surrounding stuff to make the testcase work: we need concat_vectors on i1 vectors, we need legalization of i1 vector truncates, and we need to fix up all the relevant uses of getVectorNumElements(). Differential Revision: https://reviews.llvm.org/D83811	2020-07-20 13:11:02 -07:00
Florian Hahn	e297006d6f	[ScheduleDAG] Move DBG_VALUEs after first term forward. MBBs are not allowed to have non-terminator instructions after the first terminator. Currently in some cases (see the modified test), EmitSchedule can add DBG_VALUEs after the last terminator, for example when referring a debug value that gets folded into a TCRETURN instruction on ARM. This patch updates EmitSchedule to move inserted DBG_VALUEs just before the first terminator. I am not sure if there are terminators produce values that can in turn be used by a DBG_VALUE. In that case, moving the DBG_VALUE might result in referencing an undefined register. But in any case, it seems like currently there is no way to insert a proper DBG_VALUEs for such registers anyways. Alternatively it might make sense to just remove those extra DBG_VALUES. I am not too familiar with the details of debug info in the backend and would appreciate any suggestions on how to address the issue in the best possible way. Reviewers: vsk, aprantl, jpaquette, efriedma, paquette Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D83561	2020-07-17 10:27:43 +01:00
Matt Arsenault	9d3e56e2ee	DAG: Try scalarizing when expanding saturating add/sub In an upcoming AMDGPU patch, the scalar cases will be legal and vector ops should be scalarized, rather than producing a long sequence of vector ops which will also need to be scalarized. Use a lazy heuristic that seems to work and improves the thumb2 MVE test.	2020-07-16 14:05:16 -04:00
Matt Arsenault	023883a834	IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref When the byref attribute is added, there will need to be two similar functions for the existing cases which have an associate value copy, and byref which does not. Most, but not all of the existing uses will use the existing version. The associated size function added by D82679 also needs to contextually differ, and will help eliminate a few places still relying on pointee element types.	2020-07-16 13:50:49 -04:00
Kerry McLaughlin	2762da0a16	[SVE][CodeGen] Legalisation of masked loads and stores Summary: This patch modifies IncrementMemoryAddress to use a vscale when calculating the new address if the data type is scalable. Also adds tablegen patterns which match an extract_subvector of a legal predicate type with zip1/zip2 instructions Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma, david-arm Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83137	2020-07-16 10:55:45 +01:00
Hiroshi Yamauchi	f233b92f92	[PGO][PGSO] Add profile guided size optimization to LegalizeDAG. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83333	2020-07-15 10:03:38 -07:00
Cameron McInally	ae51a70030	[Legalize] Hoist invariant condition in ExpandVectorBuildThroughStack(...) The operands of a BUILD_VECTOR must all have the same type, so we can hoist this invariant condition out of the loop. Differential Revision: https://reviews.llvm.org/D83882	2020-07-15 11:05:20 -05:00
Tim Northover	5165b2b5fd	AArch64+ARM: make LLVM consider system registers volatile. Some of the system registers readable on AArch64 and ARM platforms return different values with each read (for example a timer counter), these shouldn't be hoisted outside loops or otherwise interfered with, but the normal @llvm.read_register intrinsic is only considered to read memory. This introduces a separate @llvm.read_volatile_register intrinsic and maps all system-registers on ARM platforms to use it for the __builtin_arm_rsr calls. Registers declared with asm("r9") or similar are unaffected.	2020-07-15 09:47:36 +01:00
Roger Ferrer Ibanez	14bc5e149d	[DAGCombiner] Rebuild (setcc x, y, ==) from (xor (xor x, y), 1) The existing code already considered this case. Unfortunately a typo in the condition prevents it from triggering. Also the existing code, had it run, forgot to do the folding. This fixes PR42876. Differential Revision: https://reviews.llvm.org/D65802	2020-07-15 07:34:22 +00:00
Paul Walker	6e198aae1d	[SelectionDAG] Prevent warnings when extracting fixed length vector from scalable. ComputeNumSignBits and computeKnownBits both trigger "Scalable flag may be dropped" warnings when a fixed length vector is extracted from a scalable vector. This patch assumes nothing about the demanded elements thus matching the behaviour when extracting a scalable vector from a scalable vector. Differential Revision: https://reviews.llvm.org/D83642	2020-07-14 11:12:56 +00:00
David Sherwood	3b8eaf26db	[SVE][CodeGen] Fix implicit TypeSize->uint64_t conversion in TransformFPLoadStorePair In DAGCombiner::TransformFPLoadStorePair we were dropping the scalable property of TypeSize when trying to create an integer type of equivalent size. In fact, this optimisation makes no sense for scalable types since we don't know the size at compile time. I have changed the code to bail out when encountering scalable type sizes. I've added a test to llvm/test/CodeGen/AArch64/sve-fp.ll that exercises this code path. The test already emits an error if it encounters warnings due to implicit TypeSize->uint64_t conversions. Differential Revision: https://reviews.llvm.org/D83572	2020-07-14 08:07:30 +01:00
Sanjay Patel	8779b11410	[DAGCombiner] rot i16 X, 8 --> bswap X We have this generic transform in IR (instcombine), but as shown in PR41098: http://bugs.llvm.org/PR41098 ...the pattern may emerge in codegen too. x86 has a potential refinement/reversal opportunity here, but that should come later or needs a target hook to avoid the transform. Converting to bswap is the more specific form, so we should use it if it is available.	2020-07-13 12:01:53 -04:00
Sanjay Patel	2df46a5743	[DAGCombiner] allow load/store merging if pairs can be rotated into place This carves out an exception for a pair of consecutive loads that are reversed from the consecutive order of a pair of stores. All of the existing profitability/legality checks for the memops remain between the 2 altered hunks of code. This should give us the same x86 base-case asm that gcc gets in PR41098 and PR44895: http://bugs.llvm.org/PR41098 http://bugs.llvm.org/PR44895 I think we are missing a potential subsequent conversion to use "movbe" if the target supports that. That might be similar to what AArch64 would use to get "rev16". Differential Revision: https://reviews.llvm.org/D83567	2020-07-13 08:57:00 -04:00
Sanjay Patel	f1bbf3acb4	Revert "[DAGCombiner] allow load/store merging if pairs can be rotated into place" This reverts commit `591a3af5c7`. The commit message was cut off and failed to include the review citation.	2020-07-13 08:55:29 -04:00
Sanjay Patel	591a3af5c7	[DAGCombiner] allow load/store merging if pairs can be rotated into place This carves out an exception for a pair of consecutive loads that are reversed from the consecutive order of a pair of stores. All of the existing profitability/legality checks for the memops remain between the 2 altered hunks of code. This should give us the same x86 base-case asm that gcc gets in PR41098 and PR44895:i http://bugs.llvm.org/PR41098 http://bugs.llvm.org/PR44895 I think we are missing a potential subsequent conversion to use "movbe" if the target supports that. That might be similar to what AArch64 would use to get "rev16". Differential Revision:	2020-07-13 08:53:06 -04:00
Kerry McLaughlin	afcc9a81d2	[SVE][Codegen] Add a helper function for pointer increment logic Summary: Helper used when splitting load & store operations to calculate the pointer + offset for the high half of the split Reviewers: efriedma, sdesmalen, david-arm Reviewed By: efriedma Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83577	2020-07-13 10:53:40 +01:00
Sanjay Patel	39009a8245	[DAGCombiner] tighten fast-math constraints for fma fold fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E) This is only allowed when "reassoc" is present on the fadd. As discussed in D80801, this transform goes beyond what is allowed by "contract" FMF (-ffp-contract=fast). That is because we are fusing the trailing add of 'E' with a multiply, but without "reassoc", the code mandates that the products AB and CD are added together before adding in 'E'. I've added this example to the LangRef to try to clarify the meaning of "contract". If that seems reasonable, we should probably do something similar for the clang docs because there does not appear to be any formal spec for the behavior of -ffp-contract=fast. Differential Revision: https://reviews.llvm.org/D82499	2020-07-12 08:51:49 -04:00
Sanjay Patel	02fec9d2a5	[DAGCombiner] move/rename variables for readability; NFC	2020-07-10 11:28:51 -04:00
David Sherwood	da731894a2	[CodeGen] Replace calls to getVectorNumElements() in DAGTypeLegalizer::SetSplitVector In DAGTypeLegalizer::SetSplitVector I have changed calls in the assert from getVectorNumElements() to getVectorElementCount(), since this code path works for both fixed and scalable vectors. This fixes up one warning in the test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83196	2020-07-10 08:29:17 +01:00
David Sherwood	229dfb4728	[CodeGen] Replace calls to getVectorNumElements() in SelectionDAG::SplitVector This patch replaces some invalid calls to getVectorNumElements() with calls to getVectorMinNumElements() instead, since the code paths changed in this patch work for both fixed and scalable vector types. Fixes warnings in this test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83203	2020-07-10 08:11:30 +01:00
Sanjay Patel	a46cf40240	[DAGCombiner] convert if-chain in store merging to switch; NFC	2020-07-09 17:20:04 -04:00
Sanjay Patel	b476e6a642	[DAGCombiner] add helper function for store merging of loaded values; NFC	2020-07-09 17:20:04 -04:00
Sanjay Patel	f98a602c2e	[DAGCombiner] add helper function for store merging of extracts; NFC	2020-07-09 17:20:03 -04:00
Sanjay Patel	8d74cb01b7	[DAGCombiner] add helper function for store merging of constants; NFC	2020-07-09 17:20:03 -04:00
Sanjay Patel	6890e2a17b	[DAGCombiner] add helper function to manage list of consecutive stores; NFC	2020-07-09 17:20:03 -04:00
Christopher Tetreault	ff5b9a7b3b	[SVE] Remove calls to VectorType::getNumElements from CodeGen Reviewers: efriedma, fpetrogalli, sdesmalen, RKSimon, arsenm Reviewed By: RKSimon Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82210	2020-07-09 12:43:36 -07:00
Lucas Prates	fc39a9ca0e	[CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand Summary: When legalizing a biscast operation from an fp16 operand to an i16 on a target that requires both input and output types to be promoted to 32-bits, an assertion can fail when building the new node due to a mismatch between the the operation's result size and the type specified to the node. This patches fix the issue by making sure the bit width of the types match for the FP_TO_FP16 node, covering the difference with an extra ANYEXTEND operation. Reviewers: ostannard, efriedma, pirama, jmolloy, plotfi Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82552	2020-07-09 09:46:17 +01:00
Qiu Chaofan	4254ed5c32	[Legalizer] Fix wrong operand in split vector helper This should be a typo introduced in D69275, which may cause an unknown segment fault in getNode. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D83376	2020-07-09 09:57:29 +08:00
Matt Arsenault	18bd821f02	DAG: Remove redundant finalizeLowering call 9cac4e6d1403554b06ec2fc9d834087b1234b695/D32628 intended to eliminate this, and move all isel pseudo expansion to FinalizeISel. This was a bad rebase or something, and failed to actually delete this call. GlobalISel also has a redundant call of finalizeLowering. However, it requires more work to remove it since it currently triggers a lot of verifier errors in tests.	2020-07-08 18:48:20 -04:00
Matt Arsenault	2ec5fc0c61	DAG: Remove redundant handling of reg fixups It looks like `9cac4e6d14` accidentally added a second copy of this from a bad rebase or something. This second copy was added, and the finalizeLowering call was not deleted as intended.	2020-07-08 18:32:43 -04:00
Sanjay Patel	1265eb2d5f	[DAGCombiner] clean up in mergeConsecutiveStores(); NFC	2020-07-08 14:48:05 -04:00
Sanjay Patel	12c2271e53	[DAGCombiner] fix code comment and improve readability; NFC	2020-07-08 14:48:05 -04:00
Sanjay Patel	683a7f7025	[DAGCombiner] fix function-name formatting; NFC	2020-07-08 12:49:59 -04:00
Sanjay Patel	39329d5724	[DAGCombiner] add enum for store source value; NFC This removes existing code duplication and allows us to assert that we are handling the expected cases. We have a list of outstanding bugs that could benefit by handling truncated source values, so that's a possible addition going forward.	2020-07-08 12:49:59 -04:00
Paul Walker	bb35f0fd89	[SelectionDAG] Fix incorrect offset when expanding CONCAT_VECTORS. ExpandVectorBuildThroughStack is also used for CONCAT_VECTORS. However, when calculating the offsets for each of the operands we incorrectly use the element size rather than actual size and thus the stores overlap. Differential Revision: https://reviews.llvm.org/D83303	2020-07-08 15:39:25 +00:00
Ties Stuij	26a22478cd	[CodeGen] Don't combine extract + concat vectors with non-legal types Summary: The following combine currently breaks in the DAGCombiner: ``` extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x -> extract_vector_elt a, x ``` This happens because after we have combined these nodes we have inserted nodes that use individual instances of the vector element type. In the above example i16. However this isn't a legal type on all backends, and when the combining pass calls the legalizer it breaks as it expects types to already be legal. The type legalizer has already been run, and running it again would make a mess of the nodes. In the example code at least, the generated code is still efficient after the change. Reviewers: miyuki, arsenm, dmgreen, lebedev.ri Reviewed By: miyuki, lebedev.ri Subscribers: lebedev.ri, wdng, hiraditya, steven.zhang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83231	2020-07-08 15:29:57 +01:00
David Sherwood	9e66e9c30a	[CodeGen] Fix wrong use of getVectorNumElements() in DAGTypeLegalizer::SplitVecRes_ExtendOp In DAGTypeLegalizer::SplitVecRes_ExtendOp I have replaced an invalid call to getVectorNumElements() with a call to getVectorMinNumElements(), since the code path works for both fixed and scalable vectors. This fixes up a warning in the following test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83197	2020-07-08 09:53:20 +01:00
David Sherwood	5b14f5051f	[CodeGen] Fix wrong use of getVectorNumElements in PromoteIntRes_EXTRACT_SUBVECTOR Calling getVectorNumElements() is not safe for scalable vectors and we should normally use getVectorElementCount() instead. However, for the code changed in this patch I decided to simply move the instantiation of the variable 'OutNumElems' lower down to the place where only fixed-width vectors are used, and hence it is safe to call getVectorNumElements(). Fixes up one warning in this test: sve-sext-zext.ll Differential Revision: https://reviews.llvm.org/D83195	2020-07-08 09:36:34 +01:00
Heejin Ahn	7e6793aa33	[WebAssembly] Generate unreachable after __stack_chk_fail `__stack_chk_fail` does not return, but `unreachable` was not generated following `call __stack_chk_fail`. This had a possibility to generate an invalid binary for functions with a return type, because `__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can be the last instruction in the function whose return type is non-void. Generating `unreachable` after it makes sure CFGStackify's `fixEndsAtEndOfFunction` handles it correctly. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D83277	2020-07-08 01:02:05 -07:00
Philip Reames	22596e7b2f	[Statepoint] Use early return to reduce nesting and clarify comments [NFC]	2020-07-07 16:19:05 -07:00
Philip Reames	9955876d74	[Statepoint] Reduce intendation and change a variable name [NFC]	2020-07-07 16:19:05 -07:00
Philip Reames	b172cd7812	[Statepoint] Factor out logic for non-stack non-vreg lowering [almost NFC] This is inspired by D81648. The basic idea is to have the set of SDValues which are lowered as either constants or direct frame references explicit in one place, and to separate them clearly from the spilling logic. This is not NFC in that the handling of constants larger than > 64 bit has changed. The old lowering would crash on values which could not be encoded as a sign extended 64 bit value. The new lowering just spills all constants > 64 bits. We could be consistent about doing the sext(Con64) optimization, but I happen to know that this code path is utterly unexercised in practice, so simple is better for now.	2020-07-07 13:34:28 -07:00
Kerry McLaughlin	cdf2eef613	[SVE][CodeGen] Legalisation of unpredicated store instructions Summary: When splitting a store of a scalable type, the new address is calculated in SplitVecOp_STORE using a vscale and an add instruction. Reviewers: sdesmalen, efriedma, david-arm Reviewed By: david-arm Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83041	2020-07-07 11:47:10 +01:00
Kerry McLaughlin	5e8084beba	[SVE][CodeGen] Legalisation of unpredicated load instructions Summary: When splitting a load of a scalable type, the new address is calculated in SplitVecRes_LOAD using a vscale and an add instruction. This patch also adds a DAG combiner fold to visitADD for vscale: - Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1))) Reviewers: sdesmalen, efriedma, david-arm Reviewed By: david-arm Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82792	2020-07-07 11:05:03 +01:00
David Sherwood	c061e56e88	[CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll This patch fixes all remaining warnings in: llvm/test/CodeGen/AArch64/sve-trunc.ll llvm/test/CodeGen/AArch64/sve-vector-splat.ll I hit some warnings related to getCopyPartsToVector. I fixed two issues: 1. In widenVectorToPartType() we assumed that we'd always be using BUILD_VECTOR nodes to expand from one vector type to another, which is incorrect for scalable vector types. I've fixed this for now by simply bailing out immediately for scalable vectors. 2. In getCopyToPartsVector() I've changed the code to compare the element counts of different types. Differential Revision: https://reviews.llvm.org/D83028	2020-07-07 09:21:47 +01:00
Sanjay Patel	ea71ba11ab	[DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division X / (fabs(A) * sqrt(Z)) --> X / sqrt(AAZ) --> X * rsqrt(AAZ) In the motivating case from PR46406: https://bugs.llvm.org/show_bug.cgi?id=46406 ...this is restoring the sequence that was originally in the source code. We extracted a term from within the sqrt because we do not know in instcombine whether a target will expand a sqrt call. Note: we could say that the transform in IR should be restricted, but that would not solve the problem if the source was originally in the pattern shown here. This is a gray area for fast-math-flag requirements. I think we should at least check fast-math-flags on the fdiv and fmul because I view this transform as 2 pieces: reassociate the fmul operands and form reciprocal from the fdiv (as with the existing transform). We could argue that the sqrt also needs FMF, but that was not required before, so we should change that in a follow-up patch if that seems better. We don't currently have a way to check that the target will produce a sqrt or recip estimate without actually creating nodes (the APIs are SDValue getSqrtEstimate() and SDValue getRecipEstimate()), so we clean up speculatively created nodes if we are not able to create an estimate. The x86 test with doubles verifies that we are not changing a test with no estimate sequence. Differential Revision: https://reviews.llvm.org/D82716	2020-07-06 19:12:21 -04:00
Jay Foad	babbeafa00	[TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount Use a simpler code sequence when the shift amount is known not to be zero modulo the bit width. Nothing much uses this until D77152 changes the translation of fshl and fshr intrinsics. Differential Revision: https://reviews.llvm.org/D82540	2020-07-06 12:07:14 +01:00
Jay Foad	e7a4a24dc5	[TargetLowering] Improve expansion of ROTL/ROTR Using a negation instead of a subtraction from a constant can save an instruction on some targets. Nothing much uses this until D77152 changes the translation of fshl and fshr intrinsics. Differential Revision: https://reviews.llvm.org/D82539	2020-07-06 12:07:14 +01:00

1 2 3 4 5 ...

10971 Commits