llvm-project

Commit Graph

Author	SHA1	Message	Date
Stephen Tozer	e10493eb50	[DebugInfo] Correctly track SDNode dependencies for list debug values During SelectionDAG, we must track the SDNodes that each SDDbgValue depends on to compute its value. These are ultimately derived from the location operands to the SDDbgValue, but were stored in a separate vector prior to this patch. This resulted in cases where one of the lists was updated incorrectly, resulting in crashes during compilation. This patch fixes the issue by directly recomputing the dependency list from the SDDbgOperands in getDependencies(). Differential Revision: https://reviews.llvm.org/D99423	2021-04-08 17:01:45 +01:00
Craig Topper	67953311e2	[SelectionDAG] Teach SelectionDAG::FoldConstantArithmetic to handle SPLAT_VECTOR This allows FoldConstantArithmetic to handle SPLAT_VECTOR in addition to BUILD_VECTOR. This allows it to support scalable vectors. I'm also allowing fixed length SPLAT_VECTOR which is used by some targets, but I'm not familiar enough to write tests for those targets. I had to block this function from running on CONCAT_VECTORS to avoid calling getNode for a CONCAT_VECTORS of 2 scalars. This can happen because the 2 operand getNode calls this function for any opcode. Previously we were protected because CONCAT_VECTORs of BUILD_VECTOR is folded to a larger BUILD_VECTOR before that call. But it's not always possible to fold a CONCAT_VECTORS of SPLAT_VECTORs, and we don't even try. This fixes PR49781 where DAG combine thought constant folding should be possible, but FoldConstantArithmetic couldn't do it. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D99682	2021-04-07 10:03:33 -07:00
Yevgeny Rouban	3e738afae4	[Statepoint Lowering] Allow other than N byte sized types in deopt bundle I do not see any bit-width restriction from the point of the LLVM Lang Ref - Operand Bundles on the types of the deopt bundle operands. Statepoint Lowering seems to be able to work with any types. This patch relaxes the two related assertions and adds a new test for this change. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D100006	2021-04-07 17:48:31 +07:00
Philip Reames	fb41cae039	More precisely type code used for gc.relocate assertions [nfc]	2021-04-06 11:27:36 -07:00
Simon Pilgrim	ddbb58736a	[KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI. As promised in D98866	2021-04-06 10:11:41 +01:00
Nikita Popov	665065821e	[FastISel] Remove kill tracking This is a followup to D98145: As far as I know, tracking of kill flags in FastISel is just a compile-time optimization. However, I'm not actually seeing any compile-time regression when removing the tracking. This probably used to be more important in the past, before FastRA was switched to allocate instructions in reverse order, which means that it discovers kills as a matter of course. As such, the kill tracking doesn't really seem to serve a purpose anymore, and just adds additional complexity and potential for errors. This patch removes it entirely. The primary changes are dropping the hasTrivialKill() method and removing the kill arguments from the emitFast methods. The rest is mechanical fixup. Differential Revision: https://reviews.llvm.org/D98294	2021-04-03 15:50:13 +02:00
Simon Pilgrim	4ea5475a3f	[KnownBits] Add KnownBits::haveNoCommonBitsSet helper. NFCI. Include exhaustive test coverage.	2021-04-02 21:44:33 +01:00
Jun Ma	274ac9d40e	[AArch64][SVE] Lowering sve.dot to DOT node Differential Revision: https://reviews.llvm.org/D99699	2021-04-02 20:05:17 +08:00
Sander de Smalen	0f7bbbc481	Always emit error for wrong interfaces to scalable vectors, unless cmdline flag is passed. In order to bring up scalable vector support in LLVM incrementally, we introduced behaviour to emit a warning, instead of an error, when asking the wrong question of a scalable vector, like asking for the fixed number of elements. This patch puts that behaviour under a flag. The default behaviour is that the compiler will always error, which means that all LLVM unit tests and regression tests will now fail when a code-path is taken that still uses the wrong interface. The behaviour to demote an error to a warning can be individually enabled for tools that want to support experimental use of scalable vectors. This patch enables that behaviour when driving compilation from Clang. This means that for users who want to try out scalable-vector support, fixed-width codegen support, or build user-code with scalable vector intrinsics, Clang will not crash and burn when the compiler encounters such a case. This allows us to do away with the following pattern in many of the SVE tests: RUN: .... 2>%t RUN: cat %t \| FileCheck --check-prefix=WARN WARN-NOT: warning: ... The behaviour to emit warnings is only temporary and we expect this flag to be removed in the future when scalable vector support is more stable. This patch also has fixes the following tests: unittests: ScalableVectorMVTsTest.SizeQueries SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG regression tests: Transforms/InstCombine/vscale_gep.ll Reviewed By: paulwalker-arm, ctetreau Differential Revision: https://reviews.llvm.org/D98856	2021-04-02 10:55:22 +01:00
Simon Pilgrim	77d625f8d8	[DAG] MergeInnerShuffle with BinOps - sometimes accept undef mask elements If the inner shuffle already contains undef elements, then accept them in the merged shuffle as well. This helps some X86 HADD/SUB patterns where slow targets were ending up with HADD/SUB because the (un)merged shuffles were stuck either side of the ADD/SUB - meaning we ended up with a total cost much higher than the "2*shuffle+add" that a slow target usually expands a HADD/SUB to.	2021-04-01 14:33:00 +01:00
Simonas Kazlauskas	777a58e05b	Support {S,U}REMEqFold before legalization This allows these optimisations to apply to e.g. `urem i16` directly before `urem` is promoted to i32 on architectures where i16 operations are not intrinsically legal (such as on Aarch64). The legalization then later can happen more directly and generated code gets a chance to avoid wasting time on computing results in types wider than necessary, in the end. Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88785	2021-04-01 01:35:41 +03:00
Craig Topper	9e00b6660d	[SelectionDAG] Remove unneeded vector resize from the end of FoldConstantArithmetic. NFC There's an assert right before that makes sure the size already matches. Earlier in this function's life, scalars and vectors shared more code.	2021-03-31 12:33:10 -07:00
Tomas Matheson	a9968c0a33	[NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions Currently needsStackRealignment returns false if canRealignStack returns false. This means that the behavior of needsStackRealignment does not correspond to it's name and description; a function might need stack realignment, but if it is not possible then this function returns false. Furthermore, needsStackRealignment is not virtual and therefore some backends have made use of canRealignStack to indicate whether a function needs stack realignment. This patch attempts to clarify the situation by separating them and introducing new names: - shouldRealignStack - true if there is any reason the stack should be realigned - canRealignStack - true if we are still able to realign the stack (e.g. we can still reserve/have reserved a frame pointer) - hasStackRealignment = shouldRealignStack && canRealignStack (not target customisable) Targets can now override shouldRealignStack to indicate that stack realignment is required. This change will make it easier in a future change to handle the case where we need to realign the stack but can't do so (for example when the register allocator creates an aligned spill after the frame pointer has been eliminated). Differential Revision: https://reviews.llvm.org/D98716 Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87	2021-03-30 17:31:39 +01:00
Bradley Smith	9745dce8c3	[SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps This is currently performed in SelectionDAGLegalize, here we make it also happen in LegalizeVectorOps, allowing a target to lower the SETCC condition codes first in LegalizeVectorOps and then lower to a custom node afterwards, without having to duplicate all of the SETCC condition legalization in the target specific lowering. As a result of this, fixed length floating point SETCC nodes can now be properly lowered for SVE. Differential Revision: https://reviews.llvm.org/D98939	2021-03-29 15:32:25 +01:00
Florian Hahn	eb3d9f2eb6	[SelDag] Add isIntOrFPConstant helper function. This patch adds a new isIntOrFPConstant helper function to check if a SDValue is a integer of FP constant. This pattern is used in various places. There also are places that incorrectly just check for integer constants, e.g. D99384, so hopefully this helper will help people avoid that issue. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D99428	2021-03-28 12:48:58 +01:00
David Sherwood	748ae5281d	[IR][SVE] Add new llvm.experimental.stepvector intrinsic This patch adds a new llvm.experimental.stepvector intrinsic, which takes no arguments and returns a linear integer sequence of values of the form <0, 1, ...>. It is primarily intended for scalable vectors, although it will work for fixed width vectors too. It is intended that later patches will make use of this new intrinsic when vectorising induction variables, currently only supported for fixed width. I've added a new CreateStepVector method to the IRBuilder, which will generate a call to this intrinsic for scalable vectors and fall back on creating a ConstantVector for fixed width. For scalable vectors this intrinsic is lowered to a new ISD node called STEP_VECTOR, which takes a single constant integer argument as the step. During lowering this argument is set to a value of 1. The reason for this additional argument at the codegen level is because in future patches we will introduce various generic DAG combines such as mul step_vector(1), 2 -> step_vector(2) add step_vector(1), step_vector(1) -> step_vector(2) shl step_vector(1), 1 -> step_vector(2) etc. that encourage a canonical format for all targets. This hopefully means all other targets supporting scalable vectors can benefit from this too. I've added cost model tests for both fixed width and scalable vectors: llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll as well as codegen lowering tests for fixed width and scalable vectors: llvm/test/CodeGen/AArch64/neon-stepvector.ll llvm/test/CodeGen/AArch64/sve-stepvector.ll See this thread for discussion of the intrinsic: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html	2021-03-23 10:43:35 +00:00
Craig Topper	2f13e63f9e	[LegalizeDAG] Add asserts to verify the types of custom legalized operation matches the original node. We've messed this up a few times recently on RISCV. Experiments with these asserts found a couple issues on other targets as well. They've all been cleaned up now so we can put in these asserts to catch future issues I had to waive Glue because ADDC/ADDE/etc legalization replaces Glue with i32 on at least AArch64. X86 used to do the same before we switched to ADDCARRY. So I guess that's just how that works. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98979	2021-03-22 10:28:51 -07:00
Craig Topper	30080b003e	[DAGCombiner] Minor compile time improvement to (sext_in_reg (sign_extend_vector_inreg x)) optimization. Don't bother calling ComputeNumSignBits if N00Bits < ExtVTBits. No matter what answer we get back this will be true: (N00Bits - DAG.ComputeNumSignBits(N00, DemandedSrcElts)) < ExtVTBits) So we might as well save the computation. This makes the code more consistent with the similar (sext_in_reg (sext x)) handling above.	2021-03-21 11:16:41 -07:00
Simon Pilgrim	64c2641c89	[DAG] Limit (sext_in_reg (zero_extend_vector_inreg x)) to exact sign extension As commented by @craig.topper on rG1ba5c550d418, we can't guarantee that we'll be extending zero bits, just sign bit. So, revert to the old code for zero_extend_vector_inreg cases.	2021-03-21 14:01:37 +00:00
Simon Pilgrim	9d2df96407	[DAG] computeKnownBits - add ISD::MULHS/MULHU/SMUL_LOHI/UMUL_LOHI handling Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops. Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement. Differential Revision: https://reviews.llvm.org/D98857	2021-03-19 16:02:31 +00:00
Simon Pilgrim	ffb2887103	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),undef) -> bop(shuffle'(x,y),shuffle'(z,w)) Followup to D96345, handle unary shuffles of binops (as well as binary shuffles) if we can merge the shuffle with inner operand shuffles. Differential Revision: https://reviews.llvm.org/D98646	2021-03-19 14:14:56 +00:00
Craig Topper	182b831aeb	[DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR. Previously only all zeros BUILD_VECTOR was recognized.	2021-03-18 15:34:14 -07:00
Simon Pilgrim	1ba5c550d4	[DAG] Improve folding (sext_in_reg (*_extend_vector_inreg x)) -> (sext_vector_inreg x) Extend this to support ComputeNumSignBits of the (used) source vector elements so that we can handle more than just the case where we're sext_in_reg from the source element signbit. Noticed while investigating the poor codegen in D98587.	2021-03-18 15:34:53 +00:00
Simon Pilgrim	b1afa187c8	[DAG] SelectionDAG::isSplatValue - add ISD::ABS handling Add ISD::ABS to the existing unary instructions handling for splat detection This is similar to D83605, but doesn't appear to need to touch any of the wasm refactoring. Differential Revision: https://reviews.llvm.org/D98778	2021-03-18 10:28:29 +00:00
Stephen Tozer	3bfddc2593	Reapply "[DebugInfo] Handle multiple variable location operands in IR" Fixed section of code that iterated through a SmallDenseMap and added instructions in each iteration, causing non-deterministic code; replaced SmallDenseMap with MapVector to prevent non-determinism. This reverts commit `01ac6d1587`.	2021-03-17 16:45:25 +00:00
Hans Wennborg	01ac6d1587	Revert "[DebugInfo] Handle multiple variable location operands in IR" This caused non-deterministic compiler output; see comment on the code review. > This patch updates the various IR passes to correctly handle dbg.values with a > DIArgList location. This patch does not actually allow DIArgLists to be produced > by salvageDebugInfo, and it does not affect any pass after codegen-prepare. > Other than that, it should cover every IR pass. > > Most of the changes simply extend code that operated on a single debug value to > operate on the list of debug values in the style of any_of, all_of, for_each, > etc. Instances of setOperand(0, ...) have been replaced with with > replaceVariableLocationOp, which takes the value that is being replaced as an > additional argument. In places where this value isn't readily available, we have > to track the old value through to the point where it gets replaced. > > Differential Revision: https://reviews.llvm.org/D88232 This reverts commit `df69c69427`.	2021-03-17 13:36:48 +01:00
serge-sans-paille	6e040a19db	[NFC] Wisely nest dyn_cast in FunctionLoweringInfo Take advantage of the inheritance tree to avoid a few comparison.	2021-03-16 10:22:44 +01:00
Fraser Cormack	0035decae7	[CodeGen] Fix issues with scalable-vector INSERT/EXTRACT_SUBVECTORs This patch addresses a few issues when dealing with scalable-vector INSERT_SUBVECTOR and EXTRACT_SUBVECTOR nodes. When legalizing in DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR, we store the low and high halves to the stack separately. The offset for the high half was calculated incorrectly. Additionally, we can optimize this process when we can detect that the subvector is contained entirely within the low/high split vector type. While this optimization is valid on scalable vectors, when performing the 'high' optimization, the subvector must also be a scalable vector. Note that the 'low' optimization is still conservative: it may be possible to insert v2i32 into the low half of a split nxv1i32/nxv1i32, but we can't guarantee it. It is always possible to insert v2i32 into nxv2i32 or v2i32 into nxv4i32+2 as we know vscale is at least 1. Lastly, in SelectionDAG::isSplatValue, we early-exit on the extracted subvector value type being a scalable vector, forgetting that we can also extract a fixed-length vector from a scalable one. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98495	2021-03-15 17:04:21 +00:00
Craig Topper	5b825433d7	[DAGCombiner] Optimize 1-bit smulo to AND+SETNE. A 1-bit smulo overflows is both inputs are -1 since the result should be +1 which can't be represented in a signed 1 bit value. We can detect this with an AND and a setcc. The multiply result can also use the same AND. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97634	2021-03-13 09:39:36 -08:00
Craig Topper	2ea7014089	[DAGCombiner] Use isConstantSplatVectorAllZeros/Ones instead of isBuildVectorAllZeros/Ones in visitMSTORE and visitMLOAD. This allows us to optimize when the mask is a splat_vector in addition to build_vector.	2021-03-12 12:14:56 -08:00
LemonBoy	cfe69c8efd	[SelectionDAG] Improve scalarization of irregular vector types Use a more general strategy when splitting a vector into scalar parts (and vice-versa) to correctly handle vector types whose element size is not a power of 2 (and a multiple of 8). Reviewed By: atanasyan Differential Revision: https://reviews.llvm.org/D98273	2021-03-11 19:57:13 +01:00
Stephen Tozer	f40976bd01	Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This reverts commit `c0f3dfb9f1`. Reverted due to an error on the clang-x64-windows-msvc buildbot.	2021-03-11 14:48:01 +00:00
gbtozers	c0f3dfb9f1	[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic operations with two or more non-const operands; this includes the GetElementPtr instruction, and most Binary Operator instructions. These salvages produce DIArgList locations and are only valid for dbg.values, as currently variadic DIExpressions must use DW_OP_stack_value. This functionality is also only added for salvageDebugInfoForDbgValues; other functions that directly call salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be updated in a later patch. Differential Revision: https://reviews.llvm.org/D91722	2021-03-11 13:33:49 +00:00
Serguei Katkov	0480927712	[Statepoint Lowering] Handle the case with several gc.result Recently gc.result has been marked with readnone instead of readonly and this opens a door for different optimization to duplicate gc.result. Statepoint lowering is not ready to see several gc.results. The problem appears when there are gc.results with one located in the same basic block and another located in other basic block. In this case we need both export VR and fill local setValue. Note that this case is not sufficient optimization done before CodeGen. It is evident that local gc.result dominates all other gc.results and it is handled by GVN and EarlyCSE. But anyway, even if IR is not optimal Backend should not crash on a valid IR. Reviewers: reames, dantrushin Reviewed By: dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D98393	2021-03-11 18:44:44 +07:00
Quentin Colombet	66dab2fa84	[NFC] Fix compiler warnings Fix warnings caused by -Wrange-loop-analysis. Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D98298	2021-03-10 11:03:50 -08:00
Craig Topper	9106d04554	[RISCV][SelectionDAG] Introduce an ISD::SPLAT_VECTOR_PARTS node that can represent a splat of 2 i32 values into a nxvXi64 vector for riscv32. On riscv32, i64 isn't a legal scalar type but we would like to support scalable vectors of i64. This patch introduces a new node that can represent a splat made of multiple scalar values. I've used this new node to solve the current crashes we experience when getConstant is used after type legalization. For RISCV, we are now default expanding SPLAT_VECTOR to SPLAT_VECTOR_PARTS when needed and then handling the SPLAT_VECTOR_PARTS later during LegalizeOps. I've remove the special case I previously put in for ABS for D97991 as the default expansion is now able to succesfully use getConstant. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98004	2021-03-10 09:46:18 -08:00
Jinzheng Tu	481079e284	[NFC] Unify FIME with FIXME in comments There are 5 occurrences FIME and 15333 FIXME. All of them should be FIXME. Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D98321	2021-03-10 14:00:51 +01:00
Serguei Katkov	2fccd1b00a	[Statepoint Lowering] Fix the crash with gc.relocate in a separate block If it was decided to relocate derived pointer using the spill its value is not exported in general case. When gc.relocate is located in an another block than a statepoint we cannot get SD for derived value but for spill case it is not required at all. However implementation of gc.relocate lowering unconditionally request SD value causing the assert triggering. The CL fixes this by handling spill case earlier than SD is really required. Reviewers: reames, dantrushin Reviewed By: dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D98324	2021-03-10 19:51:04 +07:00
Nikita Popov	55ae279ba7	[FastISel] Don't trivially kill extractvalues (PR49467) All extractvalues of the same value at the same index will map to the same register, so even if one specific extractvalue only has one use, we should not mark it as a trivial kill, as there may be more extractvalues later. Fixes https://bugs.llvm.org/show_bug.cgi?id=49467. Differential Revision: https://reviews.llvm.org/D98145	2021-03-09 18:46:38 +01:00
gbtozers	df69c69427	[DebugInfo] Handle multiple variable location operands in IR This patch updates the various IR passes to correctly handle dbg.values with a DIArgList location. This patch does not actually allow DIArgLists to be produced by salvageDebugInfo, and it does not affect any pass after codegen-prepare. Other than that, it should cover every IR pass. Most of the changes simply extend code that operated on a single debug value to operate on the list of debug values in the style of any_of, all_of, for_each, etc. Instances of setOperand(0, ...) have been replaced with with replaceVariableLocationOp, which takes the value that is being replaced as an additional argument. In places where this value isn't readily available, we have to track the old value through to the point where it gets replaced. Differential Revision: https://reviews.llvm.org/D88232	2021-03-09 16:44:38 +00:00
gbtozers	5491a86f59	[DebugInfo] Emit DBG_VALUE_LIST from ISel This patch completes ISel support for DIArgList dbg.values by allowing SDDbgValues with multiple location operands to be emitted as DBG_VALUE_LIST instructions. The primary change of this patch is refactoring EmitDbgValue by pulling location operand emission out to the new function AddDbgValueLocationOps, which is used for both DIArgList and single value dbg.values. Outside of that, the only behaviour change is that the scheduler has a lambda added, HasUnknownVReg, to prevent us from attempting to emit a DBG_VALUE_LIST before all of its used VRegs have become available. Differential Revision: https://reviews.llvm.org/D88592	2021-03-09 12:17:39 +00:00
Cullen Rhodes	2750f3ed31	[IR] Introduce llvm.experimental.vector.splice intrinsic This patch introduces a new intrinsic @llvm.experimental.vector.splice that constructs a vector of the same type as the two input vectors, based on a immediate where the sign of the immediate distinguishes two variants. A positive immediate specifies an index into the first vector and a negative immediate specifies the number of trailing elements to extract from the first vector. For example: @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> ; index @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count These intrinsics support both fixed and scalable vectors, where the former is lowered to a shufflevector to maintain existing behaviour, although while marked as experimental the recommended way to express this operation for fixed-width vectors is to use shufflevector. For scalable vectors where it is not possible to express a shufflevector mask for this operation, a new ISD node has been implemented. This is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker and Cullen Rhodes. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D94708	2021-03-09 10:44:22 +00:00
gbtozers	93b170ea24	[DebugInfo] Handle dbg.values with multiple variable location operands in ISel This patch adds partial support in Instruction Selection for dbg.values that use a DIArgList. This patch does not add support for producing DBG_VALUE_LIST, but adds the logic for processing DIArgLists within the ISel pass. This change is largely focused on handleDebugValue and some of the functions that it calls. Outside of this, salvageDebugInfo and transferDbgValues have been modified to replace individual operands instead of the entire value; dangling debug info for variadic debug values is not currently supported (but may be added later). Differential Revision: https://reviews.llvm.org/D88589	2021-03-09 09:48:03 +00:00
Jessica Paquette	f7d73a6b9e	[SelectionDAG] Don't scalarize vector fpround sources that don't need it. Similar to the workaround code in ScalarizeVecRes_UnaryOp, ScalarizeVecRes_SETCC , ScalarizeVecRes_VSELECT, etc. If we have a case like this: ``` define <1 x half> @func(<1 x float> %x) { %tmp = fptrunc <1 x float> %x to <1 x half> ret <1 x half> %tmp } ``` On AArch64, the <1 x float> is legal. So, this will crash if we call GetScalarizedVector on it. Differential Revision: https://reviews.llvm.org/D98208	2021-03-08 14:37:33 -08:00
Stephen Tozer	c0450af559	Fix: [DebugInfo] Support representation of multiple location operands in SDDbgValue Removes a "default" label from a fully covered switch, causing errors on -Wcovered-switch-default builds.	2021-03-08 19:14:12 +00:00
gbtozers	9525af7b91	[DebugInfo] Support representation of multiple location operands in SDDbgValue This patch modifies the class that represents debug values during ISel, SDDbgValue, to support multiple location operands (to represent a dbg.value that uses a DIArgList). Part of this class's functionality has been split off into a new class, SDDbgOperand. The new class SDDbgOperand represents a single value, corresponding to an SSA value or MachineOperand in the IR and MIR respectively. Members of SDDbgValue that were previously related to that specific value (as opposed to the variable or DIExpression), such as the Kind enum, have been moved to SDDbgOperand. SDDbgValue now contains an array of SDDbgOperand instead, allowing it to hold more than one of these values. All changes outside SDDbgValue are simply updates to use the new interface. Differential Revision: https://reviews.llvm.org/D88585	2021-03-08 18:45:17 +00:00
gbtozers	e5d958c456	[DebugInfo] Support DIArgList in DbgVariableIntrinsic This patch updates DbgVariableIntrinsics to support use of a DIArgList for the location operand, resulting in a significant change to its interface. This patch does not update all IR passes to support multiple location operands in a dbg.value; the only change is to update the DbgVariableIntrinsic interface and its uses. All code outside of the intrinsic classes assumes that an intrinsic will always have exactly one location operand; they will still support DIArgLists, but only if they contain exactly one Value. Among other changes, the setOperand and setArgOperand functions in DbgVariableIntrinsic have been made private. This is to prevent code from setting the operands of these intrinsics directly, which could easily result in incorrect/invalid operands being set. This does not prevent these functions from being called on a debug intrinsic at all, as they can still be called on any CallInst pointer; it is assumed that any code directly setting the operands on a generic call instruction is doing so safely. The intention for making these functions private is to prevent DIArgLists from being overwritten by code that's naively trying to replace one of the Values it points to, and also to fail fast if a DbgVariableIntrinsic is updated to use a DIArgList without a valid corresponding DIExpression.	2021-03-08 14:36:13 +00:00
Craig Topper	0eb405c3b8	[SelectionDAG] Add computeKnownBits support for ISD::USUBSAT. The result of ISD::USUBSAT will never be larger than the LHS. We can use this to put a bound on the number of leading zeros. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98133	2021-03-07 09:48:42 -08:00
LemonBoy	2ec43e4167	[LegalizeDAG] Implement promotion rules for SELECT_CC Implement the promotion rule for SELECT_CC nodes by upcasting all the parameters and downcasting the result. The AArch64 target makes use of this rule and, since it was not implemented, in some cases the instruction selector would hit an assertion upon encountering the illegal node. This patch requires D97840, the included test cases hit both problems. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97859	2021-03-05 18:22:55 +01:00
Craig Topper	ad532be012	[SelectionDAG] Assert that operands to SelectionDAG::getNode are not DELETED_NODE to catch issues like PR49393 earlier. I'm not sure this would catch all such issues, but it would catch some. The problem for PR49393 was that we were holding a reference to a node that wasn't connect edto the DAG across a function that could delete unused nodes. In this particular case we managed to try to use the deleted node while it was in the deleted state before its memory got recycled. It could also happen that we delete the node, something allocates a new node which recycles the memory. Then we try to use the reference we were holding and it is now a completely different node with different valid opcode. This patch would not catch that. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97969	2021-03-04 23:05:32 -08:00
Craig Topper	74e6030bcb	[TargetLowering] Use HandleSDNodes to prevent nodes from being deleted by recursive calls in getNegatedExpression. For binary or ternary ops we call getNegatedExpression multiple times and then compare costs. While we're doing this we need to hold a node from the first call across the second call, but its not yet attached to the DAG. Its possible the second call creates an identical node and then decides it didn't need it so will try to delete it if it has no uses. This can cause a reference to the node we're holding further up the call stack to become invalidated. To prevent this, we can use a HandleSDNode to artifically give the node a use without connecting it to the DAG. I've used a std::list of HandleSDNodes so we can create handles only when we have a node to hold. HandleSDNode does not have default constructor and cannot be copied or moved. Fixes PR49393. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97914	2021-03-04 22:48:25 -08:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Simon Pilgrim	7d3d9fe8cd	[DAG] TargetLowering::BuildUDIV - use APInt as const ref. NFCI. Fixes clang-tidy warning.	2021-03-04 12:15:08 +00:00
Craig Topper	90b7825598	[LegalizeVectorTypes] Remove a tautological compare.	2021-03-03 23:26:00 -08:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
Craig Topper	543b901e58	[LegalizeVectorTypes] Improve SplitVecRes_INSERT_SUBVECTOR to handle subvector being in the high half of the split or not at element 0 of the low half. This function isn't exercised in lit tests today today according to the code coverage report. But will be after the tests in D97543 and D97559. Posting this patch to help a crash that Fraser hit. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97582	2021-03-02 21:14:13 -08:00
Sanjay Patel	415c67ba4c	[SDAG] allow partial undef vector constants with select->logic folds This is an enhancement suggested in the original review/commit: D97730 / `7fce3322a2`	2021-03-02 14:29:15 -05:00
Sanjay Patel	7fce3322a2	[SDAG] allow vector types for select->logic folds This prepares codegen for a change that will remove the identical folds from IR because they are not poison-safe. See D93065 / D97360 for details. We already generically support scalar types, and there are various target-specific transforms that overlap the vector folds. For example, x86 recognizes the and patterns, but not or. We can end up with 1 extra instruction there, but I think that is still preferred over the blendv alternative that loads a constant vector. If this is not optimal, then it should be fixed with a later transform (this change is not expected to result in any regressions because InstCombine currently does the same thing). Removing custom code and supporting undefs in constant-pattern-matching can be follow-up changes. Differential Revision: https://reviews.llvm.org/D97730	2021-03-02 09:25:10 -05:00
Simon Pilgrim	c0d4b44e6a	[DAG] DAGCombiner::tryStoreMergeOfLoads - remove unused StartAddress variable. NFCI. Noticed in "initialization is never read" clang-tidy warning - the only StartAddress set/used is inside the load combine loop.	2021-03-02 13:29:31 +00:00
Sanjay Patel	154c47dc06	[SDAG] add helper for select->logic folds; NFC This set of transforms should be extended to handle vector types.	2021-03-01 16:24:15 -05:00
Craig Topper	e745f7c563	[LegalizeTypes] Improve ExpandIntRes_XMULO codegen. The code previously used two BUILD_PAIRs to concatenate the two UMULO results with 0s in the lower bits to match original VT. Then it created an ADD and a UADDO with the original bit width. Each of those operations need to be expanded since they have illegal types. Since we put 0s in the lower bits before the ADD, the lower half of the ADD result will be 0. So the lower half of the UADDO result is solely determined by the other operand. Since the UADDO need to be split in half, we don't really needd an operation for the lower bits. Unfortunately, we don't see that in type legalization and end up creating something more complicated and DAG combine or lowering aren't always able to recover it. This patch directly generates the narrower ADD and UADDO to avoid needing to legalize them. Now only the MUL is done on the original type. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97440	2021-03-01 09:54:32 -08:00
Simon Pilgrim	9dd83f5ee8	[DAG] visitVECTOR_SHUFFLE - attempt to match commuted shuffles with MergeInnerShuffle. Try to match "shuffle(C, shuffle(A, B, M0), M1) -> shuffle(A, B, M2)" etc. by using MergeInnerShuffle's commuted inner shuffle mode.	2021-03-01 10:42:11 +00:00
Fraser Cormack	6718fda6ad	[CodeGen] Fix issues with subvector intrinsic index types This patch addresses issues arising from the fact that the index type used for subvector insertion/extraction is inconsistent between the intrinsics and SDNodes. The intrinsic forms require i64 whereas the SDNodes use the type returned by SelectionDAG::getVectorIdxTy. Rather than update the intrinsic definitions to use an overloaded index type, this patch fixes the issue by transforming the index to the correct type as required. Any loss of index bits going from i64 to a smaller type is unexpected, and will be caught by an assertion in SelectionDAG::getVectorIdxConstant. The patch also updates the documentation for INSERT_SUBVECTOR and adds an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR. This necessitated changes to AArch64 which was using i64 for EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed its codegen after updating the backend accordingly. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D97459	2021-03-01 10:28:21 +00:00
Serguei Katkov	65fb706231	[Statepoint Lowering] Consider dead deopt gc values together with other gc values Currently dead gc value mentioned in the deopt section are not listed in gc section and so are processed separately. With this CL all deopt gc values are considered as base pointers and processed in the same way as other gc values. The fact that deopt gc pointer is a base pointer was used all the time but it is explicitly documented here by putting the value in SI.Base. The idea of the patch comes from Philip Reames. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D97554	2021-03-01 17:23:02 +07:00
Simon Pilgrim	64c41301ce	[DAG] visitVECTOR_SHUFFLE - move shuffle canonicalization/merges all under the same legality test. NFCI. Minor cleanup to move related combines closer together to make it more coherent, without changing the ordering.	2021-03-01 09:42:00 +00:00
Serguei Katkov	06c5119c76	[Statepoint lowering] Require spill of deopt value in case its type is not legal If the type of the deopt operand has an illegal type and we want to use register for it then it needs to be legalized. This is not supported currently by legalizer and it is not actually clear how to legalize this type of values. Instead we just spill such values and use spill slot location in statepoint. Originally tests were created by Philip Reames. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D97541	2021-03-01 10:23:53 +07:00
Craig Topper	5de09ef02e	[DAGCombiner][X86] Don't peek through ANDs on the shift amount in matchRotateSub when called from MatchFunnelPosNeg. Peeking through AND is only valid if the input to both shifts is the same. If the inputs are different, then the original pattern ORs the two values when the masked shift amount is 0. This is ok if the values are the same since the OR would be a NOP which is why its ok for rotate. Fixes PR49365 and reverts PR34641 Differential Revision: https://reviews.llvm.org/D97637	2021-02-28 12:58:00 -08:00
Craig Topper	ca5247bb17	[DAGCombiner] Don't skip no overflow check on UMULO if the first computeKnownBits call doesn't return any 0 bits. Even if the first computeKnownBits call doesn't have any zero bits it is possible the other operand has bitwidth-1 leading zero. In that case overflow is still impossible. So always call computeKnownBits for both operands.	2021-02-28 08:26:22 -08:00
Heejin Ahn	aa097ef8d4	[WebAssembly] Fix reverse mapping in WasmEHFuncInfo D97247 added the reverse mapping from unwind destination to their source, but it had a critical bug; sources can be multiple, because multiple BBs can have a single BB as their unwind destination. This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes it return a vector rather than a single BB. It does not return the const reference to the existing vector but creates a new vector because `WasmEHFuncInfo` stores not `BasicBlock` or `MachineBasicBlock` but `PointerUnion` of them. Also I hoped to unify those methods for `BasicBlock` and `MachineBasicBlock` into one using templates to reduce duplication, but failed because various usages require `BasicBlock*` to be `const` but it's hard to make it `const` for `MachineBasicBlock` usages. Fixes https://github.com/emscripten-core/emscripten/issues/13514. (More precisely, fixes https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744) Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97583	2021-02-26 17:12:10 -08:00
Craig Topper	eea53b142d	[DAGCombiner] Optimize SMULO/UMULO if we can prove that overflow is impossible. Using ComputeNumSignBits or computeKnownBits we might be able to determine that overflow is impossible. This especially helps after type legalization if the type was promoted from a type with half the bits or more. Type legalization conservatively creates a promoted smulo/umulo and an overflow check for the promoted bits. The overflow from the promoted smulo/umulo is ORed with the result of the promoted bits overflow check. Proving that the promoted smulo/umulo can never overflow will leave us with just the promoted bits overflow check. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97160	2021-02-26 14:50:03 -08:00
Simon Pilgrim	aefe8f2f6c	[DAG] Fold vXi1 multiplies -> and This allows us to remove X86 custom lowering of vXi1 MUL, which helps simplify a load of mask math. Mentioned in D97478 post review.	2021-02-26 11:46:12 +00:00
Simon Pilgrim	73adc26ac0	[DAG] expandAddSubSat - break if-else chain. NFCI. Fix styleguide issue - each if() block always returns so we don't need to make them a if-else chain.	2021-02-26 11:02:08 +00:00
Simon Pilgrim	9490b9f14b	[DAG] Move simplification of SADDSAT/SSUBSAT/UADDSAT/USUBSAT of vXi1 to getNode() As discussed on D97276 we should be able to always do this in node creation, we don't need a combine.	2021-02-25 17:49:26 +00:00
David Sherwood	87dbcd8865	[CodeGen] Canonicalise adds/subs of i1 vectors using XOR When calling SelectionDAG::getNode() to create an ADD or SUB of two vectors with i1 element types we can canonicalise this to use XOR instead, where 1+1 is treated as wrapping around to 0 and 0-1 wraps to 1. I've added the following tests for SVE targets: CodeGen/AArch64/sve-pred-arith.ll and modified some X86 tests to reflect the much simpler codegen required. Differential Revision: https://reviews.llvm.org/D97276	2021-02-25 10:31:26 +00:00
Craig Topper	fe50be12c8	[LegalizeIntegerTypes] Further improve ExpandIntRes_SADDSUBO for targets where SADDO/SSUBO aren't supported. Rather than converting 3 signbits to bools and comparing them, we can do bitwise logic on the whole vector and convert the resulting sign bit to a bool at the end. This is still a different algorithm than what we do in LegalizeDAG through expandSADDOSSUBO. That algorithm needs to know that the RHS of SSUBO is > 0, but that's costly when the type is split. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97325	2021-02-24 10:05:38 -08:00
Simon Pilgrim	8082bfe7e5	[DAG] Add basic mul-with-overflow constant folding support As noticed on D97160	2021-02-24 11:09:02 +00:00
Craig Topper	cb6fc4b0a3	[LegalizeIntegerTypes] Use GetExpandedInteger instead of SplitInteger in ExpandIntRes_XMULO. We know the input is going to be expanded as well, so we should just ask for the already expanded operands. Otherwise we create nodes that are just going to need to be legalized.	2021-02-23 23:53:45 -08:00
Heejin Ahn	ea8c6375e3	[WebAssembly] Fix incorrect grouping and sorting of exceptions This CL is not big but contains changes that span multiple analyses and passes. This description is very long because it tries to explain basics on what each pass/analysis does and why we need this change on top of that. Please feel free to skip parts that are not necessary for your understanding. --- `WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next unwind destination>. The value (unwind dest) here is where an exception should end up when it is not caught by the key (EH pad). We record this info in WasmEHPrepare to fix catch mismatches, because the CFG itself does not have this info. A CFG only contains BBs and predecessor-successor relationship between them, but in `WasmEHFuncInfo` the unwind destination BB is not necessarily a successor or the key EH pad BB. Their relationship can be intuitively explained by this C++ code snippet: ``` try { try { foo(); } catch (int) { // EH pad ... } } catch (...) { // unwind destination } ``` So when `foo()` throws, it goes to `catch (int)` first. But if it is not caught by it, it ends up in the next unwind destination `catch (...)`. This unwind destination is what you see in `catchswitch`'s `unwind label %bb` part. --- `WebAssemblyExceptionInfo` groups exceptions so that they can be sorted continuously together in CFGSort, as we do for loops. What this analysis does is very simple: it creates a single `WebAssemblyException` per EH pad, and all BBs that are dominated by that EH pad are included in this exception. We also identify subexception relationship in this way: if EHPad A domiantes EHPad B, EHPad B's exception is a subexception of EHPad A's exception. This simple rule turns out to be incorrect in some cases. In `WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means semantically EHPad B should not be included in EHPad A's exception, because it does not make sense to rethrow/delegate to an inner scope. This is what happened in CFGStackify as a result of this: ``` try try catch ... <- %dest_bb is among here! end delegate %dest_bb ``` So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to make sure excptions' unwind destinations are not subexceptions of their unwind sources in `WasmEHFuncInfo`. But this alone does not prevent `dest_bb` in the example above from being sorted within the inner `catch`'s exception, even if its exception is not a subexception of that `catch`'s exception anymore, because of how CFGSort works, which will be explained below. --- CFGSort places BBs within the same `SortRegion` (loop or exception) continuously together so they can be demarcated with `loop`-`end_loop` or `catch`-`end_try` in CFGStackify. `SortRegion` is a wrapper for one of `MachineLoop` or `WebAssemblyException`. `SortRegionInfo` already does some complicated things because there discrepancies between those two data structures. `WebAssemblyException` is what we control, and it is defined as an EH pad as its header and BBs dominated by the header as its BBs (with a newly added exception of unwind destinations explained in the previous paragraph). But `MachineLoop` is an LLVM data structure and uses the standard loop detection algorithm. So by the algorithm, BBs that are 1. dominated by the loop header and 2. have a path back to its header. Because of the second condition, many BBs that are dominated by the loop header are not included in the loop. So BBs that contain `return` or branches to outside of the loop are not technically included in `MachineLoop`, but they can be sorted together with the loop with no problem. Maybe to relax the condition, in CFGSort, when we are in a `SortRegion` we allow sorting of not only BBs that belong to the current innermost region but also BBs that are by the current region header. (This was written this way from the first version written by Dan, when only loops existed.) But now, we have cases in exceptions when EHPad B is the unwind destination for EHPad A, even if EHPad B is dominated by EHPad A it should not be included in EHPad A's exception, and should not be sorted within EHPad A. One way to make things work, at least correctly, is change `dominates` condition to `contains` condition for `SortRegion` when sorting BBs, but this will change compilation results for existing non-EH code and I can't be sure it will not degrade performance or code size. I think it will degrade performance because it will force many BBs dominated by a loop, which don't have the path back to the header, to be placed after the loop and it will likely to create more branches and blocks. So this does a little hacky check when adding BBs to `Preferred` list: (`Preferred` list is a ready list. CFGSort maintains ready list in two priority queues: `Preferred` and `Ready`. I'm not very sure why, but it was written that way from the beginning. BBs are first added to `Preferred` list and then some of them are pushed to `Ready` list, so here we only need to guard condition for `Preferred` list.) When adding a BB to `Preferred` list, we check if that BB is an unwind destination of another BB. To do this, this adds the reverse mapping, `UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB is an unwind destination, it checks if the current stack of regions (`Entries`) contains its source BB by traversing the stack backwards. If we find its unwind source in there, we add the BB to its `Deferred` list, to make sure that unwind destination BB is added to `Preferred` list only after that region with the unwind source BB is sorted and popped from the stack. --- This does not contain a new test that crashes because of this bug, but this fix changes the result for one of existing test case. This test case didn't crash because it fortunately didn't contain `delegate` to the incorrectly placed unwind destination BB. Fixes https://github.com/emscripten-core/emscripten/issues/13514. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97247	2021-02-23 14:54:55 -08:00
Craig Topper	eb165090bb	[LegalizeIntegerTypes] Improve ExpandIntRes_SADDSUBO codegen on targets without SADDO/SSUBO. This code creates 3 setccs that need to be expanded. It was creating a sign bit test as setge X, 0 which is non-canonical. Canonical would be setgt X, -1. This misses the special case in IntegerExpandSetCCOperands for sign bit tests that assumes canonical form. If we don't hit this special case we end up with a multipart setcc instead of just checking the sign of the high part. To fix this I've reversed the polarity of all of the setccs to setlt X, 0 which is canonical. The rest of the logic should still work. This seems to produce better code on RISCV which lacks a setgt instruction. This probably still isn't the best code sequence we could use here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97181	2021-02-23 09:40:32 -08:00
Heejin Ahn	a08e609d2e	[WebAssembly] Rename methods in WasmEHFuncInfo (NFC) This renames variable and method names in `WasmEHFuncInfo` class to be simpler and clearer. For example, unwind destinations are EH pads by definition so it doesn't necessarily need to be included in every method name. Also I am planning to add the reverse mapping in a later CL, something like `UnwindDestToSrc`, so this renaming will make meanings clearer. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97173	2021-02-22 12:16:11 -08:00
Kazu Hirata	ffba9e596d	[CodeGen] Use range-based for loops (NFC)	2021-02-21 19:58:07 -08:00
Craig Topper	1a6c1ac686	[SelectionDAG][RISCV] Teach ComputeNumSignBits to handle SREM. This also removes a pattern from RISCV that is no longer needed since the sexti32 on the LHS of the srem in the pattern implies the result is sign extended so the sign_extend_inreg should be removed in DAG combine now. Reviewed By: luismarques, RKSimon Differential Revision: https://reviews.llvm.org/D97133	2021-02-21 11:13:36 -08:00
Simon Pilgrim	38ab47c813	[DAG] Match USUBSAT patterns through zext/trunc This patch handles usubsat patterns hidden through zext/trunc and uses the getTruncatedUSUBSAT helper to determine if the USUBSAT can be correctly performed in the truncated form: zext(x) >= y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit))) zext(x) > y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit))) Based on original examples: void foo(unsigned short p, int max, int n) { int i; unsigned m; for (i = 0; i < n; i++) { m = --p; *p = (unsigned short)(m >= max ? m-max : 0); } } Differential Revision: https://reviews.llvm.org/D25987	2021-02-21 15:26:54 +00:00
Kazu Hirata	0b417ba20f	[CodeGen] Use range-based for loops (NFC)	2021-02-20 21:46:02 -08:00
Simon Pilgrim	761bbed264	[DAG] foldSubToUSubSat - fold sub(a,trunc(umin(zext(a),b))) -> usubsat(a,trunc(umin(b,SatLimit))) This moves the last custom x86 USUBSAT fold to generic DAGCombine. Completes PR40111 Differential Revision: https://reviews.llvm.org/D96703	2021-02-20 12:02:07 +00:00
Simon Pilgrim	5d3930bb8f	[DAG] visitTRUNCATE - attempt to truncate USUBSAT Fold trunc(usubsat(zext(x),y)) -> usubsat(x,trunc(umin(y,satlimit)))	2021-02-19 14:26:05 +00:00
Simon Pilgrim	53e83afcaf	[DAG] getTruncatedUSUBSAT - always truncate operands. NFCI. As noticed on D96703, we're always truncating the operands so should use getNode(ISD::TRUNCATE) instead of getZExtOrTrunc.	2021-02-18 21:28:55 +00:00
Guozhi Wei	66f2d09ebf	[DAGCombiner] Transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2) If extload is legal, following transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2) can save one ext instruction. Differential Revision: https://reviews.llvm.org/D95086	2021-02-18 13:15:20 -08:00
Bradley Smith	8bad8a43c3	[AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD Adjust generateFMAsInMachineCombiner to return false if SVE is present in order to combine fmul+fadd into fma. Also add new pseudo instructions so as to select the most appropriate of FMLA/FMAD depending on register allocation. Depends on D96599 Differential Revision: https://reviews.llvm.org/D96424	2021-02-18 16:55:16 +00:00
Craig Topper	61d4d9a5d3	[TableGen][SelectionDAG] Improve efficiency of encoding negative immediates for isel's CheckInteger opcode. CheckInteger uses an int64_t encoded using a variable width encoding that is optimized for encoding a number with a lot of leading zeros. Negative numbers have no leading zeros so use the largest encoding requiring 9 bytes. I believe its most like we want to check for positive and negative numbers near 0. -1 is quite common due to its use in the 'not' idiom. To optimize for this, we can borrow an idea from the bitcode format and move the sign bit to bit 0 with the magnitude stored in the upper bits. This will drastically increase the number of leading zeros for small magnitudes. Then we can run this value through VBR encoding. This gives a small reduction in the table size on all in tree targets except VE where size increased by about 300 bytes due to intrinsic ids now requiring 3 bytes instead of 2. Since the intrinsic enum space is shared by all targets this an unfortunate consquence of where VE is currently located in the range. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96317	2021-02-18 08:53:17 -08:00
Simon Pilgrim	87fbc06d06	[DAG] Pull out getTruncatedUSUBSAT helper from foldSubToUSubSat. NFCI. This will simplify an incoming generic implementation of D25987. I'll rebase D96703 shortly to support this.	2021-02-17 12:17:08 +00:00
Simon Pilgrim	05c64ea672	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) (REAPPLIED) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Fixes issue raised by @saugustine in rG5aa8f4c0843a where we were failing to replace null shuffle operands from MergeInnerShuffle to UNDEFs. Differential Revision: https://reviews.llvm.org/D96345	2021-02-17 11:42:43 +00:00
Sterling Augustine	5aa8f4c084	Revert "[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))" This reverts commit `5dfba562dd`. That commit causes an assertion failure with the following repro: typedef long b __attribute__((__vector_size__(16))); b d; b e; b __attribute__((__always_inline__)) c(b h, b i) { return (__attribute__((__vector_size__(8 sizeof(short)))) short)h + i; } j() { b k, l, m, n, o[6], p, q; m = d[5]; b r = m; b s = f(r, 8); q = s; l = d[1]; p = l; t(q); n = c(m, l); o[1] = c(s, f(p, 8)); k = __builtin_shufflevector(n, o[1], 0, 2); e = __builtin_ia32_psrlwi128(k, j); } ./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c	2021-02-16 12:48:15 -08:00
Simon Pilgrim	df45c18135	[DAG] PromoteIntRes_ADDSUBSHLSAT - promote ISD::UADDSAT as clamped add Similar to D96622, we're better off just promoting uaddsat(x,y) -> umin(add(x,y),c) instead of trying to perform a shifted uaddsat. I initially tried to just use shifted promotion in cases where we didn't have a legal/custom umin - but we don't appear to have any targets that have uaddsat but not umin, so imo we're better off always using the umin and avoid an untested shifted uaddsat code path. Differential Revision: https://reviews.llvm.org/D96767	2021-02-16 17:37:44 +00:00
Craig Topper	064ada4ec6	[SelectionDAG][AArch64] Restrict matchUnaryPredicate to only handle SPLAT_VECTOR for scalable vectors. `fde2466171` added support for scalable vectors to matchUnaryPredicate by handling SPLAT_VECTOR in addition to BUILD_VECTOR. This was used to enabled UDIV/SDIV/UREM/SREM by constant expansion in BuildUDIV/BuildSDIV in TargetLowering.cpp The caller there expects to call getBuildVector from the match factors. This leads to a crash right now if there is a SPLAT_VECTOR of fixed vectors since the number of vectors won't match the number of elements. To fix this, this patch updates the callers to check the opcode instead of whether the type is fixed or scalable. This assumes that only 3 opcodes are handled by matchUnaryPredicate so I've added an assertion to the final else to check that opcode. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96174	2021-02-16 09:22:46 -08:00
Simon Pilgrim	5dfba562dd	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Differential Revision: https://reviews.llvm.org/D96345	2021-02-16 15:46:34 +00:00
Simon Pilgrim	420420de57	[DAG] Avoid APInt copies by directly using the APInt reference from getAPIntValue. NFCI.	2021-02-16 13:50:34 +00:00
Simon Pilgrim	dd879f7dc9	[DAG] Use APInt::extractBits instead of lshr().trunc(). NFCI. Avoids so many APInt instances by directly using the APInt reference from getAPIntValue.	2021-02-16 13:50:33 +00:00
Craig Topper	eb75f250fe	[RISCV][LegalizeTypes] Try to expand BITREVERSE before promoting if the promoted BITREVERSE would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. Simlilar was done for BSWAP previously. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96681	2021-02-15 12:33:16 -08:00
Simon Pilgrim	e47f21da61	[DAG] visitVSELECT - move OpLHS == LHS into inner if() in USUBSAT matching. NFCI. This will be necessary for the update of D25987 where we'll need to match OpLHS against other ops.	2021-02-15 18:27:00 +00:00
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
Arlo Siemsen	080866470d	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Simon Pilgrim	6f5a805bbb	[DAG] Fold i1/vXi1 saddsat/uaddsat(x,y) -> or(x,y) Alive2: https://alive2.llvm.org/ce/z/FzcrpH	2021-02-13 15:02:01 +00:00
Simon Pilgrim	0df15e5eff	[DAG] Fold i1/vXi1 ssubsat/usubsat(x,y) -> and(x,~y) Alive2: https://alive2.llvm.org/ce/z/4nkNGh	2021-02-13 13:21:15 +00:00
Simon Pilgrim	60ba5397df	[DAG] PromoteIntRes_ADDSUBSHLSAT - use promoted ISD::USUBSAT directly As discussed on D96413, as long as the promoted bits of the args are zero we can use the basic ISD::USUBSAT pattern directly, without the shifting like we do for other ops. I think something similar should be possible for ISD::UADDSAT as well, which I'll look at later. Also, create a ISD::USUBSAT node directly - this will be expanded back by the legalizer later on if necessary. Differential Revision: https://reviews.llvm.org/D96622	2021-02-13 12:35:10 +00:00
Simon Pilgrim	7ad0c573bd	[DAG] Fix shift amount limit in SimplifyDemandedBits trunc(shift(x,c)) to truncated bitwidth We lost this in D56387/rG69bc0990a9181e6eb86228276d2f59435a7fae67 - where I got the src/dst bitwidths mixed up and assumed getValidShiftAmountConstant would catch it. Patch by @craig.topper - confirmed by @Carrot that it fixes PR49162	2021-02-13 12:00:08 +00:00
Simon Pilgrim	4841a225b7	[DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets. This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization. We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation. The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987. Differential Revision: https://reviews.llvm.org/D96413	2021-02-12 18:22:57 +00:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Simon Pilgrim	2465541dc0	[DAG] DAGTypeLegalizer::PromoteIntRes_ADDSUBSHLSAT - break if-else chain. NFCI. Style fixup - the if() block always returns so we can pull out the contents of the else() block.	2021-02-12 10:33:12 +00:00
Craig Topper	5744502a13	[TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL. If we wait until the type is legalized, we'll lose information about the orginal type and need to use larger magic constants. This gets especially bad on RISCV64 where i64 is the only legal type. I've limited this to simple scalar types so it only works for i8/i16/i32 which are most likely to occur. For more odd types we might want to do a small promotion to a type where MULH is legal instead. Unfortunately, this does prevent some urem/srem+seteq matching since that still require legal types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96210	2021-02-11 09:43:13 -08:00
Simon Pilgrim	5beebf9c58	[DAG] foldLogicOfSetCCs - Generalize and/or (setcc X, CMax, ne), (setcc X, CMin, ne/eq) fold. NFCI. Prep work to add support for non-uniform vectors - replace APInt values with using the SDValue ops directly.	2021-02-11 17:09:01 +00:00
Thomas Preud'homme	bad0290ce3	Improve STRICT_FSETCC codegen in absence of no NaN As for SETCC, use a less expensive condition code when generating STRICT_FSETCC if the node is known not to have Nan. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91972	2021-02-11 14:19:43 +00:00
Joe Ellis	67464dfe36	[DebugInfo] Only perform TypeSize -> unsigned cast when necessary This commit moves a line in SelectionDAGBuilder::handleDebugValue to avoid implicitly casting a TypeSize object to an unsigned earlier than necessary. It was possible that we bail out of the loop before the value is ever used, which means we could create a superfluous TypeSize warning. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D96423	2021-02-11 13:54:09 +00:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Luís Marques	acac29ca42	[DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts Avoid doing the following combine for vector types: ``` copysign(x, fp_extend(y)) -> copysign(x, y) copysign(x, fp_round(y)) -> copysign(x, y) ``` That combine seemed to impede the selection of vector instruction and cause a mess in some circumstances. Differential Revision: https://reviews.llvm.org/D96037	2021-02-10 14:25:24 +00:00
Kazu Hirata	7e75f6fc1d	[SelectionDAG] Use range-based for loops (NFC)	2021-02-09 22:14:30 -08:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Nemanja Ivanovic	a5222aa085	[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets As of commit `284f2bffc9`, the DAG Combiner gets rid of the masking of the input to this node if the mask only keeps the bottom 16 bits. This is because the underlying library function does not use the high order bits. However, on PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits from the register. Therefore, the library implementation of __gnu_h2f_ieee will return an incorrect result if the bits aren't cleared. This combine is desired for ARM (and possibly other targets) so this patch adds a query to Target Lowering to check if this zeroing needs to be kept. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092 Differential revision: https://reviews.llvm.org/D96283	2021-02-09 06:33:48 -06:00
Thomas Preud'homme	a50ab8672d	Revert STRICT_FCMP nonan optimisation Summary: This reverts commit `b7b61a7b5b` which fails on some of the builders: http://lab.llvm.org:8011/#/builders/14/builds/5806 Reviewers: Subscribers:	2021-02-09 11:27:35 +00:00
Thomas Preud'homme	b7b61a7b5b	Improve STRICT_FSETCC codegen in absence of no NaN As for SETCC, use a less expensive condition code when generating STRICT_FSETCC if the node is known not to have Nan. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91972	2021-02-09 11:18:16 +00:00
Simon Pilgrim	c5c690a835	[DAG] visitVECTOR_SHUFFLE - move shuffle legality check into MergeInnerShuffle lamda. NFCI. This is going to be necessary for a future reuse of MergeInnerShuffle	2021-02-08 14:25:16 +00:00
Kazu Hirata	7b9f6c2d42	[SelectionDAG] Drop unnecessary const from a return type (NFC) Identified with const-return-type.	2021-02-07 09:49:33 -08:00
Simon Pilgrim	86dabf4226	[DAG] SelectionDAG::isSplatValue - handle OR/XOR cases Add OR/XOR to the basic binops that we support when checking for a splat vector value	2021-02-07 13:27:57 +00:00
Huihui Zhang	1b81117f88	[DAGCombiner][SVE] Fix invalid use of getVectorNumElements() in visitSRA. Make sure scalable property is preserved by using getVectorElementCount(). Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D95967	2021-02-05 09:56:49 -08:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `3fe3946d9a` without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `3fe3946d9a`. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Craig Topper	8cc9c42a0c	[TargetLowering] Use LegalOnly operand to isOperationLegalOrCustom to simplify some code. NFC	2021-02-04 12:30:37 -08:00
Craig Topper	34da12dd1f	[DAGCombiner] Remove (sra (shl X, C), C) if X has more than C sign bits. If sext_inreg is supported, we will turn this into sext_inreg. That will then remove it if there are enough sign bits. But if sext_inreg isn't supported, we can still remove the shift pair based on sign bits. Split from D95890.	2021-02-03 10:18:40 -08:00
Craig Topper	4553821815	[SelectionDAG] Prevent scalable vector warning from ComputeNumSignBits on extract_vector_elt on a scalable vector.	2021-02-01 23:42:03 -08:00
Kerry McLaughlin	9b4fcfaa9e	[SVE][CodeGen] Remove performMaskedGatherScatterCombine The AArch64 DAG combine added by D90945 & D91433 extends the index of a scalable masked gather or scatter to i32 if necessary. This patch removes the combine and instead adds shouldExtendGSIndex, which is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether the index should be extended before calling getMaskedGather/Scatter. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94525	2021-02-01 14:10:00 +00:00
xgupta	94fac81fcc	[Branch-Rename] Fix some links According to the [[ https://foundation.llvm.org/docs/branch-rename/ \| status of branch rename ]], the master branch of the LLVM repository is removed on 28 Jan 2021. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D95766	2021-02-01 16:43:21 +05:30
Serge Pavlov	bf416d166b	[FPEnv] Intrinsic for setting rounding mode To set non-default rounding mode user usually calls function 'fesetround' from standard C library. This way has some disadvantages. * It creates unnecessary dependency on libc. On the other hand, setting rounding mode requires few instructions and could be made by compiler. Sometimes standard C library even is not available, like in the case of GPU or AI cores that execute small kernels. * Compiler could generate more effective code if it knows that a particular call just sets rounding mode. This change introduces new IR intrinsic, namely 'llvm.set.rounding', which sets current rounding mode, similar to 'fesetround'. It however differs from the latter, because it is a lower level facility: * 'llvm.set.rounding' does not return any value, whereas 'fesetround' returns non-zero value in the case of failure. In glibc 'fesetround' reports failure if its argument is invalid or unsupported or if floating point operations are unavailable on the hardware. Compiler usually knows what core it generates code for and it can validate arguments in many cases. * Rounding mode is specified in 'fesetround' using constants like 'FE_TONEAREST', which are target dependent. It is inconvenient to work with such constants at IR level. C standard provides a target-independent way to specify rounding mode, it is used in FLT_ROUNDS, however it does not define standard way to set rounding mode using this encoding. This change implements only IR intrinsic. Lowering it to machine code is target-specific and will be implemented latter. Mapping of 'fesetround' to 'llvm.set.rounding' is also not implemented here. Differential Revision: https://reviews.llvm.org/D74729	2021-02-01 11:28:14 +07:00
Craig Topper	70289ea6f5	[RISCV][LegalizeTypes] Try to expand BSWAP before promoting if the promoted BSWAP would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. This is helfpul on RISCV where we might have to promote i16 all the way to i64. Differential Revision: https://reviews.llvm.org/D95756	2021-01-31 14:33:29 -08:00
Craig Topper	ea87cf2acd	[TargetLowering][RISCV] Don't transform (seteq/ne (sext_inreg X, VT), C1) -> (seteq/ne (zext_inreg X, VT), C1) if the sext_inreg is cheaper RISCV has to use 2 shifts for (i64 (zext_inreg X, i32)), but we can use addiw rd, rs1, x0 for sext_inreg. We already understood this when type legalizing i32 seteq/ne on rv64. But this transform in SimplifySetCC would sometimes undo it. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D95289	2021-01-25 16:37:21 -08:00
Fraser Cormack	fde2466171	[SelectionDAG] Support scalable-vector splats in more cases This patch adds support for scalable-vector splats in DAGCombiner's `isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions, which enable the SelectionDAG div/rem-by-constant optimizations for scalable vector types. It also fixes up one case where the UDIV optimization was generating a SETCC without first consulting the target for its preferred SETCC result type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94501	2021-01-25 10:58:15 +00:00
Fangrui Song	d5bbaaaf95	[XRay] Make __xray_customevent support non-Linux	2021-01-25 00:48:21 -08:00
QingShan Zhang	ffc3e800c6	[NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero For now, we correct the result for sqrt if iteration > 0. This doesn't make sense as they are not strict relative. Reviewed By: dmgreen, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D94480	2021-01-25 04:02:44 +00:00
Kazu Hirata	16baad8f4e	[llvm] Use pop_back_val (NFC)	2021-01-24 12:18:57 -08:00
Kazu Hirata	d44ca0cf2f	[CodeGen] Forward-declare TargetMachine (NFC) InstrEmitter.h needs TargetMachine but relies on a forward declaration of TargetMachine in MachineOperand.h. This patch adds a forward declaration right in InstrEmitter.h. While we are at it, this patch removes the one in MachineOperand.h, where it is unnecessary.	2021-01-24 12:18:54 -08:00
Craig Topper	147c0c263d	[TargetLowering] Use isOneConstant to simplify some code. NFC	2021-01-22 19:32:19 -08:00
Simon Pilgrim	5dbe5d2c91	[DAG] Commute shuffle(splat(A,u), shuffle(C,D)) -> shuffle'(shuffle(C,D), splat(A,u)) We only merge shuffles if the inner (LHS) shuffle is a non-splat, so commute these shuffles to improve merging of multiple shuffles.	2021-01-22 11:43:18 +00:00
Craig Topper	c953a83347	[TargetLowering] Use getBoolConstant instead of assuming zero or one for boolean contents. Noticed while I was touching other nearby code. I don't have a test where this matters because the targets I work on use zero or one boolean contents. And the tests cases I've seen this fire on happen before type legalization where the result type is MVT::i1 so the distinction doesn't matter.	2021-01-22 00:26:14 -08:00
Craig Topper	5660dc5968	[TargetLowering] Simplify some code in SimplifySetCC that tries to handle SIGN_EXTEND_INREG operand types that should never happen. NFCI There was code to handle the first operand being different than the result type. And code to handle first operand having the same type as the type to extend from. This should never happen for a correctly formed SIGN_EXTEND_INREG. I've replace the code with asserts. I also noticed we created the same APInt twice so I've reused it.	2021-01-21 23:56:37 -08:00
Simon Pilgrim	69bc0990a9	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED). Add DemandedElts support inside the TRUNCATE analysis. REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d Differential Revision: https://reviews.llvm.org/D56387	2021-01-21 13:01:34 +00:00
Simon Pilgrim	935bacd3a7	[DAG] SimplifyDemandedBits - correctly adjust truncated shift amount type As noticed on D56387, for vectors we must always correctly adjust the shift amount type during truncation (not just after legalization). We were getting away with it as we currently only accepted scalars via the dyn_cast<ConstantSDNode>.	2021-01-21 12:38:36 +00:00
Simon Pilgrim	bc9ab9a5cd	[DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI. Cleanup some code to use auto* properly from cast, and use const APInt& for getAPIntValue() to avoid an unnecessary copy.	2021-01-21 11:04:09 +00:00
Hans Wennborg	a51226057f	Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE" It caused "Vector shift amounts must be in the same as their first arg" asserts in Chromium builds. See the code review for repro instructions. > Add DemandedElts support inside the TRUNCATE analysis. > > Differential Revision: https://reviews.llvm.org/D56387 This reverts commit `cad4275d69`.	2021-01-20 20:06:55 +01:00
Simon Pilgrim	cad4275d69	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE Add DemandedElts support inside the TRUNCATE analysis. Differential Revision: https://reviews.llvm.org/D56387	2021-01-20 15:39:58 +00:00
Kazu Hirata	b023cdeacc	[llvm] Use llvm::all_of (NFC)	2021-01-19 20:19:17 -08:00
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Craig Topper	79e798aca3	Recommit "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results." This recommits `2c51bef76c`. I've fixed the broken check line from when I renamed the test function. Original commit message: This builds on D94142 where scalable vectors are allowed in structs. I did have to fix one scalable vector issue in the vector type creation for these intrinsics where we used getVectorNumElements instead of ElementCount.	2021-01-18 11:08:28 -08:00
Craig Topper	5d431c3d32	Revert "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results." This reverts commit `2c51bef76c`. I seem to have messed up the check lines in the test.	2021-01-18 11:00:20 -08:00
Craig Topper	2c51bef76c	[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results. This builds on D94142 where scalable vectors are allowed in structs. I did have to fix one scalable vector issue in the vector type creation for these intrinsics where we used getVectorNumElements instead of ElementCount. Differential Revision: https://reviews.llvm.org/D94149	2021-01-18 10:41:36 -08:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Simon Pilgrim	207f32948b	[DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other. Differential Revision: https://reviews.llvm.org/D94532	2021-01-18 10:29:23 +00:00
Qiu Chaofan	f776d8b12f	[Legalizer] Promote result type in expanding FP_TO_XINT This patch promotes result integer type of FP_TO_XINT in expanding. So crash in conversion from ppc_fp128 to i1 will be fixed. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92473	2021-01-18 11:56:11 +08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Bjorn Pettersson	4f15556731	[LegalizeDAG] Handle NeedInvert when expanding BR_CC This is a follow-up fix to commit `03c8d6a0c4`. Seems like we now end up with NeedInvert being set in the result from LegalizeSetCCCondCode more often than in the past, so we need to handle NeedInvert when expanding BR_CC. Not sure how to deal with the "Tmp4.getNode()" case properly, but current assumption is that that code path isn't impacted by the changes in `03c8d6a0c4` so we can simply move the old assert into the if-branch and only handle NeedInvert in the else-branch. I think that the test case added here, for PowerPC, might have failed also before commit `03c8d6a0c4`. But we started to hit the assert more often downstream when having merged that commit. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94762	2021-01-16 14:33:19 +01:00
Jeroen Dobbelaere	668827b648	Introduce llvm.noalias.decl intrinsic The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a noalias scope is declared. When the intrinsic is duplicated, a decision must also be made about the scope: depending on the reason of the duplication, the scope might need to be duplicated as well. Reviewed By: nikic, jdoerfert Differential Revision: https://reviews.llvm.org/D93039	2021-01-16 09:20:45 +01:00
Craig Topper	a9e939760c	[CodeGen] Removes unwanted optimisation for TargetConstantFP This 'FIXME' popped up in the development of an out-of-tree backend. Quick fix, but first llvm upstream patch, therefore I do not have commit rights, so if approved please commit? - Test is not included as this came up in an out-of-tree backend (if required, please hint on how to test this). Patch by simveg (Simon) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93219	2021-01-15 11:52:53 -08:00
Craig Topper	4c5066b078	[TargetLowering] Don't speculatively call ComputeNumSignBits. NFC These methods are recursive so a little costly. We only look at the result in one place in this function and it's conditional. We also only need the second call if the first had enough returned enough sign bits.	2021-01-15 09:09:35 -08:00
Simon Pilgrim	46aa3c6c33	[DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - improve shuffle(shuffle(x,y),shuffle(x,y)) merging MergeInnerShuffle currently attempts to merge shuffle(shuffle(x,y),z) patterns into a single shuffle, using 1 or 2 of the x,y,z ops. However if we already match 2 ops we might be able to handle the third op if its also a shuffle that references one of the previous ops, allowing us to handle some cases like: shuffle(shuffle(x,y),shuffle(x,y)) shuffle(shuffle(shuffle(x,z),y),z) shuffle(shuffle(x,shuffle(x,y)),z) etc. This isn't an exhaustive match and is dependent on the order the candidate ops are encountered - if one of the matched ops was a shuffle that was peek-able we don't go back and try to split that, I haven't found much need for that amount of analysis yet. This is a preliminary patch that will allow us to later improve x86 HADD/HSUB matching - but needs to be reviewed separately as its in generic code and affects existing Thumb2 tests. Differential Revision: https://reviews.llvm.org/D94671	2021-01-15 15:08:31 +00:00
Jay Foad	868da2ea93	[SelectionDAG] Remove an early-out from computeKnownBits for smin/smax Even if we know nothing about LHS, it can still be useful to know that smax(LHS, RHS) >= RHS and smin(LHS, RHS) <= RHS. Differential Revision: https://reviews.llvm.org/D87145	2021-01-14 18:15:17 +00:00
Jay Foad	517196e569	[Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC. Differential Revision: https://reviews.llvm.org/D94588	2021-01-14 14:02:43 +00:00
Jay Foad	a1cba5b7a1	[SelectionDAG] Make use of KnownBits::commonBits. NFC. Differential Revision: https://reviews.llvm.org/D94587	2021-01-14 14:02:43 +00:00
Simon Pilgrim	7c30c05ff7	[DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - reset shuffle ops and reorder early-out and second op matching. NFCI. I'm hoping to reuse MergeInnerShuffle in some other folds - so ensure the candidate ops/mask are reset at the start of each run. Also, move the second op matching before bailing to make it simpler to try to match other things afterward.	2021-01-14 11:55:20 +00:00
Simon Pilgrim	af8d27a7a8	[DAG] visitVECTOR_SHUFFLE - pull out shuffle merging code into lambda helper. NFCI. Make it easier to reuse in a future patch.	2021-01-14 11:05:19 +00:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Simon Pilgrim	993c488ed2	[DAG] visitVECTOR_SHUFFLE - use all_of to check for all-undef shuffle mask. NFCI.	2021-01-13 17:19:41 +00:00
Kerry McLaughlin	2170e0ee60	[SVE][CodeGen] CTLZ, CTTZ & CTPOP operations (predicates) Canonicalise the following operations in getNode() for predicate types: - CTLZ(Pred) -> bitwise_NOT(Pred) - CTTZ(Pred) -> bitwise_NOT(Pred) - CTPOP(Pred) -> Pred Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94428	2021-01-13 12:24:54 +00:00
Serguei Katkov	157efd84ab	[Statepoint Lowering] Add an option to allow use gc values in regs for landing pad Default value is not changed, so it is NFC actually. The option allows to use gc values on registers in landing pads. Reviewers: reames, dantrushin Reviewed By: reames, dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D94469	2021-01-13 11:39:34 +07:00
Juneyoung Lee	25eb7b08ba	[DAGCombiner] Fold BRCOND(FREEZE(COND)) to BRCOND(COND) This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 . When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted. It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag. The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however. This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`. To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB. It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however. I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction). Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB. Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D92015	2021-01-13 09:36:52 +09:00
Craig Topper	03c8d6a0c4	[LegalizeDAG][RISCV][PowerPC][AMDGPU][WebAssembly] Improve expansion of SETONE/SETUEQ on targets without SETO/SETUO. If SETO/SETUO aren't legal, they'll be expanded and we'll end up with 3 comparisons. SETONE is equivalent to (SETOGT \|\| SETOLT) so if one of those operations is supported use that expansion. We don't need both since we can commute the operands to make the other. SETUEQ can be implemented with !(SETOGT \|\| SETOLT) or (SETULE && SETUGE). I've only implemented the first because it didn't look like most of the affected targets had legal SETULE/SETUGE. Reviewed By: frasercrmck, tlively, nemanjai Differential Revision: https://reviews.llvm.org/D94450	2021-01-12 10:45:03 -08:00
Craig Topper	df74c001fa	[DAGCombiner] Replace static helper function isConstantFPBuildVectorOrConstantFP with the identical version in SelectionDAG. NFC	2021-01-11 23:41:40 -08:00
Craig Topper	f9ef3a6003	[SelectionDAG] Make isConstantIntBuildVectorOrConstantInt and isConstantFPBuildVectorOrConstantFP methods const.	2021-01-11 23:26:53 -08:00
Paul Robinson	1f9c29228c	[FastISel] NFC: Clean up unnecessary bookkeeping Now that we flush the local value map for every instruction, we don't need any extra flushes for specific cases. Also, LastFlushPoint is not used for anything. Follow-ups to #c161665 (D91734). This reapplies #3fd39d3. Differential Revision: https://reviews.llvm.org/D92338	2021-01-11 09:40:39 -08:00
Paul Robinson	be179b9946	[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option This option is not used for anything after #c161665 (D91737). This commit reapplies #a474657.	2021-01-11 09:32:49 -08:00
Paul Robinson	c161775dec	[FastISel] Flush local value map on every instruction Local values are constants or addresses that can't be folded into the instruction that uses them. FastISel materializes these in a "local value" area that always dominates the current insertion point, to try to avoid materializing these values more than once (per block). https://reviews.llvm.org/D43093 added code to sink these local value instructions to their first use, which has two beneficial effects. One, it is likely to avoid some unnecessary spills and reloads; two, it allows us to attach the debug location of the user to the local value instruction. The latter effect can improve the debugging experience for debuggers with a "set next statement" feature, such as the Visual Studio debugger and PS4 debugger, because instructions to set up constants for a given statement will be associated with the appropriate source line. There are also some constants (primarily addresses) that could be produced by no-op casts or GEP instructions; the main difference from "local value" instructions is that these are values from separate IR instructions, and therefore could have multiple users across multiple basic blocks. D43093 avoided sinking these, even though they were emitted to the same "local value" area as the other instructions. The patch comment for D43093 states: Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. This patch undoes most of D43093, and instead flushes the local value map after() every IR instruction, using that instruction's debug location. This avoids sometimes incorrect locations used previously, and emits instructions in a more natural order. In addition, constants materialized due to PHI instructions are not assigned a debug location immediately; instead, when the local value map is flushed, if the first local value instruction has no debug location, it is given the same location as the first non-local-value-map instruction. This prevents PHIs from introducing unattributed instructions, which would either be implicitly attributed to the location for the preceding IR instruction, or given line 0 if they are at the beginning of a machine basic block. Neither of those consequences is good for debugging. This does mean materialized values are not re-used across IR instruction boundaries; however, only about 5% of those values were reused in an experimental self-build of clang. () Actually, just prior to the next instruction. It seems like it would be cleaner the other way, but I was having trouble getting that to work. This reapplies commits `cf1c774d` and `dc35368c`, and adds the modification to PHI handling, which should avoid problems with debugging under gdb. Differential Revision: https://reviews.llvm.org/D91734	2021-01-11 08:32:36 -08:00
Joe Ellis	007358239d	[DAGCombiner] Use getVectorElementCount inside visitINSERT_SUBVECTOR This avoids TypeSize-/ElementCount-related warnings. Differential Revision: https://reviews.llvm.org/D92747	2021-01-11 14:15:11 +00:00
QingShan Zhang	7539c75bb4	[DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent. As the direction is to remove this global option, we need to remove the unsafe-fp-math check for sqrt and update the test with afn fast-math flags. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D93891	2021-01-11 02:25:53 +00:00
Fraser Cormack	41d06095b0	[SelectionDAG] Teach isConstOrConstSplat about ISD::SPLAT_VECTOR This improves llvm::isConstOrConstSplat by allowing it to analyze ISD::SPLAT_VECTOR nodes, in order to allow more constant-folding of operations using scalable vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94168	2021-01-09 20:54:34 +00:00
Fraser Cormack	de373ef779	[SelectionDAG] Extend immAll(Ones\|Zeros)V to handle ISD::SPLAT_VECTOR The TableGen immAllOnesV and immAllZerosV helpers implicitly wrapped the ISD::isBuildVectorAll(Ones\|Zeros) helper functions. This was inhibiting their use for targets such as RISC-V which use ISD::SPLAT_VECTOR. In particular, RISC-V had to define its own 'vnot' fragment. In order to extend the scope of these nodes to include support for ISD::SPLAT_VECTOR, two new ISD predicate functions have been introduced: ISD::isConstantSplatVectorAll(Ones\|Zeros). These effectively supersede the older "isBuildVector" predicates, which are now simple wrappers for the new functions. They pass a defaulted boolean toggle which preserves the old behaviour. It is hoped that in time all call-sites can be ported to the "isConstantSplatVector" functions. While the use of ISD::isBuildVectorAll(Ones\|Zeros) has not changed, the behaviour of the TableGen immAll(Ones\|Zeros)V has. To test the new functionality, the custom RISC-V TableGen fragment has been removed and replaced with the built-in 'vnot'. To test their use as pattern-roots, two splat patterns have been updated accordingly. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94223	2021-01-09 17:05:31 +00:00
Heejin Ahn	9e4eadeb13	[WebAssembly] Update basic EH instructions for the new spec This implements basic instructions for the new spec. - Adds new versions of instructions: `catch`, `catch_all`, and `rethrow` - Adds support for instruction selection for the new instructions - `catch` needs a custom routine for the same reason `throw` needs one, to encode `__cpp_exception` tag symbol. - Updates `WebAssembly::isCatch` utility function to include `catch_all` and Change code that compares an instruction's opcode with `catch` to use that function. - LateEHPrepare - Previously in LateEHPrepare we added `catch` instruction to both `catchpad`s (for user catches) and `cleanuppad`s (for destructors). In the new version `catch` is generated from `llvm.catch` intrinsic in instruction selection phase, so we only need to add `catch_all` to the beginning of cleanup pads. - `catch` is generated from instruction selection, but we need to hoist the `catch` instruction to the beginning of every EH pad, because `catch` can be in the middle of the EH pad or even in a split BB from it after various code transformations. - Removes `addExceptionExtraction` function, which was used to generate `br_on_exn` before. - CFGStackfiy: Deletes `fixUnwindMismatches` function. Running this function on the new instruction causes crashes, and the new version will be added in a later CL, whose contents will be completely different. So deleting the whole function will make the diff easier to read. - Reenables all disabled tests in exception.ll and eh-lsda.ll and a single basic test in cfg-stackify-eh.ll. - Updates existing tests to use the new assembly format. And deletes `br_on_exn` instructions from the tests and FileCheck lines. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94040	2021-01-09 01:48:06 -08:00
Heejin Ahn	7be271537e	[WebAssembly] Rename wasm_rethrow_in_catch intrinsic/builtin `wasm_rethrow_in_catch` intrinsic and builtin are used in order to rethrow an exception when the exception is caught but there is no matching clause within the current `catch`. For example, ``` try { foo(); } catch (int n) { ... } ``` If the caught exception does not correspond to C++ `int` type, it should be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch` because at the time I thought there would be another intrinsic for C++'s `throw` keyword, which rethrows an exception. It turned out that `throw` keyword doesn't require wasm's `rethrow` instruction, so we rename `rethrow_in_catch` to just `rethrow` here. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94038	2021-01-08 06:55:04 -08:00
Simon Moll	611d3c63f3	[VP] ISD helper functions [VE] isel for vp_add, vp_and This implements vp_add, vp_and for the VE target by lowering them to the VVP_* layer. We also add helper functions for VP SDNodes (isVPSDNode, getVPMaskIdx, getVPExplicitVectorLengthIdx). Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D93766	2021-01-08 14:29:45 +01:00
Simon Pilgrim	350ab7aa1c	[DAG] Simplify OR(X,SHL(Y,BW/2)) eq/ne 0/-1 'all/any-of' style patterns Attempt to simplify all/any-of style patterns that concatenate 2 smaller integers together into an and(x,y)/or(x,y) + icmp 0/-1 instead. This is mainly to help some bool predicate reduction patterns where we end up concatenating bool vectors that have been bitcasted to integers. Differential Revision: https://reviews.llvm.org/D93599	2021-01-07 12:03:19 +00:00
Simon Pilgrim	1307e3f6c4	[TargetLowering] Add icmp ne/eq (srl (ctlz x), log2(bw)) vector support.	2021-01-06 16:13:51 +00:00
Craig Topper	4ef91f5871	[DAGCombiner] Don't speculatively create an all ones constant in visitREM that might not be used. This looks to have been done to save some duplicated code under two different if statements, but it ends up being harmful to D94073. This speculative constant can be called on a scalable vector type with i64 element size when i64 scalars aren't legal. The code tries and fails to find a vector type with i32 elements that it can use. So only create the node when we know it will be used.	2021-01-05 12:45:57 -08:00
Fraser Cormack	9a1ac97d3a	[CodeGen] Format SelectionDAG::getConstant methods (NFC)	2021-01-05 12:59:46 +00:00
Cameron McInally	92be640bd7	[FPEnv][AMDGPU] Disable FSUB(-0,X)->FNEG(X) DAGCombine when subnormals are flushed This patch disables the FSUB(-0,X)->FNEG(X) DAG combine when we're flushing subnormals. It requires updating the existing AMDGPU tests to use the fneg IR instruction, in place of the old fsub(-0,X) canonical form, since AMDGPU is the only backend currently checking the DenormalMode flags. Note that this will require follow-up optimizations to make sure the FSUB(-0,X) form is handled appropriately Differential Revision: https://reviews.llvm.org/D93243	2021-01-04 14:44:10 -06:00
Juneyoung Lee	5cdf6ed744	[CodeGen] recognize select form of and/ors when splitting branch conditions Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a \|\| b" in C/C++. "a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a. This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353). In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison. The discussion at D93065 has more context about this. This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize select form of and/or as well using m_LogicalAnd/Or. Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93853	2021-01-01 04:46:10 +09:00
Kazu Hirata	7bc76fd0ec	[CodeGen] Construct SmallVector with iterator ranges (NFC)	2020-12-31 09:39:11 -08:00
Kazu Hirata	1e3ed09165	[CodeGen] Use llvm::append_range (NFC)	2020-12-28 19:55:16 -08:00
Layton Kifer	d29f93bda5	[DAGCombiner] Don't create sexts of deleted xors when they were in-visit replaced Fixes a bug introduced by D91589. When folding `(sext (not i1 x)) -> (add (zext i1 x), -1)`, we try to replace the not first when possible. If we replace the not in-visit, then the now invalidated node will be returned, and subsequently we will return an invalid sext. In cases where the not is replaced in-visit we can simply return SDValue, as the not in the current sext should have already been replaced. Thanks @jgorbe, for finding the below reproducer. The following reduced test case crashes clang when built with `clang -O1 -frounding-math`: ``` template <class> class a { int b() { return c == 0.0 ? 0 : -1; } int c; }; template class a<long>; ``` A debug build of clang produces this "assertion failed" error: ``` clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm:: SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed. ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93274	2020-12-23 16:16:26 -08:00
Bing1 Yu	e8ade4569b	[LegalizeType] When LegalizeType procedure widens a masked_gather, set MemoryType's EltNum equal to Result's EltNum When LegalizeType procedure widens a masked_gather, set MemoryType's EltNum equal to Result's EltNum. As I mentioned in https://reviews.llvm.org/D91092, in previous code, If we have a v17i32's masked_gather in avx512, we widen it to a v32i32's masked_gather with a v17i32's MemoryType. When the SplitVecRes_MGATHER process this v32i32's masked_gather, GetSplitDestVTs will assert fail since what you are going to split is v17i32. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93610	2020-12-22 13:27:38 +08:00
Denis Antrushin	6f45049fb6	[Statepoints] Disable VReg lowering for values used on exception path of invoke. Currently we lower invokes the same way as usual calls, e.g.: V1 = STATEPOINT ... V (tied-def 0) But this is incorrect is V1 is used on exceptional path. By LLVM rules V1 neither dominates its uses in landing pad, nor its live range is live on entry to landing pad. So compiler is allowed to do various weird transformations like splitting live range after statepoint and use split LR in catch block. Until (and if) we find better solution to this problem, let's use old lowering (spilling) for those values which are used on exceptional path and allow VReg lowering for values used only on normal path. Differential Revision: https://reviews.llvm.org/D93449	2020-12-21 20:27:05 +07:00
Bjorn Pettersson	a89d751fb4	Add intrinsics for saturating float to int casts This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ The intrinsics have overloaded source and result type and support vector operands: i32 @llvm.fptoui.sat.i32.f32(float %f) i100 @llvm.fptoui.sat.i100.f64(double %f) <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f) // etc On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is an integer constant specifying the scalar saturation width. The idea here is that initially the second operand and the scalar width of the result type are the same, but they may change during type legalization. For example: i19 @llvm.fptsi.sat.i19.f32(float %f) // builds i19 fp_to_sint_sat f, 19 // type legalizes (through integer result promotion) i32 fp_to_sint_sat f, 19 I went for this approach, because saturated conversion does not compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width. There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like: f = fmaxnum f, FP(MIN) f = fminnum f, FP(MAX) i = fptoxi f i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN If the bounds cannot be exactly represented, we expand to something like this instead: i = fptoxi f i = select f ult FP(MIN), MIN, i i = select f ogt FP(MAX), MAX, i i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN It should be noted that this expansion assumes a non-trapping fptoxi. Initial tests are for AArch64, x86_64 and ARM. This exercises all of the scalar and vector legalization. ARM is included to test float softening. Original patch by @nikic and @ebevhan (based on D54696). Differential Revision: https://reviews.llvm.org/D54749	2020-12-18 11:09:41 +01:00
Layton Kifer	385e9a2a04	[DAGCombiner] Improve shift by select of constant Clean up a TODO, to support folding a shift of a constant by a select of constants, on targets with different shift operand sizes. Reviewed By: RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D90349	2020-12-18 02:21:42 +00:00
Krasimir Georgiev	e71a4cc207	fix a -Wunused-variable warning in release build	2020-12-17 11:52:00 +01:00
Simon Pilgrim	cdb692ee0c	[X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969) Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns. This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts. As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes. Later patches will continue to replace X86ISD::SUBV_BROADCAST usage. Differential Revision: https://reviews.llvm.org/D92645	2020-12-17 10:25:25 +00:00
QingShan Zhang	ebdd20f430	Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128 X86 and AArch64 expand it as libcall inside the target. And PowerPC also want to expand them as libcall for P8. So, propose an implement in the legalizer to common the logic and remove the code for X86/AArch64 to avoid the duplicate code. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D91331	2020-12-17 07:59:30 +00:00
Kazu Hirata	5501b92957	[IR, CodeGen] Use llvm::is_contained (NFC)	2020-12-16 21:30:44 -08:00
Qiu Chaofan	38b4442198	[NFC] [Legalizer] Use common method for expanding fp-to-int operands Reviewed By: RKSimon, steven.zhang Differential Revision: https://reviews.llvm.org/D92481	2020-12-15 10:45:40 +08:00
Matt Arsenault	2e0e03c6a0	OpaquePtr: Require byval on x86_intrcc parameter 0 Currently the backend special cases x86_intrcc and treats the first parameter as byval. Make the IR require byval for this parameter to remove this special case, and avoid the dependence on the pointee element type. Fixes bug 46672. I'm not sure the IR is enforcing all the calling convention constraints. clang seems to ignore the attribute for empty parameter lists, but the IR tolerates it.	2020-12-14 16:34:37 -05:00
Kerry McLaughlin	c5ced82c8e	[SVE][CodeGen] Lower scalable floating-point vector reductions Changes in this patch: - Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally to also work for scalable types - Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32) - Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D93050	2020-12-14 11:45:42 +00:00
Kazu Hirata	ee5b5b7a35	[CodeGen] Use llvm::erase_value (NFC)	2020-12-13 20:05:48 -08:00
Joe Ellis	d863a0ddeb	[SelectionDAG] Implement SplitVecOp_INSERT_SUBVECTOR This function is needed for when it is necessary to split the subvector operand of an llvm.experimental.vector.insert call. Splitting the subvector operand means performing two insertions: one inserting the lower part of the split subvector into the destination vector, and another for inserting the upper part. Through experimenting, it seems quite rare to need split the subvector operand, but this is necessary to avoid assertion errors. Differential Revision: https://reviews.llvm.org/D92760	2020-12-11 11:07:59 +00:00
Craig Topper	a1ae3c6ac9	[RISCV][LegalizeDAG] Expand SETO and SETUO comparisons. Teach LegalizeDAG to expand SETUO expansion when UNE isn't legal. If SETUNE isn't legal, UO can use the NOT of the SETO expansion. Removes some complex isel patterns. Most of the test changes are from using XORI instead of SEQZ. Differential Revision: https://reviews.llvm.org/D92008	2020-12-10 09:15:52 -08:00
Justin Bogner	e6a1187dd8	Limit the recursion depth of SelectionDAG::isSplatValue() This method previously always recursively checked both the left-hand side and right-hand side of binary operations for splatted (broadcast) vector values to determine if the parent DAG node is a splat. Like several other SelectionDAG methods, limit the recursion depth to MaxRecursionDepth (6). This prevents stack overflow. See also https://issuetracker.google.com/173785481 Patch by Nicolas Capens. Thanks! Differential Revision: https://reviews.llvm.org/D92421	2020-12-09 10:35:07 -08:00
Kerry McLaughlin	05edfc5475	[SVE][CodeGen] Add DAG combines for s/zext_masked_gather This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true: - fold (and (masked_gather x)) -> (zext_masked_gather x) - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x) LowerMGATHER has also been updated to fetch the LoadExtType associated with the gather and also use this value to determine the correct masked gather opcode to use. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92230	2020-12-09 11:53:19 +00:00
Kerry McLaughlin	4519ff4b6f	[SVE][CodeGen] Add the ExtensionType flag to MGATHER Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode. Also updated SelectionDAGDumper::print_details so that details of the gather load (is signed, is scaled & extension type) are printed. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91084	2020-12-09 11:19:08 +00:00
Joe Ellis	80c33de2d3	[SelectionDAG] Add llvm.vector.{extract,insert} intrinsics This commit adds two new intrinsics. - llvm.experimental.vector.insert: used to insert a vector into another vector starting at a given index. - llvm.experimental.vector.extract: used to extract a subvector from a larger vector starting from a given index. The codegen work for these intrinsics has already been completed; this commit is simply exposing the existing ISD nodes to LLVM IR. Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D91362	2020-12-09 11:08:41 +00:00
Simon Moll	3ffbc79357	[VP] Build VP SDNodes Translate VP intrinsics to VP_* SDNodes. The tests check whether a matching vp_* SDNode is emitted. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91441	2020-12-09 11:36:51 +01:00
Huihui Zhang	8e6fc1f97e	[AArch64][SVE] Add lowering for llvm.maxnum\|minnum for scalable type. LLVM intrinsic llvm.maxnum\|minnum is overloaded intrinsic, can be used on any floating-point or vector of floating-point type. This patch extends current infrastructure to support scalable vector type. This patch also fix a warning message of incorrect use of EVT::getVectorNumElements() for scalable type, when DAGCombiner trying to split scalable vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92607	2020-12-08 09:35:53 -08:00
David Sherwood	e22259fafe	[SVE] Remove duplicate assert in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR	2020-12-08 14:41:14 +00:00
Tim Northover	c5978f42ec	UBSAN: emit distinctive traps Sometimes people get minimal crash reports after a UBSAN incident. This change tags each trap with an integer representing the kind of failure encountered, which can aid in tracking down the root cause of the problem.	2020-12-08 10:28:26 +00:00
Kai Luo	44bd8ea167	[DAGCombine][PowerPC] Simplify nabs by using legal `smin` operation Convert `0 - abs(x)` to `smin (x, -x)` if `smin` is a legal operation. Verification: https://alive2.llvm.org/ce/z/vpquFR Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92637	2020-12-08 03:24:07 +00:00
Simon Pilgrim	b6e847c396	[DAG] Cleanup by folding some single use VT.getScalarSizeInBits() calls into its comparison. NFCI.	2020-12-07 18:23:54 +00:00
Kerry McLaughlin	111f559bbd	[SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER The refineIndexType & refineUniformBase functions added by D90942 can also be used to improve CodeGen of masked gathers. These changes were split out from D91092 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92319	2020-12-07 13:20:19 +00:00
Kerry McLaughlin	f6dd32fd35	[SVE][CodeGen] Lower scalable masked gathers Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only) Changes in this patch: - Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate gather load opcode to use. - Improve codegen with refineIndexType/refineUniformBase, added in D90942 - Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91092	2020-12-07 12:20:41 +00:00
Bing1 Yu	eee30a6dce	[CodeGen] Modify the refineIndexType(...)'s code to fix a bug in D90942. In previous code, when refineIndexType(...) is called and Index is undef, Index.getOperand(0) will raise a assertion fail. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D92548	2020-12-07 08:49:07 +08:00
Layton Kifer	ac522f8700	[DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1) Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets. Differential Revision: https://reviews.llvm.org/D91589	2020-12-06 11:52:10 -05:00
Kazu Hirata	a553ac9791	[CodeGen] llvm::erase_if (NFC)	2020-12-05 15:44:40 -08:00
Simon Pilgrim	9cf4f493a7	[DAG] Move SelectionDAG implementation to KnownBits::setInReg(). NFCI.	2020-12-04 18:09:08 +00:00
Simon Pilgrim	6f4ee6f870	[DAGCombiner] Use const APInt& for getConstantOperandAPInt results. NFCI. Avoid unnecessary instantiation. Noticed while removing unnecessary autos	2020-12-04 09:44:58 +00:00
dfukalov	2ce38b3f03	[NFC] Reduce include files dependency. 1. Removed #include "...AliasAnalysis.h" in other headers and modules. 2. Cleaned up includes in AliasAnalysis.h. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92489	2020-12-03 18:25:05 +03:00
Joe Ellis	78c0ea54a2	[DAGCombine] Fix TypeSize warning in DAGCombine::visitLIFETIME_END Bail out early if we encounter a scalable store. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D92392	2020-12-03 12:12:41 +00:00
Kazu Hirata	7a4af2a8e7	[SelectionDAG] Use is_contained (NFC)	2020-12-02 19:09:45 -08:00
Hongtao Yu	24d4291ca7	[CSSPGO] Pseudo probes for function calls. An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work. One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution. With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst. To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use. Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`. Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91756	2020-12-02 13:45:20 -08:00
James Park	78b0ec3d1c	Avoid redundant inline with LLVM_ATTRIBUTE_ALWAYS_INLINE Fix MSVC warning when __forceinline is paired with inline. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D85264	2020-12-01 14:43:16 -08:00
David Blaikie	615f63e149	Revert "[FastISel] Flush local value map on ever instruction" and dependent patches This reverts commit `cf1c774d6a`. This change caused several regressions in the gdb test suite - at least a sample of which was due to line zero instructions making breakpoints un-lined. I think they're worth investigating/understanding more (& possibly addressing) before moving forward with this change. Revert "[FastISel] NFC: Clean up unnecessary bookkeeping" This reverts commit `3fd39d3694`. Revert "[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option" This reverts commit `a474657e30`. Revert "Remove static function unused after cf1c774." This reverts commit `dc35368ccf`. Revert "[lldb] Fix TestThreadStepOut.py after "Flush local value map on every instruction"" This reverts commit `53a14a47ee`.	2020-12-01 14:26:23 -08:00
Layton Kifer	d7fec38f05	[DAGCombiner][NFC] Replace duplicate implementation flipBoolean with DAG.getLogicalNOT Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D92246	2020-12-01 22:23:04 +03:00
Benjamin Kramer	107e92dff8	[DAG] Remove unused variable. NFC.	2020-12-01 16:29:02 +01:00
Simon Pilgrim	1b209ff9e3	[DAG] Move vselect(icmp_ult, 0, sub(x,y)) -> usubsat(x,y) to DAGCombine (PR40111) Move the X86 VSELECT->USUBSAT fold to DAGCombiner - there's nothing target specific about these folds.	2020-12-01 14:25:29 +00:00
Simon Pilgrim	6dbd0d36a1	[DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111) Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds. The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away. Differential Revision: https://reviews.llvm.org/D91876	2020-12-01 11:56:26 +00:00
Paul Robinson	3fd39d3694	[FastISel] NFC: Clean up unnecessary bookkeeping Now that we flush the local value map for every instruction, we don't need any extra flushes for specific cases. Also, LastFlushPoint is not used for anything. Follow-ups to #dc35368 (D91734). Differential Revision: https://reviews.llvm.org/D92338	2020-11-30 12:27:50 -08:00
Paul Robinson	a474657e30	[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option This option is not used for anything after #dc35368 (D91734).	2020-11-30 10:55:49 -08:00
Francesco Petrogalli	f6150aa41a	[SelectionDAGBuilder] Update signature of `getRegsAndSizes()`. The mapping between registers and relative size has been updated to use TypeSize to account for the size of scalable EVTs. The patch is a NFCI, if not for the fact that with this change the function `getUnderlyingArgRegs` does not raise a warning for implicit conversion of `TypeSize` to `unsigned` when generating machine code from the test added to the patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92096	2020-11-30 17:38:51 +00:00
Craig Topper	fa0f01a3c0	[RISCV][LegalizeTypes] Teach type legalizer that it can promote UMIN/UMAX using SExtPromotedInteger if that's better for the target. If Sext is cheaper than Zext for a target, we can use that to promote the operands of UMIN/UMAX. Using sext just makes numbers with the sign bit set even larger when treated as an unsigned number and it has no effect on number without the sign bit set. So the relative order doesn't change. This is similar to what we already do for promoting SETCC. This is helpful on RISCV where i32 arguments are sign extended on RV64 and many instructions are able to produce results with 33 sign bits. Differential Revision: https://reviews.llvm.org/D92128	2020-11-27 11:37:25 -08:00
Simon Pilgrim	969918e177	[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion. Allows us to move the x86-specific lowering to the generic expansion code. Differential Revision: https://reviews.llvm.org/D92183	2020-11-27 11:18:58 +00:00
QingShan Zhang	4d83aba422	[DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will have the impact the precision. As the fsqrt added belong to the cold path of the cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve the precision if the input is denormal. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D80974	2020-11-27 02:10:55 +00:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Simon Pilgrim	8057ebf4a0	Revert rG12d59b696b330 "[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal" This reverts commit `12d59b696b`. Prematurely pushed this to trunk	2020-11-26 15:07:45 +00:00
Simon Pilgrim	12d59b696b	[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion. Allows us to move the x86-specific lowering to the generic expansion code.	2020-11-26 14:47:28 +00:00
Kerry McLaughlin	4bee3197f6	[SVE][CodeGen] Extend isConstantSplatValue to support ISD::SPLAT_VECTOR Updated the affected scalable_of_scalable tests in sve-gep.ll, as isConstantSplatValue now returns true in DAGCombiner::visitMUL and folds `(mul x, 1) -> x` Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91363	2020-11-26 11:19:40 +00:00
Craig Topper	aea130f736	[LegalizerTypes] Add support for scalarizing the operand of an FP_EXTEND when the result type is legal.	2020-11-25 20:30:21 -08:00
Craig Topper	2d6042937b	[SelectionDAGBuilder] Add SPF_NABS support to visitSelect We currently don't match this which limits the effectiveness of D91120 until InstCombine starts canonicalizing to llvm.abs. This should be easy to remove if/when we remove the SPF_ABS handling. Differential Revision: https://reviews.llvm.org/D92118	2020-11-25 14:54:26 -08:00
Paul Robinson	dc35368ccf	Remove static function unused after `cf1c774`. Caused some -Werror bot failures.	2020-11-25 13:43:06 -05:00

... 3 4 5 6 7 ...

11556 Commits