llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	56b33391d3	[SelectionDAG] Move ISD:PARITY formation from DAGCombine to SimplifyDemandedBits. Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1) but the AND might be separated from the ctpop. For example if the parity result is multiplied by 2, we'll pull the AND through the shift. So to handle more cases, move to SimplifyDemandedBits where we can handle more cases that result in only the LSB of the CTPOP being used.	2020-09-13 21:04:13 -07:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
Jay Foad	517202c720	[TargetLowering] Fix comments describing XOR -> OR/AND transformations	2020-09-10 13:56:34 +01:00
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
David Sherwood	73a3d350a4	[SVE][CodeGen] Fix up warnings in sve-split-insert/extract tests I have fixed up some more ElementCount/TypeSize related warnings in the following tests: CodeGen/AArch64/sve-split-extract-elt.ll CodeGen/AArch64/sve-split-insert-elt.ll In SelectionDAG::CreateStackTemporary we were relying upon the implicit cast from TypeSize -> uint64_t when calling MachineFrameInfo::CreateStackObject. I've fixed this by passing in the known minimum size instead, which I believe is fine because the associated stack id indicates whether this is a scalable object or not. I've also fixed up a case in TargetLowering::SimplifyDemandedBits when extracting a vector element from a scalable vector. The result is a scalar, hence it wasn't caught at the start of the function. If the vector is scalable we just bail out for now. Differential Revision: https://reviews.llvm.org/D86431	2020-09-04 09:51:31 +01:00
Jay Foad	099c089d4b	[APInt] New member function setBitVal Differential Revision: https://reviews.llvm.org/D87033	2020-09-02 21:40:31 +01:00
David Sherwood	9fbb113247	[SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-load.ll I have fixed up a number of warnings resulting from TypeSize -> uint64_t casts and calling getVectorNumElements() on scalable vector types. I think most of the changes are fairly trivial except for those in DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create the MachineMemoryOperands in a sensible way for scalable vectors. I have added a CHECK line to the following test: CodeGen/AArch64/sve-split-load.ll that ensures no new warnings are added. Differential Revision: https://reviews.llvm.org/D86697	2020-09-01 07:47:59 +01:00
Nikita Popov	51d34c0c53	[TargetLowering] Strip tailing whitespace (NFC)	2020-08-29 18:09:08 +02:00
QingShan Zhang	deb4b25807	[DAGCombine] Don't delete the node if it has uses immediately This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node. ``` if (NegX && (CostX <= CostY)) { Cost = std::min(CostX, CostZ); RemoveDeadNode(NegY); return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags); #<-- NegY is used here if NegY == NegX. } ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86689	2020-08-28 16:13:43 +00:00
Jay Foad	b7e3599a22	[SelectionDAG] Handle non-power-of-2 bitwidths in expandROT Differential Revision: https://reviews.llvm.org/D86449	2020-08-26 09:20:46 +01:00
Jay Foad	a522067692	[SDAG] Convert FSHL <--> FSHR if the target only supports one of them D77152 tried to do this but got it wrong in the shift-by-zero case. D86430 reverted the wrong code. Reimplement the optimization with different code depending on whether the shift amount is known to be non-zero (modulo bitwidth). This improves code quality for fshl tests on AMDGPU, which only has an fshr instruction. Differential Revision: https://reviews.llvm.org/D86438	2020-08-24 17:47:10 +01:00
Bjorn Pettersson	7a4e26adc8	[SelectionDAG] Fix miscompile bug in expandFunnelShift This is a fixup of commit `0819a6416f` (D77152) which could result in miscompiles. The miscompile could only happen for targets where isOperationLegalOrCustom could return different values for FSHL and FSHR. The commit mentioned above added logic in expandFunnelShift to convert between FSHL and FSHR by swapping direction of the funnel shift. However, that transform is only legal if we know that the shift count (modulo bitwidth) isn't zero. Basically, since fshr(-1,0,0)==0 and fshl(-1,0,0)==-1 then doing a rewrite such as fshr(X,Y,Z) => fshl(X,Y,0-Z) would be incorrect if Z modulo bitwidth, could be zero. ``` $ ./alive-tv /tmp/test.ll ---------------------------------------- define i32 @src(i32 %x, i32 %y, i32 %z) { %0: %t0 = fshl i32 %x, i32 %y, i32 %z ret i32 %t0 } => define i32 @tgt(i32 %x, i32 %y, i32 %z) { %0: %t0 = sub i32 32, %z %t1 = fshr i32 %x, i32 %y, i32 %t0 ret i32 %t1 } Transformation doesn't verify! ERROR: Value mismatch Example: i32 %x = #x00000000 (0) i32 %y = #x00000400 (1024) i32 %z = #x00000000 (0) Source: i32 %t0 = #x00000000 (0) Target: i32 %t0 = #x00000020 (32) i32 %t1 = #x00000400 (1024) Source value: #x00000000 (0) Target value: #x00000400 (1024) ``` It could be possible to add back the transform, given that logic is added to check that (Z % BW) can't be zero. Since there were no test cases proving that such a transform actually would be useful I decided to simply remove the faulty code in this patch. Reviewed By: foad, lebedev.ri Differential Revision: https://reviews.llvm.org/D86430	2020-08-24 09:52:11 +02:00
QingShan Zhang	960cbc53ca	[DAGCombine] Remove dead node when it is created by getNegatedExpression We hit the compiling time reported by https://bugs.llvm.org/show_bug.cgi?id=46877 and the reason is the same as D77319. So we need to remove the dead node we created to avoid increase the problem size of DAGCombiner. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D86183	2020-08-24 02:50:58 +00:00
Jay Foad	0819a6416f	[SelectionDAG] Better legalization for FSHL and FSHR In SelectionDAGBuilder always translate the fshl and fshr intrinsics to FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and ORs. Improve the legalization of FSHL and FSHR to avoid code quality regressions. Differential Revision: https://reviews.llvm.org/D77152	2020-08-21 10:32:49 +01:00
David Sherwood	3f36561f69	[SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorLoads In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only ever deals with fixed width types, hence the offsets for each individual store never take 'vscale' into account. I've changed the code in that function to use TypeSize instead of unsigned for tracking the remaining load amount. In addition, I've changed the load loop to use the new IncrementPointer helper function for updating the addresses in each iteration, since this handles scalable vector types. Also, I've added report_fatal_errors in GenWidenVectorExtLoads, TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores, since these functions currently use a sequence of element-by-element scalar loads/stores. In a similar vein, I've also added a fatal error report in FindMemType for the case when we decide to return the element type for a scalable vector type. I've added new tests in CodeGen/AArch64/sve-split-load.ll CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll for the changes in GenWidenVectorLoads. Differential Revision: https://reviews.llvm.org/D85909	2020-08-19 07:54:32 +01:00
Matt Arsenault	5b53b17cd3	DAG: Add missing comment for transform	2020-08-17 10:01:12 -04:00
Matt Arsenault	c7191e3185	DAG: Don't pass 0 alignment value to allowsMisalignedMemoryAccesses I think not unconditionally passing getDstAlign is broken, but leave that for another change.	2020-08-13 09:33:17 -04:00
Kerry McLaughlin	455ed56d48	[SVE][CodeGen] Legalisation of INSERT_VECTOR_ELT for scalable vectors When the result type of insertelement needs to be split, SplitVecRes_INSERT_VECTOR_ELT will try to store the vector to a stack temporary, store the element at the location of the stack temporary plus the index, and reload the Lo/Hi parts. This patch does the following to ensure this works for scalable vectors: - Sets the StackID with getStackIDForScalableVectors() in CreateStackTemporary - Adds an IsScalable flag to getMemBasePlusOffset() and scales the offset by VScale when this is true - Ensures the immediate is clamped correctly by clampDynamicVectorIndex so that we don't try to use an out of range index Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D84874	2020-08-11 12:57:28 +01:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
Craig Topper	fdfdee98ac	[DAGCombiner] Teach SimplifySetCC SETUGE X, SINTMIN -> SETLT X, 0 and SETULE X, SINTMAX -> SETGT X, -1. These aren't the canonical forms we'd get from InstCombine, but we do have X86 tests for them. Recognizing them is pretty cheap. While there make use of APInt:isSignedMinValue/isSignedMaxValue instead of creating a new APInt to compare with. Also use SelectionDAG::getAllOnesConstant helper to hide the all ones APInt creation.	2020-08-08 22:27:16 -07:00
Bevin Hansson	5de6c56f7e	[Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. Summary: This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat, which perform signed and unsigned saturating left shift, respectively. These are useful for implementing the Embedded-C fixed point support in Clang, originally discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html and http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html Reviewers: leonardchan, craig.topper, bjope, jdoerfert Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83216	2020-08-07 15:09:24 +02:00
Simon Pilgrim	b8ffbf0e02	[DAG] TargetLowering::expandMUL_LOHI - pass SDLoc as const& Try to be more consistent with the SDLoc param in the TargetLowering methods. This also exposes an issue where we were passing a SDNode as a SDLoc, relying on the implicit SDLoc(SDNode) constructor.	2020-08-02 15:31:36 +01:00
Simon Pilgrim	d14a22da5e	[DAG] TargetLowering::LowerAsmOutputForConstraint - pass SDLoc as const& Try to be more consistent with the SDLoc param in the TargetLowering methods.	2020-08-02 15:12:02 +01:00
Simon Pilgrim	fdc902774e	[DAG][AMDGPU][X86] Add SimplifyMultipleUseDemandedBits handling for SIGN/ZERO_EXTEND + SIGN/ZERO_EXTEND_VECTOR_INREG Peek through multiple use ops like we already do for ANY_EXTEND/ANY_EXTEND_VECTOR_INREG Differential Revision: https://reviews.llvm.org/D84863	2020-07-29 18:10:59 +01:00
Matt Arsenault	9d3e56e2ee	DAG: Try scalarizing when expanding saturating add/sub In an upcoming AMDGPU patch, the scalar cases will be legal and vector ops should be scalarized, rather than producing a long sequence of vector ops which will also need to be scalarized. Use a lazy heuristic that seems to work and improves the thumb2 MVE test.	2020-07-16 14:05:16 -04:00
Kerry McLaughlin	2762da0a16	[SVE][CodeGen] Legalisation of masked loads and stores Summary: This patch modifies IncrementMemoryAddress to use a vscale when calculating the new address if the data type is scalable. Also adds tablegen patterns which match an extract_subvector of a legal predicate type with zip1/zip2 instructions Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma, david-arm Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83137	2020-07-16 10:55:45 +01:00
Jay Foad	babbeafa00	[TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount Use a simpler code sequence when the shift amount is known not to be zero modulo the bit width. Nothing much uses this until D77152 changes the translation of fshl and fshr intrinsics. Differential Revision: https://reviews.llvm.org/D82540	2020-07-06 12:07:14 +01:00
Jay Foad	e7a4a24dc5	[TargetLowering] Improve expansion of ROTL/ROTR Using a negation instead of a subtraction from a constant can save an instruction on some targets. Nothing much uses this until D77152 changes the translation of fshl and fshr intrinsics. Differential Revision: https://reviews.llvm.org/D82539	2020-07-06 12:07:14 +01:00
Simon Pilgrim	3521ecf1f8	[X86] Add vector support to targetShrinkDemandedConstant for OR/XOR opcodes If a constant is only allsignbits in the demanded/active bits, then sign extend it to an allsignbits bool pattern for OR/XOR ops. This also requires SimplifyDemandedBits XOR handling to be modified to call ShrinkDemandedConstant on any (non-NOT) XOR pattern to account for non-splat cases. Next step towards fixing PR45808 - with this patch we now get a <-1,-1,0,0> v4i64 constant instead of <1,1,0,0>. Differential Revision: https://reviews.llvm.org/D82257	2020-06-29 12:19:05 +01:00
Simon Pilgrim	973685fc78	[TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.	2020-06-29 11:46:58 +01:00
Eli Friedman	e9d4e34ab8	[AArch64][SVE] Add legalization support for i32/i64 vector srem/urem Implement them on top of sdiv/udiv, similar to what we do for integer types. Potential future work: implementing i8/i16 srem/urem, optimizations for constant divisors, optimizing the mul+sub to mls. Differential Revision: https://reviews.llvm.org/D81511	2020-06-23 16:27:52 -07:00
Paul Walker	499c63288f	[SVE] Code generation for fixed length vector loads & stores. Summary: This patch adds base support for code generating fixed length vector operations targeting a known SVE vector length. To achieve this we lower fixed length vector operations to equivalent scalable vector operations, whereby SVE predication is used to limit the elements processed to those present within the fixed length vector. Specifically this patch implements load and store operations, which get lowered to their masked counterparts thusly: V = load(Addr) => V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr)) store(V, (Addr)) => masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr)) Reviewers: rengolin, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80385	2020-06-23 09:39:03 +00:00
Simon Pilgrim	48d1a2d6d0	[DAG] Add SimplifyMultipleUseDemandedVectorElts helper for SimplifyMultipleUseDemandedBits. NFCI. We have many cases where we call SimplifyMultipleUseDemandedBits and demand specific vector elements, but all the bits from them - this adds a helper wrapper to handle this.	2020-06-22 14:24:39 +01:00
Simon Pilgrim	ecc5d7ee0d	[DAG] SimplifyMultipleUseDemandedBits - drop unnecessary *_EXTEND_VECTOR_INREG cases For little endian targets, if we only need the lowest element and none of the extended bits then we can just use the (bitcasted) source vector directly. We already do this in SimplifyDemandedBits, this adds the SimplifyMultipleUseDemandedBits equivalent.	2020-06-22 12:35:32 +01:00
David Sherwood	7edc7f6edb	[CodeGen] Fix SimplifyDemandedBits for scalable vectors For now I have changed SimplifyDemandedBits and it's various callers to assume we know nothing for scalable vectors and to ignore the demanded bits completely. I have also done something similar for SimplifyDemandedVectorElts. These changes fix up lots of warnings due to calls to EVT::getVectorNumElements() for types with scalable vectors. These functions are all used for optimisations, rather than functional requirements. In future we can revisit this code if there is a need to improve code quality for SVE. Differential Revision: https://reviews.llvm.org/D80537	2020-06-19 07:59:35 +01:00
Simon Pilgrim	2474421398	[TargetLowering] SimplifyMultipleUseDemandedBits - drop already extended ISD::SIGN_EXTEND_INREG nodes. If the source of the SIGN_EXTEND_INREG node is already sign extended, use the source directly.	2020-06-18 16:41:08 +01:00
Matt Arsenault	45e1a22a92	GlobalISel: Make known bits/alignment API more consistent Just computing the alignment makes sense without caring about the general known bits, such as for non-integral pointers. Separate the two and start calling into the TargetLowering hooks for frame indexes. Start calling the TargetLowering implementation for FrameIndexes, which improves the AMDGPU matching for stack addressing modes. Also introduce a new hook for returning known alignment of target instructions. For AMDGPU, it would be useful to report the known alignment implied by certain intrinsic calls. Also stop using MaybeAlign.	2020-06-05 14:57:22 -04:00
Matt Arsenault	af867b7850	DAG: Change computeKnownBitsForFrameIndex to be usable by GISel This wasn't getting much value from the DAG or depth arguments, since it's only called on the frame index root nodes. FrameIndexes can also only return a scalar value, so it also didn't need DemandedElts.	2020-06-04 10:50:26 -04:00
Simon Pilgrim	ea80b40669	[DAG] SimplifyDemandedBits - peek through SHL if we only demand sign bits. If we're only demanding the (shifted) sign bits of the shift source value, then we can use the value directly. This handles SimplifyDemandedBits/SimplifyMultipleUseDemandedBits for both ISD::SHL and X86ISD::VSHLI. Differential Revision: https://reviews.llvm.org/D80869	2020-06-03 16:11:54 +01:00
Simon Pilgrim	63824ad947	[TargetLowering] SimplifyDemandedBits - remove shift amount clamps from getValidShiftAmountConstant calls. NFC. getValidShiftAmountConstant only returns a value if the shift amount is in range, so we don't need to check it again.	2020-05-30 14:04:55 +01:00
Simon Pilgrim	9fa58d1bf2	[DAG] Add SimplifyDemandedVectorElts binop SimplifyMultipleUseDemandedBits handling For the supported binops (basic arithmetic, logicals + shifts), if we fail to simplify the demanded vector elts, then call SimplifyMultipleUseDemandedBits and try to peek through ops to remove unnecessary dependencies. This helps with PR40502. Differential Revision: https://reviews.llvm.org/D79003	2020-05-25 12:41:22 +01:00
Simon Pilgrim	1603106725	[TargetLowering] Improve expandFunnelShift shift amount masking For the 'inverse shift', we currently always perform a subtraction of the original (masked) shift amount. But for the case where we are handling power-of-2 type widths, we can replace: (sub bw-1, (and amt, bw-1) ) -> (and (xor amt, bw-1), bw-1) -> (and ~amt, bw-1) This allows x86 shifts to fold away the and-mask. Followup to D77301 + D80466. http://volta.cs.utah.edu:8080/z/Nod0Gr Differential Revision: https://reviews.llvm.org/D80489	2020-05-24 11:25:09 +01:00
Arthur Eubanks	8a88755610	Reland [X86] Codegen for preallocated See https://reviews.llvm.org/D74651 for the preallocated IR constructs and LangRef changes. In X86TargetLowering::LowerCall(), if a call is preallocated, record each argument's offset from the stack pointer and the total stack adjustment. Associate the call Value with an integer index. Store the info in X86MachineFunctionInfo with the integer index as the key. This adds two new target independent ISDOpcodes and two new target dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}. The setup ISelDAG node takes in a chain and outputs a chain and a SrcValue of the preallocated call Value. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an %esp adjustment, the exact amount determined by looking in X86MachineFunctionInfo with the integer index key. The arg ISelDAG node takes in a chain, a SrcValue of the preallocated call Value, and the arg index int constant. It produces a chain and the pointer fo the arg. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a lea of the stack pointer plus an offset determined by looking in X86MachineFunctionInfo with the integer index key. Force any function containing a preallocated call to use the frame pointer. Does not yet handle a setup without a call, or a conditional call. Does not yet handle musttail. That requires a LangRef change first. Tried to look at all references to inalloca and see if they apply to preallocated. I've made preallocated versions of tests testing inalloca whenever possible and when they make sense (e.g. not alloca related, inalloca edge cases). Aside from the tests added here, I checked that this codegen produces correct code for something like ``` struct A { A(); A(A&&); ~A(); }; void bar() { foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8); } ``` by replacing the inalloca version of the .ll file with the appropriate preallocated code. Running the executable produces the same results as using the current inalloca implementation. Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77689	2020-05-20 11:25:44 -07:00
Arthur Eubanks	b8cbff51d3	Revert "[X86] Codegen for preallocated" This reverts commit `810567dc69`. Some tests are unexpectedly passing	2020-05-20 10:04:55 -07:00
Arthur Eubanks	810567dc69	[X86] Codegen for preallocated See https://reviews.llvm.org/D74651 for the preallocated IR constructs and LangRef changes. In X86TargetLowering::LowerCall(), if a call is preallocated, record each argument's offset from the stack pointer and the total stack adjustment. Associate the call Value with an integer index. Store the info in X86MachineFunctionInfo with the integer index as the key. This adds two new target independent ISDOpcodes and two new target dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}. The setup ISelDAG node takes in a chain and outputs a chain and a SrcValue of the preallocated call Value. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an %esp adjustment, the exact amount determined by looking in X86MachineFunctionInfo with the integer index key. The arg ISelDAG node takes in a chain, a SrcValue of the preallocated call Value, and the arg index int constant. It produces a chain and the pointer fo the arg. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a lea of the stack pointer plus an offset determined by looking in X86MachineFunctionInfo with the integer index key. Force any function containing a preallocated call to use the frame pointer. Does not yet handle a setup without a call, or a conditional call. Does not yet handle musttail. That requires a LangRef change first. Tried to look at all references to inalloca and see if they apply to preallocated. I've made preallocated versions of tests testing inalloca whenever possible and when they make sense (e.g. not alloca related, inalloca edge cases). Aside from the tests added here, I checked that this codegen produces correct code for something like ``` struct A { A(); A(A&&); ~A(); }; void bar() { foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8); } ``` by replacing the inalloca version of the .ll file with the appropriate preallocated code. Running the executable produces the same results as using the current inalloca implementation. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77689	2020-05-20 09:20:38 -07:00
QingShan Zhang	2b59e9f1bd	[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression. However, during negating the expression, the cost might change as we are changing the DAG, and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore. This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression, and check the cost during negating the expression. It also reduce the duplicated code between getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638 Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D77319	2020-05-20 02:12:16 +00:00
Simon Pilgrim	acb6f1ae09	TargetLowering.cpp - remove non-constant EXTRACT_SUBVECTOR/INSERT_SUBVECTOR handling. NFC. Now that D79814 has landed, we can assume that subvector ops use constant, in-range indices.	2020-05-14 18:13:58 +01:00
Jay Foad	17941437a2	[TargetLowering] Improve expansion of FSHL/FSHR Use an extra shift-by-1 instead of a compare and select to handle the shift-by-zero case. This sometimes saves one instruction (if the compare couldn't be combined with a previous instruction). It also works better on targets that don't have good select instructions. Note that currently this change doesn't affect most targets because expandFunnelShift is not used because funnel shift intrinsics are lowered early in SelectionDAGBuilder. But there is work afoot to change that; see D77152. Differential Revision: https://reviews.llvm.org/D77301	2020-05-14 16:36:22 +01:00
Sam McCall	728cf6d86b	Revert "[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression" This reverts commit `3c44c441db`. Causes infloops on some inputs, see https://reviews.llvm.org/D77319 for repro	2020-05-11 16:44:01 +02:00
QingShan Zhang	3c44c441db	[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression. However, during negating the expression, the cost might change as we are changing the DAG, and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore. This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression, and check the cost during negating the expression. It also reduce the duplicated code between getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638 Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D77319	2020-05-11 02:41:10 +00:00
Simon Pilgrim	70293ba26f	[DAG] SimplifyMultipleUseDemandedBits - remove superfluous bitcasts If the SimplifyMultipleUseDemandedBits calls BITCASTs that peek through back to the original type then we can remove the BITCASTs entirely. Differential Revision: https://reviews.llvm.org/D79572	2020-05-08 19:04:49 +01:00
LemonBoy	7fa5abd343	[SelectionDAG] Fix assertion failure with big shift amounts Calling getShiftAmountTy with LegalTypes set may return a type that's too narrow to hold the shift amount for integer type it's applied to. Fixes the regression introduced by D79096 Differential Revision: https://reviews.llvm.org/D79405	2020-05-06 11:58:37 -07:00
LemonBoy	6d103ca855	[SelectionDAG] Unify scalarizeVectorLoad and VectorLegalizer::ExpandLoad The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces. The technique employed by `ExpandLoad` is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines. Differential Revision: https://reviews.llvm.org/D79096	2020-05-02 15:18:10 -07:00
Simon Pilgrim	7cb5a51f38	[DAG] SimplifyDemandedVectorElts - add INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling	2020-05-01 16:20:51 +01:00
Simon Pilgrim	65d32a9892	[DAG] SimplifyDemandedVectorElts - remove INSERT_SUBVECTOR if we don't demand the subvector	2020-05-01 16:20:51 +01:00
Simon Pilgrim	e3c0be596c	[DAG] SimplifyDemandedVectorElts - add EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling	2020-05-01 13:48:07 +01:00
Craig Topper	a58b62b4a2	[IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand(). This method has been commented as deprecated for a while. Remove it and replace all uses with the equivalent getCalledOperand(). I also made a few cleanups in here. For example, to removes use of getElementType on a pointer when we could just use getFunctionType from the call. Differential Revision: https://reviews.llvm.org/D78882	2020-04-27 22:17:03 -07:00
QingShan Zhang	2957fa0cd1	[NFC][DAGCombine] Adding three helper functions and change the getNegatedExpression to negateExpression This is a NFC patch for D77319. The idea is to hide the getNegatibleCost inside the getNegatedExpression() to have it return null if the cost is expensive, and add some helper function for easy to use. And rename the old getNegatedExpression to negateExpression to avoid the semantic conflict. Reviewed By: RKSimon Differential revision: https://reviews.llvm.org/D78291	2020-04-27 04:11:42 +00:00
QingShan Zhang	c9f9c79c5a	[NFC][DAGCombine] Change the value of NegatibleCost to make it align with the semantics This is a minor NFC change to make the code more clear. We have the NegatibleCost that has cheaper, neutral, and expensive. Typically, the smaller one means the less cost. It is inverse for current implementation, which makes following code not easy to read. If (CostX > CostY) negate(X) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D77993	2020-04-15 02:20:58 +00:00
Craig Topper	95192f548d	[CallSite removal][TargetLowering] Use CallBase instead of CallSite in TargetLowering::ParseConstraints interface. Differential Revision: https://reviews.llvm.org/D77929	2020-04-12 11:26:25 -07:00
Jay Foad	c63aed890e	[KnownBits] Move AND, OR and XOR logic into KnownBits Summary: There are at least three clients for KnownBits calculations: ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the common logic should be moved out of these clients and into KnownBits itself. This patch does this for AND, OR and XOR calculations by implementing and using appropriate operator overloads KnownBits::operator& etc. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74060	2020-04-09 10:10:37 +01:00
Matt Arsenault	aa26dd9858	CodeGen: Use Register in more places	2020-04-07 15:59:40 -04:00
Craig Topper	c41685b16f	[SelectionDAG] Make getZeroExtendInReg take a vector VT if the operand VT is a vector. This removes a call to getScalarType from a bunch of call sites. It also makes the behavior consistent with SIGN_EXTEND_INREG. Differential Revision: https://reviews.llvm.org/D77631	2020-04-07 11:34:08 -07:00
Guillaume Chatelet	9068bccbae	[Alignment][NFC] Deprecate InstrTypes getRetAlignment/getParamAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77312	2020-04-03 13:21:58 +00:00
Guillaume Chatelet	3a78f44daf	[Alignment][NFC] Convert SelectionDAG::InferPtrAlignment to MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77212	2020-04-01 13:22:11 +00:00
Matt Arsenault	aa63eb6a46	GlobalISel: Add computeKnownBitsForTargetInstr I think we can save the MRI argument from these since it's in GISelKnownBits already, but currently not accessible. Implementation deferred to avoid dependency on other patches.	2020-03-23 15:02:30 -04:00
Sanjay Patel	56da41393d	[SDAG] reduce code duplication in getNegatedExpression(); NFCI	2020-03-19 13:55:15 -04:00
Simon Pilgrim	68224c1952	[TargetLowering] Only demand a rotation's modulo amount bits ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits. Differential Revision: https://reviews.llvm.org/D76201	2020-03-17 21:23:46 +00:00
Simon Pilgrim	2b3b453a82	[TargetLowering] Only demand a funnelshift's modulo amount bits ISD::FSHL/FSHR shift amount values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.	2020-03-16 13:52:17 +00:00
Simon Pilgrim	e71fb46a8f	[TargetLowering] SimplifyDemandedVectorElts - add DemandedElts mask to ISD::BITCAST SimplifyDemandedBits call. This fixes most of the regressions introduced in the rG4bc6f6332028 bugfix. The vector-trunc.ll issue should be fixed by D66004.	2020-03-10 13:39:10 +00:00
QingShan Zhang	3906ae387f	[DAGCombine] Check the uses of negated floating constant and remove the hack PowerPC hits an assertion due to somewhat the same reason as https://reviews.llvm.org/D70975. Though there are already some hack, it still failed with some case, when the operand 0 is NOT a const fp, it is another fma that with const fp. And that const fp is negated which result in multi-uses. A better fix is to check the uses of the negated const fp. If there are already use of its negated value, we will have benefit as no extra Node is added. Differential revision: https://reviews.llvm.org/D75501	2020-03-05 03:42:50 +00:00
Jordan Rupprecht	d7803c3832	Add default case to fix -Wswitch errors	2020-03-02 14:23:46 -08:00
Craig Topper	adc69729ec	[TargetLowering] Fix what look like copy/paste mistakes in compare with infinity handling SimplifySetCC. I expect that the isCondCodeLegal checks should match that CC of the node that we're going to create. Rewriting to a switch to minimize repeated mentions of the same constants.	2020-03-02 14:12:16 -08:00
Simon Pilgrim	d20fb7ea13	Fix shadow variable warning. NFC.	2020-03-02 11:41:20 +00:00
Simon Pilgrim	4bc6f63320	[TargetLowering] SimplifyDemandedBits - fix SCALAR_TO_VECTOR knownbits bug We can only report the knownbits for a SCALAR_TO_VECTOR node if we only demand the 0'th element - the upper elements are undefined and shouldn't be trusted. This is causing a number of regressions that need addressing but we need to get the bugfix in first.	2020-02-28 15:23:37 +00:00
Craig Topper	a5fa778882	[LegalizeTypes] Scalarize non-byte sized loads in WidenRecRes_Load and SplitVecResLoad Should fix PR42803 and PR44902 Differential Revision: https://reviews.llvm.org/D74590	2020-02-24 15:14:33 -08:00
Bevin Hansson	6e561d1c94	[Intrinsic] Add fixed point saturating division intrinsics. Summary: This patch adds intrinsics and ISelDAG nodes for signed and unsigned fixed-point division: ``` llvm.sdiv.fix.sat.* llvm.udiv.fix.sat.* ``` These intrinsics perform scaled, saturating division on two integers or vectors of integers. They are required for the implementation of the Embedded-C fixed-point arithmetic in Clang. Reviewers: bjope, leonardchan, craig.topper Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71550	2020-02-24 10:50:52 +01:00
Simon Pilgrim	42ec6fdce9	[TargetLowering] Apply basic shift combines before recursive SimplifyDemandedBits calls. Minor refactor/cleanup before we begin adding non-uniform support.	2020-02-21 16:31:20 +00:00
Simon Pilgrim	86c52af05a	[TargetLowering] SimplifyDemandedBits - use getValidShiftAmountConstant helper. Use the SelectionDAG::getValidShiftAmountConstant helper to get const/constsplat shift amounts, which allows us to drop the out of range shift amount early-out. First step towards better non-uniform shift amount support in SimplifyDemandedBits.	2020-02-21 14:23:53 +00:00
Simon Pilgrim	d6eef0614f	[TargetLowering] Add SimplifyMultipleUseDemandedBits 'all elements' helper wrapper. NFC.	2020-02-18 19:53:50 +00:00
Jay Foad	32aac25637	[KnownBits] Introduce anyext instead of passing a flag into zext Summary: This was a very odd API, where you had to pass a flag into a zext function to say whether the extended bits really were zero or not. All callers passed in a literal true or false. I think it's much clearer to make the function name reflect the operation being performed on the value we're tracking (rather than on the KnownBits Zero and One fields), so zext means the value is being zero extended and new function anyext means the value is being extended with unknown bits. NFC. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74482	2020-02-12 19:06:53 +00:00
Simon Pilgrim	9eb426c88c	[TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not. This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied. It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on. Differential Revision: https://reviews.llvm.org/D74221	2020-02-12 11:51:42 +00:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Simon Pilgrim	61621f826a	[TargetLowering] SimplifyDemandedBits - add basic KnownBits ZEXTLoad handling We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.	2020-02-03 16:50:04 +00:00
Simon Pilgrim	8fbc7fd567	[DAG] SimplifyMultipleUseDemandedBits - peek through unused ISD::INSERT_SUBVECTOR subvectors If we don't demand any elements of the inserted subvector then just skip it.	2020-01-31 18:57:22 +00:00
Simon Pilgrim	5702dadf6f	[DAG] Enable ISD::INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::INSERT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.	2020-01-31 18:02:34 +00:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Simon Pilgrim	e7e043724e	[DAG] Enable ISD::EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow. Differential Revision: This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.	2020-01-27 21:17:47 +00:00
Simon Pilgrim	4a5f9d9faf	[TargetLowering] Respect recursive depth in SimplifyDemandedBits call to ComputeNumSignBits	2020-01-26 10:01:56 +00:00
Simon Pilgrim	c8de7c8f50	[TargetLowering] SimplifyDemandedBits - Remove ashr if all our demandedbits already match the sign bit Differential Revision: https://reviews.llvm.org/D73412	2020-01-25 17:36:46 +00:00
Simon Pilgrim	0b45c2264a	[SelectionDAG] rot(x, y) --> x iff ComputeNumSignBits(x) == BitWidth(x) Rotating an 0/-1 value by any amount will always result in the same 0/-1 value	2020-01-24 10:35:57 +00:00
Simon Pilgrim	f04284cf1d	[TargetLowering] SimplifyDemandedBits ISD::SRA multi-use handling Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses	2020-01-21 15:12:07 +00:00
Simon Pilgrim	651fa669a2	[TargetLowering] SimplifyDemandedBits ANY_EXTEND/ANY_EXTEND_VECTOR_INREG multi-use handling Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses	2020-01-21 14:07:19 +00:00
Simon Pilgrim	8d2e6bdbe1	[TargetLowering] SimplifyDemandedBits - Pull out InDemandedMask variable to ISD::SHL. NFCI. Matches ISD::SRA + ISD::SRL variants.	2020-01-21 10:40:18 +00:00
Michael Liao	6d0d86a64d	[DAG] Add helper for creating constant vector index with correct type. NFC.	2020-01-18 01:23:36 -05:00
Craig Topper	5cf1b01a01	[LegalizeDAG][TargetLowering] Move vXi64/i64->vXf32/f32 uint_to_fp legalizing code from TargetLowering::expandUINT_TO_FP back to LegalizeDAG. This was moved in October 2018, but we don't appear to be using this for vectors on any in tree target. Moving it back simplifies D72794 so we can share the code for i32->f32.	2020-01-15 22:04:50 -08:00
Craig Topper	ed679804d5	[TargetLowering][X86] Connect the chain from STRICT_FSETCC in TargetLowering::expandFP_TO_UINT and X86TargetLowering::FP_TO_INTHelper.	2020-01-11 17:50:20 -08:00
Craig Topper	bb2553175a	[TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from RunttimeLibcalls.def and all associated usages Summary: This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare. This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code. Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn Reviewed By: efriedma Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72536	2020-01-10 19:30:08 -08:00
Craig Topper	71cee21861	[TargetLowering] Use SelectionDAG::getSetCC and remove a repeated call to getSetCCResultType in softenSetCCOperands. NFCI	2020-01-10 13:24:00 -08:00
Craig Topper	b590e0fd81	[TargetLowering][ARM][X86] Change softenSetCCOperands handling of ONE to avoid spurious exceptions for QNANs with strict FP quiet compares ONE is currently softened to OGT \| OLT. But the libcalls for OGT and OLT libcalls will trigger an exception for QNAN. At least for X86 with libgcc. UEQ on the other hand uses UO \| OEQ. The UO and OEQ libcalls will not trigger an exception for QNAN. This patch changes ONE to use the inverse of the UEQ lowering. So we now produce O & UNE. Technically the existing behavior was correct for a signalling ONE, but since I don't know how to generate one of those from clang that seemed like something we can deal with later as we would need to fix other predicates as well. Also removing spurious exceptions seemed better than missing an exception. There are also problems with quiet OGT/OLT/OLE/OGE, but those are harder to fix. Differential Revision: https://reviews.llvm.org/D72477	2020-01-10 11:00:17 -08:00
Craig Topper	b705fe5686	[TargetLowering][X86] TeachSimplifyDemandedBits to handle cases where only the sign bit is demanded from a SETCC and can be passed through If we're doing a compare that only tests the sign bit and only the sign bit is demanded, we can just bypass the node. This removes one of the blend dependencies in our v2i64->v2f32 uint_to_fp codegen on pre-sse4.2 targets. Differential Revision: https://reviews.llvm.org/D72356	2020-01-09 10:21:25 -08:00
Bevin Hansson	8e2b44f7e0	[Intrinsic] Add fixed point division intrinsics. Summary: This patch adds intrinsics and ISelDAG nodes for signed and unsigned fixed-point division: llvm.sdiv.fix.* llvm.udiv.fix.* These intrinsics perform scaled division on two integers or vectors of integers. They are required for the implementation of the Embedded-C fixed-point arithmetic in Clang. Patch by: ebevhan Reviewers: bjope, leonardchan, efriedma, craig.topper Reviewed By: craig.topper Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70007	2020-01-08 15:17:46 +01:00
Wang, Pengfei	9a621de1ec	[X86] Adding fp128 support for strict fcmp Summary: Adding fp128 support for strict fcmp Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71897	2020-01-08 12:59:31 +08:00
Craig Topper	19ace449a3	[TargetLowering] Use SETCC input type to call getBooleanContents instead of the setcc result type. This isn't a functonal change since we also check the bit width is the same and the input type is integer. This guarantees the input and output type are the same. But passing the input type makes the code more readable.	2020-01-05 23:15:49 -08:00
Craig Topper	16a67d252c	[TargetLowering] In expandFP_TO_UINT, add proper extend or truncate for the condition to feed the DstVT select. Previously, for vectors we created a vselect with a condition that didn't match what the target wanted according to getSetCCResultType. To make up for this, X86 had a special DAG combine to detect if the condition was all sign bits and then insert its own truncate or extend. By adding the extend/truncate here explicitly we can avoid that.	2020-01-04 18:15:20 -08:00
Simon Pilgrim	eb0e1978df	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT (REAPPLIED) This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0. Differential Revision: https://reviews.llvm.org/D65887	2020-01-04 13:15:50 +00:00
Reid Kleckner	9c2b72821b	Move tail call disabling code to target independent code When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in `d9699bc7bd` (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118	2020-01-03 11:27:41 -08:00
Ulrich Weigand	63336795f0	[FPEnv] Default NoFPExcept SDNodeFlag to false The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all other such flags. This is a problem, because it implies that all code that transforms SDNodes without copying flags can introduce a correctness bug, not just a missed optimization. This patch changes the default to false. This makes it necessary to move setting the (No)FPExcept flag for constrained intrinsics from the visitConstrainedIntrinsic routine to the generic visit routine at the place where the other flags are set, or else the intersectFlagsWith call would erase the NoFPExcept flag again. In order to avoid making non-strict FP code worse, whenever SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes none of which can raise FP exceptions, it will preserve this property on all results nodes generated, by setting the NoFPExcept flag on those result nodes that would otherwise be considered as raising an FP exception. To check whether or not an SD node should be considered as raising an FP exception, the following logic applies: - For machine nodes, check the mayRaiseFPException property of the underlying MI instruction - For regular nodes, check isStrictFPOpcode - For target nodes, check a newly introduced isTargetStrictFPOpcode The latter is implemented by reserving a range of target opcodes, similarly to how memory opcodes are identified. (Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71841	2020-01-02 16:59:45 +01:00
Craig Topper	787e078f3e	[TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues instead of creating a MERGE_VALUES node. NFCI This allows us to clean up some places that were peeking through the MERGE_VALUES node after the call. By returning the SDValues directly, we can clean that up. Unfortunately, there are several call sites in AMDGPU that wanted the MERGE_VALUES and now need to create their own.	2019-12-30 19:36:04 -08:00
Fangrui Song	5edb40c022	[SelectionDAG] Disallow indirect "i" constraint This allows us to delete InlineAsm::Constraint_i workarounds in SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and TargetLowering::getInlineAsmMemConstraint overrides. They were introduced to X86 in r237517 to prevent crashes for constraints like "=*imr". They were later copied to other targets.	2019-12-29 16:50:42 -08:00
Simon Pilgrim	34769e0783	SimplifyDemandedBits - Remove duplicate getOperand() call. NFC. Pulled out from D56387 - cleanup variable names, move shift amount legalization inside if() of its only user and remove duplicate getOperand() call.	2019-12-28 16:42:50 +00:00
Craig Topper	a3f8964813	[TargetLowering] Update comment to reference the correct compiler-rt function the code is based on. NFC	2019-12-27 22:49:04 -08:00
Ulrich Weigand	0d3f782e41	[FPEnv][X86] More strict int <-> FP conversion fixes Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840	2019-12-23 21:11:45 +01:00
Sanjay Patel	6a77e36975	[SDAG] adjust isNegatibleForFree calculation to avoid crashing This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch adds a hack to work-around the case where we probably no longer detect that either multiply operand of an FMA isNegatibleForFree which is assumed to be true when we started rewriting the expression. Differential Revision: https://reviews.llvm.org/D70975	2019-12-17 13:49:15 -05:00
Sanjay Patel	5b0251da1c	Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()" This reverts commit `36b1232ec5`. Need to adjust commit message - that was a leftover from the earlier version.	2019-12-17 13:47:59 -05:00
Sanjay Patel	36b1232ec5	[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression() This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch adds a hack to work-around the case where we probably no longer detect that either multiply operand of an FMA isNegatibleForFree which is assumed to be true when we started rewriting the expression. Differential Revision: https://reviews.llvm.org/D70975	2019-12-17 13:46:06 -05:00
Kevin P. Neal	b1d8576b0a	This adds constrained intrinsics for the signed and unsigned conversions of integers to floating point. This includes some of Craig Topper's changes for promotion support from D71130. Differential Revision: https://reviews.llvm.org/D69275	2019-12-17 10:06:51 -05:00
Sanjay Patel	2afe864118	[DAG] Add SimplifyDemandedBits support for BSWAP This exposes a shortcoming for AArch64, and that is tracked by PR40881: https://bugs.llvm.org/show_bug.cgi?id=40881 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D58017	2019-12-15 08:52:34 -05:00
Alex Richardson	11448eeb72	[NFC] Use SelectionDAG::getMemBasePlusOffset() instead of getNode(ISD::ADD) Summary: To find potential opportunities to use getMemBasePlusOffset() I looked at all ISD::ADD uses found with the regex getNode\(ISD::ADD,.+,.+Ptr in lib/CodeGen/SelectionDAG. If this patch is accepted I will convert the files in the individual backends too. The motivation for this change is our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). We use a separate register type to store pointers (128-bit capabilities, which are effectively unforgeable and monotonic fat pointers). These capabilities permit a reduced set of operations and therefore use a separate ValueType (iFATPTR). to represent pointers implemented as capabilities. Therefore, we need to avoid using ISD::ADD for our patterns that operate on pointers and need to use a function that chooses ISD::ADD or a new ISD::PTRADD opcode depending on the value type. We originally added a new DAG.getPointerAdd() function, but after this patch series we can modify the implementation of getMemBasePlusOffset() instead. Avoiding direct uses of ISD::ADD for pointer types will significantly reduce the amount of assertion/instruction selection failures for us in future upstream merges. Reviewers: spatel Reviewed By: spatel Subscribers: merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71207	2019-12-13 21:40:03 +00:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Sanjay Patel	cdf5cfea8e	Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()" This reverts commit `d1f0bdf2d2`. The patch can cause infinite loops in DAGCombiner.	2019-12-11 16:56:58 -05:00
Sanjay Patel	d1f0bdf2d2	[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression() This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch skips the use check during the rewrite phase. So we determine that some expression isNegatibleForFree (identically to without this patch), but during the rewrite, don't rely on use counts to decide how to create the optimal expression. Differential Revision: https://reviews.llvm.org/D70975	2019-12-11 13:30:39 -05:00
Craig Topper	28b573d249	[TargetLowering] Fix another potential FPE in expandFP_TO_UINT D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105	2019-12-06 14:11:04 -08:00
Ulrich Weigand	c3d05c1b52	[SelectionDAG] Expand nnan FMINNUM/FMAXNUM to select sequence InstCombine may synthesize FMINNUM/FMAXNUM nodes from fcmp+select sequences (where the fcmp is marked nnan). Currently, if the target does not otherwise handle these nodes, they'll get expanded to libcalls to fmin/fmax. However, these functions may reside in libm, which may introduce a library dependency that was not originally present in the source code, potentially resulting in link failures. To fix this problem, add code to TargetLowering::expandFMINNUM_FMAXNUM to expand FMINNUM/FMAXNUM to a compare+select sequence instead of the libcall. This is done only if the node is marked as "nnan"; in this case, the expansion to compare+select is always correct. This also suffices to catch all cases where FMINNUM/FMAXNUM was synthesized as above. Differential Revision: https://reviews.llvm.org/D70965	2019-12-04 10:32:35 +01:00
Craig Topper	d6ec6e4bf6	[TargetLowering] Merge ExpandChainLibCall with makeLibCall I need to be able to drop an operand for STRICT_FP_ROUND handling on X86. Merging these functions gives me the ArrayRef interface that passes the return type, operands, and debugloc instead of the Node. Differential Revision: https://reviews.llvm.org/D70503	2019-11-25 10:52:49 -08:00
Roman Lebedev	96cf5c8d47	[Codegen] TargetLowering::prepareUREMEqFold(): `x u% C1 ==/!= C2` (PR35479) Summary: The current lowering is: ``` Name: (X % C1) == C2 -> X * C3 <= C4 \|\| false Pre: (C2 == 0 \|\| C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1 %zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition %o0 = urem i8 %x, C1 %r = icmp eq i8 %o0, C2 => %zz = and i8 C3, 0 ; and silence it from complaining about said reg %C4 = -1 /u C1 %n0 = mul i8 %x, C3 %n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right %n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right %n3 = or i8 %n1, %n2 ; rotate right %is_tautologically_false = icmp ule i8 C1, C2 %C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4 %res = icmp ule i8 %n3, %C4_fixed %r = xor i1 %res, %is_tautologically_false ``` https://rise4fun.com/Alive/2xC https://rise4fun.com/Alive/jpb5 However, we can support non-tautological cases `C1 u> C2` too. Said handling consists of two parts: * `C2 u<= (-1 %u C1)`. It just works. We only have to change `(X % C1) == C2` into `((X - C2) % C1) == 0` ``` Name: (X % C1) == C2 -> (X - C2) * C3 <= C4 iff C2 u<= (-1 %u C1) Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1 && C2 u<= (-1 %u C1) %zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition %o0 = urem i8 %x, C1 %r = icmp eq i8 %o0, C2 => %zz = and i8 C3, 0 ; and silence it from complaining about said reg %C4 = (-1 /u C1) %n0 = sub i8 %x, C2 %n1 = mul i8 %n0, C3 %n2 = lshr i8 %n1, countTrailingZeros(C1) ; rotate right %n3 = shl i8 %n1, ((8-countTrailingZeros(C1)) %u 8) ; rotate right %n4 = or i8 %n2, %n3 ; rotate right %is_tautologically_false = icmp ule i8 C1, C2 %C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4 %res = icmp ule i8 %n4, %C4_fixed %r = xor i1 %res, %is_tautologically_false ``` https://rise4fun.com/Alive/m4P https://rise4fun.com/Alive/SKrx * `C2 u> (-1 %u C1)`. We also have to change `(X % C1) == C2` into `((X - C2) % C1) == 0`, and we have to decrement C4: ``` Name: (X % C1) == C2 -> (X - C2) * C3 <= C4 iff C2 u> (-1 %u C1) Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1 && C2 u> (-1 %u C1) %zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition %o0 = urem i8 %x, C1 %r = icmp eq i8 %o0, C2 => %zz = and i8 C3, 0 ; and silence it from complaining about said reg %C4 = (-1 /u C1)-1 %n0 = sub i8 %x, C2 %n1 = mul i8 %n0, C3 %n2 = lshr i8 %n1, countTrailingZeros(C1) ; rotate right %n3 = shl i8 %n1, ((8-countTrailingZeros(C1)) %u 8) ; rotate right %n4 = or i8 %n2, %n3 ; rotate right %is_tautologically_false = icmp ule i8 C1, C2 %C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4 %res = icmp ule i8 %n4, %C4_fixed %r = xor i1 %res, %is_tautologically_false ``` https://rise4fun.com/Alive/d40 https://rise4fun.com/Alive/8cF I believe this concludes `x u% C1 ==/!= C2` lowering. In fact, clang is may now be better in this regard than gcc: as it can be seen from `@t32_6_4` test, we do lower `x % 6 == 4` via this pattern, while gcc does not: https://godbolt.org/z/XNU2z9 And all the general alive proofs say this is legal. And manual checking agrees: https://rise4fun.com/Alive/WA2 Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=35479 \| PR35479 ]]. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: nick, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70053	2019-11-22 15:22:42 +03:00
Roman Lebedev	3f46022e33	[Codegen] TargetLowering::prepareUREMEqFold(): `x u% C1 ==/!= C2` with tautological C1 u<= C2 (PR35479) Summary: This is a preparatory cleanup before i add more of this fold to deal with comparisons with non-zero. In essence, the current lowering is: ``` Name: (X % C1) == 0 -> X * C3 <= C4 Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1 %zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition %o0 = urem i8 %x, C1 %r = icmp eq i8 %o0, 0 => %zz = and i8 C3, 0 ; and silence it from complaining about said reg %C4 = -1 /u C1 %n0 = mul i8 %x, C3 %n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right %n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right %n3 = or i8 %n1, %n2 ; rotate right %r = icmp ule i8 %n3, %C4 ``` https://rise4fun.com/Alive/oqd It kinda just works, really no weird edge-cases. But it isn't all that great for when comparing with non-zero. In particular, given `(X % C1) == C2`, there will be problems in the always-false tautological case where `C2 u>= C1`: https://rise4fun.com/Alive/pH3 That case is tautological, always-false: ``` Name: (X % Y) u>= Y %o0 = urem i8 %x, %y %r = icmp uge i8 %o0, %y => %r = false ``` https://rise4fun.com/Alive/ofu While we can't/shouldn't get such tautological case normally, we do deal with non-splat vectors, so unless we want to give up in this case, we need to fixup/short-circuit such lanes. There are two lowering variants: 1. We can blend between whatever computed result and the correct tautological result ``` Name: (X % C1) == C2 -> X * C3 <= C4 \|\| false Pre: (C2 == 0 \|\| C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1 %zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition %o0 = urem i8 %x, C1 %r = icmp eq i8 %o0, C2 => %zz = and i8 C3, 0 ; and silence it from complaining about said reg %C4 = -1 /u C1 %n0 = mul i8 %x, C3 %n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right %n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right %n3 = or i8 %n1, %n2 ; rotate right %is_tautologically_false = icmp ule i8 C1, C2 %res = icmp ule i8 %n3, %C4 %r = select i1 %is_tautologically_false, i1 0, i1 %res ``` https://rise4fun.com/Alive/PjT5 https://rise4fun.com/Alive/1KV 2. We can invert the comparison result ``` Name: (X % C1) == C2 -> X * C3 <= C4 \|\| false Pre: (C2 == 0 \|\| C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1 %zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition %o0 = urem i8 %x, C1 %r = icmp eq i8 %o0, C2 => %zz = and i8 C3, 0 ; and silence it from complaining about said reg %C4 = -1 /u C1 %n0 = mul i8 %x, C3 %n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right %n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right %n3 = or i8 %n1, %n2 ; rotate right %is_tautologically_false = icmp ule i8 C1, C2 %C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4 %res = icmp ule i8 %n3, %C4_fixed %r = xor i1 %res, %is_tautologically_false ``` https://rise4fun.com/Alive/2xC https://rise4fun.com/Alive/jpb5 3. We can expand into `and`/`or`: https://rise4fun.com/Alive/WGn https://rise4fun.com/Alive/lcb5 Blend-one is likely better since we avoid having to load the replacement from constant pool. `xor` is second best since it's still pretty general. I'm not adding `and`/`or` variants. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: nick, hiraditya, xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70051	2019-11-22 15:16:03 +03:00
Craig Topper	dc02eb1909	[SelectionDAG] Merge the two identical ExpandChainLibCall methods from LegalizeTypes and LegalizeDAG to one version in TaretLowering. Reviewers: RKSimon, efriedma, spatel Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70354	2019-11-18 20:22:33 -08:00
joanlluch	d384ad6b63	[TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (4) Summary: Replaces ``` unsigned getShiftAmountThreshold(EVT VT) ``` by ``` bool shouldAvoidTransformToShift(EVT VT, unsigned amount) ``` thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not. Updates the MSP430 target with a custom implementation. This continues D69116, D69120, D69326 and updates them, so all of them must be committed before this. Existing tests apply, a few more have been added. Reviewers: asl, spatel Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70042	2019-11-13 09:23:08 +01:00
joanlluch	e0012c5d6a	[TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (3) Summary: Additional filtering of undesired shifts for targets that do not support them efficiently. Related with D69116 and D69120 Applies the TLI.getShiftAmountThreshold hook to prevent undesired generation of shifts for the following IR code: ``` define i16 @testShiftBits(i16 %a) { entry: %and = and i16 %a, -64 %cmp = icmp eq i16 %and, 64 %conv = zext i1 %cmp to i16 ret i16 %conv } define i16 @testShiftBits_11(i16 %a) { entry: %cmp = icmp ugt i16 %a, 63 %conv = zext i1 %cmp to i16 ret i16 %conv } define i16 @testShiftBits_12(i16 %a) { entry: %cmp = icmp ult i16 %a, 64 %conv = zext i1 %cmp to i16 ret i16 %conv } ``` The attached diff file shows the piece code in TargetLowering that is responsible for the generation of shifts in relation to the IR above. Before applying this patch, shifts will be generated to replace non-legal icmp immediates. However, shifts may be undesired if they are even more expensive for the target. For all my previous patches in this series (cited above) I added test cases for the MSP430 target. However, in this case, the target is not suitable for showing improvements related with this patch, because the MSP430 does not implement "isLegalICmpImmediate". The default implementation returns always true, therefore the patched code in TargetLowering is never reached for that target. Targets implementing both "isLegalICmpImmediate" and "getShiftAmountThreshold" will benefit from this. The differential effect of this patch can only be shown for the MSP430 by temporarily implementing "isLegalICmpImmediate" to return false for large immediates. This is simulated with the implementation of a command line flag that was incorporated in D69975 This patch belongs to a initiative to "relax" the generation of shifts by LLVM for targets requiring it Reviewers: spatel, lebedev.ri, asl Reviewed By: spatel Subscribers: lenary, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69326	2019-11-11 10:18:25 +01:00
Sanjay Patel	777d1d1d98	[SDAG] reduce code duplication; NFC	2019-11-07 10:28:45 -05:00
Sanjay Patel	2fdd58c506	[SDAG] reduce code duplication; NFC	2019-11-07 10:15:17 -05:00
Craig Topper	96bb076621	[TargetLowering] Move the setBooleanContents check on (xor (setcc), (setcc)) == / != 1 -> (setcc) != / == (setcc) to the right place We need to be checking the value types for the inner setccs not the outer setcc. We need to ensure those setccs produce a 0/1 value or that the xor is on the i1 type. I think at the time this code was originally written, getBooleanContents didn't take any arguments so this was probably correct. But now we can have a different boolean contents for integer and floating point. Not sure why the other combines below the xor were also checking the boolean contents. None of them involve any setccs other than the outer one and they only produce a new setcc. Differential Revision: https://reviews.llvm.org/D69480	2019-11-01 14:43:17 -07:00
Craig Topper	73f255b83a	[TargetLowering] Add getBooleanContents contents check to "SETCC (SETCC), [0\|1], [EQ\|NE] -> SETCC" combine. This combine is only valid if the inner setcc produces a 0/1 result or the inner type is MVT::i1. I haven't seen this cause any issues, just happened to notice it while reviewing combines in this function. While there also fix another call to use the value type from the SDValue for the operand instead of calling SDNode::getValueType(0). Though its likely the use is result 0, its not guaranteed.	2019-10-27 10:07:15 -07:00
Hans Wennborg	684ebc605e	Revert `4334892e7b` "[DAGCombine][ARM] x ==/!= c -> (x - c) ==/!= 0 iff '-c' can be folded into the x node." This broke various Windows builds, see comments on the Phabricator review. This also reverts the follow-up `20bf0cf`. > Summary: > This fold, helps recover from the rest of the D62266 ARM regressions. > https://rise4fun.com/Alive/TvpC > > Note that while the fold is quite flexible, i've restricted it > to the single interesting pattern at the moment. > > Reviewers: efriedma, craig.topper, spatel, RKSimon, deadalnix > > Reviewed By: deadalnix > > Subscribers: javed.absar, kristof.beyls, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D62450	2019-10-23 19:52:02 +02:00
Roman Lebedev	20bf0cf2f0	[TargetLowering] optimizeSetCCToComparisonWithZero(): add extra sanity checks (PR43769) We should do the fold only if both constants are plain, non-opaque constants, at least that is the DAG.FoldConstantArithmetic() requirement. And if the constant we are comparing with is zero - we shouldn't be trying to do this fold in the first place. Fixes https://bugs.llvm.org/show_bug.cgi?id=43769	2019-10-23 12:01:40 +03:00
Roman Lebedev	4334892e7b	[DAGCombine][ARM] x ==/!= c -> (x - c) ==/!= 0 iff '-c' can be folded into the x node. Summary: This fold, helps recover from the rest of the D62266 ARM regressions. https://rise4fun.com/Alive/TvpC Note that while the fold is quite flexible, i've restricted it to the single interesting pattern at the moment. Reviewers: efriedma, craig.topper, spatel, RKSimon, deadalnix Reviewed By: deadalnix Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62450	2019-10-22 22:56:35 +03:00
Sanjay Patel	a298964d22	[TargetLowering][DAGCombine][MSP430] add/use hook for Shift Amount Threshold (1/2) Provides a TLI hook to allow targets to relax the emission of shifts, thus enabling codegen improvements on targets with no multiple shift instructions and cheap selects or branches. Contributes to a Fix for PR43559: https://bugs.llvm.org/show_bug.cgi?id=43559 Patch by: @joanlluch (Joan LLuch) Differential Revision: https://reviews.llvm.org/D69116 llvm-svn: 375347	2019-10-19 16:57:02 +00:00
Simon Pilgrim	3c912c4abe	[DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). Partial reversion of rL372756 - I've identified the infinite loop issue inside the X86 override but haven't fixed it yet so I've only (re)committed the common TargetLowering refactoring part of the patch. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 373343	2019-10-01 15:32:04 +00:00
Daniel Sanders	cbe13a1461	[globalisel][knownbits] Allow targets to call GISelKnownBits::computeKnownBitsImpl() Summary: It seems we missed that the target hook can't query the known-bits for the inputs to a target instruction. Fix that oversight Reviewers: aditya_nandakumar Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67380 llvm-svn: 373264	2019-09-30 20:55:53 +00:00
Roger Ferrer Ibanez	5a2a14db0b	[TargetLowering] Simplify expansion of S{ADD,SUB}O ISD::SADDO uses the suggested sequence described in the section §2.4 of the RISCV Spec v2.2. ISD::SSUBO uses the dual approach but checking for (non-zero) positive. Differential Revision: https://reviews.llvm.org/D47927 llvm-svn: 373187	2019-09-30 07:58:50 +00:00
Ilya Biryukov	60e5e0b667	Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756	2019-09-24 13:48:02 +00:00
Craig Topper	1b7b4b467f	[SelectionDAG][Mips][Sparc] Don't allow SimplifyDemandedBits to constant fold TargetConstant nodes to a Constant. Summary: After the switch in SimplifyDemandedBits, it tries to create a constant when possible. If the original node is a TargetConstant the default in the switch will call computeKnownBits on the TargetConstant which will succeed. This results in the TargetConstant becoming a Constant. But TargetConstant exists to avoid being changed. I've fixed the two cases that relied on this in tree by explicitly making the nodes constant instead of target constant. The Sparc case is an old bug. The Mips case was recently introduced now that ImmArg on intrinsics gets turned into a TargetConstant when the SelectionDAG is created. I've removed the ImmArg since it lowers to generic code. Reviewers: arsenm, RKSimon, spatel Subscribers: jyknight, sdardis, wdng, arichardson, hiraditya, fedor.sergeev, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67802 llvm-svn: 372409	2019-09-20 16:49:51 +00:00
Simon Pilgrim	af6043557d	[DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 372333	2019-09-19 15:02:47 +00:00
Simon Pilgrim	c65dd89804	[DAG] Add SelectionDAG::MaxRecursionDepth constant As commented on D67557 we have a lot of uses of depth checks all using magic numbers. This patch adds the SelectionDAG::MaxRecursionDepth constant and moves over some general cases to use this explicitly. Differential Revision: https://reviews.llvm.org/D67711 llvm-svn: 372315	2019-09-19 12:58:43 +00:00
Craig Topper	b5ffbd0b14	[SimplifyDemandedBits] Use APInt::intersects to instead of ANDing and comparing to 0 separately. NFC llvm-svn: 372158	2019-09-17 18:19:02 +00:00
Simon Pilgrim	b743e94cdc	[TargetLowering] SimplifyDemandedBits - add EXTRACT_SUBVECTOR support. Call SimplifyDemandedBits on the source vector. llvm-svn: 371923	2019-09-14 16:38:26 +00:00
Philip Reames	079e210463	[SDAG] Update generic code to conservatively check for isAtomic in addition to isVolatile This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later. See D66309 for context. Differential Revision: https://reviews.llvm.org/D66318 llvm-svn: 371786	2019-09-12 22:49:17 +00:00
Tim Northover	36147adc0b	GlobalISel: add combiner to form indexed loads. Loosely based on DAGCombiner version, but this part is slightly simpler in GlobalIsel because all address calculation is performed by G_GEP. That makes the inc/dec distinction moot so there's just pre/post to think about. No targets can handle it yet so testing is via a special flag that overrides target hooks. llvm-svn: 371384	2019-09-09 10:04:23 +00:00
Bjorn Pettersson	d065c81164	[CodeGen] Handle SMULFIXSAT with scale zero in TargetLowering::expandFixedPointMul Summary: Normally TargetLowering::expandFixedPointMul would handle SMULFIXSAT with scale zero by using an SMULO to compute the product and determine if saturation is needed (if overflow happened). But if SMULO isn't custom/legal it falls through and uses the same technique, using MULHS/SMUL_LOHI, as used for non-zero scales. Problem was that when checking for overflow (handling saturation) when not using MULO we did not expect to find a zero scale. So we ended up in an assertion when doing APInt::getLowBitsSet(VTSize, Scale - 1) This patch fixes the problem by adding a new special case for how saturation is computed when scale is zero. Reviewers: RKSimon, bevinh, leonardchan, spatel Reviewed By: RKSimon Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67071 llvm-svn: 371309	2019-09-07 12:16:23 +00:00
Bjorn Pettersson	5e331e4ce8	[Intrinsic] Add the llvm.umul.fix.sat intrinsic Summary: Add an intrinsic that takes 2 unsigned integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Patch by: leonardchan, bjope Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel Reviewed By: leonardchan Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57836 llvm-svn: 371308	2019-09-07 12:16:14 +00:00
Shiva Chen	adfdcb9c26	[TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken with constant inputs Summary: This fixes the bugzilla id 43183 which triggerd by the following commit: [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall llvm-svn: 370604	2019-09-01 04:52:54 +00:00
Simon Pilgrim	2d1e0899e9	[TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit SDNodeFlags from the original node. Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case. Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests llvm-svn: 370497	2019-08-30 17:58:55 +00:00
Shiva Chen	b39876d8cd	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275	2019-08-28 23:40:37 +00:00
Kevin P. Neal	ddf13c00ed	[FPEnv] Add fptosi and fptoui constrained intrinsics. This implements constrained floating point intrinsics for FP to signed and unsigned integers. Quoting from D32319: The purpose of the constrained intrinsics is to force the optimizer to respect the restrictions that will be necessary to support things like the STDC FENV_ACCESS ON pragma without interfering with optimizations when these restrictions are not needed. Reviewed by: Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton Approved by: Craig Topper Differential Revision: http://reviews.llvm.org/D63782 llvm-svn: 370228	2019-08-28 16:33:36 +00:00
Amaury Sechet	4f4387dd12	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190	2019-08-28 12:00:06 +00:00
Craig Topper	e7211bb567	[SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression Patch showing the effect of enabling bool vector oversimplification. Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify: insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i Preventing the removal of the AND to clear the upper bits of result Differential Revision: https://reviews.llvm.org/D53022 llvm-svn: 369780	2019-08-23 17:14:58 +00:00
Shiva Chen	72a41e7b0d	[TargetLowering] Remove optional arguments passing to makeLibCall The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622	2019-08-22 04:59:43 +00:00
Roman Lebedev	edfaee0811	[TargetLowering] x s% C == 0 fold: vector divisor with INT_MIN handling Summary: The general fold is only valid for positive divisors. Which effectively means, it is invalid for `INT_MIN` divisors, and we currently bailout if we see them. But that is too strict, we can just fix-up the results. For that, let's do a second computation 'in parallel': ``` Name: srem -> and Pre: isPowerOf2(C) %o = srem i8 %X, C %r = icmp eq %o, 0 => %n = and i8 %X, C-1 %r = icmp eq %n, 0 ``` https://rise4fun.com/Alive/Sup And then just blend results: if the divisor was `INT_MIN`, pick the value we got via bit-test, else pick the value from general fold. There's interesting observation - `ISD::ROTR` is set to `LegalizeAction::Expand` before AVX512, so we should not treat `INT_MIN` divisor as even; and as it can be seen while `@test_srem_odd_even_one` improves on all run-lines, `@test_srem_odd_even_INT_MIN` only improves for AVX512. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66300 llvm-svn: 369268	2019-08-19 15:01:42 +00:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00
Simon Pilgrim	983e9118a2	Remove BitVector.h include. NFCI. BitVector type isn't used at all in the cpp file. llvm-svn: 369007	2019-08-15 14:39:28 +00:00
Roman Lebedev	676594305a	[CodeGen][SelectionDAG] More efficient code for X % C == 0 (SREM case) Summary: This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. One huge caveat: this signed case is only valid for positive divisors. While we can freely negate negative divisors, we can't negate `INT_MIN`, so for now if `INT_MIN` is encountered, we bailout. As a follow-up, it should be possible to handle that more gracefully via extra `and`+`setcc`+`select`. This passes llvm's test-suite, and from cursory(!) cross-examination the folds (the assembly) match those of GCC, and manual checking via alive did not reveal any issues (other than the `INT_MIN` case) Reviewers: RKSimon, spatel, hermord, craig.topper, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: xbolva00, thakis, javed.absar, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65366 llvm-svn: 368702	2019-08-13 14:57:37 +00:00
Roman Lebedev	f4de7eda4a	[TargetLowering][NFC] prepareUREMEqFold(): fixup comment The comment initially matched the code, but the code was incorrect and was fixed after the initial revert back back when it was introduced, but the comment was never updated. llvm-svn: 368701	2019-08-13 14:57:08 +00:00
Hans Wennborg	5390d25f2b	Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT" This introduced a false positive MemorySanitizer warning about use of uninitialized memory in a vectorized crc function in Chromium. That suggests maybe something is not right with this transformation. See https://crbug.com/992853#c7 for a reproducer. This also reverts the follow-up commits r368307 and r368308 which depended on this. > This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. > > In particular this helps remove some unnecessary scalar->vector->scalar patterns. > > The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. > > Differential Revision: https://reviews.llvm.org/D65887 llvm-svn: 368660	2019-08-13 09:33:25 +00:00
Simon Pilgrim	05e8209e33	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::TRUNCATE llvm-svn: 368553	2019-08-12 10:56:05 +00:00
Simon Pilgrim	e2e366797e	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Differential Revision: https://reviews.llvm.org/D65887 llvm-svn: 368276	2019-08-08 10:37:03 +00:00
Simon Pilgrim	0eafe011ca	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::VECTOR_SHUFFLE In particular this helps the SSE vector shift cvttps2dq+add+shl pattern by avoiding the need for zeros in shuffle style extensions to vXi32 types as we'll be shifting out those bits anyway llvm-svn: 368155	2019-08-07 11:43:13 +00:00
Aditya Nandakumar	c8ac029d0a	[GISel]: Add GISelKnownBits analysis https://reviews.llvm.org/D65698 This adds a KnownBits analysis pass for GISel. This was done as a pass (compared to static functions) so that we can add other features such as caching queries(within a pass and across passes) in the future. This patch only adds the basic pass boiler plate, and implements a lazy non caching knownbits implementation (ported from SelectionDAG). I've also hooked up the AArch64PreLegalizerCombiner pass to use this - there should be no compile time regression as the analysis is lazy. llvm-svn: 368065	2019-08-06 17:18:29 +00:00
Simon Pilgrim	dae5ddad9d	[TargetLowering] SimplifyMultipleUseDemandedBits - return UNDEF for undemanded ops If we demand no bits/elts from an Op, just return UNDEF llvm-svn: 368043	2019-08-06 14:30:42 +00:00
Craig Topper	5a4989e2ac	[TargetLowering][X86] Teach SimplifyDemandedVectorElts to replace the base vector of INSERT_SUBVECTOR with undef if none of the elements are demanded even if the node has other users. Summary: The SimplifyDemandedVectorElts function can replace with undef when no elements are demanded, but due to how it interacts with TargetLoweringOpts, it can only do this when the node has no other users. Remove a now unneeded DAG combine from the X86 backend. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65713 llvm-svn: 367788	2019-08-04 17:30:41 +00:00
Bill Wendling	41a2847a9a	Emit diagnostic if an inline asm constraint requires an immediate Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 llvm-svn: 367750	2019-08-03 05:52:47 +00:00
Simon Pilgrim	794f7591ec	[TargetLowering] SimplifyMultipleUseDemandedBits - don't assume INSERT_VECTOR_ELT value type is simple. Noticed by inspection - this was copied from the X86 target equivalent where we can assume its legal/simple. llvm-svn: 367721	2019-08-02 21:07:07 +00:00
Simon Pilgrim	1d183b407a	[TargetLowering] SimplifyMultipleUseDemandedBits - Add ISD::INSERT_VECTOR_ELT handling Allow us to peek through vector insertions to avoid dependencies on entire insertion chains. llvm-svn: 367588	2019-08-01 17:46:44 +00:00
Simon Pilgrim	603f94aa2a	[TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support (Reapplied) This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back. This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value. Fixes PR42777 llvm-svn: 367174	2019-07-27 14:11:59 +00:00
Simon Pilgrim	8a52671782	[SelectionDAG] Check for any recursion depth greater than or equal to limit instead of just equal the limit. If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit. This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it. This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches. llvm-svn: 367171	2019-07-27 12:48:46 +00:00
Simon Pilgrim	3ff6126487	[TargetLowering] Add depth limit to SimplifyMultipleUseDemandedBits We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit. llvm-svn: 367169	2019-07-27 12:23:36 +00:00
Nico Weber	13f337c4cb	Revert r367091, it caused PR42777. llvm-svn: 367118	2019-07-26 14:58:42 +00:00
Simon Pilgrim	9758407bf1	[TargetLowering] SimplifyMultipleUseDemandedBits - add SIGN_EXTEND_INREG support. llvm-svn: 367096	2019-07-26 09:41:08 +00:00
Simon Pilgrim	b32ceb79b0	[TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support. This allows us to peek through BITCASTs and attempt simplify the source operand, and then bitcast back. llvm-svn: 367091	2019-07-26 08:38:39 +00:00
Roman Lebedev	017e272c3a	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955	2019-07-24 22:57:22 +00:00
Sanjay Patel	10dad95a75	[SDAG] convert (sub x, 1) to (add x, -1) in ctpop expansion; NFC We canonicalize to the add form, so create that directly for efficiency. llvm-svn: 366914	2019-07-24 15:43:50 +00:00
Simon Pilgrim	0e8359aec1	[TargetLowering] SimplifyMultipleUseDemandedBits - add VECTOR_SHUFFLE support. If all the demanded elts are from one operand and are inline, then we can use the operand directly. The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements. llvm-svn: 366817	2019-07-23 15:35:55 +00:00
Simon Pilgrim	743d45ee25	[TargetLowering] Add SimplifyMultipleUseDemandedBits This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured. The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use. We do see a couple of regressions that need to be addressed: AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better). X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG. The code owners have confirmed its ok for these cases to fixed up in future patches. Differential Revision: https://reviews.llvm.org/D63281 llvm-svn: 366799	2019-07-23 12:39:08 +00:00
Roman Lebedev	cd9b19484b	[Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements Summary: Four things here: 1. Generalize the fold to handle non-splat divisors. Reasonably trivial. 2. Unban power-of-two divisors. I don't see any reason why they should be illegal. * There is no ban in Hacker's Delight * I think the ban came from the same bug that caused the miscompile in the base patch - in `floor((2^W - 1) / D)` we were dividing by `D0` instead of `D`, and we were ensuring that `D0` is not `1`, which made sense. 3. Unban `1` divisors. I no longer believe Hacker's Delight actually says that the fold is invalid for `D = 0`. Further considerations: * We know that * `(X u% 1) == 0` can be constant-folded to `1`, * `(X u% 1) != 0` can be constant-folded to `0`, * Also, we know that * `X u<= -1` can be constant-folded to `1`, * `X u> -1` can be constant-folded to `0`, * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p * We know will end up with the following: `(setule/setugt (rotr (mul N, P), K), Q)` * Therefore, for given new DAG nodes and comparison predicates (`ule`/`ugt`), we will still produce the correct answer if: `Q` is a all-ones constant; and both `P` and `K` are anything other than `undef`. * The fold will indeed produce `Q = all-ones`. 4. Try to re-splat the `P` and `K` vectors - we don't care about their values for the lanes where divisor was `1`. Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00 Reviewed By: RKSimon Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63963 llvm-svn: 366637	2019-07-20 16:33:15 +00:00
Sanjay Patel	138328e45c	[SDAG] commute setcc operands to match a subtract If we have: R = sub X, Y P = cmp Y, X ...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags. Motivated by diffs in D58875 - not sure if this changes anything there, but this seems like a good thing independent of that. There's a more involved version of this transform already in IR (in instcombine although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()". Differential Revision: https://reviews.llvm.org/D63958 llvm-svn: 365711	2019-07-10 23:23:54 +00:00
Nick Desaulniers	8728e45706	[TargetLowering] support BlockAddress as "i" inline asm constraint Summary: This allows passing address of labels to inline assembly "i" input constraints. Fixes pr/42502. Reviewers: ostannard Reviewed By: ostannard Subscribers: void, echristo, nathanchance, ostannard, javed.absar, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D64167 llvm-svn: 365664	2019-07-10 17:08:25 +00:00
Simon Pilgrim	9285bf0fb9	[TargetLowering] SimplifyDemandedBits - just call computeKnownBits for BUILD_VECTOR cases. Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well). A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts. llvm-svn: 365309	2019-07-08 11:00:39 +00:00
Roman Lebedev	7c8ee375d8	[NFC][TargetLowering] Some preparatory cleanups around 'prepareUREMEqFold()' from D63963 llvm-svn: 364921	2019-07-02 13:21:23 +00:00
Benjamin Kramer	ed13fef477	[SelectionDAG] Do minnum->minimum at legalization time instead of building time The SDAGBuilder behavior stems from the days when we didn't have fast math flags available in SDAG. We do now and doing the transformation in the legalizer has the advantage that it also works for vector types. llvm-svn: 364743	2019-07-01 11:00:23 +00:00
Roman Lebedev	29d05c005f	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3) Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. This is a recommit, the original commit rL364563 was reverted in rL364568 because test-suite detected miscompile - the new comparison constant 'Q' was being computed incorrectly (we divided by `D0` instead of `D`). Original patch D50222 by @hermord (Dmytro Shynkevych) Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 llvm-svn: 364600	2019-06-27 21:52:10 +00:00
Roman Lebedev	0a2b7b79fa	Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)" Appears to break test-suite on http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790 FAIL: burg.execution_time FAIL: spiff.execution_time FAIL: employ.execution_time FAIL: llu.execution_time FAIL: gramschmidt.execution_time FAIL: fdtd-apml.execution_time This reverts commit r364563. llvm-svn: 364568	2019-06-27 17:22:31 +00:00
Roman Lebedev	0627b09863	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2) Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... Original patch D50222 by @hermord (Dmytro Shynkevych) This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. Original patch author: @hermord (Dmytro Shynkevych)! Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 llvm-svn: 364563	2019-06-27 16:45:42 +00:00
Simon Pilgrim	83e1a1e79b	[TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support. llvm-svn: 364548	2019-06-27 14:25:54 +00:00
Simon Pilgrim	c692a8dc51	[TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts llvm-svn: 364541	2019-06-27 13:48:43 +00:00
Sanjay Patel	685c5cbc65	[SDAG] expand ctpop != 1 Change the generic ctpop expansion to more efficiently handle a check for not-a-power-of-two value: (ctpop x) != 1 --> (x == 0) \|\| ((x & x-1) != 0) This is the inverted predicate sibling pattern that was added with: D63004 This should have been done before I changed IR canonicalization to favor this form with: rL364246 ...so if this requires revert/changing, the earlier commit may also need to modified. llvm-svn: 364319	2019-06-25 14:46:52 +00:00
Simon Pilgrim	1a18bb6f25	[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases. Reapplies rL363856. llvm-svn: 364311	2019-06-25 13:25:57 +00:00
Simon Pilgrim	36953ce769	[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. Reapplies rL363850 but now with legality checks added at rL364290 llvm-svn: 364303	2019-06-25 12:57:43 +00:00
Sanjay Patel	e4ef62291b	[SDAG] improve expansion of ctpop+setcc This should not cause any visible change in output, but it's more efficient because we were producing non-canonical 'sub x, 1' and 'setcc ugt x, 0'. As mentioned in the TODO, we should also be handling the inverse predicate. llvm-svn: 364302	2019-06-25 12:49:35 +00:00
Simon Pilgrim	69fc111184	[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. Reapplies rL363802 but now with legality checks added at rL364290 llvm-svn: 364299	2019-06-25 12:19:12 +00:00
Simon Pilgrim	49b3778e32	[TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ZERO/ANY_EXTEND As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations \|\| isOperationLegal cases. We'll improve X86 legality in future commits. llvm-svn: 364290	2019-06-25 10:51:15 +00:00
Roman Lebedev	cdd43eac4f	[Codegen] TargetLowering::SimplifySetCC(): omit urem when possible Summary: This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll` The missing fold, at least partially, looks trivial: https://rise4fun.com/Alive/Zsln i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two, and the `urem` is of something that may at most have a single bit set (or no bits set at all), the `urem` is not needed. Reviewers: RKSimon, craig.topper, xbolva00, spatel Reviewed By: xbolva00, spatel Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63390 llvm-svn: 364286	2019-06-25 10:01:42 +00:00
Craig Topper	079924b0b7	Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..." This reverts the following patches. "[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support" We can end up with an any_extend_vector_inreg with a 256 bit result type and a 128 bit result type. This is allowed by the ISD opcode, but the generic operation legalizer is only able to expand cases where the total vector width is the same. The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg. The SimplifyDemandedBits changes are allowing those nodes to become aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom handling and never lets them get to the generic legalizer. We need to do the same for aext_vec_inreg. llvm-svn: 364264	2019-06-25 01:32:42 +00:00
Simon Pilgrim	f05369768c	[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support Move 'lowest' demanded elt -> bitcast fold out of ZERO_EXTEND_VECTOR_INREG into ANY_EXTEND_VECTOR_INREG case. llvm-svn: 363856	2019-06-19 18:34:58 +00:00
Simon Pilgrim	6016fb726c	[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. llvm-svn: 363850	2019-06-19 18:00:24 +00:00
Simon Pilgrim	c3994f77cb	[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. llvm-svn: 363802	2019-06-19 13:58:02 +00:00
Simon Pilgrim	5bef886cd8	[TargetLowering] SimplifyDemandedBits - Cleanup ANY_EXTEND handling Match SIGN_EXTEND + ZERO_EXTEND handling - will be adding ANY_EXTEND_VECTOR_INREG support in a future patch. llvm-svn: 363716	2019-06-18 18:22:30 +00:00
Simon Pilgrim	032b54f8e8	[TargetLowering] SimplifyDemandedBits - Merge ZERO_EXTEND+ZERO_EXTEND_VECTOR_INREG handling Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches. llvm-svn: 363713	2019-06-18 18:08:30 +00:00
Simon Pilgrim	b6e7108dcd	[TargetLowering] SimplifyDemandedBits - Merge SIGN_EXTEND+SIGN_EXTEND_VECTOR_INREG handling Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches. llvm-svn: 363710	2019-06-18 17:57:53 +00:00
Simon Pilgrim	9aa25be149	[TargetLowering] SimplifyDemandedVectorElts - support MUL and ANY_EXTEND_VECTOR_INREG Also fold ANY_EXTEND_VECTOR_INREG -> BITCAST if we only need the bottom element. Fixes temporary regression introduced in rL363693. llvm-svn: 363694	2019-06-18 15:49:35 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
David Bolvansky	dcf5e6abdf	[TargetLowering] Simplify (ctpop x) == 1 Reviewers: craig.topper, spatel, RKSimon, bkramer Reviewed By: spatel Subscribers: javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63004 llvm-svn: 362912	2019-06-09 18:18:57 +00:00
Tim Northover	607c8a9d14	IR: make getParamByValType Just Work. NFC. Most parts of LLVM don't care whether the byval type is derived from an explicit Attribute or from the parameter's pointee type, so it makes sense for the main access function to just return the right value. The very few users who do care (only BitcodeReader so far) can find out how it's specified by accessing the Attribute directly. llvm-svn: 362642	2019-06-05 20:37:47 +00:00
Simon Pilgrim	5a81af547c	[TargetLowering] SimplifyDemandedBits - pull out shift value type. NFCI. Will be used more in an upcoming patch. llvm-svn: 362595	2019-06-05 10:59:04 +00:00
Simon Pilgrim	88522ce388	[TargetLowering] SimplifyDemandedBits - don't use OriginalDemanded variables in analysis. These might have been replaced in multiple use cases. llvm-svn: 362322	2019-06-02 10:12:55 +00:00
Simon Pilgrim	30a6caa3e7	[TargetLowering] SimplifyDemandedVectorElts - use same arg names as SimplifyDemandedBits. NFCI. Helps with debugging as we recurse between them. llvm-svn: 362321	2019-06-02 10:03:56 +00:00
Tim Northover	b7141207a4	Reapply: IR: add optional type to 'byval' function parameters When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. The original commit did not remap byval types when linking modules, which broke LTO. This version fixes that. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362128	2019-05-30 18:48:23 +00:00
Tim Northover	71ee3d0237	Revert "IR: add optional type to 'byval' function parameters" The IRLinker doesn't delve into the new byval attribute when mapping types, and this breaks LTO. llvm-svn: 362029	2019-05-29 20:46:38 +00:00
Tim Northover	6e07f16fae	IR: add optional type to 'byval' function parameters When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362012	2019-05-29 19:12:48 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Kees Cook	c2187c20a4	[TargetLowering] Extend bool args to inline-asm according to getBooleanType Summary: This extends Krzysztof Parzyszek's X86-specific solution (https://reviews.llvm.org/D60208) to the generic code pointed out by James Y Knight. Reviewers: kparzysz, craig.topper, nickdesaulniers Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight Tags: #llvm Differential Revision: https://reviews.llvm.org/D60224 llvm-svn: 361404	2019-05-22 16:16:15 +00:00
Kees Cook	a7a687e500	[TargetLowering] Add blank line (test commit) llvm-svn: 361403	2019-05-22 16:02:13 +00:00
Leonard Chan	0bada7ce6c	[Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D55720 llvm-svn: 361289	2019-05-21 19:17:19 +00:00
Dylan McKay	e967308da4	Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess Summary: The endianess used in the calling convention does not always match the endianess of the target on all architectures, namely AVR. When an argument is too large to be legalised by the architecture and is split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian is queried to find the endianess that function arguments must be laid out in. This approach was recommended by Eli Friedman. Originally reported in https://github.com/avr-rust/rust/issues/129. Patch by Carl Peto. Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma Reviewed By: efriedma Subscribers: JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62003 llvm-svn: 361222	2019-05-21 06:38:02 +00:00
Nikita Popov	9060b6df97	[SDAG] Vector op legalization for overflow ops Fixes issue reported by aemerson on D57348. Vector op legalization support is added for uaddo, usubo, saddo and ssubo (umulo and smulo were already supported). As usual, by extracting TargetLowering methods and calling them from vector op legalization. Vector op legalization doesn't really deal with multiple result nodes, so I'm explicitly performing a recursive legalization call on the result value that is not being legalized. There are some existing test changes because expansion happens earlier, so we don't get a DAG combiner run in between anymore. Differential Revision: https://reviews.llvm.org/D61692 llvm-svn: 361166	2019-05-20 16:09:22 +00:00
Sanjay Patel	99d6420a82	[SDAG] fix unused variable warning and unneeded indirection; NFC llvm-svn: 360640	2019-05-14 00:57:31 +00:00
Sanjay Patel	3a13d970aa	[SDAG, x86] allow targets to override test for binop opcodes This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639	2019-05-14 00:39:40 +00:00
Nick Desaulniers	c33f754e74	[TargetLowering] Handle multi depth GEPs w/ inline asm constraints Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604	2019-05-13 17:27:44 +00:00
Simon Pilgrim	d3cedee3c6	[TargetLowering] Add SimplifyDemandedBits support for ZERO_EXTEND_VECTOR_INREG More work for PR39709. llvm-svn: 360592	2019-05-13 15:51:26 +00:00
Simon Pilgrim	d845bc3d0c	TargetLowering::SimplifyDemandedBits - early-out for UNDEF ops. NFCI. llvm-svn: 360579	2019-05-13 12:44:03 +00:00
Craig Topper	61e556d2bd	Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" I've included a new fix in X86RegisterInfo to prevent PR41619 without reintroducing r359392. We might be able to improve that in the base class implementation of shouldRewriteCopySrc somehow. But this hopefully enables forward progress on SimplifyDemandedBits improvements for now. Original commit message: This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. llvm-svn: 360552	2019-05-13 04:03:35 +00:00
Simon Pilgrim	605a840747	[DAG] Add SimplifyDemandedBits support for BITREVERSE Pulled out of D58017 while I continue to investigate the BSWAP regression on PPC llvm-svn: 360534	2019-05-11 20:56:05 +00:00
Craig Topper	55a71b575c	Revert r359392 and r358887 Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066	2019-05-06 19:29:24 +00:00
Simon Pilgrim	308b5ec1ff	[TargetLowering] SimplifySetCC - remove repeated variable. NFCI. Also reduce scope of Temp variable. llvm-svn: 359911	2019-05-03 18:02:33 +00:00
Simon Pilgrim	bc876df3a5	[TargetLowering] ShrinkDemandedConstant - reduce scope of TLO.DAG variable. NFCI. Only ever used in one block llvm-svn: 359890	2019-05-03 14:38:24 +00:00
Simon Pilgrim	e798e3a346	[TargetLowering] expandUnalignedStore - cleanup EVT variables. NFCI. Avoid duplicated EVTs and rename Store/Load VTs to avoid -Wshadow warnings. llvm-svn: 359877	2019-05-03 12:55:25 +00:00
Sjoerd Meijer	0ed4619679	[TargetLowering] findOptimalMemOpLowering. NFCI. This was a local static funtion in SelectionDAG, which I've promoted to TargetLowering so that I can reuse it to estimate the cost of a memory operation in D59787. Differential Revision: https://reviews.llvm.org/D59766 llvm-svn: 359543	2019-04-30 10:09:15 +00:00
Simon Pilgrim	6276ce0142	[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887	2019-04-22 14:04:35 +00:00
Craig Topper	3b1239d2a8	[TargetLowering][X86] Teach SimplifyDemandedBits to use ShrinkDemandedOp on ISD::SHL nodes. If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding. Differential Revision: https://reviews.llvm.org/D60358 llvm-svn: 358257	2019-04-12 06:49:28 +00:00
Simon Pilgrim	d7cc0ec581	[TargetLowering] SimplifyDemandedBits - add ISD::INSERT_SUBVECTOR support llvm-svn: 358019	2019-04-09 16:52:21 +00:00
Simon Pilgrim	55f79ef9fe	[TargetLowering] SimplifyDemandedBits - Remove GetDemandedSrcMask lambda. NFCI. An older version of this could return false but now that this always succeeds we can just inline and simplify it. llvm-svn: 357999	2019-04-09 12:29:26 +00:00
Simon Pilgrim	345eacd555	[TargetLowering] SimplifyDemandedBits - call SimplifyDemandedBits in bitcast handling When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op. llvm-svn: 357992	2019-04-09 10:27:59 +00:00
Simon Pilgrim	9f74df7d5b	[TargetLowering] SimplifyDemandedBits - use DemandedElts in bitcast handling Be more selective in the SimplifyDemandedBits -> SimplifyDemandedVectorElts bitcast call based on the demanded elts. llvm-svn: 357942	2019-04-08 20:59:38 +00:00
Simon Pilgrim	561ba38623	[DAG] Pull out ComputeNumSignBits call to make debugging easier. NFCI. llvm-svn: 357861	2019-04-07 11:49:33 +00:00
Simon Pilgrim	e24441aab0	[TargetLowering] Add SimplifyDemandedBits support for ISD::INSERT_VECTOR_ELT This helps us relax the extension of a lot of scalar elements before they are inserted into a vector. Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions. Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes. Differential Revision: https://reviews.llvm.org/D59484 llvm-svn: 356989	2019-03-26 12:32:01 +00:00
Simon Pilgrim	94e8f152c1	[TargetLowering] SimplifyDemandedBits trunc(srl(x, C1)) - early out for out of range C1. NFCI. llvm-svn: 356810	2019-03-22 20:53:49 +00:00
Sanjay Patel	6a6e808b69	[TargetLowering] improve the default expansion of uaddsat/usubsat This is a subset of what was proposed in: D59006 ...and may overlap with test changes from: D59174 ...but it seems like a good general optimization to turn selects into bitwise-logic when possible because we never know exactly what can happen at this stage of DAG combining depending on how the target has defined things. Differential Revision: https://reviews.llvm.org/D59066 llvm-svn: 356332	2019-03-17 14:57:40 +00:00
Simon Pilgrim	8fbe439345	[SelectionDAG] Add SimplifyDemandedBits handling for ISD::SCALAR_TO_VECTOR Fixes a lot of constant folding mismatches between i686 and x86_64 llvm-svn: 356273	2019-03-15 17:00:55 +00:00
Simon Pilgrim	360ce82db2	[DAG] Move integer setcc %x, %x folding into FoldSetCC First step towards PR40800 - I intend to move the float case in a separate future patch. I had to tweak the (overly reduced) thumb2 test and the x86 widening test change is annoying (no longer rematerializable) but we should address this separately. Differential Revision: https://reviews.llvm.org/D59244 llvm-svn: 356040	2019-03-13 11:08:57 +00:00
Nikita Popov	149bc099f6	[SDAG] Expand pow2 mulo using shifts Expand MULO with constant power of two operand into a shift. The overflow is checked with (x << shift) >> shift == x, where the right shift will be logical for umulo and arithmetic for smulo (with exception for multiplications by signed_min). Differential Revision: https://reviews.llvm.org/D59041 llvm-svn: 355937	2019-03-12 16:57:25 +00:00

... 3 4 5 6 7 ...

1231 Commits