llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	bf5748a1af	[x86] fold vector (X > -1) & Y to shift+andn and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y This avoids the -1 constant vector in favor of an arithmetic shift instruction if it exists (the ISA is still not complete after all these years...). We catch this pattern late in combining by matching PCMPGT, so it should not interfere with more general folds. Differential Revision: https://reviews.llvm.org/D113603	2021-11-12 08:17:46 -05:00
Florian Hahn	819bca9b90	[SCEV] Use APIntOps::umin to select best max BC count (NFC). Suggested in D102267, but I missed this in the committed version.	2021-11-12 12:20:01 +00:00
Neubauer, Sebastian	d1f45ed58f	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672	2021-11-12 11:37:21 +01:00
Simon Moll	751aa6c280	[VE][NFCi] Remove unused tablegen parameters TableGen has started warning about unused template parameters in the isel patterns. Remove those. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D113675	2021-11-12 08:19:50 +01:00
Markus Lavin	4e94e25c90	Fix minor deficiency in machine-sink. Register uses that are MRI->isConstantPhysReg() should not inhibit sinking transformation. Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D111531	2021-11-12 08:01:13 +01:00
Kazu Hirata	2ca45adf24	[CodeGen, Target] Use MachineRegisterInfo::use_operands (NFC)	2021-11-11 22:28:55 -08:00
Serge Pavlov	3057e850b8	[X86] Preserve FPSW when popping x87 stack When compiler converts x87 operations to stack model, it may insert instructions that pop top stack element. To do it the compiler inserts instruction FSTP right after the instruction that calculates value on the stack. It can break the code that uses FPSW set by the last instruction. For example, an instruction FXAM is usually followed by FNSTSW, but FSTP is inserted after FXAM. As FSTP leaves condition code in FPSW undefined, the compiler produces incorrect code. With this change FSTP in inserted after the FPSW consumer if the last instruction sets FPSW. Differential Revision: https://reviews.llvm.org/D113335	2021-11-12 12:00:09 +07:00
Luís Ferreira	665b4138d9	[DebugInfo] run clang-format on some unformatted files This trivial patch runs clang-format on some unformatted files before doing logic changes and prevent hard to review diffs. Differential Revision: https://reviews.llvm.org/D113572	2021-11-11 18:59:41 -08:00
Phoebe Wang	74b979abcd	[X86][FP16] Avoid to generate VZEXT_MOVL with i16 This fixes the crash due to lacking VZEXT_MOVL support with i16. Reviewed By: LuoYuanke, RKSimon Differential Revision: https://reviews.llvm.org/D113661	2021-11-12 09:32:29 +08:00
Nikita Popov	986416251b	[InstCombine] Drop redundant fold for and/or of icmp eq/ne (NFCI) This handles a special case of foldAndOrOfICmpsUsingRanges() with two equality predicates.	2021-11-11 20:25:40 +01:00
Min-Yih Hsu	99152a4164	[M68k][NFC] Rename 'GlSel' -> 'GISel' AArch64 as well as other targets use the abbrev "GISel" so we'd better to be consistent with them. NFC.	2021-11-11 11:01:09 -08:00
Simon Pilgrim	94a901a50a	[X86] Move LowerFunnelShift below LowerShift. NFC. Makes it easier to reuse the various vector shift helpers defined above LowerShift	2021-11-11 18:45:51 +00:00
Simon Pilgrim	010b09b0c5	[DAG] reassociateOpsCommutative - test getNode result directly. NFC Matches the clean code style we use directly above	2021-11-11 18:45:50 +00:00
Mircea Trofin	f64eee1625	[NFC][InlineAdvisor] Inform advisor when the module is invalidated This avoids unnecessary re-calculation of module-wide features in the MLInlineAdvisor. In cases where function passes don't invalidate functions (and, thus, don't invalidate the module), but we re-process a CGSCC, we currently refreshed module features unnecessarily. The overhead of fetching cached results (albeit they weren't themselves invalidated) was noticeable in certain modules' compilations. We don't want to just invalidate the advisor object, though, via the analysis manager, because we'd then need to re-create expensive state (like the model evaluator in the ML 'development' mode). Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D113644	2021-11-11 10:23:49 -08:00
Nikita Popov	84e273cced	[InstCombine] Handle undefs in and of icmp eq zero fold For the scalar/splat case, this fold is subsumed by foldLogOpOfMaskedICmps(). However, the conjugated fold for "or" also supports splats with undef. Make both code paths consistent by using m_ZeroInt() for the "and" implementation as well. https://alive2.llvm.org/ce/z/tN63cu https://alive2.llvm.org/ce/z/ufB_Ue	2021-11-11 19:07:07 +01:00
Jordan Rupprecht	da4822f6c8	[PowerPC][NFC] Ignore unused var in release builds. Note we can't inline this call into assert because `isIntS16Immediate` has a side effect. But we only look at the return value in asserts builds.	2021-11-11 08:57:40 -08:00
Craig Topper	ee7a006ce4	[RISCV] Promote f16 ceil/floor/round/roundeven/nearbyint/rint/trunc intrinsics to f32 libcalls. Previously these would crash. I don't think these can be generated directly from C. Not sure if any optimizations can introduce them. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D113527	2021-11-11 08:28:41 -08:00
Victor Huang	18fe0a0d9e	[PowerPC] PPC backend optimization to lower int_ppc_tdw/int_ppc_tw intrinsics to TDI/TWI machine instructions This patch adds the backend optimization to match XL behavior for the two builtins __tdw and __tw that when the second input argument is an immediate, emitting tdi/twi instructions instead of td/tw. Reviewed By: nemanjai, amyk, PowerPC Differential revision: https://reviews.llvm.org/D112285	2021-11-11 09:52:00 -06:00
Sanjay Patel	11522cfcad	[DAGCombiner] add fold for vselect based on mask of signbit, part 3 (Cond0 s> -1) ? N1 : 0 --> ~(Cond0 s>> BW-1) & N1 https://alive2.llvm.org/ce/z/mGCBrd This was suggested as a potential enhancement in D113212 (also `7e30404c3b` ). There's an improvement for AArch that could be generalized ( X > -1 --> X >= 0 ). For x86, we have a counter-acting fold for most cases that turns the shift+not back into a setcc, so that needs a work-around to get more cases to use "pandn": D113603 Note that this pattern (and a previous one) are not currently canonical forms in IR: https://alive2.llvm.org/ce/z/e4o96b Adding swapped variants is left as a TODO item here, but is planned as a near-term follow-up patch. Differential Revision: https://reviews.llvm.org/D113426	2021-11-11 10:27:37 -05:00
Jay Foad	417add4d4e	[CodeGen] Tweak whitespace in LiveInterval printing When printing a LiveInterval, tweak the use of single and double spaces to try to make it clearer that the valnos are associated with the preceding range or subrange, not the following subrange. Compare the output before and then after this patch: %1 [32r,144r:0) 0@32r L000000000000000C [32r,144r:0) 0@32r L00000000000000F3 [32r,32d:0) 0@32r weight:0.000000e+00 %1 [32r,144r:0) 0@32r L000000000000000C [32r,144r:0) 0@32r L00000000000000F3 [32r,32d:0) 0@32r weight:0.000000e+00 Differential Revision: https://reviews.llvm.org/D113671	2021-11-11 15:19:32 +00:00
Kazu Hirata	ce227ce3b3	[CodeGen] Use MachineInstr::operands (NFC)	2021-11-11 07:10:30 -08:00
Jay Foad	491beae71d	[TwoAddressInstruction] Update LiveIntervals after rewriting INSERT_SUBREG to COPY Also add subranges to an existing live interval when introducing a new subreg def. Differential Revision: https://reviews.llvm.org/D113044	2021-11-11 12:24:59 +00:00
Jay Foad	6abbc3a420	[LiveIntervals] Update subranges in processTiedPairs In TwoAddressInstructionPass::processTiedPairs when updating live intervals after moving the last use of RegB back to the newly inserted copy, update any affected subranges as well as the main range. Differential Revision: https://reviews.llvm.org/D110411	2021-11-11 12:24:59 +00:00
Simon Pilgrim	82b74363a9	[DAG] reassociateOpsCommutative - peek through bitcasts to find constants Now that FoldConstantArithmetic can fold bitcasted constants, we should peek through bitcasts of binop operands to try and find foldable constants	2021-11-11 12:00:22 +00:00
Simon Pilgrim	098ea29641	[DAG] FoldConstantArithmetic - fold intop(bitcast(buildvector(c1)),bitcast(buildvector(c1))) -> bitcast(intop(buildvector(c1'),buildvector(c2'))) Enable FoldConstantArithmetic to constant fold bitcasted constant build vectors. These have typically been bitcasted for type legalization purposes. By extracting the raw constant bit data, performing the constant fold, and then casting the constant bit data back to the (legalized) type, we can perform constant folding on integer types after legalization. This in particular helps 32-bit targets which need to handle vXi64 build vectors - during legalization the (unsupported) i64 elements are split to create a bitcasted v2Xi32 build vector. Addresses some regressions in D113192. Differential Revision: https://reviews.llvm.org/D113564	2021-11-11 11:35:18 +00:00
Florian Hahn	c2ed9fd054	[AArch64] Use custom lowering for {U,S}INT_TO_FP with i8. With fullfp16, it is cheaper to cast the {U,S}INT_TO_FP operand to i16 first, rather than promoting it to i32. The custom lowering for {U,S}INT_TO_FP already supports that, it just needs to be used. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D113601	2021-11-11 08:47:15 +00:00
duanbo.db	53dc525828	[LoopInfo] Fix function getInductionVariable The way function gets the induction variable is by judging whether StepInst or IndVar in the phi statement is one of the operands of CMP. But if the LatchCmpOp0/LatchCmpOp1 is a constant, the subsequent comparison may result in null == null, which is meaningless. This patch fixes the typo. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D112980	2021-11-11 16:22:42 +08:00
David Green	703ded8dda	[AArch64] Allow FP16 vector fixed point converts This extends performFpToIntCombine to work on FP16 vectors as well as the f32 and f64 vectors it already supported. Differential Revision: https://reviews.llvm.org/D113297	2021-11-11 07:32:52 +00:00
Bin Cheng	bf76e64854	[BPI] Push exit block rather than exiting ones in getSccExitBlocks The function BranchProbabilityInfo::SccInfo::getSccExitBlocks is supposed to collect all exit blocks for SCC rather than all exiting blocks. This patch fixes the typo. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D113344	2021-11-11 14:22:19 +08:00
Craig Topper	0963291991	[TypePromotion] Fix a hardcoded use of 32 as the size being promoted to. At least I think that's what the 32 here is. Use RegisterBitWidth instead. While there replace zext with zextOrSelf to simplify the code. Reviewed By: samparker, dmgreen Differential Revision: https://reviews.llvm.org/D113495	2021-11-10 22:12:39 -08:00
Kazu Hirata	642a361b7e	[llvm] Use make_early_inc_range (NFC)	2021-11-10 19:56:35 -08:00
kpyzhov	c9690092c8	[AMDGPU] Small correction in SITargetLowering::performOrCombine(). Differential Revision: https://reviews.llvm.org/D113203	2021-11-10 21:07:27 -05:00
Craig Topper	4183522e80	[RISCV] Promote f16 frem with Zfh. Add riscv64 coverage for f32 and f64 frem. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D113531	2021-11-10 17:35:07 -08:00
Stanislav Mekhanoshin	476ab0f809	[AMDGPU] Fixed stack pointer init with architected flat scratch Even if wave offset is not present we still need to do the rest of the initialization. The mov into s32 was missing in the kernels. Fixes: SWDEV-310935 Differential Revision: https://reviews.llvm.org/D113628	2021-11-10 17:18:38 -08:00
Matt Arsenault	c7a0c2d0f7	AMDGPU: Report large stack usage for recursive calls We were previously setting an ignored bit in the kernel headers. The current behavior is to add the large amount on top of the statically known size of a single stack frame. I'm not sure if we should just use the large size as the entire reported size instead.	2021-11-10 20:02:01 -05:00
Nikita Popov	0242a6adf7	[InstCombine] Support splat vectors in some or of icmp folds Replace m_ConstantInt() with m_APInt() in order to support splat constants in addition to scalar integers.	2021-11-10 22:59:09 +01:00
Nikita Popov	861adaf2ad	[InstCombine] Support splat vectors in some and of icmp folds Replace m_ConstantInt() with m_APInt() to support splat vectors in addition to scalar integers.	2021-11-10 22:37:54 +01:00
Nikita Popov	58ebc79a64	[InstCombine] Strip offset when folding and/or of icmps When folding and/or of icmps, look through add of a constant and adjust the icmp range instead. Effectively, this decomposes X + C1 < C2 style range checks back into a normal range. This allows us to fold comparisons involving two range checks or one range check and some other condition. We had a fold for a really specific case of this (or of range check and eq, and only one one side!) while this handles it in fully generality. Differential Revision: https://reviews.llvm.org/D113510	2021-11-10 22:01:52 +01:00
Craig Topper	9ee5cec688	[RISCV] Prevent bad legalizer behavior when bitcasting fixed vectors to i64 on RV32 with Zve32. Similar to D113219, we need to make sure we don't create a vXi64 vector when it isn't legal. This fixes an error found by an expensive checks build.	2021-11-10 11:58:49 -08:00
Roman Lebedev	a70d74323e	[X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 8 bit-wide elements with AVX512VBMI VBMI introduced VPERMB, so cost-model i8 replication shuffle using it. Note that we can still model i8 replication shufflle without VBMI, by promoting to i16/i32. That will be done in follow-ups. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113479	2021-11-10 22:52:40 +03:00
Roman Lebedev	c6e894b9b2	[X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 16 bit-wide elements with AVX512BW BWI introduced VPERMW, so cost-model i16 replication shuffle using it. Note that we can still model i16 replication shufflle without BWI, by promoting to i32. That will be done in follow-ups. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113478	2021-11-10 22:52:39 +03:00
Roman Lebedev	4101c7bf19	[X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 32/64 bit-wide elements with AVX512F This models lowering to `vpermd`/`vpermq`/`vpermps`/`vpermpd`, that take a single input vector and a single index vector, and are cross-lane. So far i haven't seen evidence that replication ever results in demanding more than a single input vector per output vector. This results in shockingly lesser costs :) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113350	2021-11-10 22:52:33 +03:00
Sanjay Patel	a8abd19b10	[x86] simplify code; NFC We bail out if the types don't match, so it's clearer to have a single variable to show that common type.	2021-11-10 13:29:57 -05:00
Sanjay Patel	5424fb164a	[x86] fix formatting; NFC	2021-11-10 13:29:57 -05:00
Zarko Todorovski	ed4a91300b	[NFC][llvm][M68k] Inclusive language: reword comment Rewording the comment to avoid the use of blacklist.	2021-11-10 13:28:32 -05:00
Stanislav Mekhanoshin	5731381594	[InstCombine] Relax and reorganize one use checks in the ~(a \| b) & c Since there is just a single check for LHS in ~(A \| B) & C \| ... transforms and multiple RHS checks inside with more coming I am removing m_OneUse checks for LHS and adding new checks for RHS. This is non essential as long as there is total benefit. In addition (~(A \| B) & C) \| (~(A \| C) & B) --> (B ^ C) & ~A checks were overly restrictive, it should be good without any additional checks. Differential Revision: https://reviews.llvm.org/D113141	2021-11-10 10:14:12 -08:00
Fraser Cormack	b1d8d70b9d	[SelectionDAG] Replace the Chain in LOAD->VP_LOAD widening The introduction of this legalization, D111248, forgot to replace the old chain with the new. This could manifest itself in the old (illegally-typed) value remaining in the DAG, though the simple test cases didn't catch this. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113561	2021-11-10 17:49:12 +00:00
Craig Topper	57bc7b1089	[RISCV] Prevent crashes when bitcasting between fixed vectors and scalars. Not all scalar element types are allowed in vectors so we may not be able to bitcast to a 1 element vector to use insert/extract. This will become a bigger issue when the Zve extensions are commited. For now, I'm using the ELEN limit to limit the element types. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113219	2021-11-10 09:21:52 -08:00
Simon Pilgrim	a1e0aa75ca	[X86] combineMulToPMADDWD - remove useless TODO We should always be able to use PMULUDQ/PMULDQ in PMADDWD patterns with greater than 32-bit extended integer sources	2021-11-10 16:56:44 +00:00
David Green	509b397dd5	[AArch64] Combine vector fptoi.sat(fmul) to fixed point fcvtz Similar to D113199 but dealing with the vector size, this extends the fptosi+fmul to fixed point fold to handle fptosi.sat nodes that are equally viable, so long as the saturation width matches the output width. Differential Revision: https://reviews.llvm.org/D113200	2021-11-10 16:12:48 +00:00

1 2 3 4 5 ...

152407 Commits