llvm-project

Commit Graph

Author	SHA1	Message	Date
Eli Friedman	cfd2c5ce58	Untangle the mess which is MachineBasicBlock::hasAddressTaken(). There are two different senses in which a block can be "address-taken". There can be a BlockAddress involved, which means we need to map the IR-level value to some specific block of machine code. Or there can be constructs inside a function which involve using the address of a basic block to implement certain kinds of control flow. Mixing these together causes a problem: if target-specific passes are marking random blocks "address-taken", if we have a BlockAddress, we can't actually tell which MachineBasicBlock corresponds to the BlockAddress. So split this into two separate bits: one for BlockAddress, and one for the machine-specific bits. Discovered while trying to sort out related stuff on D102817. Differential Revision: https://reviews.llvm.org/D124697	2022-08-16 16:15:44 -07:00
Bing1 Yu	807b8cb06c	[X86] Fix a lowering issue of mask.compress which has undef float passthrough Previously, LegaizeDAG didn't check mask.compress's passthrough might be float, and this lead to getConstant crash since it doesn't support fp Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131947	2022-08-16 17:54:45 +08:00
Simon Pilgrim	a7b85e4c0c	[X86] Freeze shl(x,1) -> add(x,x) vector fold (PR50468) Vector fold shl(x,1) -> add(freeze(x),freeze(x)) to avoid the undef issues identified in PR50468 Differential Revision: https://reviews.llvm.org/D106675	2022-08-15 16:17:21 +01:00
Simon Pilgrim	41bdb8cd36	[X86] Fold insert_vector_elt(undef, elt, 0) --> scalar_to_vector(elt) I had hoped to make this a generic fold in DAGCombine, but there's quite a few regressions in Thumb2 MVE that need addressing first. Fixes regressions from D106675.	2022-08-15 14:56:30 +01:00
Simon Pilgrim	8b47e29fa0	[X86] combineVectorShiftImm - fold (shl (add X, X), C) -> (shl X, (C + 1)) Noticed while investigating the regressions in D106675	2022-08-14 17:42:02 +01:00
Phoebe Wang	8b69549dc5	[X86][FP16] Promote FP16->[U]INT to FP16->FP32->[U]INT This is to avoid f16->i64 being lowered to `__fixhfdi/__fixunshfdi` on 32-bits since neither libgcc nor compiler-rt provide them. https://godbolt.org/z/cjWEsea5v It also helps to improve the performance by promoting the vector type. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D131828	2022-08-14 09:37:33 +08:00
Kazu Hirata	109df7f9a4	[llvm] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-13 12:55:42 -07:00
James Y Knight	4d7f9b7489	X86: Don't fold TEST into ADD ...@GOTTPOFF/GOTNTPOFF/INDNTPOFF The linker may convert such an ADD into a LEA, so we must not use the EFLAGS output. This causes miscompiles with -fsanitize=null after `bacdf80f42` added llvm.threadlocal.address -- previously, global variables were known to be non-null, but the intrinsic is not currently known to return nonnull. (That should be corrected, but it shouldn't've caused miscompiles!) Differential Revision: https://reviews.llvm.org/D131716	2022-08-12 20:52:00 +00:00
Simon Pilgrim	6ba5fc2dee	[X86] lowerShuffleWithVPMOV - support direct lowering to VPMOV on VLX targets lowerShuffleWithVPMOV currently only matches shuffle(truncate(x)) patterns, but on VLX targets the truncate isn't usually necessary to make the VPMOV node worthwhile (as we're only targetting v16i8/v8i16 shuffles we're almost always ending up with a PSHUFB node instead). PACKSS/PACKUS are still preferred vs VPMOV due to their lower uop count. Fixes the remaining regression from the fixes in rG293899c64b75	2022-08-11 17:40:07 +01:00
Simon Pilgrim	5dcf0c342b	[X86] lowerShuffleWithVPMOV - remove oneuse constraints on shuffle(trunc(x),undef) -> vpmov(x) lowering These were added in rG057bdd63 but shuffle combining has gotten a lot better at folding different vector widths since then.	2022-08-11 14:06:42 +01:00
aqjune	02e56e2533	[CodeGen] Generate efficient assembly for freeze(poison) version of `mm_cast` intel intrinsics This patch makes the variants of `mm_cast` intel intrinsics that use `shufflevector(freeze(poison), ..)` emit efficient assembly. (These intrinsics are planned to use `shufflevector(freeze(poison), ..)` after shufflevector's semantics update; relevant thread: D103874) To do so, this patch 1. Updates `LowerAVXCONCAT_VECTORS` in X86ISelLowering.cpp to recognize `FREEZE(UNDEF)` operand of `CONCAT_VECTOR` in addition to `UNDEF` 2. Updates X86InstrVecCompiler.td to recognize `insert_subvector` of `FREEZE(UNDEF)` vector as its first operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130339	2022-08-11 13:36:21 +09:00
Amaury Séchet	9bceb8981d	[X86] (0 - SetCC) \| C -> (zext (not SetCC)) * (C + 1) - 1 if we can get a LEA out of it. This adresses various regression in D131260 , as well as is a useful optimization in itself. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131358	2022-08-10 15:12:00 +00:00
Simon Pilgrim	b92c7dc211	[X86] Use DAG.getFreeze() to create freeze node. NFC.	2022-08-10 15:03:56 +01:00
Alex Bradbury	7e7860c5d7	[X86][NFCI] Remove target-specific branch optimisation that's handled in BranchFolding This specific optimisation is handled in OptimizeBlock in BranchFolding so is redundant. As discussed on the review thread, I've verified that we have test coverage for that optimisation within test/CodeGen/X86 by disabling the BranchFolding version of this transform after applying this patch and rerunning the test suite. Differential Revision: https://reviews.llvm.org/D129204	2022-08-10 10:35:31 +01:00
Phoebe Wang	c7ec6e19d5	[X86][BF16] Make backend type bf16 to follow the psABI X86 psABI has updated to support __bf16 type, the ABI of which is the same as FP16. See https://discourse.llvm.org/t/patch-add-optional-bfloat16-support/63149 This is an alternative of D129858, which has less code modification and supports the vector type as well. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D130832	2022-08-10 08:58:56 +08:00
Luo, Yuanke	aaf6c7b05c	[globalisel] Select register bank for DBG_VALUE The register operand of DBG_VALUE is not selected to a proper register bank in both AArch64 and X86. This would cause getRegClass crash after global ISel. After discussion, we think the MIR should assume all vritual register should be set proper register class after global ISel, so this patch is to fix the gap of DBG_VALUE for AArch64 and X86. Differential Revision: https://reviews.llvm.org/D129037	2022-08-09 13:11:51 +08:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Simon Pilgrim	9ea54ac9ce	[X86] X86ISelDAGToDAG.cpp - use auto for all values derived from cast/dyn_cast (style). NFC.	2022-08-08 14:35:06 +01:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00
Kazu Hirata	54199d805a	[x86] Remove unused declaration processWaitCnt (NFC) The declaration was introduced without a corresponding definition on Jan 2, 2022 in commit `85e6e748d4`.	2022-08-07 00:16:19 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Krzysztof Parzyszek	2bc390bdd6	[RDF] Use default TargetOperandInfo if not given in constructor All current in-tree users use the default implementation.	2022-08-06 14:32:52 -05:00
Dawid Jurczak	1bd31a6898	[NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T Extracted from https://reviews.llvm.org/D129781 and address comment: https://reviews.llvm.org/D129781#3655571 Differential Revision: https://reviews.llvm.org/D130268	2022-08-05 13:35:41 +02:00
Phoebe Wang	2312b747b8	[X86] Move getting module flag into `runOnMachineFunction` to reduce compile-time. NFCI Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D131245	2022-08-05 01:58:17 -07:00
Phoebe Wang	7f648d27a8	Reland "[X86][MC] Always emit `rep` prefix for `bsf`" `BMI` new instruction `tzcnt` has better performance than `bsf` on new processors. Its encoding has a mandatory prefix '0xf3' compared to `bsf`. If we force emit `rep` prefix for `bsf`, we will gain better performance when the same code run on new processors. GCC has already done this way: https://c.godbolt.org/z/6xere6fs1 Fixes #34191 Reviewed By: craig.topper, skan Differential Revision: https://reviews.llvm.org/D130956	2022-08-05 10:22:48 +08:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
Phoebe Wang	6f867f9102	[X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk This is to address feature request from https://github.com/ClangBuiltLinux/linux/issues/1665 Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D130754	2022-08-04 15:12:15 +08:00
Craig Topper	91e8079cd5	[X86] Teach PostprocessISelDAG to fold ANDrm+TESTrr when chain result is used. The isOnlyUserOf prevented the fold if the chain result had any users. What we really care about is the the data result from the AND is only used by the TEST, and the flags results from the ANDs aren't used at all. It's ok if the chain has users, we just need to replace those users with the chain from the TESTrm. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D131117	2022-08-03 21:00:22 -07:00
Craig Topper	84e9194828	Revert "[X86][MC] Always emit `rep` prefix for `bsf`" This reverts commit `c2066d19cd`. It's causing failures on the build bots.	2022-08-03 14:51:34 -07:00
Craig Topper	ff91b2d9df	[X86] Promote i16 CTTZ/CTTZ_ZERO_UNDEF always. If we're going to emit a rep prefix before bsf as proposed in D130956, it makes sense to promote i16 operations to i32 to avoid the false depedency of tzcntw. Reviewed By: skan, pengfei Differential Revision: https://reviews.llvm.org/D130995	2022-08-03 13:12:20 -07:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Phoebe Wang	c2066d19cd	[X86][MC] Always emit `rep` prefix for `bsf` `BMI` new instruction `tzcnt` has better performance than `bsf` on new processors. Its encoding has a mandatory prefix '0xf3' compared to `bsf`. If we force emit `rep` prefix for `bsf`, we will gain better performance when the same code run on new processors. GCC has already done this way: https://c.godbolt.org/z/6xere6fs1 Fixes #34191 Reviewed By: skan Differential Revision: https://reviews.llvm.org/D130956	2022-08-03 17:09:36 +08:00
Liu, Chen3	5bbb0a831f	[X86] Using `X86MemOperand` instead of `Operand` for `i32mem_TC` and `i64mem_TC` To fix build fail when X86_GEN_FOLD_TABLES is enabled. Differential Revision: https://reviews.llvm.org/D131049	2022-08-03 16:17:51 +08:00
Phoebe Wang	23021d4d8c	[X86][FP16] Fix vector_shuffle and lowering without f16c feature problems The problem Alexander reported on D127982 was caused by an optimization for AVX512-FP16 instruction. We must limit it to the feature enabled only. During the investigation, I found we didn't expand for fp_round/fp_extend without F16C. This may result runtime crash, so change them too. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130817	2022-08-02 22:26:41 +08:00
Sotiris Apostolakis	995b61cdac	[SelectOpti] Auto-disable other cmov optis when the new select-opti pass is enabled Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D129817	2022-08-02 00:19:59 +00:00
Jay Foad	a5a7a9da39	[X86] Fix updating LiveVariables in convertToThreeAddress Fix all instances of: * Bad machine code: Kill missing from LiveVariables * in the X86 CodeGen tests with D129213 applied, which adds verification of LiveIntervals after the TwoAddressInstruction pass runs. Differential Revision: https://reviews.llvm.org/D129634	2022-08-01 13:45:21 +01:00
Kazu Hirata	bf6021709a	Use drop_begin (NFC)	2022-07-31 15:17:09 -07:00
Simon Pilgrim	acb5abb7d3	[X86] getFauxShuffleMask - use DemandedElts variant of getTargetShuffleInputs. NFCI. We don't specify the demanded elts yet, this patch just rewires the getTargetShuffleInputs calls and gives an "all demanded elts" mask.	2022-07-31 12:15:04 +01:00
Simon Pilgrim	9cdba33337	[X86] combineX86ShufflesRecursively - determine demanded elts to pass to getTargetShuffleInputs Only PACKSS/PACKUS faux shuffles make use of the demanded elts at the moment, but this at least improves the handling of a couple of truncation patterns.	2022-07-31 11:30:40 +01:00
Simon Pilgrim	df457f583a	[X86] Use std::tie so we can have more meaningful variable names for demanded bits/elts pairs. NFCI. .first + .second were proving difficult to keep track of.	2022-07-30 18:57:15 +01:00
Simon Pilgrim	a14f94c20c	[X86] computeKnownBitsForTargetNode - out of range X86ISD::VSRAI doesn't fold to zero Noticed by inspection and I can't seem to make a test case, but SSE arithmetic bit shifts clamp to the max shift amount (i.e. create a sign splat) - combineVectorShiftImm already does something similar.	2022-07-30 17:55:39 +01:00
Simon Pilgrim	813459ed2b	[X86] combineSelect fold 'smin' style pattern select(pcmpgt(RHS, LHS), LHS, RHS) -> select(pcmpgt(LHS, RHS), RHS, LHS) if pcmpgt(LHS, RHS) already exists Avoids repeated commuted comparisons when we're performing min/max and clamp patterns	2022-07-30 15:31:36 +01:00
Simon Pilgrim	bc2c4f6c85	[X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) (REAPPLIED) If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc. Updated version - original version rG057db2002bb3 didn't correctly account for multiple uses of the mask that might be folding "OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y))" in canonicalizeBitSelect	2022-07-29 15:12:26 +01:00
Florian Hahn	f912bab111	Revert "[X86][DAGISel] Don't widen shuffle element with AVX512" This reverts commit `5fb4134210`. This patch is causing crashes when building llvm-test-suite when optimizing for CPUs with AVX512. Reproducer crashing with llc: target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx" define i32 @test(<32 x i32> %0) #0 { entry: %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1> %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1) ret i32 %2 } ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1 attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" } attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }	2022-07-28 15:26:42 +01:00
Phoebe Wang	726d9f8e8c	[X86][MC] Avoid emitting incorrect warning for complex FMUL We will insert a new operand which is identical to the Dest for complex FMUL with a mask. https://godbolt.org/z/eTEdnYv3q Complex FMA and FMUL with maskz don't have this problem. Reviewed By: LuoYuanke, skan Differential Revision: https://reviews.llvm.org/D130638	2022-07-28 13:58:34 +08:00
Kazu Hirata	3f3930a451	Remove redundaunt virtual specifiers (NFC) Identified with tidy-modernize-use-override.	2022-07-25 23:00:59 -07:00
Luo, Yuanke	5fb4134210	[X86][DAGISel] Don't widen shuffle element with AVX512 Currently the X86 shuffle lowering would widen the element type for shuffle if the mask element value is adjacent. For below example %t2 = add nsw <16 x i32> %t0, %t1 %t3 = sub nsw <16 x i32> %t0, %t1 %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3, <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15> ret <16 x i32> %t4 Compiler would transform the shuffle to %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3, <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> This may lose the oppotunity to let ISel select mask instruction when avx512 is enabled. This patch is to prevent the tranform when avx512 feature is enabled. Thank Simon for the idea. Differential Revision: https://reviews.llvm.org/D129537	2022-07-26 11:56:03 +08:00
Craig Topper	00060a7b97	[X86] Custom type legalize v2i32 smulo/umulo to use a single pmuldq/pmuludq. With SSE4.1 and above we were using 3 multiply instructions. This was due to type legalization widening to v4i32 and the low half being done with pmulld while the high half used two pmuldq/pmuludq. Instead of that, we can use a single pmuludq/pmuldq to calculate the full product at once, extract the high and low bits and compare to check for overflow. I've restricted SMULO to sse4.1 to get pmuldq. We can probably do a fixup to pmuludq on earlier targets, but that's for another day. I was going through my git stash and found an early version of this patch from a year or two ago so I went ahead and finished it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130432	2022-07-25 09:12:35 -07:00
Kazu Hirata	b5188591a0	[llvm] Remove redundaunt virtual specifiers (NFC) Identified with modernize-use-override.	2022-07-24 21:50:35 -07:00
Simon Pilgrim	0708771cce	[DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC. Just use KnownBits::isZero() to ensure all the bits are known zero.	2022-07-24 13:12:21 +01:00

1 2 3 4 5 ...

22777 Commits