llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	e378577524	[X86] Use is128BitLaneRepeatedShuffleMask wrapper. NFC. We don't need to know the actual repeated mask.	2022-04-27 21:09:57 +01:00
Alexey Bataev	29a470e380	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-27 10:56:26 -07:00
Simon Pilgrim	03482bccad	[X86] collectConcatOps - add ability to collect from vector 'widening' patterns Recognise insert_subvector(undef, x, lo/hi) patterns where we double the width of a vector - creating an UNDEF subvector on the fly.	2022-04-27 15:38:58 +01:00
Vasileios Porpodas	fa8a9fea47	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `6a9bbd9f20`. Code review: https://reviews.llvm.org/D124202	2022-04-26 14:02:40 -07:00
Vasileios Porpodas	6a9bbd9f20	Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `55ce296d6f`.	2022-04-26 11:25:26 -07:00
Vasileios Porpodas	55ce296d6f	[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost` Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle. Differential Revision: https://reviews.llvm.org/D124202	2022-04-26 11:11:29 -07:00
Xiang1 Zhang	c430f0f532	[X86] Add use condition for combineSetCCMOVMSK Reviewed by RKSimon, LuoYuanke Differential Revision: https://reviews.llvm.org/D123652	2022-04-26 16:42:50 +08:00
Luo, Yuanke	f3ad7ea03a	[X86][AMX] Report error when shapes are not pre-defined. Instead of report fatal error, this patch emit error message and exit when shapes are not pre-defined. This would cause the compiling fail but not crash. Differential Revision: https://reviews.llvm.org/D124342	2022-04-26 14:57:25 +08:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Simon Pilgrim	e8305c0b8f	[X86] combineX86ShuffleChain - don't fold to truncate(concat(V1,V2)) if it was already a PACK op Fixes #55050	2022-04-25 17:13:44 +01:00
Vasileios Porpodas	889588ee97	[SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`. Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`. This helps reduce the diff of a future SLP patch for AArch64.	2022-04-21 10:19:00 -07:00
gpei-dev	3e6b904f0a	Force insert zero-idiom and break false dependency of dest register for several instructions. The related instructions are: VPERMD/Q/PS/PD VRANGEPD/PS/SD/SS VGETMANTSS/SD/SH VGETMANDPS/PD - mem version only VPMULLQ VFMULCSH/PH VFCMULCSH/PH Differential Revision: https://reviews.llvm.org/D116072	2022-04-21 16:47:13 +08:00
Matt Arsenault	3659780d58	MachineModuleInfo: Remove UsesMorestackAddr This is x86 specific, and adds statefulness to MachineModuleInfo. Instead of explicitly tracking this, infer if we need to declare the symbol based on the reference previously inserted. This produces a small change in the output due to the move from AsmPrinter::doFinalization to X86's emitEndOfAsmFile. This will now be moved relative to other end of file fields, which I'm assuming doesn't matter (e.g. the __morestack_addr declaration is now after the .note.GNU-split-stack part) This also produces another small change in code if the module happened to define/declare __morestack_addr, but I assume that's invalid and doesn't really matter.	2022-04-20 11:10:20 -04:00
Matt Arsenault	d7938b1a81	MachineModuleInfo: Move HasSplitStack handling to AsmPrinter This is used to emit one field in doFinalization for the module. We can accumulate this when emitting all individual functions directly in the AsmPrinter, rather than accumulating additional state in MachineModuleInfo. Move the special case behavior predicate into MachineFrameInfo to share it. This now promotes it to generic behavior. I'm assuming this is fine because no other target implements adjustForSegmentedStacks, or has tests using the split-stack attribute.	2022-04-20 10:54:29 -04:00
Matt Arsenault	209e7ef874	X86: Do not use ValueMap for PreallocatedIds ValueMap should only be necessary if the IR values can be replaced. This is only used during codegen, when it's illegal to change the underlying IR. This allows using the default copy constructor for X86MachineFunctionInfo. I'm not happy about targets keeping state here that's only used in one specific pass, but we don't have a better place to put it right now.	2022-04-19 21:07:47 -04:00
Craig Topper	c6fdb1de47	[X86] Move some hasOneUse checks after checking what the opcode is. Calling hasOneUse can be expensive on nodes with multiple results. Especially when some results are Chains. By checking the opcode first, we can avoid walking the uses if it isn't an interesting node, and thus avoid calling hasOneUse on a node that might have many uses. Found by profiling the IR given in D123857. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D123881	2022-04-16 14:18:58 -07:00
Craig Topper	9d86bf825c	[X86] Move hasOneUse check after opcode check. NFC Checking opcode is cheap. hasOneUse might not be if the node has multiple results. By checking the opcode we can rule out nodes with multiple results we aren't interested in.	2022-04-15 17:20:57 -07:00
Simon Pilgrim	a305d8f44e	[X86] Adjust fsetcc/fmin/fmax costs to match SoG (Issue #54889 ) znver1/2 models were incorrectly modelling these as 3 cycle latency instructions on the wrong pipe and znver1 ymm variants also require double pumping. Now matches AMD SoG, Agner and instlatx64 numbers. Thanks to @fabian-r for the report	2022-04-14 13:27:33 +01:00
Liu, Chen3	bf60a5af0a	[X86] Covert unsigned int 0 to float-point with FILD instruction. unsinged int 0 will be convert to float/double -0.0 when the rounding mode is set to 'FE_DOWNWARD'. Using FILD instruction instead of SSE instructions on 32-bit target if the strictfp is enabled. Differential Revision: https://reviews.llvm.org/D123660	2022-04-13 20:06:15 +08:00
Jonas Paulsson	46f83caebc	[InlineAsm] Add support for address operands ("p"). This patch adds support for inline assembly address operands using the "p" constraint on X86 and SystemZ. This was in fact broken on X86 (see example at https://reviews.llvm.org/D110267, Nov 23). These operands should probably be treated the same as memory operands by CodeGenPrepare, which have been commented with "TODO" there. Review: Xiang Zhang and Ulrich Weigand Differential Revision: https://reviews.llvm.org/D122220	2022-04-13 12:50:21 +02:00
Harald van Dijk	3337f50625	[X86] Fix handling of maskmovdqu in x32 differently This reverts the functional changes of D103427 but keeps its tests, and and reimplements the functionality by reusing the existing 32-bit MASKMOVDQU and VMASKMOVDQU instructions as suggested by skan in review. These instructions were previously predicated on Not64BitMode. This reimplementation restores the disassembly of a class of instructions, which will see a test added in followup patch D122449. These instructions are in 64-bit mode special cased in X86MCInstLower::Lower, because we use flags with one meaning for subtly different things: we have an AdSize32 class which indicates both that the instruction needs a 0x67 prefix and that the text form of the instruction implies a 0x67 prefix. These instructions are special in needing a 0x67 prefix but having a text form that does not imply a 0x67 prefix, so we encode this in MCInst as an instruction that has an explicit address size override. Note that originally VMASKMOVDQU64 was special cased to be excluded from disassembly, as we cannot distinguish between VMASKMOVDQU and VMASKMOVDQU64 and rely on the fact that these are indistinguishable, or close enough to it, at the MCInst level that it does not matter which we use. Because VMASKMOVDQU now receives special casing, even though it does not make a difference in the current implementation, as a precaution VMASKMOVDQU is excluded from disassembly rather than VMASKMOVDQU64. Reviewed By: RKSimon, skan Differential Revision: https://reviews.llvm.org/D122540	2022-04-12 18:32:14 +01:00
Simon Pilgrim	0488c6638b	[X86] getFauxShuffleMask - remove use DemandedElts TODO Most of the getTargetShuffleInputs recursive calls have now gone and the remaining uses aren't likely to benefit from a DemandedElts mask	2022-04-12 15:36:30 +01:00
Simon Pilgrim	058a33d3c9	[X86] Account for high uop/resource usage in BSF/BSR instructions znver1/2 models were incorrectly modelling these as single uop instructions, instead of the microcoded nightmares they really are. Now matches AMD SoG, Agner and instlatx64 numbers. Fixes #54811	2022-04-11 11:20:09 +01:00
Simon Pilgrim	1e803d305a	Revert rG88ff6f70c45f2767576c64dde28cbfe7a90916ca "[X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains" Reverting while I investigate reports of internal test regressions/failures	2022-04-11 10:42:43 +01:00
Simon Pilgrim	88ff6f70c4	[X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains	2022-04-10 13:04:53 +01:00
Simon Pilgrim	c74d729bd6	[X86] combineExtractSubvector - fold extract_subvector(insert_subvector(V,X,C1),C1) extract_subvector(insert_subvector(V,X,C1),C1) -> insert_subvector(extract_subvector(V,C1),X,0) More aggressively attempt to reduce the width of an extract_subvector source - we currently only do this if we're inserting into a zero vector (i.e. canonicalizing to the AVX implicit zero upper elts pattern). But if we're extracting from the same point as the inner insert_subvector then the fold is still relatively trivial - we can probably do even better if we can ensure the subvector isn't badly split.	2022-04-10 11:03:08 +01:00
Luo, Yuanke	690bed0cec	[X86][AMX] Fix infinite loop of getShape. When walk the user chain to get the shape of a phi node. If it is phi node in the chain, we should walk to the user of this phi node instead of the original phi node.	2022-04-10 14:44:51 +08:00
Simon Pilgrim	30a01bccda	[X86] Fold concat(pshufb(x,y),pshufb(z,w)) -> pshufb(concat(x,z),concat(y,w))	2022-04-09 16:05:50 +01:00
Simon Pilgrim	97ee923248	[X86] lowerV64I8Shuffle - attempt to fold to SHUFFLE(ALIGNR(X,Y)) and OR(PSHUFB(X),PSHUFB(Y))	2022-04-09 14:09:39 +01:00
Simon Pilgrim	3d4bb78fbe	[X86][SSE] combineSelect - more aggressively create zero elements in the or(pshufb(x), pshufb(y)) fold When we fold vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)), ensure we convert all undef elements to zero elements - this should help us expose more known zero elements for deeper chains of these cases. Noticed while triaging Issue #54819	2022-04-09 12:53:00 +01:00
Simon Pilgrim	f5b4507486	[X86] Reduce some superfluous diffs between znver1/znver2 models. NFC znver2 is a mainly a search+replace of the znver1 model, but for no reason some lines have been moved around - try to keep these in sync (no actual changes in the models).	2022-04-09 10:59:18 +01:00
Nikita Popov	3075e5d2ef	[X86][FastISel] Fix with.overflow + select eflags clobber (PR54369) Don't try to directly use the with.overflow flag result in a cmov if we need to materialize constants between the instruction producing the overflow flag and the cmov. The current code is careful to check that there are no other instructions in between, but misses the constant materialization case (which may clobber eflags via xor or constant expression evaluation). Fixes https://github.com/llvm/llvm-project/issues/54369. Differential Revision: https://reviews.llvm.org/D122825	2022-04-08 16:12:28 +02:00
Simon Pilgrim	5626bd4289	[X86] Fix SLM scheduler model for PMULLD (PR37059) Adjust the PMULLD entry to match the Intel AoM numbers - PMULLD is a uop nightmare on SLM and we should model it as such. We had reports of internal regressions the last time this was attempted (rG13a0f83a05ff), but no public repros, and tests I did last year when I had access to a SLM box failed to see anything. My hunch is that the more aggressive PMULLD -> PMADDWD folds we now perform might have helped. We can revisit this again if we ever receive an actual repro. Fixes #36407	2022-04-08 10:07:06 +01:00
chenglin.bi	f72b3a506b	[x86] Replace getNodeIfExists to doesNodeExist when only check node exist Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123224	2022-04-08 00:33:05 +08:00
Simon Pilgrim	cf3a09369a	[X86] Enable fast variable per-lane shuffle tuning on all Ryzen targets (PR44795) rGa3b8695bf592 enabled this for znver3, but AMD SoG, Agner and uops.info all agree that even znver1 has a fast per-lane shuffle op (VPSHUFB), but cross-lane shuffles seem to be slow (PERMPS etc.) Fixes #44140 Differential Revision: https://reviews.llvm.org/D123306	2022-04-07 16:00:52 +01:00
Simon Pilgrim	a1df2ef5cb	[X86] Ensure ZN3Tuning inherits from ZN2Tuning instead of ZNTuning At the moment ZN2Tuning is just a copy of ZNTuning, but we should try to keep a clean inheritance.	2022-04-07 14:01:15 +01:00
Wei Xiao	842d0bf931	[x86] Improve select lowering for smin(x, 0) & smax(x, 0) smin(x, 0): (select (x < 0), x, 0) -> ((x >> (size_in_bits(x)-1))) & x smax(x, 0): (select (x > 0), x, 0) -> (~(x >> (size_in_bits(x)-1))) & x The comparison is testing for a positive value, we have to invert the sign bit mask, so only do that transform if the target has a bitwise 'and not' instruction (the invert is free). The transform is performed only when CMP has a single user to avoid increasing total instruction number. https://alive2.llvm.org/ce/z/euUnNm https://alive2.llvm.org/ce/z/37339J Differential Revision: https://reviews.llvm.org/D123109	2022-04-07 15:53:24 +08:00
Matt Arsenault	c4ea925f50	AtomicExpand: Change return type for shouldExpandAtomicStoreInIR Use the same enum as the other atomic instructions for consistency, in preparation for addition of another strategy. Introduce a new "Expand" option, since the store expansion does not use cmpxchg. Alternatively, the existing CmpXChg strategy could be renamed to Expand.	2022-04-06 22:34:04 -04:00
Roman Lebedev	9be6e7b0f2	[X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64->XMM broadcasts from constant pool Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D123221	2022-04-06 18:33:40 +03:00
Shengchen Kan	f4661b5a55	[X86] Fold MMX_MOVD64from64rr + store to MMX_MOVQ64mr instead of MMX_MOVD64from64mr in auto-generated table This is a follow-up patch for D122241.	2022-04-06 21:33:57 +08:00
Shengchen Kan	4d21497006	[X86] Remove TB_NO_REVERSE for 2 memory folding entries ``` X86::MMX_MOVD64from64rr -> X86::MMX_MOVQ64mr X86::MMX_MOVD64grr -> X86::MMX_MOVD64mr ``` These two entries were added in llvm-svn: 372770. I think these two should be reversable. Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D122217	2022-04-06 17:21:12 +08:00
Martin Storsjö	46776f7556	Fix warnings about variables that are set but only used in debug mode Add void casts to mark the variables used, next to the places where they are used in assert or `LLVM_DEBUG()` expressions. Differential Revision: https://reviews.llvm.org/D123117	2022-04-06 10:01:46 +03:00
Shengchen Kan	81b10f8200	[X86][tablgen] Consider the mnemonic when auto-generating memory folding table Intuitively, the memory folding pair should have the same mnemonic. This patch removes ``` {X86::SENDUIPI,X86::VMXON} ``` in the auto-generated table. And `NotMemoryFoldable` for `TPAUSE` and `CLWB` can be saved. ``` {X86::MOVLHPSrr,X86::MOVHPSrm} {X86::VMOVLHPSZrr,X86::VMOVHPSZ128rm} {X86::VMOVLHPSrr,X86::VMOVHPSrm} ``` It seems the three pairs above are mistakenly killed. But we can add them back manually later. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D122477	2022-04-06 12:53:05 +08:00
Pierre Gousseau	a3d5f1cf5d	[x86] Fix infinite loop inside DAG combiner with lzcnt feature. The issue affects targets supporting fast-lzcnt such as btver2. This removes extraneous zext/trunc node insertions to fix the infinite loop. This fixes Issue https://github.com/llvm/llvm-project/issues/54694 Differential Revision: https://reviews.llvm.org/D122900 Reviewed By: RKSimon, spatel, lebedev.ri	2022-04-05 17:32:10 +01:00
Wei Xiao	ca33d74ca5	[X86] Improve x86-partial-reduction to support abs intrinsic Current implementation only recognizes absolute operation implemented by select instruction. This patch adds support for abs intrinsic. Differential Revision: https://reviews.llvm.org/D122777	2022-04-05 11:32:09 +08:00
Simon Pilgrim	ffe0cc82db	[X86] Add XOR(X, MIN_SIGNED_VALUE) -> ADD(X, MIN_SIGNED_VALUE) isel patterns (PR52267) Improve chances of folding to LEA patterns Differential Revision: https://reviews.llvm.org/D123043	2022-04-04 19:47:06 +01:00
Simon Pilgrim	623d4b5787	[X86] Support optional NOT stages in the AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) fold Extension to D122891, peek through NOT() ops, adjusting the condcode as we go.	2022-04-04 10:51:26 +01:00
Simon Pilgrim	fbfd78f7aa	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow v16i32 sub-lane permutes for v64i8 shuffles Without VBMI, we are better off permuting v16i32 sub-lanes, even though its a variable shuffle, if it allows us to then shuffle v64i8 inlane repeated masks (PSHUFB etc.) Fixes #54658	2022-04-03 10:05:10 +01:00
Simon Pilgrim	76cd11f303	[DAG] Add llvm::isMinSignedConstant helper. NFC Pulled out of D122754	2022-04-01 17:47:34 +01:00
Simon Pilgrim	c64f37f818	[X86] matchAddressRecursively - add XOR(X, MIN_SIGNED_VALUE) handling Allows us to fold XOR(X, MIN_SIGNED_VALUE) == ADD(X, MIN_SIGNED_VALUE) into LEA patterns As mentioned on PR52267. Differential Revision: https://reviews.llvm.org/D122815	2022-04-01 17:26:29 +01:00
Simon Pilgrim	b8652fbcbb	[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) (RECOMMITTED) As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc. Recommitted with a fix to ensure we zext/trunc the SETCC result to the original type. Differential Revision: https://reviews.llvm.org/D122891	2022-04-01 16:59:06 +01:00
Simon Pilgrim	5a457bd2fa	Revert rGa5f637bcbb7d1e08ce637f113fc117c3f4b2b110 "[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))" Investigating a sanitizer-windows buildbot breakage	2022-04-01 16:48:24 +01:00
Simon Pilgrim	9afa6811ad	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow 64-bit sublane shuffling on AVX512BW v64i8 shuffles We were only performing this on 256-bit vectors on AVX2 targets Noticed while triaging Issue #54658	2022-04-01 16:40:10 +01:00
Simon Pilgrim	a5f637bcbb	[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc. Differential Revision: https://reviews.llvm.org/D122891	2022-04-01 16:07:56 +01:00
Simon Pilgrim	3245cfb8d3	[X86] Add getBT helper node for attempting to create a X86ISD::BT node Avoids repeating all the extension/legalization wrappers in every use	2022-04-01 11:48:25 +01:00
Simon Pilgrim	919b657080	Revert rGff2d1bb2b749bd8a5697c25d2380b7c97a59ae06 "[X86] Add getBT helper node for attempting to create a X86ISD::BT node" Typo means that this doesn't return a value in all cases.	2022-04-01 11:21:00 +01:00
Simon Pilgrim	ff2d1bb2b7	[X86] Add getBT helper node for attempting to create a X86ISD::BT node Avoids repeating all the extension/legalization wrapper in every use	2022-04-01 11:12:23 +01:00
Simon Pilgrim	cb5c4a5917	[X86] lowerV8I16Shuffle - use explicit SmallVector<SDValue, 4> width to avoid MSVC AVX alignment bug As discussed on Issue #54645 - building llc with /AVX can result in incorrectly aligned structs	2022-04-01 10:54:24 +01:00
Fangrui Song	ac6878b330	[X86] Set frame-setup/frame-destroy on prologue/epilogue CFI instructions This approach is used by AArch64/RISCV to make frame-setup/frame-destroy instructions contiguous instead of being interleaved by CFI instructions. Code checking `MBBI->getFlag(MachineInstr::FrameSetup) \|\| MBBI->isCFIInstruction()` can be simplified to just check FrameSetup. This helps locate all CFI instructions in the prologue, which can be handy to use .cfi_remember_state/.cfi_restore_state to decrease unwind table size (D114545). Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122541	2022-03-31 23:04:50 -07:00
Matt Arsenault	f635be3014	X86/GlobalISel: Use LLT form of getMachineMemOperand	2022-03-31 18:49:23 -04:00
Simon Pilgrim	535211c3eb	[X86] Remove redundant FIXME lowerV64I8Shuffle has been extended a lot since this was added.	2022-03-31 18:05:52 +01:00
Simon Pilgrim	fac1729924	[X86] lowerV64I8Shuffle - don't use lowerShuffleWithPERMV until we've tried simpler options Shuffle combining will still lower to this with better fast cross lane checks. Noticed while triaging Issue #54658	2022-03-31 18:05:51 +01:00
Sanjay Patel	4a54e3eed3	[x86] try to replace 0.0 in fcmp with negated operand This inverts a fold recently added to IR with: `3491f2f4b0` We can put -bidirectional on the Alive2 examples to show that the reverse transforms work: https://alive2.llvm.org/ce/z/8iVQwB The motivation for the IR change was to improve matching to 'fabs' in IR (see https://github.com/llvm/llvm-project/issues/38828 ), but it regressed x86 codegen for 'not-quite-fabs' patterns like (X > -X) ? X : -X. Ie, when there is no fast-math (nsz), the cmp+select is not a proper fabs operation, but it does map nicely to the unusual NAN semantics of MINSS/MAXSS. I drafted this as a target-independent fold, but it doesn't appear to help any other targets and seems to cause regressions for SystemZ at least. Differential Revision: https://reviews.llvm.org/D122726	2022-03-31 09:17:49 -04:00
Luo, Yuanke	6753eb0c90	[X86][AMX] Materialize undef or zero value to tilezero The AMX combiner would store undef or zero to stack and invoke tileload to load the data to tile register. To avoid the store/load, we can materialzie undef or zero value to tilezero. Differential Revision: https://reviews.llvm.org/D122714	2022-03-31 19:10:28 +08:00
Simon Pilgrim	481b185620	[X86] combineCarryThroughADD - recognise X86ISD::ADD(AND(X,1),-1) pattern can be folded to X86ISD::BT As mentioned on D122482, if we've generated a masked overflow test see if we can fold it to X86ISD::BT to feed a X86ISD::ADC/SBB Differential Revision: https://reviews.llvm.org/D122572	2022-03-31 09:52:55 +01:00
Luo, Yuanke	1141c8b6fc	[X86][AMX] Fix bug for amx cast tranform After combining amx cast operation, some amx cast intrinsic may be dead code. This patch is to delete such dead code and avoid crash.	2022-03-30 17:22:30 +08:00
Simon Pilgrim	6697e3354f	[X86] combineADC - fold ADC(C1,C2,Carry) -> ADC(0,C1+C2,Carry) If we're not relying on the flag result, we can fold the constants together into the RHS immediate operand and set the LHS operand to zero, simplifying for further folds. We could do something similar if the flag result is in use and the constant fold doesn't affect it, but I don't have any real test cases for this yet. As suggested by @davezarzycki on Issue #35256 Differential Revision: https://reviews.llvm.org/D122482	2022-03-30 09:11:55 +01:00
Simon Pilgrim	d663166acb	[CostModel][X86] Reduce cost of v2i64 icmp base cost on SSE2 targets Based off the script from D103695, we were exaggerating the cost of the v2i64 comparison expansion using instruction count instead of effective throughput	2022-03-30 09:11:55 +01:00
Simon Pilgrim	1ec109ec58	[X86] combineCarryThroughADD - remove unused peek through of SEXT/AEXT nodes.	2022-03-29 17:22:50 +01:00
Shao-Ce SUN	662b9fa02c	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557	2022-03-29 09:53:24 +08:00
Simon Pilgrim	8a1956dfa5	[X86] lowerV64I8Shuffle - attempt to match with lowerShuffleAsLanePermuteAndPermute Fixes #54562	2022-03-28 17:21:27 +01:00
Kazu Hirata	6212871968	[Target] Apply clang-tidy fixes for readability-redundant-member-init (NFC)	2022-03-27 22:22:37 -07:00
Phoebe Wang	674d52e8ce	[X86] Refactor X86ScalarSSEf16/32/64 with hasFP16/SSE1/SSE2. NFCI This is used for f16 emulation. We emulate f16 for SSE2 targets and above. Refactoring makes the future code to be more clean. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D122475	2022-03-27 12:24:02 +08:00
Shengchen Kan	dc68ca3eff	[X86][tablgen] Rename field hasREX_WPrefix to hasREX_W for X86Inst. NFC To make it more like hasVEX_L and hasEVEX_K, etc.	2022-03-26 23:14:08 +08:00
Shengchen Kan	271e8d2495	[X86][tablgen] Refine the class RecognizableInstr. NFCI 1. Add comments to explain why we set `isAsmParserOnly` for XACQUIRE and XRELEASE 2. Check `X86Inst` in the constructor of `RecognizableInstrBase` so that we can avoid the case where one of it's field is not initialized but accessed by user. (e.g. in X86EVEX2VEXTablesEmitter.cpp) 3. Move `Rec` from `RecognizableInstrBase` to `RecognizableInstr` to reduce size of `RecognizableInstrBase` 4. Remove out-of-date comments for shouldBeEmitted() (filter() was removed) 5. Add a basic field `IsAsmParserOnly` and remove the field `ShouldBeEmitted` b/c we can deduce it w/ little overhead	2022-03-26 22:41:49 +08:00
Simon Pilgrim	43a969debd	[X86] combineADC - pull out repeated dyn_cast<ConstantSDNode> calls. NFC.	2022-03-25 12:53:08 +00:00
Simon Pilgrim	3db858c58c	[X86] combineAdd - fold ADD(ADC(Y,0,W),X) -> ADC(X,Y,W) This also exposed a missed ADC canonicalization of constant ops to the RHS	2022-03-25 10:52:10 +00:00
Simon Pilgrim	33b214b711	[X86] combineSub - fold SUB(X,ADC(Y,0,W)) -> SBB(X,Y,W)	2022-03-24 18:00:00 +00:00
Alexander Belyaev	bef928f8b2	[llvm] Initialize and move UseUpRegs outside of `union` MemOp struct. Asan complained about uninitialized bool `invalid-bool-load` llvm/lib/Target/X86/AsmParser/X86Operand.h:389:12: runtime error: load of value 171, which is not a valid value for type 'bool' Differential Revision: https://reviews.llvm.org/D122405	2022-03-24 16:53:38 +01:00
Shengchen Kan	c34365149d	[X86][NFC] Remove unused variable introduce by D121785	2022-03-24 18:48:10 +08:00
Dávid Bolvanský	03e7fb9d53	[NFCI] Fix set-but-unused warning in X86LoadValueInjectionLoadHardening.cpp	2022-03-24 08:33:40 +01:00
Dávid Bolvanský	44572be295	[NFCI] Fix set-but-unused warning in X86AsmBackend.cpp	2022-03-24 08:13:28 +01:00
Kai Luo	77cc68b049	[X86][NFC] Fix missing `override` in `isMemUseUpRegs` Fix warning ``` warning: 'isMemUseUpRegs' overrides a member function but is not marked 'override' [-Winconsistent-missing-override] ```	2022-03-24 12:35:15 +08:00
Xiang1 Zhang	9566405020	[Inline asm] Fix mangle problem when variable used in inline asm. (Correct 'Mem symbol + IntelExpr' output in PIC model) Reviewed By: skan Differential Revision: https://reviews.llvm.org/D121785	2022-03-24 09:41:23 +08:00
Xiang1 Zhang	287dad13ab	[InlineAsm] Fix mangle problem when global variable used in inline asm (Add modifier P for ARR[BaseReg+IndexReg+..]) Reviewed By: skan Differential Revision: https://reviews.llvm.org/D120887	2022-03-24 09:41:23 +08:00
Xiang1 Zhang	8a6b644c79	[Inline asm] Fix mangle problem when variable used in inline asm. (Connect InlineAsm Memory Operand with its real value not just name) Revert 2 history bugfix patch: Revert "[X86][MS-InlineAsm] Make the constraint *m to be simple place holder" This patch revert https://reviews.llvm.org/D115225 which mainly fix problems intrduced by https://reviews.llvm.org/D113096 This reverts commit `d7c07f60b3`. Revert "Reland "[X86][MS-InlineAsm] Use exact conditions to recognize MS global variables"" This patch revert https://reviews.llvm.org/D116090 which fix problem intrduced by https://reviews.llvm.org/D115225 This reverts commit `24c68ea1eb`. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D120886	2022-03-24 09:41:22 +08:00
Vasileios Porpodas	39aa202aff	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `e6ead19b77`.	2022-03-23 18:32:17 -07:00
Arthur Eubanks	e6ead19b77	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash." This reverts commit `27bd8f9492`. Causes crashes, see comments in D121973	2022-03-23 10:57:45 -07:00
Vasileios Porpodas	27bd8f9492	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `f7d7d2a08d`.	2022-03-22 16:41:55 -07:00
Craig Topper	9933015fdd	[X86] Fold MMX_MOVD64from64rr + store to MMX_MOVQ64mr instead of MMX_MOVD64from64mr. MMX_MOVD64from64rr moves an MMX register to a 64-bit GPR. MMX_MOVD64from64mr is the memory version of moving a MMX register to a 64-bit GPR. It requires the REX.W bit to be set. There are no isel patterns that use this instruction. MMX_MOVQ64mr is the MMX register store instruction. It doesn't require a REX.W prefix. This makes it one byte shorter to encode than MMX_MOVD64from64mr in many cases. Both store instructions output the same mnemonic string. The assembler would choose MMX_MOVQ64mr if it was to parse the output. Which is another reason using it is the correct thing to do. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122241	2022-03-22 14:21:55 -07:00
Arthur Eubanks	f7d7d2a08d	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads."" This reverts commit `79613185d3`. Causes crashes, see comments in https://reviews.llvm.org/D121973.	2022-03-22 13:33:49 -07:00
Simon Pilgrim	5a65f0b4d9	[X86][SandyBridge] Remove superfluous mmx store from vector load schedule model group Noticed by D122216	2022-03-22 10:48:29 +00:00
Shengchen Kan	021b42367a	[X86] Rename MMX_MOVD64from64rm to MMX_MOVD64from64mr b/c it stores sth, NFC Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D122216	2022-03-22 17:59:28 +08:00
Vasileios Porpodas	79613185d3	Recommit "[SLP] Fix lookahead operand reordering for splat loads." Original review: https://reviews.llvm.org/D121354 The original commit `9136145eb0` broke the build on several targets. Differential Revision: https://reviews.llvm.org/D121973	2022-03-21 15:57:32 -07:00
Simon Pilgrim	438ac282db	[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y) (REAPPLIED) As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op. Reapply with extra type legality checks - LowerAndToBT was originally only used during lowering, now that it can occur earlier we might encounter illegal types that we can either promote to i32 or just bail. Differential Revision: https://reviews.llvm.org/D122084	2022-03-21 21:37:42 +00:00
Maksim Panchenko	f8a32f333c	[X86][NFCI] Remove redundant functions Reviewed By: skan Differential Revision: https://reviews.llvm.org/D121731	2022-03-21 14:18:47 -07:00
Nikita Popov	1533682839	Revert "[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)" This reverts commit `81569f5b6e`. This causes a segfault when building consumer-typeset in ReleaseLTO-g configuration: https://llvm-compile-time-tracker.com/show_error.php?commit=81569f5b6ef531a48023f28133481262ee1509a3	2022-03-21 21:52:36 +01:00
Simon Pilgrim	5fd9451668	[X86][AVX512] lower1BitShuffle - fold broadcast(setcc(x,y)) -> setcc(broadcast(x),broadcast(y)) (PR52500) AVX512 has excellent broadcast ops for everything but vXi1 bool vectors - so if we're broadcasting a comparison result, see if we can broadcast the comparison operands instead.	2022-03-21 17:42:49 +00:00
Simon Pilgrim	b6e2832fc2	[X86] Don't fold SUB(X,SBB(0,0,W)) -> SUB(ADC(0,0,W),Y) This will further fold to a AND(SETCC_CARRY(),1) pattern which tends to prevent further folds.	2022-03-21 15:54:48 +00:00
Simon Pilgrim	315896d3ac	[X86] Fold SUB(X,SBB(Y,Z,W)) -> SUB(ADC(X,Z,W),Y) Prefer the commutable ADC over SBB to improve load folding opportunities	2022-03-21 14:20:46 +00:00
Simon Pilgrim	ed51e26ab4	[X86] combineAddOrSubToADCOrSBB - commute + neg subtraction patterns Handle SUB(AND(SRL(Y,Z),1),X) -> NEG(SBB(X,0,BT(Y,Z))) folds I'll address the X86 lost folded-load regressions in a follow-up patch	2022-03-21 13:55:35 +00:00
Simon Pilgrim	5e9365c5eb	[X86] combineAddOrSubToADCOrSBB - bail for illegal types Ensure we don't attempt to fold to illegal types to ADC/SBB nodes. After D122084 its possible for ADD(X,AND(SRL(Y,Z),1) patterns to be matched before type legalization.	2022-03-21 13:31:21 +00:00
Simon Pilgrim	81569f5b6e	[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y) As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op. Differential Revision: https://reviews.llvm.org/D122084	2022-03-21 10:57:12 +00:00
Craig Topper	4b28980772	[X86] Simplify the interface to getCondNoFromDesc. Instead of taking a SkipDefs parameter, rename to getCondSrcNoFromDesc and have it return the source operand number. Make getCondFromMI responsible for adding the number of Defs for MI instructions. While there remove some unneeded casts to unsigned and check for negative numbers instead of explicitly -1. Less than 0 is easier for a compiler to codegen. Differential Revision: https://reviews.llvm.org/D122113	2022-03-20 22:41:39 -07:00
Shengchen Kan	01136c0530	[X86][NFC] Run clang-format on `cb26730aaa`, fix typo and remove redundant else	2022-03-21 12:08:10 +08:00
Shengchen Kan	cb26730aaa	[X86][NFC] Unify implementations of getting condition code	2022-03-21 11:31:16 +08:00
Shengchen Kan	51e6059c12	[X86] Simplify function isDataInvariant by using X86MnemonicTables This is not a NFC change b/c we add more instructions like IMUL16/32/64r, MOV16ao16 and MOV16rr_REV etc to the list. But I think it's reasonable. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D122063	2022-03-20 18:38:58 +08:00
Simon Pilgrim	1ae3c4e948	[X86] combineAddOrSubToADCOrSBB - split to more cleanly handle commuted variants. Split combineAddOrSubToADCOrSBB into wrapper (which handles ADDs with commuted args) and the real combine, which no longer has to account for commutation. I'm intending to extend combineAddOrSubToADCOrSBB to detect patterns other than just X86ISD::SETCC, so we need to detect all patterns without detecting them as part of a commutation swap.	2022-03-20 09:14:21 +00:00
Shengchen Kan	e58dadf3e2	[X86][NFC] Generate fields and getters for subtarget features Non-duplicated comments are moved from X86Subtarget.h to X86.td. This is a follow-up patch for D120906.	2022-03-20 15:27:21 +08:00
Shengchen Kan	ae0ae91903	[X86][NFC] Remove unused variable UseAA	2022-03-20 13:21:25 +08:00
Shengchen Kan	c266776429	[X86][NFC] Remove unused feature UseAA	2022-03-20 13:14:13 +08:00
Shengchen Kan	076a9dc99a	[X86][NFC] Rename hasCMOV() to canUseCMOV(), hasLAHFSAHF() to canUseLAHFSAHF() To make them less like other feature functions. This is a follow-up patch for D121978.	2022-03-20 12:00:25 +08:00
Craig Topper	57b41af838	[X86] Rename FeatureCMPXCHG8B/FeatureCMPXCHG16B to FeatureCX8/CX16 to match CPUID. Rename hasCMPXCHG16B() to canUseCMPXCHG16B() to make it less like other feature functions. Add a similar canUseCMPXCHG8B() that aliases hasCX8() to keep similar naming. Differential Revision: https://reviews.llvm.org/D121978	2022-03-19 12:34:06 -07:00
Simon Pilgrim	34110a7320	[X86] combineAddOrSubToADCOrSBB - pull out repeated Y.getOperand(1) calls. NFC.	2022-03-19 17:56:11 +00:00
Simon Pilgrim	b90478d422	[X86] createShuffleMaskFromVSELECT - handle BLENDV constant masks as well as VSELECT constant masks Handle constant masks for both vselect nodes (mask != 0) and blendv nodes (mask < 0)	2022-03-19 16:51:07 +00:00
Simon Pilgrim	a6c18bfbe3	[X86] combineSelect - don't constant fold BLENDV nodes like VSELECT If a X86ISD::BLENDV op appears before legalization (in this test case due to the icmp_slt x, 0) its constant mask was being treated as a vselect mask (mask != 0) instead of blendv (mask < 0) This just prevents constant folding entirely for non-VSELECT ops.	2022-03-19 16:31:19 +00:00
Simon Pilgrim	56ad791f46	[X86] LowerAndToBT - fold BT(NOT(X),Y) -> BT(X,Y) and flip the CondCode	2022-03-19 14:03:03 +00:00
Simon Pilgrim	c7ba5a9aff	[X86][SSE] Add initial support for extracting non-constant bool vector elements We can use MOVMSK+TEST/BT to extract individual bool elements even if the index isn't constant This relies on combineBitcastvxi1 so some AVX512 cases still aren't optimized as they avoid MOVMSK usage.	2022-03-19 13:31:05 +00:00
Simon Pilgrim	5dde9c1286	[CostModel][X86] Reduce cost of extracting bool vector elements For constant indices, these are now just a MOVMSK+TEST/BT	2022-03-18 19:02:47 +00:00
Amir Ayupov	a954ade8ed	[X86][NFC] Generate mnemonic tables Produce mnemonic tables, adding the functions to llvm::X86 namespace. Reviewed By: MaskRay, skan Differential Revision: https://reviews.llvm.org/D121572	2022-03-18 01:46:48 -07:00
Shengchen Kan	920c2e5763	[X86][NFC] Rename target feature hasCMov->hasCMOV This is a follow-up patch for D121975.	2022-03-18 14:05:52 +08:00
Craig Topper	6cfe41dcc8	[X86] Rename more target feature related things consistency. NFC -Rename ModeBit to IsBit to match X86Subtarget. -Rename FeatureLAHFSAHF to FeatureLAFHSAFH64 to match X86Subtarget. -Use consistent capitalization Reviewed By: skan Differential Revision: https://reviews.llvm.org/D121975	2022-03-17 22:27:17 -07:00
Shengchen Kan	1a70febf82	[X86] Set Int_MemBarrier as a meta-instruction Compiler only emits a comment for `Int_MemBarrier`, so it should be marked as a meta-instruction, which can help improve accuracy of debug location. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D121879	2022-03-18 13:12:28 +08:00
Vasileios Porpodas	9136145eb0	Revert "[SLP] Fix lookahead operand reordering for splat loads." due to build failures This reverts commit `5efa78985b`.	2022-03-17 18:22:04 -07:00
Vasileios Porpodas	5efa78985b	[SLP] Fix lookahead operand reordering for splat loads. Splat loads are inexpensive in X86. For a 2-lane vector we need just one instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads to worse code. This patch adds a new score dedicated for splat loads. Please note that a splat is usually three IR instructions: - It is usually a load and 2 inserts: %ld = load double, double* %gep %ins1 = insertelement <2 x double> poison, double %ld, i32 0 %ins2 = insertelement <2 x double> %ins1, double %ld, i32 1 - But it can also be a load, an insert and a shuffle: %ld = load double, double* %gep %ins = insertelement <2 x double> poison, double %ld, i32 0 %shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer Because of this some of the lit tests contain more IR instructions. Differential Revision: https://reviews.llvm.org/D121354	2022-03-17 18:05:54 -07:00
Sanjay Patel	67e9151096	[x86] try harder to use shift instead of test if it can save some immediate bytes We favor 'and' and 'test' in earlier phases of optimization, and that's usually the better option, but we can save a few instruction bytes by converting a mask constant to a shift here. Differential Revision: https://reviews.llvm.org/D121147	2022-03-17 09:10:57 -04:00
Sanjay Patel	83413bb617	[x86] reduce indentation; NFC We may be able to refine the conditions for these transforms ( D120648 ).	2022-03-16 13:39:02 -04:00
Amir Ayupov	2c4e38fa6f	[X86] Emit REX prefix immediately before the opcode Fix prefix emission order to emit REX immediately before the opcode (SDM vol2, 2.1, Figure 2-1). According to SDM vol2 2.2.1, "Other placements are ignored". This fix has a side effect of outputting segment override prefix in a different order than previously (benign). Follow-up to https://reviews.llvm.org/D120592 Reviewed By: skan, craig.topper Differential Revision: https://reviews.llvm.org/D120871	2022-03-16 08:30:31 -07:00
Amir Ayupov	1d3719820f	[X86] Preserve redundant Address-Size override prefix Print and emit redundant Address-Size override prefix if it's set on the instruction. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D120592	2022-03-16 08:30:29 -07:00
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
Simon Pilgrim	e3deb7d88b	[X86] computeKnownBitsForTargetNode - add X86ISD::AND KnownBits handling Fixes #54171	2022-03-16 11:05:36 +00:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Shengchen Kan	052d37dc7c	[NFC][X86] Rename some variables and functions about target features This is preparation for D121768. The member's name should align w/ the interface for trival target feature.	2022-03-16 13:08:52 +08:00
Matthias Braun	84ef62126a	X86ISelDAGToDAG: Transform TEST + MOV64ri to SHR + TEST Optimize a pattern where a sequence of 8/16 or 32 bits is tested for zero: LLVM normalizes this towards and `AND` with mask which is usually good, but does not work well on X86 when the mask does not fit into a 64bit register. This DagToDAG peephole transforms sequences like: ``` movabsq $562941363486720, %rax # imm = 0x1FFFE00000000 testq %rax, %rdi ``` to ``` shrq $33, %rdi testw %di, %di ``` The result has a shorter encoding and saves a register if the tested value isn't used otherwise. Differential Revision: https://reviews.llvm.org/D121320	2022-03-15 14:18:04 -07:00
Simon Pilgrim	f591231cad	[X86] combineSelect - canonicalize (vXi1 bitcast(iX Cond)) with combineToExtendBoolVectorInReg before legalization This replaces the attempt in `20af71f8ec` to use combineToExtendBoolVectorInReg to create X86ISD::BLENDV masks directly, instead we use it to canonicalize the iX bitcast to a sign-extended mask and then truncate it back to vXi1 prior to legalization breaking it apart. Fixes #53760	2022-03-15 12:16:11 +00:00
Amir Ayupov	842fa38dbe	[X86] Fix cosmetic issues in instruction mnemonics - Remove spurious } in invlpgb mnemonic - Add \t between mnemonic and operands for ud1 instructions Reviewed By: skan, craig.topper Differential Revision: https://reviews.llvm.org/D121570	2022-03-14 12:29:44 -07:00
Mircea Trofin	294eca35a0	[regalloc] Remove -consider-local-interval-cost Discussed extensively on D98232. The functionality introduced in D35816 never worked correctly. In D98232, it was fixed, but, as it was introducing a large compile-time regression, and the value of the original patch was called into doubt, we disabled it by default everywhere. A year later, it appears that caused no grief, so it seems safe to remove the disabled code. This should be accompanied by re-opening bug 26810. Differential Revision: https://reviews.llvm.org/D121128	2022-03-14 10:49:16 -07:00
Simon Pilgrim	ad3a7654dc	[X86] combineCMP - peek through zero-extensions for X86cmp(zext(x0),0) zero tests (PR38960) If we're comparing a value against zero, strip away any zero-extension and perform the comparison on the pre-extended value Fixes #38308 Differential Revision: https://reviews.llvm.org/D121472	2022-03-13 11:38:40 +00:00
Amir Ayupov	999fa9f687	[X86][NFC] Move table from getRelaxedOpcodeArith into its own class Move out the table and prepare the code to reuse it for the reverse mapping. Follows the example of memory folding/unfolding tables in X86InstrFoldTables.cpp Preparation step to unify `llvm::X86::getRelaxedOpcodeArith` and `getShortArithOpcode` in BOLT X86MCPlusBuilder.cpp. Addresses https://lists.llvm.org/pipermail/llvm-dev/2022-January/154526.html Reviewed By: skan, MaskRay Differential Revision: https://reviews.llvm.org/D121402	2022-03-12 09:06:17 -08:00
serge-sans-paille	ed98c1b376	Cleanup includes: DebugInfo & CodeGen Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332	2022-03-12 17:26:40 +01:00
Fangrui Song	689c3a2552	[MC] Fix letter case of some MCSection member functions	2022-03-11 20:07:00 -08:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit `7f230feeea`. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
Simon Pilgrim	e4ab2024a6	[X86] convertIntLogicToFPLogic - enable fp-logic on pre-AVX targets for supported fp predicates (PR34563) If the SETCC fp-condcode is supported on SSE as a single CMPPS/PD op then we can use convertIntLogicToFPLogic to reduce EFLAGS and XMM->GPR traffic like we do for AVX targets. Differential Revision: https://reviews.llvm.org/D121210	2022-03-08 18:06:27 +00:00
Simon Pilgrim	9119eefe5f	[X86] Add cheapX86FSETCC_SSE helper. NFC. Identify FP CondCode that can be performed by a non-AVX SSE CMP op Pulled out of D121210	2022-03-08 18:06:27 +00:00
Simon Pilgrim	d0aa77440c	[X86] convertIntLogicToFPLogic - pull out condcodes. NFCI.	2022-03-08 13:31:17 +00:00
Sanjay Patel	9fce696110	[x86] reduce code duplication for select of X86ISD::CMP; NFC	2022-03-07 15:14:20 -05:00
Maksim Panchenko	cf9b3ef941	Revert "[X86] Fix MCSymbolizer interface for X86Disassembler" This reverts commit `0c2b43ab8c`.	2022-03-07 10:40:48 -08:00
Maksim Panchenko	0c2b43ab8c	[X86] Fix MCSymbolizer interface for X86Disassembler Fix a number of issues with MCSymbolizer::tryAddingSymbolicOperand() in X86Disassembler: * Pass instruction size instead of immediate size. * Correctly adjust the value of PC-relative operands. * Set operand offset to zero when the operand is specified implicitly. Reviewed By: Amir, skan Differential Revision: https://reviews.llvm.org/D121065	2022-03-07 10:27:28 -08:00
Simon Pilgrim	588d97e246	[X86] getTargetVShiftNode - peek through any zext node If the shift amount has been zero-extended, peek through as this might help us further canonicalize the shift amount. Fixes regression mentioned in rG147cfcbef1255ba2b4875b76708dab1a685085f5	2022-03-04 17:41:45 +00:00
Simon Pilgrim	147cfcbef1	[X86] LowerShiftByScalarVariable - find splat patterns with getSplatSourceVector instead of getSplatValue This completes the removal of uses of SelectionDAG::getSplatValue started in D119090 - by avoiding extracting the splatted element we make it a lot easier to zero-extend the bottom 64-bits of the shift amount and fixes issues we had on 32-bit targets where i64 isn't legal. I've removed the old version of getTargetVShiftNode that took the scalar shift amount argument and LowerRotate can finally efficiently handle vXi16 rotates-by-scalar (using the same code as general funnel-shifts). The only regression we see is in the X86-AVX2 PR52719 test case in vector-shift-ashr-256.ll - this is now hitting the same problem as the X86-AVX1 case (failure to simplify a multi-use X86ISD::VBROADCAST_LOAD) which I intend to address in a follow up patch.	2022-03-04 16:47:35 +00:00
Simon Pilgrim	940d7cd59f	[X86] SimplifyDemandedVectorElts - adjust X86ISD::ANDNP demanded elts based off constant masks Similar to what we already do in combineAndnp, if either operand is a constant then we can improve the demanded elts/bits.	2022-03-04 13:40:56 +00:00
Maksim Panchenko	7e570308f2	[NFC] Fix typos Reviewed By: yota9, Amir Differential Revision: https://reviews.llvm.org/D120859	2022-03-03 13:26:39 -08:00
Paul Robinson	7b85f0f32f	[PS4] isPS4 and isPS4CPU are not meaningfully different	2022-03-03 11:36:59 -05:00
Simon Pilgrim	0c9c92ffc0	[X86][XOP] Tidyup VPHADD/VPHSUB unary horizontal ops default schedule class Based off Agner and AMD SoG tables, the XOP VPHADD/VPHSUB unary horizontal ops are as fast as basic arithmetic ops, not the slower SSSE3 binary horizontal add/sub ops. This also matches what the bdver2 model already lists. Noticed while investigating reduction add optimizations.	2022-03-03 12:07:48 +00:00
Simon Pilgrim	75c4a92706	[X86] Enable v32i16 FSHL/FSHR support Now that we've improved splat detection we no longer see regressions in the funnel-shift-by-splat-amount test cases	2022-03-02 17:32:38 +00:00
Simon Pilgrim	ab2cbb8466	[X86] LowerShiftByScalarVariable - remove 32-bit vXi64 bitcast shift amount handling This was handled generically (and better) by D120553	2022-03-02 13:52:14 +00:00
Mircea Trofin	cb2160760e	[nfc][codegen] Move RegisterBank[Info].h under CodeGen This wraps up from D119053. The 2 headers are moved as described, fixed file headers and include guards, updated all files where the old paths were detected (simple grep through the repo), and `clang-format`-ed it all. Differential Revision: https://reviews.llvm.org/D119876	2022-03-01 21:53:25 -08:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
Phoebe Wang	e03d216c28	[X86] Use bit test instructions to optimize some logic atomic operations This is to match GCC's optimizations: https://gcc.godbolt.org/z/3odh9e7WE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120199	2022-03-01 09:57:08 +08:00
Simon Pilgrim	2b46417aa2	[X86][SSE] Attempt to lower vec_reduce_add patterns with PSADBW for zero-extended vXi8 sources For i16/32/64 vectors, if the upper bits are known to be zero, then we can try to truncate to vXi8 (if its worth it) and perform this as a PSADBW to add+zext each v4i8 subvector to a i64 sum, which we can then reduce together. This addresses some of the PR42674 test cases where the source data was vXi8 but had been extended to match a wider unsigned integer accumulator. Differential Revision: https://reviews.llvm.org/D120193	2022-02-27 15:17:42 +00:00
Jameson Nash	c4b1a63a1b	mark getTargetTransformInfo and getTargetIRAnalysis as const Seems like this can be const, since Passes shouldn't modify it. Reviewed By: wsmoses Differential Revision: https://reviews.llvm.org/D120518	2022-02-25 14:30:44 -05:00
Pawe Bylica	eb1ff70fc5	[X86] Combine ADC(ADD(X,Y),0,Carry) -> ADC(X,Y,Carry) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120435	2022-02-25 14:31:20 +01:00
Simon Pilgrim	748bf545dc	Revert rG87753cebf5f861eee418d6bce155dfa0b00f9878 "[X86] combineX86ShufflesRecursively - don't both widening inputs before calling combineX86ShuffleChain" Reverting while we investigate codegen regression reports	2022-02-25 08:59:53 +00:00
Amir Ayupov	e38fc14c43	[X86] Introduce x86-cmov-converter-force-all Introduce an option to expand all CMOV groups into hammocks, matching GCC's `-fno-if-conversion2` flag. The motivation is to leave CMOV conversion opportunities to a binary optimizer that can make the decision based on branch misprediction rate (available e.g. in Intel's LBR). Reviewed By: MaskRay, skan Differential Revision: https://reviews.llvm.org/D119777	2022-02-24 10:47:22 -08:00
Simon Pilgrim	a636801a36	[X86] LowerRotate - enable v8i16 ROTL/ROTR on all pre-SSE41 targets We're still better off expanding this once we have PMOVZX	2022-02-24 14:14:08 +00:00
Simon Pilgrim	0ea50bee83	[X86] SimplifyDemandedVectorEltsForTargetNode - add X86ISD::ANDNP handling	2022-02-24 13:51:51 +00:00
Simon Pilgrim	e41a138520	[X86] LowerShiftByScalarVariable - use getSplatSourceVector for vXi8 shift expansion Using getSplatValue causes poor codegen due to not always being able to remove the EXTRACT_VECTOR_ELT created inside getSplatValue. The vXi16 shifts/rotates are still showing occasional regressions but vXi8 is a definite improvement.	2022-02-24 11:24:06 +00:00
Simon Pilgrim	427d9f60db	[X86] combineX86ShufflesRecursively - pull out repeated getValueType/getSimpleValueType calls.	2022-02-23 18:45:28 +00:00
Simon Pilgrim	87753cebf5	[X86] combineX86ShufflesRecursively - don't both widening inputs before calling combineX86ShuffleChain combineX86ShuffleChain no longer has to assume that the shuffle inputs are the right size, so don't create unnecessary nodes messing up oneuse limits as detailed on Issue #45319	2022-02-23 17:29:41 +00:00
Simon Pilgrim	22d0453128	[X86] combineX86ShuffleChainWithExtract - don't both widening inputs after peeking through ISD::EXTRACT_SUBVECTOR nodes combineX86ShuffleChain no longer has to assume that the shuffle inputs are the right size, so don't create unnecessary nodes messing up oneuse limits as detailed on Issue #45319 Removing widening from combineX86ShufflesRecursively will be the next step, followed by removing combineX86ShuffleChainWithExtract entirely	2022-02-23 15:44:24 +00:00
Sanjay Patel	ad7214f23d	[x86] add load folding restriction to pushAddIntoCmovOfConsts() With only a load-fold the diffs look neutral. If there's a load and store (rmw) fold opportunity as shown in the test based on #53862, then we end up with an extra instruction. Fixes #53862 Differential Revision: https://reviews.llvm.org/D120281	2022-02-22 08:02:11 -05:00
Simon Pilgrim	ec910751fe	[X86] combineX86ShufflesRecursively - attempt to fold ISD::EXTRACT_SUBVECTOR into a shuffle chain Peek through if we're extracting a non-zero'th subvector in an attempt to fold the extract into a lane-crossing shuffle This also exposes a failure to fold extract_subvector(movddup(x),c) -> movddup(extract_subvector(x,c))	2022-02-20 18:50:33 +00:00
Simon Pilgrim	8ef3e895ad	[X86] combineX86ShufflesRecursively - add TODO not to generate temporary nodes Extension to PR45974, unless we actual combine the target shuffles we shouldn't be generating temporary nodes as they may interfere with the one use checks in the shuffle recursions	2022-02-20 15:59:23 +00:00
Simon Pilgrim	ab069f37e8	[X86] combineArithReduction - pull out repeated getVectorNumElements() calls	2022-02-19 19:41:20 +00:00
Simon Pilgrim	de2c0a2e61	[X86] combineADC/SBB - pull out repeated getOperand calls. NFC.	2022-02-18 11:21:44 +00:00
Nick Desaulniers	027c16bef4	[X86ISelLowering] permit BlockAddressSDNode "i" constraints for PIC When building 32b x86 code as PIC, the existing handling of "i" constraints is conservative since generally we have to go through the GOT to find references to functions. But generally, BlockAddresses from C code refer to the Function in the current TU. Permit BlockAddresses to be used with the "i" constraint for those cases. I regressed this in commit `4edb9983cb` ("[SelectionDAG] treat X constrained labels as i for asm") Fixes: https://github.com/llvm/llvm-project/issues/53868 Reviewed By: efriedma, MaskRay Differential Revision: https://reviews.llvm.org/D119905	2022-02-17 10:54:46 -08:00
Simon Pilgrim	3f22a4962d	[X86] selectLEAAddr - add X86ISD::SMUL/UMULO handling After D118128 relaxed the heuristic to require only one EFLAGS generating operand, it now makes sense to avoid X86ISD::SMUL/UMULO duplication as well. Differential Revision: https://reviews.llvm.org/D119578	2022-02-17 13:51:02 +00:00
Simon Pilgrim	ada6bcc13f	[X86] X86tcret_1reg - use cast<> instead of dyn_cast<> to avoid dereference of nullptr The pointer is always dereferenced, so assert the cast is correct instead of returning nullptr	2022-02-17 11:54:12 +00:00
Jessica Paquette	6d58f4ab07	[MachineOutliner] NFC: Hide LRU-related stuff behind helper functions It's not particularly user-friendly to have to call `initLRU` everywhere. Also, it wasn't particularly great that the LRU for registers used in a sequence was also initialized by `initLRU`. This patch hides this stuff behind some helper functions: * `isAvailableAcrossAndOutOfSeq` * `isAnyUnavailableAcrossOrOutOfSeq` * `isAvailableInsideSeq` This allows the user to avoid calling `initLRU` explicitly. Also, it allows us to separate initializing the used-in-sequence LRU from the main LRU. Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this refactor requires that we de-const the Candidate there. Some other quality-of-code improvements: * LRUs in outliner::Candidate now have more descriptive names * Use `Register` instead of `unsigned` in some places * Improve readability in some places by using ranges rather than `std::for_each` This is a preparatory commit for a larger compile time related change for the AArch64 outliner.	2022-02-16 11:39:07 -08:00
Shao-Ce SUN	2aed07e96c	[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter` Reviewed By: skan Differential Revision: https://reviews.llvm.org/D119846	2022-02-16 13:10:09 +08:00
Shao-Ce SUN	9cc49c1951	Revert "[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`" This reverts commit `fe25c06cc5`.	2022-02-16 11:57:49 +08:00
Shao-Ce SUN	fe25c06cc5	[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter` For ten years, it seems that `MCRegisterInfo` is not used by any target. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D119846	2022-02-16 11:47:17 +08:00
Simon Pilgrim	2808743cbd	[X86] LowerVSETCC - always split 512-bit vectors before lowering to PCMPEQ/GT (PR53842) Extend the existing split where we already do this for v32i16/v64i8 We can end up trying to use PCMPEQ/GT if the result needs to be sign-extended (typically due to the DAGCombiner::foldSextSetcc fold). Fixes #53842	2022-02-15 14:21:12 +00:00
Markus Böck	78c27a3cee	[X86][Win64] Avoid statepoints in trailing call position The "avoid trailing call pass" makes sure that no function ends with a call instruction for the purpose of the unwinder. It starts of by skipping over any non real instruction, which is approximated via the Pseudo and Meta property. This sadly leads to issues when the last machine instruction is a STATEPOINT, as it is skipped despite it lowering to a call. This patch fixes the use of a statepoint in the trailing call position by making sure call instructions are not skipped. Differential Revision: https://reviews.llvm.org/D119644	2022-02-15 12:17:19 +01:00
Simon Pilgrim	890beda4e1	[X86] combineArithReduction - pull out (near) duplicate v4i8/v8i8 widening code. NFC.	2022-02-13 21:02:50 +00:00
Sanjay Patel	c486b82cfb	[x86] try harder to scalarize a vector load with extracted integer op uses This is a retry of `b4b97ec813` - that was reverted because it could cause miscompiles by illegally reordering memory operations. A new test based on #53695 is added here to verify we do not have that same problem. extract_vec_elt (load X), C --> scalar load (X+C) As noted in the comment, DAGCombiner has this fold -- and the code in this patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but x86 should benefit even if the loaded vector has other uses as long as we apply some other x86-specific conditions. The motivating example from #50310 is shown in vec_int_to_fp.ll. Fixes #50310 Fixes #53695 Differential Revision: https://reviews.llvm.org/D118376	2022-02-13 08:32:21 -05:00
Phoebe Wang	2aa732a918	[X86][MS] Fix the wrong alignment of vector variable arguments on Win32 D108887 fixed alignment mismatch by changing the caller's alignment in ABI. However, we found some cases that still assume the alignment is vector size. This patch fixes them to avoid the runtime crash. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D114536	2022-02-13 10:23:18 +08:00
Simon Pilgrim	9c55b0e121	[X86] LowerFunnelShift - enable v16i16 support	2022-02-12 17:04:59 +00:00
Simon Pilgrim	a4ed0c2f03	[X86] combineAndnp - if an input has a zero (after inversion for Op0) in a vector element, then we don't demand that bit/element in the other input Similar to what we already perform in combineAnd	2022-02-12 16:49:05 +00:00
Simon Pilgrim	1f43367377	[X86] getTargetVShiftNode - Fix Wparentheses gcc warning.	2022-02-12 16:37:24 +00:00
Simon Pilgrim	6320c3e77c	[X86] combineAndnp - pull out repeated operands. NFC.	2022-02-12 16:35:24 +00:00
Simon Pilgrim	dcf465731d	[X86] combineAnd - add SimplifyMultipleUseDemandedBits handling to masked vector element analysis Extend the existing fold to use SimplifyMultipleUseDemandedBits as well as SimplifyDemandedVectorElts/SimplifyDemandedBits when attempting to simplify based off known zero vector elements.	2022-02-12 15:30:53 +00:00
Simon Pilgrim	1e1b60138c	[X86] Improve uniform funnelshift/rotation amount handling To find uniform shift/rotation amounts, we currently use SelectionDAG::getSplatValue which creates a node that extracts the scalar value from the source vector, this makes it more difficult for later combines to remove the extraction and stay on the SIMD unit, and can be a problem when the scalar type is illegal (i.e. i64 vs v2i64 on 32-bit targets). This patch begins to use SelectionDAG::getSplatSourceVector (which SelectionDAG::getSplatValue uses internally) and adds a new variant of getTargetVShiftNode that takes the source vector and the splat index, and adjusts the vector in place to create the zero-extended value suitable for the SSE PSLL/PSRL/PSRA uniform instructions. I'm still addressing a number of regressions when used for normal vector shifts, so I've just handled the funnelshift/rotation lowering for this first patch. I can then focus on the yak shaving (SimplifyDemandedBits/Elts in particular) necessary to always use SelectionDAG::getSplatSourceVector. Differential Revision: https://reviews.llvm.org/D119090	2022-02-12 14:46:30 +00:00
Simon Pilgrim	37cf7275cd	[X86] Enable vector splitting of ISD::AVGCEILU nodes on AVX1 and non-BWI targets	2022-02-12 14:04:55 +00:00
David Green	f810b40c3b	[X86] Replace X86ISD::AVG with generic ISD::AVGCEILU Pulled out of D106237, this replaces the X86ISD::AVG DAG node with the generic ISD::AVGCEILU. It doesn't remove the detectAVGPattern method, but the extra generic ISel matching does alter the existing test. Differential Revision: https://reviews.llvm.org/D119073	2022-02-11 18:57:18 +00:00
Simon Pilgrim	20af71f8ec	[X86] combineVSelectToBLENDV - handle vselect(vXi1,A,B) -> blendv(sext(vXi1),A,B) For pre-AVX512 targets, attempt to sign-extend a vXi1 condition mask to pass to a X86ISD::BLENDV node Fixes Issue #53760	2022-02-11 18:38:17 +00:00
Simon Pilgrim	48e1434a0a	[X86] Move combineToExtendBoolVectorInReg before the select combines. NFC. Avoid the need for a forward declaration. Cleanup prep for Issue #53760	2022-02-11 16:51:46 +00:00
Simon Pilgrim	827d0c51be	[X86] combineToExtendBoolVectorInReg - use explicit arguments. NFC. Replace the *_EXTEND node with the raw operands, this will make it easier to use combineToExtendBoolVectorInReg for any boolvec extension combine. Cleanup prep for Issue #53760	2022-02-11 16:40:29 +00:00
Bill Wendling	74aa44a887	[X86] Zero out the 32-bit GPRs explicitly This should ensure that only the 32-bit xors are emitted, and not the 64-bit xors. Differential Revision: https://reviews.llvm.org/D119523	2022-02-10 23:09:00 -08:00
Yuanfang Chen	f927021410	Reland "[clang-cl] Support the /JMC flag" This relands commit `b380a31de0`. Restrict the tests to Windows only since the flag symbol hash depends on system-dependent path normalization.	2022-02-10 15:16:17 -08:00
Yuanfang Chen	b380a31de0	Revert "[clang-cl] Support the /JMC flag" This reverts commit `bd3a1de683`. Break bots: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview	2022-02-10 14:17:37 -08:00
Simon Pilgrim	8c82d42e97	[TTI][X86] Pull out repeated getSizeInBits() calls. NFC.	2022-02-10 18:58:32 +00:00
Yuanfang Chen	bd3a1de683	[clang-cl] Support the /JMC flag The introduction and some examples are on this page: https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/ The `/JMC` flag enables these instrumentations: - Insert at the beginning of every function immediately after the prologue with a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`. The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean global variable (the global variable is initialized to 1) with the name convention `__<hash>_<filename>`. All such global variables are placed in the `.msvcjmc` section. - The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping with a directory path. MSVC uses some unknown hashing function. Here I used DJB. - Add a dummy/empty COMDAT function `__JustMyCode_Default`. - Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link option via ".drectve" section. This is to prevent failure in case `__CheckForDebuggerJustMyCode` is not provided during linking. Implementation: All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch). Reviewed By: hans Differential Revision: https://reviews.llvm.org/D118428	2022-02-10 10:26:30 -08:00
Simon Pilgrim	e95fc20f04	[X86] getFMA3OpcodeToCommuteOperands - use unreachable to detect fma3 format mismatch Matches what we do in getThreeSrcCommuteCase. Fixes static analyzer out of bounds array access warning.	2022-02-10 17:14:39 +00:00
Jeremy Morse	662799c851	[DebugInfo][InstrRef] Avoid duplicate instruction numbers in x86-lea-fixup This new-ish LEA-fixup code path creates two substitutions for an instruction number -- this is incorrect because each Value should be replaced by a single replacement Value. Fix by deleting the duplicate substitution. Add some test coverage for this path with debug-info attached. Differential Revision: https://reviews.llvm.org/D119232	2022-02-10 16:36:50 +00:00
Reid Kleckner	f3481f43bb	[X86] Only force FP usage in the presence of pushf/popf on Win64 This ensures that the Windows unwinder will work at every instruction boundary, and allows other targets to read and write flags without setting up a frame pointer. Fixes GH-46875 Differential Revision: https://reviews.llvm.org/D119391	2022-02-09 18:23:16 -08:00
Tong Zhang	2fe315162e	[X86] TCRETURNmi fix for 32bit platform This fix is similar to 3cf3ffce240e("Fix the TCRETURNmi64 bug differently.") after allocating register for index+base, we will only have one register left This bug affects linux kernel compilation for x86 target. Error happens when compiling kmod_si476x_core. clang complains: error: ran out of registers during register allocation The full command is: clang -Wp,-MMD,drivers/mfd/.si476x-cmd.o.d -nostdinc -isystem /opt/toolchain/main/lib/clang/14.0.0/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -Qunused-arguments -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 -no-integrated-as --prefix=/usr/bin/ -Werror=unknown-warning-option -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m32 -msoft-float -mregparm=3 -freg-struct-return -fno-pic -mstack-alignment=4 -march=atom -mtune=atom -mtune=generic -Wa,-mtune=generic32 -ffreestanding -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-address-of-packed-member -O2 -Wframe-larger-than=1024 -fno-stack-protector -Wno-format-invalid-specifier -Wno-gnu -mno-global-merge -Wno-unused-but-set-variable -Wno-unused-const-variable -fomit-frame-pointer -ftrivial-auto-var-init=pattern -fno-stack-clash-protection -falign-functions=32 -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare -DKBUILD_MODFILE='"drivers/mfd/si476x-core"' -DKBUILD_BASENAME='"si476x_cmd"' -DKBUILD_MODNAME='"si476x_core"' -D__KBUILD_MODNAME=kmod_si476x_core -c -o drivers/mfd/si476x-cmd.o drivers/mfd/si476x-cmd.c ------------- LLVM cannot compile the following code for x86 32bit target, the reason is tail call(TCRETURNmi) is using 2 registers for index+base and we want to use more than one registers for passing function args and that is impossible. This fix is similar to 3cf3ffce240e("Fix the TCRETURNmi64 bug differently."). We will only use tail call when it is using <=1 registers for passing args. ``` struct BIG_PARM { int ver; }; static struct { int (foo) (struct BIG_PARM a, void b); int (bar) (struct BIG_PARM* a); int (zoo0) (void); int (zoo1) (void); int (zoo2) (void); int (zoo3) (void); int (zoo4) (void); } vtable[] = { [0] = { .foo = (int ()(struct BIG_PARM* a, void b))0xdeadbeef, }, }; int something(struct BIG_PARM a, void* b) { return vtable[a->ver].foo(a,b); } ``` ``` $ clang -std=gnu89 -m32 -mregparm=3 -mtune=generic -fno-strict-overflow -O2 -c t0.c -o t0.c.o error: ran out of registers during register allocation 1 error generated. ``` Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D118312	2022-02-09 20:34:04 +08:00
Tim Northover	8366e182d5	Revert "X86: gate all vmovsh instructions on FP16 support." This reverts commit `3fc40b6e66`. It was pushed unintentionally.	2022-02-09 12:33:23 +00:00
Tim Northover	3fc40b6e66	X86: gate all vmovsh instructions on FP16 support. Previously the `let Predicates = ...` line only applied to the rr version, and so VMOVSH was being emitted whenever HasAVX512 (the default) applied. This is not right.	2022-02-09 12:29:16 +00:00
serge-sans-paille	ef736a1c39	Cleanup LLVMMC headers There's a few relevant forward declarations in there that may require downstream adding explicit includes: llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h Counting preprocessed lines required to rebuild llvm-project on my setup: before: 1052436830 after: 1049293745 Which is significant and backs up the change in addition to the usual benefits of decreasing coupling between headers and compilation units. Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119244	2022-02-09 11:09:17 +01:00
Bill Wendling	d295a53a92	[X86] Specify Undef for the registers we xor Fixes expensive check failures from D110869.	2022-02-09 02:06:12 -08:00
Bill Wendling	deaf22bc0e	[X86] Implement -fzero-call-used-regs option The "-fzero-call-used-regs" option tells the compiler to zero out certain registers before the function returns. It's also available as a function attribute: zero_call_used_regs. The two upper categories are: - "used": Zero out used registers. - "all": Zero out all registers, whether used or not. The individual options are: - "skip": Don't zero out any registers. This is the default. - "used": Zero out all used registers. - "used-arg": Zero out used registers that are used for arguments. - "used-gpr": Zero out used registers that are GPRs. - "used-gpr-arg": Zero out used GPRs that are used as arguments. - "all": Zero out all registers. - "all-arg": Zero out all registers used for arguments. - "all-gpr": Zero out all GPRs. - "all-gpr-arg": Zero out all GPRs used for arguments. This is used to help mitigate Return-Oriented Programming exploits. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110869	2022-02-08 17:42:54 -08:00
Craig Topper	56d6ccd4cb	[X86] Update register RCL/RCR by 1 and immediate scheduling for Intel CPUs Most Intel CPU scheduler files lumped the immediate and 1 instructions together, but uops.info shows they are quite different. For the most part the by 1 instructions were pretty accurate to the uops.info data except the latency was 3 instead of 2 as uops.info indicates. The by immediate instructions need 7 or 8 uops and have higher latency. It looks like the 8-bit by immediate instructions may need even more uops, but I just lumped them with the 16/32/64. Noticed while checking out PR53648. So mostly I cared about the by 1 instructions. Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D119217	2022-02-08 09:20:20 -08:00
Simon Pilgrim	0b00cd19e6	[X86] selectLEAAddr - relax heuristic to only require one operand to be a MathWithFlags op (PR46809) As suggested by @craig.topper, relaxing LEA matching to only require the ADD to be fed from a single op with EFLAGS helps avoid duplication when the EFLAGS are consumed in a later, dependent instruction. There was some concern about whether the heuristic is too simple, not taking into account lost loads that can't fold by using a LEA, but some basic tests (included in select-lea.ll) don't suggest that's really a problem. Differential Revision: https://reviews.llvm.org/D118128	2022-02-08 15:09:22 +00:00
Sanjay Patel	a68e098024	[SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC This is no-functional-change-intended because only the x86 target enables the TLI hook currently. We can add fmul/fdiv opcodes to the switch similar to the proposal D119111, but we don't need to make other changes like enabling target-specific combines. We can also add integer opcodes (add, or, shl, etc.) to the switch because this function is called from all of the generic binary opcodes. The goal is to incrementally enable the profitable diffs from D90113 while avoiding regressions. Differential Revision: https://reviews.llvm.org/D119150	2022-02-08 09:55:05 -05:00
Simon Pilgrim	fd2bb51f1e	[ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask. This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments. I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these. Differential Revision: https://reviews.llvm.org/D119019	2022-02-08 12:04:13 +00:00
Sanjay Patel	be059a1263	[x86] avoid compile-time warning for parens; NFC	2022-02-07 16:59:50 -05:00
Sanjay Patel	40a50f8701	[x86] avoid false dependency stall on 'sbb' with same source reg This is effectively inverting the transform added with D116804 because the downside of the false dependency of something like "sbb %eax, %eax" is much greater than the upside of eliminating a zeroing instruction on (all?) Intel CPUs. Differential Revision: https://reviews.llvm.org/D118843	2022-02-07 10:12:12 -05:00
Simon Pilgrim	d7be2bff16	[X86] combineShiftRightArithmetic - break if-else chain as they all return (style). NFC.	2022-02-07 09:54:34 +00:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Simon Pilgrim	74b98ab1db	[X86] Fold ZERO_EXTEND_VECTOR_INREG(BUILD_VECTOR(X,Y,?,?)) -> BUILD_VECTOR(X,0,Y,0) Helps avoid some unnecessary shift by splat amount extensions before shuffle combining gets limited by with one use checks	2022-02-06 12:53:11 +00:00
Phoebe Wang	0b7669f333	[X86] Introduce more common modern tunings into `generic` GCC has updated its generic `-mtune` to haswell. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 Update it to match with GCC. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D118534	2022-02-05 10:31:30 +08:00
Sanjay Patel	fff3e1dbaa	[x86] enable fast sqrtss/sqrtps tuning for AMD Zen cores As discussed in D118534, all of the recent AMD CPUs have relatively fast (<14 cycle latency) "sqrtss" and "sqrtps" instructions: https://uops.info/table.html?search=sqrtps&cb_lat=on&cb_tp=on&cb_SNB=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_ZEN3=on&cb_measurements=on&cb_avx=on&cb_sse=on So we should set this tuning flag to alter codegen of plain "sqrt(X)" expansion (as opposed to reciprocal-sqrt - there is other test coverage for that pattern). The expansion is both slower and less accurate than the hardware instruction. Differential Revision: https://reviews.llvm.org/D119001	2022-02-04 13:59:20 -05:00
Sanjay Patel	7b03725097	Revert "[x86] try harder to scalarize a vector load with extracted integer op uses" This reverts commit `b4b97ec813`. As discussed in post-commit feedback at: https://reviews.llvm.org/D118376 ...there's a stage 2 failure on a Mac running a clang-refactor tool test.	2022-02-04 07:45:57 -05:00
Simon Pilgrim	ea7a3e6a6a	[X86] simplifyX86varShift - use KnownBits.getMaxValue().ult() to check for out of bounds shift amounts This is easier to grok than MaskedValueIsZero for high bits.	2022-02-03 16:02:45 +00:00
Fangrui Song	de88c1aba2	[asan][X86] Change some std::string variables to StringRef. NFC	2022-02-02 16:34:35 -08:00
Sanjay Patel	f523e83b20	[x86] make helper function to create sbb with zero operands; NFC As noted in D116804, we want to effectively invert that patch for CPUs (intel) that don't break the false dependency on sbb %eax, %eax So we will likely want to create that here in the X86DAGToDAGISel::Select() case for X86::SETCC_CARRY.	2022-02-02 16:56:10 -05:00
Sanjay Patel	6592bcecd4	[x86] invert a vector select IR canonicalization with a binop identity constant This is an intentionally limited/different form of D90113. That patch bravely tries to generalize folds where we pull a binop into the arms of a select: N0 + (Cond ? 0 : FVal) --> Cond ? N0 : (N0 + FVal) ...but it is not universally profitable. This is the inverse of IR canonicalization as discussed in D113442. We know that this transform is not entirely profitable even within x86, so we only handle x86 vector fadd/fsub as a 1st step. The intent is to prevent AVX512 regressions as mentioned in D113442. The plan is to port this to DAGCombiner (so it will eventually look more like D90113) and add more types/cases in pieces with many more tests to verify that we are seeing improvements. Differential Revision: https://reviews.llvm.org/D118644	2022-02-02 08:17:53 -05:00
Simon Pilgrim	5aa2acc86b	[DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.	2022-02-02 12:04:49 +00:00
serge-sans-paille	e188aae406	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652	2022-02-02 06:54:20 +01:00
Simon Pilgrim	7ec8fc2932	[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements. This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.	2022-01-31 13:58:00 +00:00
Simon Pilgrim	156f83adc2	[X86] combineVectorTruncation - use PACKUSDW(BLENDW(X,0),BLENDW(Y,0)) for v8i32->v8i16 truncation Limit this to SSE41 - AVX1 targets to avoid UNPCKL(PSHUFB,PSHUFB), pre-SSE41 we don't have PACKUSDW/BLENDW and with AVX2 we can perform this as PERMQ(PSHUFB()).	2022-01-30 20:07:04 +00:00
Simon Pilgrim	b7e04ccd99	[X86][AVX] matchUnaryShuffle - avoid creation of on-the-fly nodes (PR45974) Don't extract the ANY/ZERO_EXTEND_VECTOR_INREG subvector source until we're definitely combining to a new node.	2022-01-30 17:59:14 +00:00
Simon Pilgrim	2cdbaca394	[X86] Attempt to fold MOVMSK(CMPEQ(AND(X,C1),0)) -> MOVMSK(NOT(SHL(X,C2))) Allows pow2 mask tests to avoid an unnecessary constant load. Noticed while investigating how to extend MatchVectorAllZeroTest to support more allof/anyof patterns.	2022-01-30 15:53:21 +00:00
Simon Pilgrim	ee9eeed773	[X86] LowerFunnelShift - enable v8i16 lowering	2022-01-29 16:20:36 +00:00
Simon Pilgrim	6777289dd9	[X86] lowerShuffleAsBlend - pull out repeated getVectorNumElements() calls. NFC.	2022-01-29 16:16:29 +00:00
Simon Pilgrim	f1305f2369	[X86] combinePredicateReduction - always use PMOVMSKB(PCMPEQB()) for allof(icmp_eq()) reductions This greatly simplifies the codegen for recognising PTEST patterns and matches the codegen from the very similar LowerVectorAllZero	2022-01-29 15:16:59 +00:00
Simon Pilgrim	67a399fd57	[X86] SimplifyDemandedBits - add X86ISD::BLENDV SimplifyMultipleUseDemandedBits handling Lets us see through multiple use operands	2022-01-29 14:26:41 +00:00
Simon Pilgrim	7e849fd97b	[X86] LowerFunnelShift - allow non-constant vXi8 unpack(y,x) << zext(z) lowering pre-AVX512 Without AVX512 (which can efficiently extend/truncate to vXi16/vXi32), unpacking/packing to vXi16 is more efficient that relying on the (uops-heavy) PBLENDV shift expansion	2022-01-29 13:58:30 +00:00
Luo, Yuanke	be44177ede	[X86][avx512fp16] Promote fp16 to fp32 for frem. Promote fp16 to fp32 for frem. Differential Revision: https://reviews.llvm.org/D118470	2022-01-29 11:41:27 +08:00
Sanjay Patel	b4b97ec813	[x86] try harder to scalarize a vector load with extracted integer op uses extract_vec_elt (load X), C --> scalar load (X+C) As noted in the comment, DAGCombiner has this fold -- and the code in this patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but x86 should benefit even if the loaded vector has other uses as long as we apply some other x86-specific conditions. The motivating example from #50310 is shown in vec_int_to_fp.ll. Fixes #50310 Differential Revision: https://reviews.llvm.org/D118376	2022-01-28 10:22:52 -05:00
Simon Pilgrim	c7bb3665a1	[X86] SimplifyDemandedBitsForTargetNode - fold MOVMSK(YMM) -> MOVMSK(XMM) If we don't demand the upper elements of the 256-bit vector, then just perform as a 128-bit vector	2022-01-28 14:42:53 +00:00
Simon Pilgrim	2a13beaa70	[X86] combineSetCCMOVMSK - don't fold MOVMSK(BITCAST(PCMPEQ(X,0))) -> PTESTZ(X,X) if we're not testing every element comparison	2022-01-28 13:22:37 +00:00
Simon Pilgrim	cce6490eca	[X86] combineSetCCMOVMSK - match all_of patterns with X86ISD::CMP as well as X86ISD::SUB Previous folds by combineSetCCMOVMSK might have converted these to CMP when changing the bitwidth, and the CMP->SUB fold might not have happened (or will happen)	2022-01-28 11:43:10 +00:00
Simon Pilgrim	93c9b39d25	[X86] Fix MOVMSK(CONCAT(X,Y)) -> MOVMSK(AND/OR(X,Y)) fold for float types and demanded elements rG9103b73fe052 was assuming that we could OR/AND with the source vector, but that will fail on float/double vectors without bitcasting - it also missed the case that any_of checks might be testing less than all the source elements	2022-01-28 11:01:47 +00:00
Simon Pilgrim	9103b73fe0	[X86] Fold MOVMSK(CONCAT(X,Y)) -> MOVMSK(AND/OR(X,Y)) for all_of/any_of patterns Makes it easier for later folds and avoids unnecessary 256-bit ops (especially on AVX1-only targets where we miss a lot of integer instructions)	2022-01-27 18:28:09 +00:00
Simon Pilgrim	ccda0f2226	[X86][SSE] Add combineBitOpWithShift for BITOP(SHIFT(X,Z),SHIFT(Y,Z)) -> SHIFT(BITOP(X,Y),Z) vector folds InstCombine performs this more generally with SimplifyUsingDistributiveLaws, but we don't need anything that complex here - this is mainly to fix up cases where logic ops get created late on during lowering, often in conjunction with sext/zext ops for type legalization. https://alive2.llvm.org/ce/z/gGpY5v	2022-01-27 14:54:41 +00:00
Simon Pilgrim	389ae775e4	[X86] Fold TESTZ(OR(LO(X),HI(X)),OR(LO(Y),HI(Y))) -> TESTZ(X,Y) Helps fix a number of poor codegen cases for allof(cmp()) with 256-bit vectors on AVX1	2022-01-27 13:20:36 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Simon Pilgrim	99ae5c13f6	[X86] Add 'getSplitVectorSrc' helper to determine if subvectors all come from the same source Helps determine if the subvector ops come from the same larger vector and match the lower/upper extractions	2022-01-26 15:17:21 +00:00
Simon Pilgrim	157f9b68a3	[X86] combineVectorSignBitsTruncation - fix indentation. NFC.	2022-01-25 11:54:22 +00:00
Martin Storsjö	70cb8daed4	[X86] [CodeView] Add codeview mappings for registers ST0-ST7 These can end up needed after https://reviews.llvm.org/D116821. Suggested by Alexandre Ganea. Differential Revision: https://reviews.llvm.org/D118072	2022-01-25 10:09:06 +02:00
Simon Pilgrim	902184e6cc	[X86] combinePredicateReduction - generalize allof(cmpeq(x,0)) handling to allof(cmpeq(x,y)) There's no further reasons to limit this to cmpeq-with-zero, the outstanding regressions with lowering to PTEST have now been addressed Improves codegen for Issue #53379	2022-01-25 00:24:06 +00:00
Simon Pilgrim	11bb4a1111	[X86] combinePredicateReduction - split vXi16 allof(cmpeq()) to vXi8 allof(cmpeq()) vXi16 patterns allof(cmp()) reduction patterns will have to be pack the comparison results to vXi8 to use PMOVMSKB. If we're reducing cmpeq(), then we can compare the vXi8 halves directly - similar to what we already do for vXi64 -> vXi32 for cases without PCMPEQQ.	2022-01-24 22:43:29 +00:00
Simon Pilgrim	8d298355ca	[X86] combineSetCCMOVMSK - detect and(pcmpeq(),pcmpeq()) ptest pattern. Handle cases where we've split an allof(cmpeq()) pattern to a legal vector type	2022-01-24 21:42:03 +00:00
Simon Pilgrim	6997f4d07f	[X86] combineSetCCMOVMSK - fold allof(cmpeq(x,y)) -> ptest(sub(x,y)) (PR53379) As suggested on PR53379, for all-of icmp-eq patterns, we can use ptest(sub(x,y)) on SSE41+ targets This is a generalization of the existing allof(cmpeq(x,0)) -> ptest(x) pattern We can probably extend this further, in particularly to handle 256-bit cases on pre-AVX2 targets, but this part of the generalization is pretty trivial Fixes Issue #53379	2022-01-24 16:44:37 +00:00
Simon Pilgrim	577a6dc9a1	[X86] getVectorMaskingNode - fix indentation. NFC. clang-format	2022-01-24 11:08:41 +00:00
Kazu Hirata	bf039a8620	[Target] Use range-based for loops (NFC)	2022-01-23 22:53:15 -08:00
Simon Pilgrim	4762c077e7	[X86] LowerFunnelShift - always lower vXi8 fshl by constant amounts as unpack(y,x) << zext(z) This can always be lowered as PMULLW+PSRLWI+PACKUSWB	2022-01-23 21:35:05 +00:00
Simon Pilgrim	32dc14f876	[X86] LowerFunnelShift - use supportedVectorShiftWithBaseAmnt to check for supported scalar shifts Allows us to reuse the ISD shift opcode instead of a mixture of ISD/X86ISD variants	2022-01-23 21:13:58 +00:00
Phoebe Wang	37d1d02200	[X86][MS] Change the alignment of f80 to 16 bytes on Windows 32bits to match with ICC MSVC currently doesn't support 80 bits long double. ICC supports it when the option `/Qlong-double` is specified. Changing the alignment of f80 to 16 bytes so that we can be compatible with ICC's option. Reviewed By: rnk, craig.topper Differential Revision: https://reviews.llvm.org/D115942	2022-01-23 09:58:46 +08:00
David Green	b27e5459d5	[DAG] Convert truncstore(extend(x)) back to store(x) Pulled out of D106237, this folds truncstore(extend(x)) back to store(x) if the original store was legal. This can come up due to the order we fold nodes. A fold from X86 needs to be adjusted to prevent infinite loops, to have it pick the operand of a trunc more directly. Differential Revision: https://reviews.llvm.org/D117901	2022-01-22 13:20:36 +00:00
Joao Moreira	82af95029e	[X86] Enable ibt-seal optimization when LTO is used in Kernel Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function. Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1]. This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage. A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540. The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged. [1] - https://lkml.org/lkml/2021/11/22/591 Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D116070	2022-01-21 10:55:34 +08:00
Simon Pilgrim	866311e71c	[X86] lowerToAddSubOrFMAddSub - lower 512-bit ADDSUB patterns to blend(fsub,fadd) AVX512 doesn't provide a ADDSUB instruction, but if we've built this from a build vector of scalar fsub/fadd elements we can still lower to blend(fsub,fadd)	2022-01-20 15:16:05 +00:00
Simon Pilgrim	304cfc706a	[X86] combineConcatVectorOps - remove superfluous Subtarget.hasAVX() check This function only ever gets called by AVX targets, and we already assert for this at the top of the function	2022-01-20 12:56:09 +00:00
Simon Pilgrim	c4f5fd76da	[X86] combineConcatVectorOps - add handling for X86ISD::VSHL/VSRL/VSRA These can be handled the same as the vector shift by immediate variants that are already handled.	2022-01-20 12:56:08 +00:00
Luo, Yuanke	5dea7a865e	Combine to vpdpbusd when operand is constant and small enough. Differential Revision: https://reviews.llvm.org/D116363	2022-01-20 11:10:49 +08:00
Jim Lin	d6b0734837	[NFC] Use Register instead of unsigned	2022-01-19 20:17:04 +08:00
Simon Pilgrim	6eb8fc9244	[X86] Add some missing dependency-breaking zero idiom patterns to scheduler models Many of the x86 scheduler models are not accounting for their microarch's ability to handle dependency-breaking zero idioms (pxor xmm0,xmm0 etc.), which is causing some notable differences when comparing llvm-mca reports to iaca, uops.info etc. These are based on the Intel AoMs and Agner's docs which list the instructions handled on each cpu model - there may be more, although tbh the xor/pxor/xorps/xorpd are by far the most commonly encountered. Once this is in place we also need to review missing support for 'allones' idioms and reg-reg move elimination, but this needs fixing first. @lebedev.ri The Barcelona test changes are due to the cpu still being tagged as using the SandyBridge model, if/when you get back to D63628 these will need to be addressed. Based on an original patch by @andreadb (Andrea Di Biagio) Differential Revision: https://reviews.llvm.org/D117497	2022-01-19 11:29:33 +00:00
Simon Pilgrim	a8890995ee	[X86][AVX] LowerFunnelShift - improve FSHL/FSHR per-element lowering Similar to LowerRotate, see if we can either unpack or extend to a wider type and use that type's per-element shift instruction	2022-01-19 10:15:43 +00:00
Simon Pilgrim	ce2345d8c1	[X86] getTargetShuffleInputs - ensure we limit the maximum recursion depth to match SelectionDAG::MaxRecursionDepth Regressions were pre-handled by rG62e36b120749 Fixes Issue #52960	2022-01-18 15:25:21 +00:00
Simon Pilgrim	62e36b1207	[X86] canLowerByDroppingEvenElements - generalize to drop even or odd elements This allows us to match shuffle<1,3,5,7,9,11,13,15> style shift+trunc/pack patterns as well as the existing shuffle<0,2,4,6,8,10,12,14> style shuffle trunc/pack patterns In the future, interleaving patterns might benefit from an even more general implementation for higher strides	2022-01-18 15:07:24 +00:00
Simon Pilgrim	c41ca1be7d	[X86] LowerFunnelShift - enable vXi32 handling	2022-01-15 15:03:24 +00:00
Phoebe Wang	f63a805a4e	Revert "[X86][MS] Change the alignment of f80 to 16 bytes on Windows 32bits to match with ICC" This reverts commit `1bb0caf561`.	2022-01-15 10:54:38 +08:00
Fangrui Song	254302021b	[X86] Fix -Wunused-lambda-capture	2022-01-14 10:07:20 -08:00
Simon Pilgrim	67076ebb60	[X86][AVX] lowerShuffleAsLanePermuteAndShuffle - don't split repeated mask patterns Generalize `57a551a8df` - if the inlane mask is a repeated mask, we're better off performing the lane permute instead of splitting	2022-01-14 17:10:37 +00:00
Simon Pilgrim	9b72e0f9a2	[X86] combineConcatVectorOps - fold concat(permilpd(x),permilpd(y)) -> permilpd(concat(x,y))	2022-01-14 15:48:57 +00:00
Simon Pilgrim	7500b4c7e4	[X86] combineConcatVectorOps - fold concat(movsdup(x),movsdup(y)) -> movs*dup(concat(x,y))	2022-01-14 15:48:56 +00:00
Simon Pilgrim	7d0ea3f41a	[X86] combineConcatVectorOps - fold concat(movddup(x),movddup(y)) -> movddup(concat(x,y)) For AVX2+ targets this requires us to also recognise v4f64 concat(broadcast(x),broadcast(y)) -> movddup(concat(x,y))	2022-01-14 14:49:57 +00:00
Craig Topper	0fac3891ec	[X86] Fix mistake in comment on LowerFROUND. NFC The code uses floor not trunc.	2022-01-13 12:22:04 -08:00
Simon Pilgrim	08212dbc44	[X86] Add xop/avx2 shifts to X86TargetLowering::isBinOp Allows shuffle combining through per-element shift nodes This exposed a number of issues with shuffle combining with target intrinsics that are lowered to nodes later during legalization - in particular shuffle combining and SimplifyDemandedVectorElts were being called after canonicalizeShuffleWithBinOps, meaning that shuffles didn't have a chance to be combined away before the shuffle(binop(x,y)) -> binop(shuffle(x),shuffle(y)) fold.	2022-01-13 17:44:10 +00:00
Simon Pilgrim	55029f017d	[X86] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFHW/PSHUFLW handling	2022-01-13 17:08:59 +00:00
Simon Pilgrim	57a551a8df	[X86][AVX] lowerShuffleAsLanePermuteAndShuffle - don't split element rotate patterns Partial element rotate patterns (e.g. for element insertion on Issue #53124) were being split if every lane wasn't crossing, but really there's a good repeated mask hiding in there.	2022-01-13 11:59:08 +00:00
Simon Pilgrim	de3808c8fc	[X86][AVX2] Add SimplifyDemandedVectorElts handling for avx2 per element shifts Noticed while investigating how to improve funnel shift codegen	2022-01-12 14:50:28 +00:00
Rosie Sumpter	552eb372cb	[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter This is required to query the legality more precisely in the LoopVectorizer. This adds another TTI function named 'forceScalarizeMaskedGather/Scatter' function to work around the hack introduced for MVE, where isLegalMaskedGather/Scatter would return an answer by second-guessing where the function was called from, based on the Type passed in (vector vs scalar). The new interface makes this explicit. It is also used by X86 to check for vector widths where gather/scatters aren't profitable (or don't exist) for certain subtargets. Differential Revision: https://reviews.llvm.org/D115329	2022-01-12 13:34:12 +00:00
Simon Pilgrim	c2426fdcae	[X86][XOP] Add SimplifyDemandedVectorElts handling for xop shifts Noticed while investigating how to improve funnel shift codegen	2022-01-12 12:43:13 +00:00
Phoebe Wang	1bb0caf561	[X86][MS] Change the alignment of f80 to 16 bytes on Windows 32bits to match with ICC MSVC currently doesn't support 80 bits long double. ICC supports it when the option `/Qlong-double` is specified. Changing the alignment of f80 to 16 bytes so that we can be compatible with ICC's option. Reviewed By: rnk, craig.topper Differential Revision: https://reviews.llvm.org/D115942	2022-01-12 17:50:37 +08:00
Patrick Holland	85e6e748d4	[MCA] Switching from conservatively guessing which instructions are memory-barrier instructions to providing targets and developers a convenient way to explicitly declare which instructions are memory-barriers. Differential Revision: https://reviews.llvm.org/D116779	2022-01-11 13:50:14 -08:00
Matthias Braun	ad25f8a556	X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr This is a re-commit of `e2c7ee0743` which was reverted in `a2a58d91e8` and `ea81cea816`. This includes a fix to consistently check for EFLAGS being live-out. See phabricator review. Original Summary: This extends `optimizeCompareInstr` to re-use previous comparison results if the previous comparison was with an immediate that was 1 bigger or smaller. Example: CMP x, 13 ... CMP x, 12 ; can be removed if we change the SETg SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP Motivation: This often happens because SelectionDAG canonicalization tends to add/subtract 1 often when optimizing for fallthrough blocks. Example for `x > C` the fallthrough optimization switches true/false blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into `x < C + 1`. Differential Revision: https://reviews.llvm.org/D110867	2022-01-11 09:07:29 -08:00
Simon Pilgrim	20404d820c	[X86] Apply clang-format to X86TargetLowering::isVectorShiftByScalarCheap Fix indentation	2022-01-11 16:41:16 +00:00
Simon Pilgrim	5eb47961c4	[CostModel][X86] Update ROTL/ROTR vXi8/vXi16 costs on AVX512BW targets Refresh based off recent improvements to codegen and the helper script from D103695	2022-01-10 13:18:25 +00:00
Alexander Shaposhnikov	22430ede7e	[CodeGen] Rename emitCalleeSavedFrameMoves This diff renames emitCalleeSavedFrameMoves to avoid conflicts with non-virtual methods of derived classes having the same name but different semantics. E.g. the class AArch64FrameLowering used to have (non-virtual) "emitCalleeSavedFrameMoves" but it started to override TargetFrameLowering::emitCalleeSavedFrameMoves after https://github.com/llvm/llvm-project/commit/c3e6555616 though its usage and semantics didn't change. P.S. for x86 there was no conflict because the signature of non-virtual X86FrameLowering::emitCalleeSavedFrameMoves is different Test plan: make check-all Differential revision: https://reviews.llvm.org/D114140	2022-01-10 01:33:04 +00:00
Sanjay Patel	e745507eda	[x86] exclude "X==0 ? Y :-1" from math/logic transform This is the last step in a series to improve lowering via "SBB" asm: `68defc0134` `aab1f55e33` ...and fixes #53006	2022-01-09 09:03:39 -05:00
Sanjay Patel	aab1f55e33	[x86] use SETCC_CARRY instead of SBB node for select lowering This is a suggested follow-up to D116765. This removes a clear of the register operand, so it is better for code size, but it does potentially create a false register dependency on surrounding code. If that is a problem, it should be solvable using dependency-breaking code that is used for other instructions. Differential Revision: https://reviews.llvm.org/D116804	2022-01-09 06:23:50 -05:00
Simon Pilgrim	75d8507e45	[X86] LowerRotate - enable ROTL vXi16 rotate-by-splat-amount on pre-AVX targets To enable this on all targets there's still a number of regressions due to getSplatValue/getTargetVShiftNode but these don't really affect pre-AVX targets.	2022-01-08 14:57:00 +00:00
Simon Pilgrim	b5d2e232b8	[X86][SSE] Add initial FSHL/FSHR vXi8 lowering support This is very similar to the existing ROTL/ROTR support for scalar shifts in LowerRotate, I think as time goes on we should be able to share much of this code in helpers between Funnel Shift + Rotation lowering.	2022-01-08 12:19:25 +00:00
Kazu Hirata	4e2ec7e38d	[llvm] Remove unused forward declarations (NFC)	2022-01-07 20:00:34 -08:00
Sanjay Patel	68defc0134	[x86] make select lowering using SBB hack more flexible select (X != 0), -1, Y --> 0 - X; or (sbb), Y select (X != 0), Y, -1 --> X - 1; or (sbb), Y We already had these x86 carry-flag transforms, but one was over-specified to handle a "0" select arm only. That's just a special-case of the more general pattern (the 'or' will be deleted if Y is zero). This is part of solving #53006, but it misses that example because some other combine has already converted that exact pattern into math ops. Differential Revision: https://reviews.llvm.org/D116765	2022-01-07 13:23:09 -05:00

... 4 5 6 7 8 ...

22777 Commits