llvm-project

Commit Graph

Author	SHA1	Message	Date
Phoebe Wang	12b203ea7c	[X86][FP16] Add the missing legal action for EXTRACT_SUBVECTOR Fixes #57340 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D132563	2022-08-24 23:25:07 +08:00
Simon Pilgrim	9317e6311f	[TTI] Add SK_Splice shuffle mask detection and X86 costs Enables fixed sized vectors to detect SK_Splice shuffle patterns and provides basic X86 cost support Differential Revision: https://reviews.llvm.org/D132374	2022-08-23 20:07:30 +01:00
Philip Reames	df20ff9ae2	[TTI] Kill last couple uses of OperandValueKind in targets [nfc] Use the accessor methods on the containing class instead so that we can change the representation.	2022-08-23 08:54:41 -07:00
Philip Reames	c9608d57b8	[TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC] This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet. The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.	2022-08-23 07:55:42 -07:00
Philip Reames	104fa367ee	[TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC] This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both. This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through. I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.	2022-08-22 15:16:39 -07:00
Philip Reames	478cf94378	[X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc] This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties. This change adds some accessor methods and uses them to simplify backend code. The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.	2022-08-22 13:37:59 -07:00
Philip Reames	5e87a020a5	[X86][TTI] Rename OpNInfo to OpNKind [nfc] Both are reasonable names; this is solely that an upcoming change can use the OpNInfo name, and the compiler can tell me if I forgot to update something (instead of silently passing along properties that might not hold.)	2022-08-22 13:37:59 -07:00
Simon Pilgrim	dd5b48976c	[CostModel][X86] getShuffleCost - treat SK_Splice as SK_PermuteTwoSrc SK_Splice should be equivalent to a PALIGNR instruction etc. - but as discussed on D132308, until full fixed vector support for SK_Splice is in place, just assume its a SK_PermuteTwoSrc.	2022-08-22 10:51:08 +01:00
Simon Pilgrim	7ff2a9f250	[CostModel][X86] Add CodeSize handling for fadd/fsub/fmul/fsqrt ops Eventually this will be part of the cost table lookup	2022-08-21 17:42:11 +01:00
Simon Pilgrim	3c4391b4bb	Revert rG15de7aaae52ef4be9f9ff3b130804e5b5ccd29f4 "[CostModel][X86] Add CodeSize/SizeLatency handling for fadd/fsub/fmul/fsqrt ops" This is unintentionally affecting some backend tests	2022-08-21 16:51:45 +01:00
Simon Pilgrim	15de7aaae5	[CostModel][X86] Add CodeSize/SizeLatency handling for fadd/fsub/fmul/fsqrt ops Eventually this will be part of the cost table lookup	2022-08-21 16:39:57 +01:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Kazu Hirata	8b1b0d1d81	Revert "Use std::is_same_v instead of std::is_same (NFC)" This reverts commit `c5da37e42d`. This patch seems to break builds with some versions of MSVC.	2022-08-20 23:00:39 -07:00
Kazu Hirata	c5da37e42d	Use std::is_same_v instead of std::is_same (NFC)	2022-08-20 22:36:26 -07:00
Kazu Hirata	258531b7ac	Remove redundant initialization of Optional (NFC)	2022-08-20 21:18:28 -07:00
Simon Pilgrim	fa96383506	[X86] Fold PMULUDQ(X,1) -> AND(X,(1<<32)-1) 'getZeroExtendInReg' Fix cases where shl/srem/urem expansion results in a mulh/mul_lohi(x,1) 'pass through' that gets lowered to pmuludq. Fixes #56684	2022-08-20 14:58:25 +01:00
Simon Pilgrim	a7441289e2	[X86] Fix znver1 256-bit ALU/Logic/Blend uop counts ymm instructions are double pumped on znver1 - noticed while trying to review size-latency costkinds numbers for D132216 Matches AMD 17h SOG / Agner / uops.info	2022-08-19 19:09:39 +01:00
Alexey Bataev	0e7ed32c71	[SLP]Cost for a constant buildvector. In many cases constant buildvector results in a vector load from a constant/data pool. Need to consider this cost too. Differential Revision: https://reviews.llvm.org/D126885	2022-08-19 08:02:42 -07:00
Alexey Bataev	d53e245951	[COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC. Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to better estimate cost with immediate values. Part of D126885.	2022-08-19 07:33:00 -07:00
Simon Pilgrim	b864cad7b4	[CostModel][X86] Adjust SLM select costs to match poor throughput of pblendvb/blendvpd/blendvps	2022-08-18 17:05:38 +01:00
Simon Pilgrim	55b1a147f2	[CostModel][X86] getArithmeticInstrCost - use MUL/DIV/REM expansions for all cost kinds The costs tables still assume throughput, but the general expansion patterns should be good for any cost kind	2022-08-18 14:18:54 +01:00
Simon Pilgrim	fdec50182d	[CostModel] Replace getUserCost with getInstructionCost * Replace getUserCost with getInstructionCost, covering all cost kinds. * Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks. Original Patch by @samparker (Sam Parker) Differential Revision: https://reviews.llvm.org/D79483	2022-08-18 11:55:23 +01:00
Haohai Wen	f4410d471f	[X86] Add schedule module for Alderlake-P The X86SchedAlderlakeP.td file is automatically generated by schedtool (D130897). Most of instruction's scheduling information is based on measured ADL-P data in uops.info. Some data is from GLC tpt/lat data provided by intel doc. The rest instruction's scheduling information is from skylake client schedule model in order to get a relative complete model. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130959	2022-08-18 16:40:14 +08:00
Daniil Fukalov	7ed3d81333	[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo. TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all other costs are processed in TTI. E.g. it is not comfortable to use other TTI members in these two functions overrided in a target. Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout parameter - it was always passed from TTI. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D117723	2022-08-18 00:38:55 +03:00
Nick Desaulniers	6b0e2fa6f0	[SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses As part of re-architecting callbr to no longer use blockaddresses (https://reviews.llvm.org/D129288), we don't really need them in MIR. They make comparing MachineBasicBlocks of indirect targets during MachineVerifier a PITA. Suggested by @efriedma from the discussion: https://reviews.llvm.org/D130290#3669531 Reviewed By: efriedma, void Differential Revision: https://reviews.llvm.org/D130316	2022-08-17 09:34:31 -07:00
Eli Friedman	cfd2c5ce58	Untangle the mess which is MachineBasicBlock::hasAddressTaken(). There are two different senses in which a block can be "address-taken". There can be a BlockAddress involved, which means we need to map the IR-level value to some specific block of machine code. Or there can be constructs inside a function which involve using the address of a basic block to implement certain kinds of control flow. Mixing these together causes a problem: if target-specific passes are marking random blocks "address-taken", if we have a BlockAddress, we can't actually tell which MachineBasicBlock corresponds to the BlockAddress. So split this into two separate bits: one for BlockAddress, and one for the machine-specific bits. Discovered while trying to sort out related stuff on D102817. Differential Revision: https://reviews.llvm.org/D124697	2022-08-16 16:15:44 -07:00
Bing1 Yu	807b8cb06c	[X86] Fix a lowering issue of mask.compress which has undef float passthrough Previously, LegaizeDAG didn't check mask.compress's passthrough might be float, and this lead to getConstant crash since it doesn't support fp Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131947	2022-08-16 17:54:45 +08:00
Simon Pilgrim	a7b85e4c0c	[X86] Freeze shl(x,1) -> add(x,x) vector fold (PR50468) Vector fold shl(x,1) -> add(freeze(x),freeze(x)) to avoid the undef issues identified in PR50468 Differential Revision: https://reviews.llvm.org/D106675	2022-08-15 16:17:21 +01:00
Simon Pilgrim	41bdb8cd36	[X86] Fold insert_vector_elt(undef, elt, 0) --> scalar_to_vector(elt) I had hoped to make this a generic fold in DAGCombine, but there's quite a few regressions in Thumb2 MVE that need addressing first. Fixes regressions from D106675.	2022-08-15 14:56:30 +01:00
Simon Pilgrim	8b47e29fa0	[X86] combineVectorShiftImm - fold (shl (add X, X), C) -> (shl X, (C + 1)) Noticed while investigating the regressions in D106675	2022-08-14 17:42:02 +01:00
Phoebe Wang	8b69549dc5	[X86][FP16] Promote FP16->[U]INT to FP16->FP32->[U]INT This is to avoid f16->i64 being lowered to `__fixhfdi/__fixunshfdi` on 32-bits since neither libgcc nor compiler-rt provide them. https://godbolt.org/z/cjWEsea5v It also helps to improve the performance by promoting the vector type. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D131828	2022-08-14 09:37:33 +08:00
Kazu Hirata	109df7f9a4	[llvm] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-13 12:55:42 -07:00
James Y Knight	4d7f9b7489	X86: Don't fold TEST into ADD ...@GOTTPOFF/GOTNTPOFF/INDNTPOFF The linker may convert such an ADD into a LEA, so we must not use the EFLAGS output. This causes miscompiles with -fsanitize=null after `bacdf80f42` added llvm.threadlocal.address -- previously, global variables were known to be non-null, but the intrinsic is not currently known to return nonnull. (That should be corrected, but it shouldn't've caused miscompiles!) Differential Revision: https://reviews.llvm.org/D131716	2022-08-12 20:52:00 +00:00
Simon Pilgrim	6ba5fc2dee	[X86] lowerShuffleWithVPMOV - support direct lowering to VPMOV on VLX targets lowerShuffleWithVPMOV currently only matches shuffle(truncate(x)) patterns, but on VLX targets the truncate isn't usually necessary to make the VPMOV node worthwhile (as we're only targetting v16i8/v8i16 shuffles we're almost always ending up with a PSHUFB node instead). PACKSS/PACKUS are still preferred vs VPMOV due to their lower uop count. Fixes the remaining regression from the fixes in rG293899c64b75	2022-08-11 17:40:07 +01:00
Simon Pilgrim	5dcf0c342b	[X86] lowerShuffleWithVPMOV - remove oneuse constraints on shuffle(trunc(x),undef) -> vpmov(x) lowering These were added in rG057bdd63 but shuffle combining has gotten a lot better at folding different vector widths since then.	2022-08-11 14:06:42 +01:00
aqjune	02e56e2533	[CodeGen] Generate efficient assembly for freeze(poison) version of `mm_cast` intel intrinsics This patch makes the variants of `mm_cast` intel intrinsics that use `shufflevector(freeze(poison), ..)` emit efficient assembly. (These intrinsics are planned to use `shufflevector(freeze(poison), ..)` after shufflevector's semantics update; relevant thread: D103874) To do so, this patch 1. Updates `LowerAVXCONCAT_VECTORS` in X86ISelLowering.cpp to recognize `FREEZE(UNDEF)` operand of `CONCAT_VECTOR` in addition to `UNDEF` 2. Updates X86InstrVecCompiler.td to recognize `insert_subvector` of `FREEZE(UNDEF)` vector as its first operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130339	2022-08-11 13:36:21 +09:00
Amaury Séchet	9bceb8981d	[X86] (0 - SetCC) \| C -> (zext (not SetCC)) * (C + 1) - 1 if we can get a LEA out of it. This adresses various regression in D131260 , as well as is a useful optimization in itself. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131358	2022-08-10 15:12:00 +00:00
Simon Pilgrim	b92c7dc211	[X86] Use DAG.getFreeze() to create freeze node. NFC.	2022-08-10 15:03:56 +01:00
Alex Bradbury	7e7860c5d7	[X86][NFCI] Remove target-specific branch optimisation that's handled in BranchFolding This specific optimisation is handled in OptimizeBlock in BranchFolding so is redundant. As discussed on the review thread, I've verified that we have test coverage for that optimisation within test/CodeGen/X86 by disabling the BranchFolding version of this transform after applying this patch and rerunning the test suite. Differential Revision: https://reviews.llvm.org/D129204	2022-08-10 10:35:31 +01:00
Phoebe Wang	c7ec6e19d5	[X86][BF16] Make backend type bf16 to follow the psABI X86 psABI has updated to support __bf16 type, the ABI of which is the same as FP16. See https://discourse.llvm.org/t/patch-add-optional-bfloat16-support/63149 This is an alternative of D129858, which has less code modification and supports the vector type as well. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D130832	2022-08-10 08:58:56 +08:00
Luo, Yuanke	aaf6c7b05c	[globalisel] Select register bank for DBG_VALUE The register operand of DBG_VALUE is not selected to a proper register bank in both AArch64 and X86. This would cause getRegClass crash after global ISel. After discussion, we think the MIR should assume all vritual register should be set proper register class after global ISel, so this patch is to fix the gap of DBG_VALUE for AArch64 and X86. Differential Revision: https://reviews.llvm.org/D129037	2022-08-09 13:11:51 +08:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Simon Pilgrim	9ea54ac9ce	[X86] X86ISelDAGToDAG.cpp - use auto for all values derived from cast/dyn_cast (style). NFC.	2022-08-08 14:35:06 +01:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00
Kazu Hirata	54199d805a	[x86] Remove unused declaration processWaitCnt (NFC) The declaration was introduced without a corresponding definition on Jan 2, 2022 in commit `85e6e748d4`.	2022-08-07 00:16:19 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Krzysztof Parzyszek	2bc390bdd6	[RDF] Use default TargetOperandInfo if not given in constructor All current in-tree users use the default implementation.	2022-08-06 14:32:52 -05:00
Dawid Jurczak	1bd31a6898	[NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T Extracted from https://reviews.llvm.org/D129781 and address comment: https://reviews.llvm.org/D129781#3655571 Differential Revision: https://reviews.llvm.org/D130268	2022-08-05 13:35:41 +02:00
Phoebe Wang	2312b747b8	[X86] Move getting module flag into `runOnMachineFunction` to reduce compile-time. NFCI Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D131245	2022-08-05 01:58:17 -07:00
Phoebe Wang	7f648d27a8	Reland "[X86][MC] Always emit `rep` prefix for `bsf`" `BMI` new instruction `tzcnt` has better performance than `bsf` on new processors. Its encoding has a mandatory prefix '0xf3' compared to `bsf`. If we force emit `rep` prefix for `bsf`, we will gain better performance when the same code run on new processors. GCC has already done this way: https://c.godbolt.org/z/6xere6fs1 Fixes #34191 Reviewed By: craig.topper, skan Differential Revision: https://reviews.llvm.org/D130956	2022-08-05 10:22:48 +08:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
Phoebe Wang	6f867f9102	[X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk This is to address feature request from https://github.com/ClangBuiltLinux/linux/issues/1665 Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D130754	2022-08-04 15:12:15 +08:00
Craig Topper	91e8079cd5	[X86] Teach PostprocessISelDAG to fold ANDrm+TESTrr when chain result is used. The isOnlyUserOf prevented the fold if the chain result had any users. What we really care about is the the data result from the AND is only used by the TEST, and the flags results from the ANDs aren't used at all. It's ok if the chain has users, we just need to replace those users with the chain from the TESTrm. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D131117	2022-08-03 21:00:22 -07:00
Craig Topper	84e9194828	Revert "[X86][MC] Always emit `rep` prefix for `bsf`" This reverts commit `c2066d19cd`. It's causing failures on the build bots.	2022-08-03 14:51:34 -07:00
Craig Topper	ff91b2d9df	[X86] Promote i16 CTTZ/CTTZ_ZERO_UNDEF always. If we're going to emit a rep prefix before bsf as proposed in D130956, it makes sense to promote i16 operations to i32 to avoid the false depedency of tzcntw. Reviewed By: skan, pengfei Differential Revision: https://reviews.llvm.org/D130995	2022-08-03 13:12:20 -07:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Phoebe Wang	c2066d19cd	[X86][MC] Always emit `rep` prefix for `bsf` `BMI` new instruction `tzcnt` has better performance than `bsf` on new processors. Its encoding has a mandatory prefix '0xf3' compared to `bsf`. If we force emit `rep` prefix for `bsf`, we will gain better performance when the same code run on new processors. GCC has already done this way: https://c.godbolt.org/z/6xere6fs1 Fixes #34191 Reviewed By: skan Differential Revision: https://reviews.llvm.org/D130956	2022-08-03 17:09:36 +08:00
Liu, Chen3	5bbb0a831f	[X86] Using `X86MemOperand` instead of `Operand` for `i32mem_TC` and `i64mem_TC` To fix build fail when X86_GEN_FOLD_TABLES is enabled. Differential Revision: https://reviews.llvm.org/D131049	2022-08-03 16:17:51 +08:00
Phoebe Wang	23021d4d8c	[X86][FP16] Fix vector_shuffle and lowering without f16c feature problems The problem Alexander reported on D127982 was caused by an optimization for AVX512-FP16 instruction. We must limit it to the feature enabled only. During the investigation, I found we didn't expand for fp_round/fp_extend without F16C. This may result runtime crash, so change them too. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130817	2022-08-02 22:26:41 +08:00
Sotiris Apostolakis	995b61cdac	[SelectOpti] Auto-disable other cmov optis when the new select-opti pass is enabled Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D129817	2022-08-02 00:19:59 +00:00
Jay Foad	a5a7a9da39	[X86] Fix updating LiveVariables in convertToThreeAddress Fix all instances of: * Bad machine code: Kill missing from LiveVariables * in the X86 CodeGen tests with D129213 applied, which adds verification of LiveIntervals after the TwoAddressInstruction pass runs. Differential Revision: https://reviews.llvm.org/D129634	2022-08-01 13:45:21 +01:00
Kazu Hirata	bf6021709a	Use drop_begin (NFC)	2022-07-31 15:17:09 -07:00
Simon Pilgrim	acb5abb7d3	[X86] getFauxShuffleMask - use DemandedElts variant of getTargetShuffleInputs. NFCI. We don't specify the demanded elts yet, this patch just rewires the getTargetShuffleInputs calls and gives an "all demanded elts" mask.	2022-07-31 12:15:04 +01:00
Simon Pilgrim	9cdba33337	[X86] combineX86ShufflesRecursively - determine demanded elts to pass to getTargetShuffleInputs Only PACKSS/PACKUS faux shuffles make use of the demanded elts at the moment, but this at least improves the handling of a couple of truncation patterns.	2022-07-31 11:30:40 +01:00
Simon Pilgrim	df457f583a	[X86] Use std::tie so we can have more meaningful variable names for demanded bits/elts pairs. NFCI. .first + .second were proving difficult to keep track of.	2022-07-30 18:57:15 +01:00
Simon Pilgrim	a14f94c20c	[X86] computeKnownBitsForTargetNode - out of range X86ISD::VSRAI doesn't fold to zero Noticed by inspection and I can't seem to make a test case, but SSE arithmetic bit shifts clamp to the max shift amount (i.e. create a sign splat) - combineVectorShiftImm already does something similar.	2022-07-30 17:55:39 +01:00
Simon Pilgrim	813459ed2b	[X86] combineSelect fold 'smin' style pattern select(pcmpgt(RHS, LHS), LHS, RHS) -> select(pcmpgt(LHS, RHS), RHS, LHS) if pcmpgt(LHS, RHS) already exists Avoids repeated commuted comparisons when we're performing min/max and clamp patterns	2022-07-30 15:31:36 +01:00
Simon Pilgrim	bc2c4f6c85	[X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) (REAPPLIED) If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc. Updated version - original version rG057db2002bb3 didn't correctly account for multiple uses of the mask that might be folding "OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y))" in canonicalizeBitSelect	2022-07-29 15:12:26 +01:00
Florian Hahn	f912bab111	Revert "[X86][DAGISel] Don't widen shuffle element with AVX512" This reverts commit `5fb4134210`. This patch is causing crashes when building llvm-test-suite when optimizing for CPUs with AVX512. Reproducer crashing with llc: target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx" define i32 @test(<32 x i32> %0) #0 { entry: %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1> %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1) ret i32 %2 } ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1 attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" } attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }	2022-07-28 15:26:42 +01:00
Phoebe Wang	726d9f8e8c	[X86][MC] Avoid emitting incorrect warning for complex FMUL We will insert a new operand which is identical to the Dest for complex FMUL with a mask. https://godbolt.org/z/eTEdnYv3q Complex FMA and FMUL with maskz don't have this problem. Reviewed By: LuoYuanke, skan Differential Revision: https://reviews.llvm.org/D130638	2022-07-28 13:58:34 +08:00
Kazu Hirata	3f3930a451	Remove redundaunt virtual specifiers (NFC) Identified with tidy-modernize-use-override.	2022-07-25 23:00:59 -07:00
Luo, Yuanke	5fb4134210	[X86][DAGISel] Don't widen shuffle element with AVX512 Currently the X86 shuffle lowering would widen the element type for shuffle if the mask element value is adjacent. For below example %t2 = add nsw <16 x i32> %t0, %t1 %t3 = sub nsw <16 x i32> %t0, %t1 %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3, <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15> ret <16 x i32> %t4 Compiler would transform the shuffle to %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3, <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> This may lose the oppotunity to let ISel select mask instruction when avx512 is enabled. This patch is to prevent the tranform when avx512 feature is enabled. Thank Simon for the idea. Differential Revision: https://reviews.llvm.org/D129537	2022-07-26 11:56:03 +08:00
Craig Topper	00060a7b97	[X86] Custom type legalize v2i32 smulo/umulo to use a single pmuldq/pmuludq. With SSE4.1 and above we were using 3 multiply instructions. This was due to type legalization widening to v4i32 and the low half being done with pmulld while the high half used two pmuldq/pmuludq. Instead of that, we can use a single pmuludq/pmuldq to calculate the full product at once, extract the high and low bits and compare to check for overflow. I've restricted SMULO to sse4.1 to get pmuldq. We can probably do a fixup to pmuludq on earlier targets, but that's for another day. I was going through my git stash and found an early version of this patch from a year or two ago so I went ahead and finished it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130432	2022-07-25 09:12:35 -07:00
Kazu Hirata	b5188591a0	[llvm] Remove redundaunt virtual specifiers (NFC) Identified with modernize-use-override.	2022-07-24 21:50:35 -07:00
Simon Pilgrim	0708771cce	[DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC. Just use KnownBits::isZero() to ensure all the bits are known zero.	2022-07-24 13:12:21 +01:00
Simon Pilgrim	69d1e805ce	[X86] combineAndnp - remove unused variable. NFC.	2022-07-24 11:32:44 +01:00
Simon Pilgrim	ce81a0df67	[X86][SSE] Enable X86ISD::ANDNP constant folding	2022-07-24 11:07:34 +01:00
Simon Pilgrim	293899c64b	[X86] Don't assume an AND/ANDNP element is undef/undemanded just because one element is undef For mask ops like these, the other operand's corresponding element might be zero (result = zero) - so we must demand all the bits and that element. This appears to be what D128570 was trying to fix - both sides of the funnel shift mask of the vXi64 (legalized to v2Xi32) were incorrectly simplifying the upper 32-bit halves to undef, resulting in bad folds later on. I intend to address the test case regressions, but this close to the release branch I'd prefer to get a fix in first.	2022-07-24 10:53:38 +01:00
Simon Pilgrim	676a03d8a5	[X86] matchBinaryShuffle - limit SHUFFLE(X,Y) -> OR(X,Y) cases to where X + Y are the same width as the result Minor bit of prep work toward not unnecessarily widening shuffle operands in combineX86ShufflesRecursively, instead only widening in combineX86ShuffleChain if we actual find a match - see Issue #45319	2022-07-23 16:56:45 +01:00
Arnold Schwaighofer	58e6ee0e1f	llvm.swift.async.context.addr cannot be modeled as NoMem because we don't want it to be cse'd accross async suspends An async suspend models the split between two partial async functions. `llvm.swift.async.context.addr ` will have a different value in the two partial functions so it is not correct to generally CSE the instruction. rdar://97336162 Differential Revision: https://reviews.llvm.org/D130201	2022-07-22 11:50:58 -07:00
Phoebe Wang	02fe96b240	[X86][FP16] Do not split FP64->FP16 to FP64->FP32->FP16 Truncation from double to half is not always identical to truncating to float first and then to half. https://godbolt.org/z/56s9517hd On the other hand, expanding to float and then to double is always identical to expanding to double directly. https://godbolt.org/z/Ye8vbYPnY Reviewed By: RKSimon, skan Differential Revision: https://reviews.llvm.org/D130151	2022-07-22 08:36:05 +08:00
Haohai Wen	d946fb8d95	[X86] Make sure load size is not larger than stack slot Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D130084	2022-07-20 12:17:44 +08:00
Sanjay Patel	f0dd12ec5c	[x86] use zero-extending load of a byte outside of loops too (2nd try) The first attempt missed changing test files for tools (update_llc_test_checks.py). Original commit message: This implements the main suggested change from issue #56498. Using the shorter (non-extending) instruction with only -Oz ("minsize") rather than -Os ("optsize") is left as a possible follow-up. As noted in the bug report, the zero-extending load may have shorter latency/better throughput across a wide range of x86 micro-arches, and it avoids a potential false dependency. The cost is an extra instruction byte. This could cause perf ups and downs from secondary effects, but I don't think it is possible to account for those in advance, and that will likely also depend on exact micro-arch. This does bring LLVM x86 codegen more in line with existing gcc codegen, so if problems are exposed they are more likely to occur for both compilers. Differential Revision: https://reviews.llvm.org/D129775	2022-07-19 21:27:08 -04:00
Sanjay Patel	95401b0153	Revert "[x86] use zero-extending load of a byte outside of loops too" This reverts commit `9d1ea1774c`. There are tests of update_llc_tests_checks.py that missed being updated.	2022-07-19 17:37:22 -04:00
Sanjay Patel	9d1ea1774c	[x86] use zero-extending load of a byte outside of loops too This implements the main suggested change from issue #56498. Using the shorter (non-extending) instruction with only -Oz ("minsize") rather than -Os ("optsize") is left as a possible follow-up. As noted in the bug report, the zero-extending load may have shorter latency/better throughput across a wide range of x86 micro-arches, and it avoids a potential false dependency. The cost is an extra instruction byte. This could cause perf ups and downs from secondary effects, but I don't think it is possible to account for those in advance, and that will likely also depend on exact micro-arch. This does bring LLVM x86 codegen more in line with existing gcc codegen, so if problems are exposed they are more likely to occur for both compilers. Differential Revision: https://reviews.llvm.org/D129775	2022-07-19 16:43:47 -04:00
Bing1 Yu	e01bf5a3e2	[X86] Promote v32f16's fadd into v32f32's fadd when it is avx512 without avx512fp16 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D130059	2022-07-19 14:37:50 +08:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Benjamin Kramer	9234a7c0df	[X86][FP16] Don't crash when lowering SELECT on fp16 vectors This is a regression from `f187948162`	2022-07-18 13:41:00 +02:00
Phoebe Wang	f187948162	[X86][FP16] Enable vector support for FP16 emulation This is follow up of D107082, which enable vector support according to psABI. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D127982	2022-07-16 09:38:58 +08:00
Phoebe Wang	190518da4b	[X86] Use generic tuning for "x86-64" if "tune-cpu" is not specified This is an alternative to D129154. See discussions on https://discourse.llvm.org/t/fast-scalar-fsqrt-tuning-in-x86/63605 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D129647	2022-07-15 10:05:08 +08:00
David Green	3e0bf1c7a9	[CodeGen] Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Recommitted with some fixes for the leftover MCII variables in release builds. Differential Revision: https://reviews.llvm.org/D129506	2022-07-14 09:33:28 +01:00
David Green	95252133e1	Revert "Move instruction predicate verification to emitInstruction" This reverts commit `e2fb8c0f4b` as it does not build for Release builds, and some buildbots are giving more warning than I saw locally. Reverting to fix those issues.	2022-07-13 13:28:11 +01:00
David Green	e2fb8c0f4b	Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Differential Revision: https://reviews.llvm.org/D129506	2022-07-13 12:53:32 +01:00
Simon Pilgrim	66bfd1ba8c	[X86] Move isInRange(ArrayRef<int>) inside assert to fix NDEBUG builds. NFC. Fix unused static function warning introduced by D129207	2022-07-12 21:51:07 +01:00
Nick Desaulniers	2240d72f15	[X86] initial -mfunction-return=thunk-extern support Adds support for: * `-mfunction-return=<value>` command line flag, and * `__attribute__((function_return("<value>")))` function attribute Where the supported <value>s are: * keep (disable) * thunk-extern (enable) thunk-extern enables clang to change ret instructions into jmps to an external symbol named __x86_return_thunk, implemented as a new MachineFunctionPass named "x86-return-thunks", keyed off the new IR attribute fn_ret_thunk_extern. The symbol __x86_return_thunk is expected to be provided by the runtime the compiled code is linked against and is not defined by the compiler. Enabling this option alone doesn't provide mitigations without corresponding definitions of __x86_return_thunk! This new MachineFunctionPass is very similar to "x86-lvi-ret". The <value>s "thunk" and "thunk-inline" are currently unsupported. It's not clear yet that they are necessary: whether the thunk pattern they would emit is beneficial or used anywhere. Should the <value>s "thunk" and "thunk-inline" become necessary, x86-return-thunks could probably be merged into x86-retpoline-thunks which has pre-existing machinery for emitting thunks (which could be used to implement the <value> "thunk"). Has been found to build+boot with corresponding Linux kernel patches. This helps the Linux kernel mitigate RETBLEED. * CVE-2022-23816 * CVE-2022-28693 * CVE-2022-29901 See also: * "RETBLEED: Arbitrary Speculative Code Execution with Return Instructions." * AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion * TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0 2022-07-12 * Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 SystemZ may eventually want to support "thunk-extern" and "thunk"; both options are used by the Linux kernel's CONFIG_EXPOLINE. This functionality has been available in GCC since the 8.1 release, and was backported to the 7.3 release. Many thanks for folks that provided discrete review off list due to the embargoed nature of this hardware vulnerability. Many Bothans died to bring us this information. Link: https://www.youtube.com/watch?v=IF6HbCKQHK8 Link: https://github.com/llvm/llvm-project/issues/54404 Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1 Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60 Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html Reviewed By: aaron.ballman, craig.topper Differential Revision: https://reviews.llvm.org/D129572	2022-07-12 09:17:54 -07:00
Xiang1 Zhang	a45dd3d814	[X86] Support -mstack-protector-guard-symbol Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D129346	2022-07-12 10:17:00 +08:00
Xiang1 Zhang	643786213b	Revert "[X86] Support -mstack-protector-guard-symbol" This reverts commit `efbaad1c4a`. due to miss adding review info.	2022-07-12 10:14:32 +08:00
Xiang1 Zhang	efbaad1c4a	[X86] Support -mstack-protector-guard-symbol	2022-07-12 10:13:48 +08:00
spupyrev	eecd41aa09	Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type" This reverts commit `6d0528636a`.	2022-07-11 09:50:47 -07:00
Rafael Auler	6d0528636a	Rebase: [Facebook] [MC] Introduce NeverAlign fragment type Summary: Introduce NeverAlign fragment type. The intended usage of this fragment is to insert it before a pair of macro-op fusion eligible instructions. NeverAlign fragment ensures that the next fragment (first instruction in the pair) does not end at a given alignment boundary by emitting a minimal size nop if necessary. In effect, it ensures that a pair of macro-fusible instructions is not split by a given alignment boundary, which is a precondition for macro-op fusion in modern Intel Cores (64B = cache line size, see Intel Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode Pipeline: Macro-Fusion). This patch introduces functionality used by BOLT when emitting code with MacroFusion alignment already in place. The use case is different from BoundaryAlign and instruction bundling: - BoundaryAlign can be extended to perform the desired alignment for the first instruction in the macro-op fusion pair (D101817). However, this approach has higher overhead due to reliance on relaxation as BoundaryAlign requires in the general case - see https://reviews.llvm.org/D97982#2710638. - Instruction bundling: the intent of NeverAlign fragment is to prevent the first instruction in a pair ending at a given alignment boundary, by inserting at most one minimum size nop. It's OK if either instruction crosses the cache line. Padding both instructions using bundles to not cross the alignment boundary would result in excessive padding. There's no straightforward way to request instruction bundling to avoid a given end alignment for the first instruction in the bundle. LLVM: https://reviews.llvm.org/D97982 Manual rebase conflict history: https://phabricator.intern.facebook.com/D30142613 Test Plan: sandcastle Reviewers: #llvm-bolt Subscribers: phabricatorlinter Differential Revision: https://phabricator.intern.facebook.com/D31361547	2022-07-11 09:31:52 -07:00
Simon Pilgrim	97868fb972	[X86] isTargetShuffleEquivalent - attempt to match SM_SentinelZero shuffle mask elements using known bits If the combined shuffle mask requires zero elements, we don't currently have much chance of matching them against the expected source vector. This patch uses the SelectionDAG::MaskedVectorIsZero wrapper to attempt to determine if the expected lement we want to use is already known to be zero. I've also tightened up the ExpectedMask assertion to always be in range - we're never giving it a target shuffle mask that has sentinels at all - allowing to remove some of the confusing bounds checks. This attempts to address some of the regressions uncovered by D129150 where we more aggressively fold shuffles as AND / 'clear' masks which results in more combined shuffles using SM_SentinelZero. Differential Revision: https://reviews.llvm.org/D129207	2022-07-11 15:29:44 +01:00
Nicolai Hähnle	ede600377c	ManagedStatic: remove many straightforward uses in llvm (Reapply after revert in `e9ce1a5880` due to Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other than error categories, to be checked in more detail and reapplied separately.) Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it. Differential Revision: https://reviews.llvm.org/D129120	2022-07-10 10:29:15 +02:00
Nicolai Hähnle	e9ce1a5880	Revert "ManagedStatic: remove many straightforward uses in llvm" This reverts commit `e6f1f06245`. Reverting due to a failure on the fuchsia-x86_64-linux buildbot.	2022-07-10 09:54:30 +02:00
Nicolai Hähnle	e6f1f06245	ManagedStatic: remove many straightforward uses in llvm Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it. Differential Revision: https://reviews.llvm.org/D129120	2022-07-10 09:15:08 +02:00
Phoebe Wang	8fb083d33e	[X86][FP16] Add constrained FP support for scalar emulation This is a follow up patch to support constrained FP in FP16 emulation. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D128114	2022-07-08 20:33:42 +08:00
Sanjay Patel	8b75671314	[SDAG] try to replace subtract-from-constant with xor This is almost the same as the abandoned D48529, but it allows splat vector constants too. This replaces the x86-specific code that was added with the alternate patch D48557 with the original generic combine. This transform is a less restricted form of an existing InstCombine and the proposed SDAG equivalent for that in D128080: https://alive2.llvm.org/ce/z/OUm6N_ Differential Revision: https://reviews.llvm.org/D128123	2022-07-08 08:14:24 -04:00
Haohai Wen	18a1085e02	[X86] Fix collectLeaves for adds used by phi that forms loop When add has additional users, we should indentify whether add's user is phi that forms loop rather than root's. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D129169	2022-07-08 10:39:02 +08:00
Phoebe Wang	6c535f9f1b	[X86][FP16] Fix crash when lowering copysign for f16 This is to address the assertion fail reported in https://reviews.llvm.org/D107082#3635612 Not sure if it is a problem of promoting FCOPYSIGN + libcall FP_ROUND. The promoting will set the rounding mode to 1 `a442c62888/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (L4810-L4814)` While libcall cannot handle the rounding mode equals to 1 `a442c62888/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (L4324-L4328)` So changing the action to Expand to workaround the problem. Reviewed By: clementval, MaskRay Differential Revision: https://reviews.llvm.org/D129294	2022-07-07 19:17:26 -07:00
Nicolai Hähnle	64a78c8501	Remove unnecessary includes of ManagedStatic.h Differential Revision: https://reviews.llvm.org/D129115	2022-07-07 14:29:20 +02:00
Tim Northover	8d9dc83f35	X86: add newline to end of FMA instruction comments. The newline is used by Disassembler.cpp (`emitComments`) to work out how to format them properly, and if there's no newline it goes into an infinite loop. Unfortunately I couldn't get llvm-objdump to be affected, only the MacOS otool utility which dlopens libLTO.	2022-07-07 12:35:28 +01:00
Simon Pilgrim	fbb51ac0ba	[X86] LowerShift - lower some shuffles directly to X86ISD::PSHUFLW nodes. These are expected to lower to X86ISD::PSHUFLW but we were seeing some regressions in D129150 because it'd managed to exploit the masking of the shift amounts to create unintended clear masks instead.	2022-07-06 18:01:03 +01:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Paul Robinson	08e4fe6c61	[X86] Add RDPRU instruction Add support for the RDPRU instruction on Zen2 processors. User-facing features: - Clang option -m[no-]rdpru to enable/disable the feature - Support is implicit for znver2/znver3 processors - Preprocessor symbol __RDPRU__ to indicate support - Header rdpruintrin.h to define intrinsics - "rdpru" mnemonic supported for assembler code Internal features: - Clang builtin __builtin_ia32_rdpru - IR intrinsic @llvm.x86.rdpru Differential Revision: https://reviews.llvm.org/D128934	2022-07-06 07:17:47 -07:00
Craig Topper	2bfca35614	[X86] Disable combineVectorSizedSetCCEquality for soft float. The vector types aren't legal with soft float. Also disable under NoImplicitFloat for good measure. Fixes PR56351. Differential Revision: https://reviews.llvm.org/D129060	2022-07-04 08:33:30 -07:00
Simon Pilgrim	26708fa166	Revert rG057db2002bb3: [X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc. Reverted due to reports from @alexfh about it causing an infinite loop (repro still pending).	2022-07-01 10:36:09 +01:00
Simon Pilgrim	e961e05d59	[SLP][X86] Add 32-bit vector stores to help vectorization opportunities Building on the work on D124284, this patch tags v4i8 and v2i16 vector loads as custom, enabling SLP to try to vectorize these types ending in a partial store (using the SSE MOVD instruction) - we already do something similar for 64-bit vector types. Differential Revision: https://reviews.llvm.org/D127604	2022-06-30 20:25:50 +01:00
Amir Ayupov	cb75faf40c	[X86][BOLT] Use getOperandType to determine memory access size Generate INSTRINFO_OPERAND_TYPE table in X86GenInstrInfo.inc. This diff adds support for instructions that were previously reported as having memory access size 0. It replaces the heuristic of looking at instruction register width to determine memory access width by instead checking the memory operand type using tablegen-provided tables. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D126116	2022-06-30 00:25:32 -07:00
Luo, Yuanke	5cb0979870	[X86][AMX] Split greedy RA for tile register When we fill the shape to tile configure memory, the shape is gotten from AMX pseudo instruction. However the register for the shape may be split or spilled by greedy RA. That cause we fill the shape to config memory after ldtilecfg is executed, so that the shape configuration would be wrong. This patch is to split the tile register allocation from greedy register allocation, so that after tile registers are allocated the shape registers are still virtual register. The shape register only may be redefined or multi-defined by phi elimination pass, two address pass. That doesn't affect tile register configuration. Differential Revision: https://reviews.llvm.org/D128584	2022-06-29 10:35:43 +08:00
Craig Topper	3706bdad4a	[X86] Remove unnecessary COPY from EmitLoweredCascadedSelect. I believe we already checked that the destination of the first CMOV is only used by the second CMOV so I don't think there is any reason we need the PHI to write the register that was used by the first CMOV. We can directly use the second CMOV destination and avoid the copy. This may be a left over from when the cascaded select handling was part of the main algorithm before it was refactored in D35685. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D128124	2022-06-28 09:33:33 -07:00
Simon Pilgrim	0b998053db	[X86] combineConcatVectorOps - IsConcatFree must check extraction index Identified in the regression reported by @alexfh on rGb5d7beeb9792 - IsConcatFree wasn't ensuring the subvector extraction index matched the position it would be concatenated back into.	2022-06-27 11:46:49 +01:00
Nikita Popov	1061511008	[X86PreAMXConfig] Use IRBuilder to insert instructions (NFC) Use an IRBuilder to insert instructions in preWriteTileCfg(). While here, also remove some unnecessary bool return values. There are some test changes because the IRBuilder folds "trunc i16 8 to i8" to "i8 8", and that has knock-on effects on instruction naming. I ran into this when converting tests to opaque pointers and noticed that this pass introduces unnecessary "bitcast ptr to ptr" instructions.	2022-06-22 17:28:48 +02:00
Nikita Popov	fbb72530fe	[X86PreAMXConfig] Use MapVector to fix non-determinism We generate code by iterating over this map, so make sure that the order is deterministic.	2022-06-22 16:57:33 +02:00
Vasileios Porpodas	7a9ad25769	Recommit "[SLP][X86] Improve reordering to consider alternate instruction bundles" This reverts commit `6d6268dcbf`. Review: https://reviews.llvm.org/D125712	2022-06-21 18:35:29 -07:00
Vasileios Porpodas	6d6268dcbf	Revert "[SLP][X86] Improve reordering to consider alternate instruction bundles" This reverts commit `6f88acf410`.	2022-06-21 17:07:21 -07:00
Vasileios Porpodas	6f88acf410	[SLP][X86] Improve reordering to consider alternate instruction bundles During the reordering transformation we should try to avoid reordering bundles like fadd,fsub because this may block them being matched into a single vector instruction in x86. We do this by checking if a TreeEntry is such a pattern and adding it to the list of TreeEntries with orders that need to be considered. Differential Revision: https://reviews.llvm.org/D125712	2022-06-21 16:44:48 -07:00
Simon Pilgrim	ac4cb1775b	[X86] fold (and (mul x, c1), c2) -> (mul x, (and c1, c2)) iff c2 is all/no bits mask Noticed on D128216 - if we're zeroing out vector elements of a mul/mulh result then see if we can merge the and-mask into the mul by just multiplying by zero. Ideally we'd make this generic (similar to the existing foldSelectWithIdentityConstant?), but these cases are appearing very late, after the constants have been lowered to constant-pool loads.	2022-06-21 15:10:43 +01:00
Simon Pilgrim	057db2002b	[X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc.	2022-06-21 12:31:01 +01:00
Simon Pilgrim	843d43e62a	[X86] computeKnownBitsForTargetNode - add X86ISD::VBROADCAST_LOAD handling This requires us to override the isTargetCanonicalConstantNode callback introduced in D128144, so we can recognise the various cases where a VBROADCAST_LOAD constant is being reused at different vector widths to prevent infinite loops.	2022-06-21 11:48:01 +01:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Phoebe Wang	edcc68e86f	[X86] Make sure SF is updated when optimizing for `jg/jge/jl/jle` This fixes issue #56103. Reviewed By: mingmingl Differential Revision: https://reviews.llvm.org/D128122	2022-06-21 09:09:27 +08:00
Simon Pilgrim	8254966062	[X86] LowerINSERT_VECTOR_ELT - always lower v32i8/v16i16 allones insertions on AVX1 as OR ops v32i8/v16i16 blend shuffles on AVX1 will expand to OR(AND,ANDN) patterns which can be easily broken by other combines	2022-06-20 18:43:03 +01:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
Simon Pilgrim	e4a124dda5	[DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m) Similar to the existing (shl (srl x, c1), c2) fold Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125836	2022-06-20 08:37:38 +01:00
Amir Ayupov	c0128549b0	[TableGen][X86] Add Size field to X86MemOperand class Set Size appropriately in operand definitions and query it for dumping memory operand size table `getMemOperandSize` (follow-up use D126116) and `X86Disassembler::getMemOperandSize`. Excerpt from a produced `getMemOperandSize` table for X86: ``` static int getMemOperandSize(int OpType) { switch (OpType) { default: return 0; case OpTypes::i8mem: case OpTypes::i8mem_NOREX: return 8; case OpTypes::f16mem: case OpTypes::i16mem: return 16; case OpTypes::f32mem: case OpTypes::i32mem: return 32; ... ``` Reviewed By: skan, pengfei Differential Revision: https://reviews.llvm.org/D127787	2022-06-19 11:46:56 -07:00
Simon Pilgrim	ba3f2667b6	[DAG] Add MaskedVectorIsZero helper Equivalent to MaskedValueIsZero, except its checking if all of the demanded vectors elements are known to be zero	2022-06-19 17:56:30 +01:00
Simon Pilgrim	41455dd1dc	[X86] Remove isTargetShuffleSplat and just use SelectionDAG::isSplatValue shuffle(splat(x)) -> splat(x), it doesn't have to be a target specific broadcast	2022-06-19 11:22:57 +01:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
Kazu Hirata	47b39c5157	[X86] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-06-18 12:11:58 -07:00
Kazu Hirata	1590d39f2e	[X86] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-06-18 12:08:07 -07:00
Kazu Hirata	7c987bb4d9	[X86] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-06-18 12:05:34 -07:00
Simon Pilgrim	ac3f967382	[X86] canonicalizeShuffleWithBinOps - merge shuffles across binops if either source op is a known splat The shuffle of a splat (with no undefs) should always be removed	2022-06-18 17:14:00 +01:00
Simon Pilgrim	f42f2b7005	[X86] canonicalizeShuffleWithBinOps - merge unary shuffles across binops if either source op is a foldable load This mostly handles folding of constants that have already become loads, but we expose some generic load cases as well. This also exposes the chance to merge unary shuffles across X86ISD::ANDNP nodes with different scalar widths	2022-06-18 15:58:54 +01:00
Kazu Hirata	621f58e716	[Target, CodeGen] Use isImm(), isReg(), etc (NFC)	2022-06-18 07:41:04 -07:00
Simon Pilgrim	3c9123af9f	[X86] isShuffleFoldableLoad - ensure the load has one use. We'll only fold the load if has one use. Makes no difference to existing tests but will be necessary for an upcoming patch to improve load folding as part of canonicalizeShuffleWithBinOps.	2022-06-18 14:51:55 +01:00
Phoebe Wang	655ba9c8a1	Reland "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""" This resolves problems reported in commit `1a20252978`. 1. Promote to float lowering for nodes XINT_TO_FP 2. Bail out f16 from shuffle combine due to vector type is not legal in the version	2022-06-17 21:34:05 +08:00
Benjamin Kramer	1a20252978	Revert "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""" This reverts commit `04a3d5f3a1`. I see two more issues: - uitofp/sitofp from i32/i64 to half now generates __floatsihf/__floatdihf, which exists in neither compiler-rt nor libgcc - This crashes when legalizing the bitcast: ``` ; RUN: llc < %s -mcpu=skx define void @main.45(ptr nocapture readnone %retval, ptr noalias nocapture readnone %run_options, ptr noalias nocapture readnone %params, ptr noalias nocapture readonly %buffer_table, ptr noalias nocapture readnone %status, ptr noalias nocapture readnone %prof_counters) local_unnamed_addr { entry: %fusion = load ptr, ptr %buffer_table, align 8 %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1 %Arg_1.2 = load ptr, ptr %0, align 8 %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 2 %Arg_0.1 = load ptr, ptr %1, align 8 %2 = load half, ptr %Arg_0.1, align 8 %3 = bitcast half %2 to i16 %4 = and i16 %3, 32767 %5 = icmp eq i16 %4, 0 %6 = and i16 %3, -32768 %broadcast.splatinsert = insertelement <4 x half> poison, half %2, i64 0 %broadcast.splat = shufflevector <4 x half> %broadcast.splatinsert, <4 x half> poison, <4 x i32> zeroinitializer %broadcast.splatinsert9 = insertelement <4 x i16> poison, i16 %4, i64 0 %broadcast.splat10 = shufflevector <4 x i16> %broadcast.splatinsert9, <4 x i16> poison, <4 x i32> zeroinitializer %broadcast.splatinsert11 = insertelement <4 x i16> poison, i16 %6, i64 0 %broadcast.splat12 = shufflevector <4 x i16> %broadcast.splatinsert11, <4 x i16> poison, <4 x i32> zeroinitializer %broadcast.splatinsert13 = insertelement <4 x i16> poison, i16 %3, i64 0 %broadcast.splat14 = shufflevector <4 x i16> %broadcast.splatinsert13, <4 x i16> poison, <4 x i32> zeroinitializer %wide.load = load <4 x half>, ptr %Arg_1.2, align 8 %7 = fcmp uno <4 x half> %broadcast.splat, %wide.load %8 = fcmp oeq <4 x half> %broadcast.splat, %wide.load %9 = bitcast <4 x half> %wide.load to <4 x i16> %10 = and <4 x i16> %9, <i16 32767, i16 32767, i16 32767, i16 32767> %11 = icmp eq <4 x i16> %10, zeroinitializer %12 = and <4 x i16> %9, <i16 -32768, i16 -32768, i16 -32768, i16 -32768> %13 = or <4 x i16> %12, <i16 1, i16 1, i16 1, i16 1> %14 = select <4 x i1> %11, <4 x i16> %9, <4 x i16> %13 %15 = icmp ugt <4 x i16> %broadcast.splat10, %10 %16 = icmp ne <4 x i16> %broadcast.splat12, %12 %17 = or <4 x i1> %15, %16 %18 = select <4 x i1> %17, <4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> <i16 1, i16 1, i16 1, i16 1> %19 = add <4 x i16> %18, %broadcast.splat14 %20 = select i1 %5, <4 x i16> %14, <4 x i16> %19 %21 = select <4 x i1> %8, <4 x i16> %9, <4 x i16> %20 %22 = bitcast <4 x i16> %21 to <4 x half> %23 = select <4 x i1> %7, <4 x half> <half 0xH7E00, half 0xH7E00, half 0xH7E00, half 0xH7E00>, <4 x half> %22 store <4 x half> %23, ptr %fusion, align 16 ret void } ``` llc: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:977: void (anonymous namespace)::SelectionDAGLegalize::LegalizeOp(llvm::SDNode ): Assertion `(TLI.getTypeAction(DAG.getContext(), Op.getValueType()) == TargetLowering::TypeLegal \|\| Op.getOpcode() == ISD::TargetConstant \|\| Op.getOpcode() == ISD::Register) && "Unexpected illegal type!"' failed.	2022-06-17 09:43:07 +02:00
Phoebe Wang	04a3d5f3a1	Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""" Fix the crash on lowering X86ISD::FCMP.	2022-06-17 12:12:17 +08:00
Paul Robinson	ff0122dcce	[PS5] Emit ud2 for ubsan trap	2022-06-16 11:20:10 -07:00
Paul Robinson	77b00098f2	[PS5] Use same debug trap instruction as PS4	2022-06-16 11:03:03 -07:00
Frederik Gossen	3cd5696a33	Revert "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""" This reverts commit `e1c5afa47d`. This introduces crashes in the JAX backend on CPU. A reproducer in LLVM is below. Let me know if you have trouble reproducing this. ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" @0 = private unnamed_addr constant [4 x i8] c"\00\00\00?" @1 = private unnamed_addr constant [4 x i8] c"\1C}\908" @2 = private unnamed_addr constant [4 x i8] c"?\00\\4" @3 = private unnamed_addr constant [4 x i8] c"%ci1" @4 = private unnamed_addr constant [4 x i8] zeroinitializer @5 = private unnamed_addr constant [4 x i8] c"\00\00\00\C0" @6 = private unnamed_addr constant [4 x i8] c"\00\00\00B" @7 = private unnamed_addr constant [4 x i8] c"\94\B4\C22" @8 = private unnamed_addr constant [4 x i8] c"^\09B6" @9 = private unnamed_addr constant [4 x i8] c"\15\F3M?" @10 = private unnamed_addr constant [4 x i8] c"e\CC\\;" @11 = private unnamed_addr constant [4 x i8] c"d\BD/>" @12 = private unnamed_addr constant [4 x i8] c"V\F4I=" @13 = private unnamed_addr constant [4 x i8] c"\10\CB,<" @14 = private unnamed_addr constant [4 x i8] c"\AC\E3\D6:" @15 = private unnamed_addr constant [4 x i8] c"\DC\A8E9" @16 = private unnamed_addr constant [4 x i8] c"\C6\FA\897" @17 = private unnamed_addr constant [4 x i8] c"%\F9\955" @18 = private unnamed_addr constant [4 x i8] c"\B5\DB\813" @19 = private unnamed_addr constant [4 x i8] c"\B4W_\B2" @20 = private unnamed_addr constant [4 x i8] c"\1Cc\8F\B4" @21 = private unnamed_addr constant [4 x i8] c"~3\94\B6" @22 = private unnamed_addr constant [4 x i8] c"3Yq\B8" @23 = private unnamed_addr constant [4 x i8] c"\E9\17\17\BA" @24 = private unnamed_addr constant [4 x i8] c"\F1\B2\8D\BB" @25 = private unnamed_addr constant [4 x i8] c"\F8t\C2\BC" @26 = private unnamed_addr constant [4 x i8] c"\82[\C2\BD" @27 = private unnamed_addr constant [4 x i8] c"uB-?" @28 = private unnamed_addr constant [4 x i8] c"^\FF\9B\BE" @29 = private unnamed_addr constant [4 x i8] c"\00\00\00A" ; Function Attrs: uwtable define void @main.158(ptr %retval, ptr noalias %run_options, ptr noalias %params, ptr noalias %buffer_table, ptr noalias %status, ptr noalias %prof_counters) #0 { entry: %fusion.invar_address.dim.1 = alloca i64, align 8 %fusion.invar_address.dim.0 = alloca i64, align 8 %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1 %Arg_0.1 = load ptr, ptr %0, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 0 %fusion = load ptr, ptr %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 store i64 0, ptr %fusion.invar_address.dim.0, align 8 br label %fusion.loop_header.dim.0 return: ; preds = %fusion.loop_exit.dim.0 ret void fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %entry %fusion.indvar.dim.0 = load i64, ptr %fusion.invar_address.dim.0, align 8 %2 = icmp uge i64 %fusion.indvar.dim.0, 3 br i1 %2, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0 fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0 store i64 0, ptr %fusion.invar_address.dim.1, align 8 br label %fusion.loop_header.dim.1 fusion.loop_header.dim.1: ; preds = %fusion.loop_body.dim.1, %fusion.loop_body.dim.0 %fusion.indvar.dim.1 = load i64, ptr %fusion.invar_address.dim.1, align 8 %3 = icmp uge i64 %fusion.indvar.dim.1, 1 br i1 %3, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1 fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1 %4 = getelementptr inbounds [3 x [1 x half]], ptr %Arg_0.1, i64 0, i64 %fusion.indvar.dim.0, i64 0 %5 = load half, ptr %4, align 2, !invariant.load !0, !noalias !3 %6 = fpext half %5 to float %7 = call float @llvm.fabs.f32(float %6) %constant.121 = load float, ptr @29, align 4 %compare.2 = fcmp ole float %7, %constant.121 %8 = zext i1 %compare.2 to i8 %constant.120 = load float, ptr @0, align 4 %multiply.95 = fmul float %7, %constant.120 %constant.119 = load float, ptr @5, align 4 %add.82 = fadd float %multiply.95, %constant.119 %constant.118 = load float, ptr @4, align 4 %multiply.94 = fmul float %add.82, %constant.118 %constant.117 = load float, ptr @19, align 4 %add.81 = fadd float %multiply.94, %constant.117 %multiply.92 = fmul float %add.82, %add.81 %constant.116 = load float, ptr @18, align 4 %add.79 = fadd float %multiply.92, %constant.116 %multiply.91 = fmul float %add.82, %add.79 %subtract.87 = fsub float %multiply.91, %add.81 %constant.115 = load float, ptr @20, align 4 %add.78 = fadd float %subtract.87, %constant.115 %multiply.89 = fmul float %add.82, %add.78 %subtract.86 = fsub float %multiply.89, %add.79 %constant.114 = load float, ptr @17, align 4 %add.76 = fadd float %subtract.86, %constant.114 %multiply.88 = fmul float %add.82, %add.76 %subtract.84 = fsub float %multiply.88, %add.78 %constant.113 = load float, ptr @21, align 4 %add.75 = fadd float %subtract.84, %constant.113 %multiply.86 = fmul float %add.82, %add.75 %subtract.83 = fsub float %multiply.86, %add.76 %constant.112 = load float, ptr @16, align 4 %add.73 = fadd float %subtract.83, %constant.112 %multiply.85 = fmul float %add.82, %add.73 %subtract.81 = fsub float %multiply.85, %add.75 %constant.111 = load float, ptr @22, align 4 %add.72 = fadd float %subtract.81, %constant.111 %multiply.83 = fmul float %add.82, %add.72 %subtract.80 = fsub float %multiply.83, %add.73 %constant.110 = load float, ptr @15, align 4 %add.70 = fadd float %subtract.80, %constant.110 %multiply.82 = fmul float %add.82, %add.70 %subtract.78 = fsub float %multiply.82, %add.72 %constant.109 = load float, ptr @23, align 4 %add.69 = fadd float %subtract.78, %constant.109 %multiply.80 = fmul float %add.82, %add.69 %subtract.77 = fsub float %multiply.80, %add.70 %constant.108 = load float, ptr @14, align 4 %add.68 = fadd float %subtract.77, %constant.108 %multiply.79 = fmul float %add.82, %add.68 %subtract.75 = fsub float %multiply.79, %add.69 %constant.107 = load float, ptr @24, align 4 %add.67 = fadd float %subtract.75, %constant.107 %multiply.77 = fmul float %add.82, %add.67 %subtract.74 = fsub float %multiply.77, %add.68 %constant.106 = load float, ptr @13, align 4 %add.66 = fadd float %subtract.74, %constant.106 %multiply.76 = fmul float %add.82, %add.66 %subtract.72 = fsub float %multiply.76, %add.67 %constant.105 = load float, ptr @25, align 4 %add.65 = fadd float %subtract.72, %constant.105 %multiply.74 = fmul float %add.82, %add.65 %subtract.71 = fsub float %multiply.74, %add.66 %constant.104 = load float, ptr @12, align 4 %add.64 = fadd float %subtract.71, %constant.104 %multiply.73 = fmul float %add.82, %add.64 %subtract.69 = fsub float %multiply.73, %add.65 %constant.103 = load float, ptr @26, align 4 %add.63 = fadd float %subtract.69, %constant.103 %multiply.71 = fmul float %add.82, %add.63 %subtract.67 = fsub float %multiply.71, %add.64 %constant.102 = load float, ptr @11, align 4 %add.62 = fadd float %subtract.67, %constant.102 %multiply.70 = fmul float %add.82, %add.62 %subtract.66 = fsub float %multiply.70, %add.63 %constant.101 = load float, ptr @28, align 4 %add.61 = fadd float %subtract.66, %constant.101 %multiply.68 = fmul float %add.82, %add.61 %subtract.65 = fsub float %multiply.68, %add.62 %constant.100 = load float, ptr @27, align 4 %add.60 = fadd float %subtract.65, %constant.100 %subtract.64 = fsub float %add.60, %add.62 %multiply.66 = fmul float %subtract.64, %constant.120 %constant.99 = load float, ptr @6, align 4 %divide.4 = fdiv float %constant.99, %7 %add.59 = fadd float %divide.4, %constant.119 %multiply.65 = fmul float %add.59, %constant.118 %constant.98 = load float, ptr @3, align 4 %add.58 = fadd float %multiply.65, %constant.98 %multiply.64 = fmul float %add.59, %add.58 %constant.97 = load float, ptr @7, align 4 %add.57 = fadd float %multiply.64, %constant.97 %multiply.63 = fmul float %add.59, %add.57 %subtract.63 = fsub float %multiply.63, %add.58 %constant.96 = load float, ptr @2, align 4 %add.56 = fadd float %subtract.63, %constant.96 %multiply.62 = fmul float %add.59, %add.56 %subtract.62 = fsub float %multiply.62, %add.57 %constant.95 = load float, ptr @8, align 4 %add.55 = fadd float %subtract.62, %constant.95 %multiply.61 = fmul float %add.59, %add.55 %subtract.61 = fsub float %multiply.61, %add.56 %constant.94 = load float, ptr @1, align 4 %add.54 = fadd float %subtract.61, %constant.94 %multiply.60 = fmul float %add.59, %add.54 %subtract.60 = fsub float %multiply.60, %add.55 %constant.93 = load float, ptr @10, align 4 %add.53 = fadd float %subtract.60, %constant.93 %multiply.59 = fmul float %add.59, %add.53 %subtract.59 = fsub float %multiply.59, %add.54 %constant.92 = load float, ptr @9, align 4 %add.52 = fadd float %subtract.59, %constant.92 %subtract.58 = fsub float %add.52, %add.54 %multiply.58 = fmul float %subtract.58, %constant.120 %9 = call float @llvm.sqrt.f32(float %7) %10 = fdiv float 1.000000e+00, %9 %multiply.57 = fmul float %multiply.58, %10 %11 = trunc i8 %8 to i1 %12 = select i1 %11, float %multiply.66, float %multiply.57 %13 = fptrunc float %12 to half %14 = getelementptr inbounds [3 x [1 x half]], ptr %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 0 store half %13, ptr %14, align 2, !alias.scope !3 %invar.inc1 = add nuw nsw i64 %fusion.indvar.dim.1, 1 store i64 %invar.inc1, ptr %fusion.invar_address.dim.1, align 8 br label %fusion.loop_header.dim.1 fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1 %invar.inc = add nuw nsw i64 %fusion.indvar.dim.0, 1 store i64 %invar.inc, ptr %fusion.invar_address.dim.0, align 8 br label %fusion.loop_header.dim.0 fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0 br label %return } ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn declare float @llvm.fabs.f32(float %0) #1 ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn declare float @llvm.sqrt.f32(float %0) #1 attributes #0 = { uwtable "denormal-fp-math"="preserve-sign" "no-frame-pointer-elim"="false" } attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 6} !2 = !{i64 8} !3 = !{!4} !4 = !{!"buffer: {index:0, offset:0, size:6}", !5} !5 = !{!"XLA global AA domain"}	2022-06-15 18:04:42 -04:00

1 2 3 4 5 ...

22902 Commits