llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	5b06ccb611	Revert "foo" This reverts commit `2138ef354a`. Forgot to squash	2022-10-03 12:15:41 -07:00
Craig Topper	a55cdcae3e	Revert "[RISCV] Add missing VL arguments to the creation of RISCVISD::VMV_V_X_VL nodes." This reverts commit `4c03c9f375`. Forgot to squash	2022-10-03 12:15:28 -07:00
Craig Topper	4c03c9f375	[RISCV] Add missing VL arguments to the creation of RISCVISD::VMV_V_X_VL nodes. VMV_V_X_VL nodes should always have a passthru, a splat, and a VL. We were sometimes missing the VL. This went unnoticed because these cases were all selected into the following node to form a .vx or .vi instruction. The ComplexPattern that does this, doesn't check the VL operand. I've added an assert to the ComplexPattern to catch if the operand is missing. @qcolombet spotted some of these in D134703.	2022-10-03 12:13:21 -07:00
Craig Topper	2138ef354a	foo	2022-10-03 12:13:20 -07:00
Chris Bieneman	c0a76c2c71	[DirectX] Add DXIL metadata `dx.shaderModel` This captures the target shader model and pipeline stage into the DXIL metadata for consumption by the DirectX runtime. Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D134469	2022-10-03 13:00:11 -05:00
Craig Topper	31bca38ad1	[RISCV] Pass the destination register to getVLENFactoredAmount instead of returning it. NFC This is a refactor for another patch. For now we move the vreg creation to the caller. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D135008	2022-10-03 10:59:35 -07:00
Zain Jaffal	966411790e	[AArch64] Add support to loop vectorization for non temporal loads Currently, AArch64 doesn't support vectorization for non temporal loads because `isLegalNTLoad` is not implemented for the target. This patch applies similar functionality as `D73158` but for non temporal loads Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D131964	2022-10-03 17:06:47 +01:00
Sam Clegg	664a5c6d03	[WebAssembly] Fix return type of __builtin_return_address under wasm64 Differential Revision: https://reviews.llvm.org/D135005	2022-10-03 08:31:52 -07:00
Amara Emerson	3daf7ddaef	[GlobalISel] Allow prelegalizer combiners to have access to LegalizerInfo. Before, the isPreLegalize() query in CombinerHelper only checked for the presence of a LegalizerInfo object. This is problematic when we want to have a combine actually check for legality in a pre-legalizer combine pass, since if we pass a LegalizerInfo object to the constructor it causes the combines to think that we're running post legalizer, which isn't true. This change fixes it to instead check an explicit bool that passes to signal whether the pass will be run before or after legalization. Doing so exposed a bug in the extending loads combine, which tried to check for legality of candidate extending loads if LegalizerInfo was present. Since we only ran it pre-legalizer and therefore with a null LegalizerInfo, it never actually ran. Also fixes the legality checks to keep the tests passing. Differential Revision: https://reviews.llvm.org/D135044	2022-10-03 07:36:18 +01:00
David Green	5e1a9d319d	[ARM] Add lowering for bf16 neon vtrn, vzup and vuzp. These go via Dag2Dag, which are better based on element sizes not the exact element types.	2022-10-02 15:34:37 +01:00
David Green	f2fde99461	[ARM] More bf16 shuffle handling, including perfect shuffles.	2022-10-02 14:31:51 +01:00
David Green	8193f0d1d2	[ARM] Add tablegen patterns for bf16 vrev	2022-10-02 13:42:14 +01:00
David Green	58369c8631	[ARM] Add tablegen patterns for bf16 vext This adds missing tablegen patterns for VEXT, identical to the fp16 patterns as they only use baseline Neon operations. Part of fixing #57770.	2022-10-02 12:45:58 +01:00
Craig Topper	5bbc5eb55f	[RISCV] Use _TIED form of VWADD(U)_WX/VWSUB(U)_WX to avoid early clobber. One of the sources is the same size as the destination so that source doesn't have an overlap with the destination register. By using the _TIED form we avoid an early clobber contraint for that source. This matches what was already done for instrinsics. ConvertToThreeAddress will fix it if it can't stay tied.	2022-10-01 16:34:39 -07:00
Craig Topper	85db4f10e3	[RISCV] Minor tablegen formatting cleanup. NFC	2022-10-01 15:59:25 -07:00
zhongyunde	4a549be9c3	[AArch64] Lower multiplication by a negative constant to shl+sub+shl Change the costmodel to lower a = b * C where C = -(2^n - 2^m) to lsl w8, w0, m sub w0, w8, w0, lsl n Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134934	2022-10-01 21:27:42 +08:00
Filipp Zhinkin	945a1468c9	[ARM] Support all versions of AND, ORR, EOR and BIC in optimizeCompareInstr Combine cmp with zero and all versions of AND, ORR, EOR and BIC instructions into S-suffixed versions. Related issue: https://github.com/llvm/llvm-project/issues/57122 Reviewed By: efriedma, samtebbs Differential Revision: https://reviews.llvm.org/D131786	2022-10-01 12:41:37 +03:00
Carl Ritson	a35013bec6	[AMDGPU][GFX11] Mitigate VALU mask write hazard VALU use of an SGPR (pair) as mask followed by SALU write to the same SGPR can cause incorrect execution of subsequent SALU reads of the SGPR. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D134151	2022-10-01 16:21:24 +09:00
Craig Topper	9273f860c0	[RISCV] Prevent performCombineVMergeAndVOps from creating cycles in the DAG. If True has a Chain result, the other operands of the vmerge may depend on it through that Chain. We need to ensure it isn't a predecessor of those operands. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D134980	2022-09-30 20:01:45 -07:00
Craig Topper	de0de294eb	[RISCV] Update cost of vector roundeven to match round which uses the same sequence but a different FRM value. Reviewed By: reames, eopXD Differential Revision: https://reviews.llvm.org/D134978	2022-09-30 20:01:35 -07:00
Yeting Kuo	cefb7aab61	[VP][RISCV] Add vp.copysign and RISC-V support. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134935	2022-10-01 10:19:10 +08:00
Matthias Braun	189900eb14	X86: Stop assigning register costs for longer encodings. This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`. This was previously done because instruction encoding require a REX prefix when using them resulting in longer instruction encodings. I found that this regresses the quality of the register allocation as the costs impose an ordering on eviction candidates. I also feel that there is a bit of an impedance mismatch as the actual costs occure when encoding instructions using those registers, but the order of VReg assignments is not primarily ordered by number of Defs+Uses. I did extensive measurements with the llvm-test-suite wiht SPEC2006 + SPEC2017 included, internal services showed similar patterns. Generally there are a log of improvements but also a lot of regression. But on average the allocation quality seems to improve at a small code size regression. Results for measuring static and dynamic instruction counts: Dynamic Counts (scaled by execution frequency) / Optimization Remarks: Spills+FoldedSpills -5.6% Reloads+FoldedReloads -4.2% Copies -0.1% Static / LLVM Statistics: regalloc.NumSpills mean -1.6%, geomean -2.8% regalloc.NumReloads mean -1.7%, geomean -3.1% size..text mean +0.4%, geomean +0.4% Static / LLVM Statistics: mean -2.2%, geomean -3.1%) regalloc.NumSpills mean -2.6%, geomean -3.9%) regalloc.NumReloads mean +0.6%, geomean +0.6%) size..text Static / LLVM Statistics: regalloc.NumSpills mean -3.0% regalloc.NumReloads mean -3.3% size..text mean +0.3%, geomean +0.3% Differential Revision: https://reviews.llvm.org/D133902	2022-09-30 16:01:33 -07:00
Peter Collingbourne	0caa9d4b1e	AArch64: Don't use RETA[AB] when ShadowCallStack is enabled. When returning from a function with both SCS and PAC-RET enabled, we need to authenticate the return address from the stack and then load from the SCS, but this was happening in the reverse order when RETA[AB] were being used. Fix it by disabling the use of RETA[AB] when SCS is enabled. Fixes pr58072. Differential Revision: https://reviews.llvm.org/D134931	2022-09-30 12:33:23 -07:00
Xiang Li	a80a888de5	[DirectX backend] Support global ctor for DXILBitcodeWriter. 1. Save typed pointer type for GlobalVariable/Function instead of the ObjectType. This will allow use GlobalVariable/Function as value. 2. Save target type for global ctors for Constant. 3. In DXILBitcodeWriter::getTypeID, check PointerMap first for Constant case. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D133283	2022-09-30 11:27:23 -07:00
Florian Hahn	fe49ba84d3	[AArch64] Reflow comment in AArch64IselLowering.cpp (NFC).	2022-09-30 17:17:04 +01:00
Zain Jaffal	fca8730793	[AArch64] Refactor opcode selection for LowerMUL (NFC) Move the logic for selecting `NewOpc` out of `LowerMUL` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134875	2022-09-30 16:48:02 +01:00
Philip Reames	1e3c179519	[RISCV] Address post commit review comments from D134881	2022-09-30 08:31:40 -07:00
Saleem Abdulrasool	519a73111b	RISCV: adjust relocation emission Simplify and make the pair-wise relocation more precise. If either of the symbol references are textual, the relocation must be delayed. If the difference is across sections, delay it as well which partially matches the behaviour of gas. We unfortunately do not handle the case where the difference references a symbol that is not yet defined. In such a case, we simply fail to resolve the difference, which should hopefully not be too onerous (particularly since no other target supports cross-section references and it is not clear if this was intentional on the part of RISCV). Differential Revision: https://reviews.llvm.org/D132262 Reviewed By: @MaskRay	2022-09-30 15:28:48 +00:00
Philip Reames	2b5960028e	[RISCV] Branchless lowering for select (and (x , 0x1) == 0), y, (z ^ y) ) and select (and (x , 0x1) == 0), y, (z \| y) ) This code is directly ported from the X86 backend which applies the same rewrite (along with several others). Planning on looking more closely at the other branchless variants from x86 to see if any are worth porting in future changes. Motivation here is the coremark crc8 routine from https://github.com/eembc/coremark/blob/main/core_util.c#L165. This patch significantly reduces the number of unpredictable branches in the workload. Differential Revision: https://reviews.llvm.org/D134881	2022-09-30 08:24:32 -07:00
Ray Wang	4c786c9747	[RISCV] Remove some unused var decl. NFC Differential Revision: https://reviews.llvm.org/D134707	2022-09-30 08:08:15 -07:00
Pierre van Houtryve	d8258508d4	[AMDGPU][GISel] Update `isCanonicalized` Recognize more opcodes in the function. Fixes some regressions introduced in D134857 for fdiv.f16 too. Depends on D134857 Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D134862	2022-09-30 14:13:35 +00:00
Pierre van Houtryve	9a67a6b72a	[AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal. Also removes RegBankInfo's scalarization of small BUILD_VECTORs, replacing it with InstructionSelector logic instead. This allows for V2S16 BUILD_VECTOR instructions to survive all the way to ISel so we can select FMA/MAD_MIX instructions in D134354. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134433	2022-09-30 14:04:53 +00:00
Ivan Kosarev	a964099ce5	[AMDGPU][SetWavePriority] Fix dealing with MBBInfo records. Happened earlier than I anticipated. :) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134726	2022-09-30 14:27:50 +01:00
Zain Jaffal	661403b85c	[AArch64] Add support for 128-bit non temporal loads. Adding to the work done in `D131773` here we add support to 128-bit loads. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D132559	2022-09-30 11:04:04 +01:00
gonglingqin	853a1b7236	[LoongArch] Clean up redundant code introduced by conflict resolution. NFC	2022-09-30 16:45:21 +08:00
Yeting Kuo	1cc02b05b7	[SelectionDAG] Add helper function to check whether a SDValue is neutral element. NFC. Using this helper makes work about neutral elements more easier. Although I only find one case now, I think it will have more chance to be used since so many combine works are related to neutral elements. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133866	2022-09-30 11:29:11 +08:00
Amara Emerson	7653586d88	[AArch64][GlobalISel] Implement another combine for shufflevector->AArch64 G_EXT. This is a port of an existing optimization in AArch64 ISelLowering, handling a case when the same input vector can be used for both ext inputs. Differential Revision: https://reviews.llvm.org/D134891	2022-09-29 22:53:24 +01:00
Philip Reames	900364fccf	[RISCV] Minor code motion in InsertVSETVLI [nfc]	2022-09-29 14:01:57 -07:00
Bjorn Pettersson	0513b0305a	[X86] Avoid miscompile in combineOr (X86ISelLowering.cpp) In combineOr (X86ISelLowering.cpp) there is a DAG combine that rewrite a "(0 - SetCC) \| C" pattern into something simpler given that a LEA can be used. Another requirement is that C has some specific value, for example 1 or 7. When checking those requirements the code used a 32-bit unsigned variable to store the value of C. So for a 64-bit OR this could miscompile in case any of the 32 most significant bits in C were non zero. This patch adds fixes the bug by using a large enough type for the C value. The faulty code seem to have been introduced by commit `9bceb8981d` (D131358). Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134892	2022-09-29 21:24:31 +02:00
zhongyunde	4d15e7b21b	[AArch64] Lower multiplication by a constant (NFC) Refactor according https://reviews.llvm.org/D134706#inline-1298952 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134848	2022-09-30 01:37:28 +08:00
zhongyunde	62a51c357c	[AArch64] Lower multiplication by a constant int to shl+sub+shl Decompose the const 14 can be separated from D132322 Change the costmodel to lower a = b * C where C = 2^n - 2^m to lsl w8, w0, n sub w0, w8, w0, lsl m Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134706	2022-09-30 01:31:06 +08:00
Chris Bieneman	5d4dd53570	Revert "[DirectX backend] Support global ctor for DXILBitcodeWriter." This reverts commit `26129766df`. The reverted commit broke in-tree unit tests for the DirectX backend.	2022-09-29 11:58:27 -05:00
Dmitry Preobrazhensky	485c539391	[AMDGPU][MC][GFX11] Disable non-null src0 for s_waitcnt_*cnt Differential Revision: https://reviews.llvm.org/D134809	2022-09-29 19:56:03 +03:00
David Green	4c4e544cd8	[ARM] Add an option for disabling omitting DLS. Useful for testing, this option disables when `DLS lr, lr` gets removed.	2022-09-29 17:42:45 +01:00
Philip Reames	02bfe2de7c	[RISCV] Adjust vector immediate store materialization cost This change updates the costs to make constant pool loads match their actual cost, and adds the broadcast special case to avoid too many regressions. We really need more information about the constants being rematerialized, but this is an incremental improvement. Differential Revision: https://reviews.llvm.org/D134746	2022-09-29 07:37:13 -07:00
eopXD	02a982829c	[RISCV] Add lowering for llvm.roundeven Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134785	2022-09-29 06:08:14 -07:00
Fangrui Song	04a65d62a0	Revert D134638 "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC" This reverts commit `b7baddc755`. Broke CodeGen/X86/callbr-asm-kill.mir We shall pay attention when adding new constraints.	2022-09-29 00:54:56 -07:00
Weining Lu	b7baddc755	[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC k: A memory operand whose address is formed by a base register and (optionally scaled) index register. m: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as st.w and ld.w. ZB: An address that is held in a general-purpose register. The offset is zero. ZC: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ll.w and sc.w. Differential Revision: https://reviews.llvm.org/D134638	2022-09-29 15:02:08 +08:00
Abinav Puthan Purayil	3759398b4b	[AMDGPU] Report minimum scratch size in code object v5 and later by default This change sets -amdgpu-assume-{external-call-stack-size \| dynamic-stack-object-size} options to zero by default for code object v5 and later. The runtime is expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit in the kernel descriptor is set. Differential Revision: https://reviews.llvm.org/D128346	2022-09-29 09:52:45 +05:30
gonglingqin	dc3c5a78f2	[LoongArch] Add fp_to_sint support for soft floating point Differential Revision: https://reviews.llvm.org/D134692	2022-09-29 10:25:35 +08:00
WANG Xuerui	3155f6c508	[LoongArch] Expand llvm.stacksave and llvm.stackrestore As in commit `bfb00d4c1c` ("[RISCV] Allow lowering of dynamic_stackalloc, stacksave, stackrestore"). Differential Revision: https://reviews.llvm.org/D134435	2022-09-29 09:07:44 +08:00
wanglei	036b170c24	[LoongArch] Produce a R_LARCH_32_PCREL relocation LoongArchELFObjectWriter::getRelocType check IsPCRel for FK_Data_4 (which we produce a R_LARCH_32_PCREL relocation for if IsPCRel). R_LARCH_32_PCREL is required for FDE relocation. Differential Revision: https://reviews.llvm.org/D134715	2022-09-29 09:04:44 +08:00
wanglei	7b1bdfbeb0	[LoongArch] Override TargetSubtargetInfo::getSelectionDAGInfo The target selection DAG lowering information is needed for SelectionDAGBuilder to lower a call like memcmp into an optimized form. Differential Revision: https://reviews.llvm.org/D134712	2022-09-29 08:46:53 +08:00
Jessica Paquette	95dabac7a5	[AArch64][GlobalISel] Make G_PTRTOINT only legal for s64 + p0 A few issues: 1. There was no legalizer test for G_PTRTOINT 2. Same clamping issue as in many other opcodes 3. AArch64 pointers can only be 64b, so in reality we always have to trunc or extend with any size other than p0 anyway. This seems to actually produce more correct selection for narrow types as well. Differential Revision: https://reviews.llvm.org/D107588	2022-09-28 16:20:24 -07:00
Jessica Paquette	a7aaafde2e	[AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN This is intended to be equivalent to the s32 + s64 cases in AArch64TargetLowering::LowerFCOPYSIGN. Widen everything and then use G_BIT + a mask to handle the actual copysign operation. Then, narrow back down to s32/s64. I wasn't sure about what the best/most canonical INSERT_SUBREG-selectable pattern is. I chose G_INSERT_VECTOR_ELT + an undef vector because it produces reasonably okay codegen. (It doesn't produce INSERT_SUBREG right now though.) If there's a better way to do this then I'm happy to change it. We also have a couple codegen deficiencies with how we emit vector constants right now. (We need a GISel equivalent to the tryAdvSIMDModImm64 stuff) Differential Revision: https://reviews.llvm.org/D108725	2022-09-28 16:03:22 -07:00
Florian Mayer	0401dc2913	[MTE] [HWASan] unify isInterestingAlloca Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D134779	2022-09-28 15:52:34 -07:00
Jessica Paquette	4957ee6529	[AArch64][GlobalISel] Add a target-specific G_BIT opcode. This is necessary for custom-legalizing G_FCOPYSIGN. This is equivalent to the BIT instruction (bitwise insert if true). Add selection testcases for imported patterns. Differential Revision: https://reviews.llvm.org/D108714	2022-09-28 15:48:35 -07:00
Xiang Li	26129766df	[DirectX backend] Support global ctor for DXILBitcodeWriter. 1. Save typed pointer type for GlobalVariable/Function instead of the ObjectType. This will allow use GlobalVariable/Function as value. 2. Save target type for global ctors for Constant. 3. In DXILBitcodeWriter::getTypeID, check PointerMap first for Constant case. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D133283	2022-09-28 13:23:56 -07:00
Stanislav Mekhanoshin	5a3fe9a039	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-28 13:13:40 -07:00
Nico Weber	90f7f24b20	try to fix build yet more after `16544cbe64`	2022-09-28 15:40:52 -04:00
Baptiste	b556726ccc	[AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary One of the conditions to flush the vmcnt counter in loop preheaders is: The loop contains a use of a vgpr that is defined out of the loop. The code currently checks if a waitcnt is needed by looking at the score of that vgpr in the score brackets. This is not enough and may cause the generation of an unnecessary vmcnt flush. This patch fixes that case. Differential Revision: https://reviews.llvm.org/D130313	2022-09-28 13:05:50 -04:00
Jon Chesterfield	35f2584ef9	[amdgpu] Error, instead of miscompile, anonymous kernels using lds The association between kernel and struct is done by symbol name. This doesn't work robustly for anonymous kernels as shown by the modified test case. An alternative association between function and struct can be constructed if necessary, probably though metadata, but on the basis that we currently miscompile anonymous kernels and that they are difficult to construct from application code and difficult to call from the runtime, this patch makes it a fatal error for now. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134741	2022-09-28 16:30:04 +01:00
Matt Arsenault	7a84624079	AMDGPU: Make various vector undefs legal Surprisingly these were getting legalized to something zero initialized. This fixes an infinite loop when combining some vector types. Also fixes zero initializing some undef values. SimplifyDemandedVectorElts / SimplifyDemandedBits are not checking for the legality of the output undefs they are replacing unused operations with. This resulted in turning vectors into undefs that were later re-legalized back into zero vectors.	2022-09-28 10:48:52 -04:00
Matt Devereau	0a4771a7e8	[AArch64][SVE] Expand gather index to 32 bits instead of 64 bits For gathers which load in 8 and 16 bit data then use that data as an index, the index can be extended to 32 bits instead of 64 bits Differential Revision: https://reviews.llvm.org/D130692	2022-09-28 14:42:12 +00:00
Florian Hahn	eba84971ae	Revert "[AARCH64][CostModel] Modified the cost of mask vector load/store" This reverts commit `1c62af3e23`. The commit causes the test below to fail. Revert for now to get the bots back to green. Failing test: lvm/test/Transforms/LoopVectorize/AArch64/masked-op-cost.ll	2022-09-28 15:35:13 +01:00
Florian Hahn	2d3c260362	[AArch64] break non-temporal loads over 256 into 256-loads and a smaller load Currently over 256 non-temporal loads are broken inefficently. For example, `v17i32` gets broken into 2 128-bit loads. It is better if we can use 256-bit loads instead. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133421	2022-09-28 15:20:26 +01:00
Jon Chesterfield	80ba432821	[amdgpu][nfc] Allocate kernel-specific LDS struct deterministically A kernel may have an associated struct for laying out LDS variables. This patch puts that instance, if present, at a deterministic address by allocating it at the same time as the module scope instance. This is relatively likely to be where the instance was allocated anyway (~NFC) but will allow later patches to calculate where a given field can be found, which means a function which is only reachable from a single kernel will be able to access a LDS variable with zero overhead. That will be particularly helpful for applications that instantiate a function template containing LDS variables once per kernel. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127052	2022-09-28 14:55:16 +01:00
Archibald Elliott	ff4027d152	[ARM] Support fp16/bf16 using t constraint fp16 and bf16 values can be used in GCC's inline assembly using the "t" constraint, which means "VFP floating-point registers s0-s31" - fp16 and bf16 values are stored in S registers too. This change ensures that LLVM is compatible with GCC for programs that use fp16 and the 't' constraint. Fixes #57753 Differential Revision: https://reviews.llvm.org/D134553	2022-09-28 14:48:21 +01:00
liqinweng	1c62af3e23	[AARCH64][CostModel] Modified the cost of mask vector load/store Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D134413	2022-09-28 19:40:29 +08:00
Carl Ritson	266b5dbc5d	[AMDGPU] Add MIMG NSA threshold configuration attribute Make MIMG NSA minimum addresses threshold an attribute that can be set on a function or configured via command line. This enables frontend tuning which allows increased NSA usage where beneficial. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D134780	2022-09-28 20:03:18 +09:00
Simon Pilgrim	759bedade5	Fix MSVC "not all control paths return a value" warning. NFCI.	2022-09-28 10:56:37 +01:00
wanglei	983a0ae5cf	[LoongArch] Specify registers used in DWARF exception handling Defines LoongArch registers for getExceptionPointerRegister() and getExceptionSelectorRegister(). Differential Revision: https://reviews.llvm.org/D134709	2022-09-28 17:53:16 +08:00
Cullen Rhodes	3918ef07c4	[AArch64][SVE] Remove redundant ptest after match/nmatch These instructions are flag setting so the ptest is redundant, the TableGen class wasn't setting the element size for the predicate causing the checks in AArch64InstrInfo::optimizePTestInstr to fail.	2022-09-28 08:23:23 +00:00
eopXD	9677d70eb2	[VP][RISCV] Add vp.floor, vp.round, vp.roundeven and their RISC-V support Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134759	2022-09-27 19:45:58 -07:00
gonglingqin	95d2367647	[LoongArch] Expand FSIN/FCOS/FSINCOS/FPOW/FREM Differential Revision: https://reviews.llvm.org/D134628	2022-09-28 09:42:41 +08:00
Florian Mayer	979db5343f	[HWASan] [NFC] use auto* over auto& for pointers	2022-09-27 18:19:25 -07:00
Han-Kuan Chen	c595c874cb	[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133688	2022-09-27 17:25:34 -07:00
Philip Reames	f6d110e26f	[LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc] This is purely NFC restructure in advance of a change which actually exposes zero strides. This is mostly because I find this interface confusing each time I look at it.	2022-09-27 15:55:44 -07:00
Quentin Colombet	ce35e8b426	[RISCV][ISel] Remove the commutative flag on SUB I wasn't able to produce a testcase for that because right now VWSUB is only generated from VWSUB_W and from there to trigger the commutative bug we would need to grab VWSUB where the splat value is on the LHS, which is currently not matched. Differential Revision: https://reviews.llvm.org/D134701	2022-09-27 20:15:01 +00:00
Philip Reames	b54c571a01	[RISCV] Extend strided load/store pattern matching to non-loop cases The motivation here is to enable a change I'm exploring in vectorizer to prefer base + offset_vector addressing for scatter/gather. The form the vectorizer would end up emitting would be a gep whose vector operand is an add of the scalar IV (splated) and the index vector. This change makes sure we can recognize that pattern as well as the current code structure. As a side effect, it might improve scatter/gathers from other sources. Differential Revision: https://reviews.llvm.org/D134755	2022-09-27 12:56:58 -07:00
eopXD	163cb33854	[VP][RISCV] Add vp.ceil and RISC-V support Previous commit `8b00b24f85` missed to add `int_ceil` anchor for the llvm.ceil.* section under LangRef.rst Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134586	2022-09-27 12:04:09 -07:00
eopXD	384b8b3da7	Revert "[VP][RISCV] Add vp.ceil and RISC-V support" This reverts commit `8b00b24f85`.	2022-09-27 11:12:57 -07:00
eopXD	8b00b24f85	[VP][RISCV] Add vp.ceil and RISC-V support Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134586	2022-09-27 11:08:27 -07:00
Krzysztof Parzyszek	7da2b91887	[Hexagon] Unify getSizeOfs in HexagonVectorCombine, NFC	2022-09-27 10:51:52 -07:00
Krzysztof Parzyszek	9c9e877b7e	[Hexagon] Move function to a different class, NFC "Sector" is a concept from AlignVectors, so the check for it should be there.	2022-09-27 10:32:52 -07:00
Stefan Gränitz	ed8409dfa0	[ObjC][ARC] Fix target register for call expanded from CALL_RVMARKER on Windows Fix regression https://github.com/llvm/llvm-project/issues/56952 for Clang CodeGen on Windows. In the Windows ABI the instruction sequence that is expanded from CALL_RVMARKER should use RCX as target register and not RDI. Reviewed By: rnk, fhahn Differential Revision: https://reviews.llvm.org/D134441	2022-09-27 18:49:40 +02:00
David Green	401481daac	[AArch64] Remove incorrect zero element insert-bitcast patterns These two patterns are not working as intended, as shown in D134022. They need to insert the value into the new register, not override it.	2022-09-27 17:08:17 +01:00
Philip Reames	77b202f974	[RISCV] Rename getVectorImmCost to getStoreImmCost [nfc] My original intent had been to reuse this for arithmetic instructions as well, but due to the availability of a immediate splat encoding there, we will need different heuristics. So specialize the existing code for the store case.	2022-09-27 08:22:13 -07:00
wanglei	823ce6ad18	[LoongArch] Add some comments for expand pseudo-inst pass. NFC Differential Revision: https://reviews.llvm.org/D134708	2022-09-27 20:26:07 +08:00
Kazushi (Jam) Marukawa	de8013201f	[VE] Change to expand FPOW VE doesn't have FPOW instruction, so this patch makes llvm expand it. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134695	2022-09-27 20:03:10 +09:00
WANG Xuerui	c2a44b591e	[LoongArch] Support lowering frames larger than 2048 bytes Differential Revision: https://reviews.llvm.org/D134582	2022-09-27 18:58:33 +08:00
David Sherwood	fbb119412f	[AArch64] Add Neoverse V2 CPU support Adds support for the Neoverse V2 CPU to the AArch64 backend. Differential Revision: https://reviews.llvm.org/D134352	2022-09-27 07:56:08 +00:00
Paulo Matos	1bd1a44070	[WebAssembly] Use intrinsics for table.get/set instructions Initial table.get/set implementation would match and lower combinations of GEP+load/store to table.get/set instructions. However, this is error prone due to potential combinations of GEP+load/store we don't implement, and load/store optimizations. By changing the code to using intrinsics, we avoid both issues and simplify the code. New builtins implemented: * @llvm.wasm.table.get.externref * @llvm.wasm.table.get.funcref * @llvm.wasm.table.set.externref * @llvm.wasm.table.set.funcref Reviewed By: asb, tlively Differential Revision: https://reviews.llvm.org/D134436	2022-09-27 09:16:30 +02:00
Yeting Kuo	04e1301f3d	[VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support. Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum and llvm.minnum. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134639	2022-09-27 13:36:45 +08:00
Vitaly Buka	20a80d60a8	Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI" Break msan bots. Details in D134666. This reverts commit `0ce96e06ee`.	2022-09-26 22:22:09 -07:00
Paul Scoropan	ce004fb4f2	[PowerPC] XCOFF exception section support on the direct assembler path This feature implements support for making entries in the exception section on XCOFF on the direct assembly path using the ".except" pseudo-op. It also provides functionality to lower entries (comprised of language and reason codes) into the exception section through the use of annotation metadata attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler support will be provided in another review. https://reviews.llvm.org/D133030 needs to merge first for LIT tests Reviewed By: shchenz, RKSimon Differential Revision: https://reviews.llvm.org/D132146	2022-09-26 22:24:20 -04:00
Stanislav Mekhanoshin	0ce96e06ee	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-26 13:20:24 -07:00
Krzysztof Parzyszek	dfaf7a2846	[Hexagon] Avoid some unnecessary sign-extend instructions Simplify (sext_inreg (extractu ...)) -> (extract ...) where appropriate.	2022-09-26 12:30:18 -07:00
Krzysztof Parzyszek	d6c0a5be7f	[Hexagon] Make sure we can still shift scalar vectors by non-splats	2022-09-26 11:25:06 -07:00
Changpeng Fang	dee4bc4a4e	AMDGPU: Handle new address pattern in LowerKernelAttributes introduced by opaque pointers Summary: With opaque pointer support, the "ptr" type is introduced and thus BitCast is not necessary in some cases. This work takes care of this change, and recognizes the new address patterns to do appropriate optimizations. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D134596	2022-09-26 09:31:52 -07:00
Kazushi (Jam) Marukawa	1cef30b9d3	[VE] Disable automatic maxnum/minnum selection Disable FMAX/FMIN selection from select_cc in VEInstrInfo.td because of the lack of NaN consideration. This patch removes such selection from VEInstrInfo.td and lets llvm work on it in combineMinNumMaxNum. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134595	2022-09-26 22:04:02 +09:00
Kazushi (Jam) Marukawa	76c76e9ab4	[VE] Support smax/smin Support smax/smin in VEInstrInfo.td. Remove obsolete patterns for smax/smin. Add regression tests for smax/smin/umax/umin. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134583	2022-09-26 22:02:57 +09:00
Momchil Velikov	6602110152	[ARM] Enable and/cmp0 folding The `CodeGenPrepare` pass can sink bitwise `and` used by compare to zero into the basic blocks where the users are. This operation is guarded by lowering hook, which is disabled for ARM. In the ARM architecture versions from v7-M up these two operations can be folded into `tst rN, #imm` instruction. Sinking of `and` can also enable the cmov-to-bfi DAG combiner. This patch fixes some benchmark regressions caused by https://reviews.llvm.org/D129370 as well scoring slightly better overall. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D134360	2022-09-26 11:31:23 +01:00
David Green	bebc96956b	[AArch64] Enable FeatureFuseAdrpAdd for all Arm cpus The commit D120104 enabled FeatureFuseAdrpAdd for -mcpu=generic, allowing the linker to relax adrp;add pairs where possible. D132075 extended that to neoverse-n1, this patch extends it to all other cortex and neoverse cpus for the same reasons. Differential Revision: https://reviews.llvm.org/D134521	2022-09-26 09:55:10 +01:00
gonglingqin	a6d699b55d	[LoongArch] Add codegen support for strict_fsetccs and any_fsetcc Differential Revision: https://reviews.llvm.org/D134274	2022-09-26 13:05:36 +08:00
wanglei	75265c7f49	[LoongArch] Lower BlockAddress/JumpTable This patch uses a unified interface for lower GlobalAddress ConstantPool BlockAddress and JumpTable. This patch allows lowering addresses by using PC-relative addressing for DSO-local symbols, and accessing the address through the global offset table for DSO-preemptable symbols. Remove hardcoded `MininumJumpTableEntries` for test lower JumpTable. Also updated some test cases using ConstantPool, due to the addition of relocation information. Differential Revision: https://reviews.llvm.org/D134431	2022-09-26 10:52:54 +08:00
Yeting Kuo	43c5fbdd3a	[VP][RISCV] Add vp.sqrt intrinsic and RISC-V support. The patch modeled vp.fabs patch D132793. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133690	2022-09-26 10:47:40 +08:00
WANG Xuerui	ad6fe32032	[LoongArch] Support 'generic' as a valid CPU name As the LoongArch port is largely modeled after RISCV it has the same behavior of not accepting `generic` as a CPU name. For better compatibility with consumers of LLVM (e.g. mesa) follow D121149's suit and treat `generic` the same as an empty CPU name. Differential Revision: https://reviews.llvm.org/D134412	2022-09-26 10:20:13 +08:00
Ruiling Song	bf25a48985	Add override for runOnFunction() Fix build-bot failure.	2022-09-26 10:19:35 +08:00
WANG Xuerui	d2ac89b64e	[LoongArch] Support fastcc and treat it as ccc As explained in D68559 the `fastcc` calling convention may be requested under certain conditions, hence the need for supporting it. But unlike RISCV we actually treat it exactly like ccc, without actually inventing any performance hack right here. And CSKY does the same thing. This is going to fix a few more test cases with native LoongArch builds. Differential Revision: https://reviews.llvm.org/D134443	2022-09-26 10:15:00 +08:00
WANG Xuerui	f89f0990db	[LoongArch] Support llvm.thread.pointer For `__builtin_thread_pointer` to work, among other things. Similar to D76828 for RISCV. Differential Revision: https://reviews.llvm.org/D134368	2022-09-26 09:56:42 +08:00
Ruiling Song	cf14c7caac	AMDGPU: Add a pass to rewrite certain undef in PHI For the pattern of IR (%if terminates with a divergent branch.), divergence analysis will report %phi as uniform to help optimal code generation. ``` %if \| \ \| %then \| / %endif: %phi = phi [ %uniform, %if ], [ %undef, %then ] ``` In the backend, %phi and %uniform will be assigned a scalar register. But the %undef from %then will make the scalar register dead in %then. This will likely cause the register being over-written in %then. To fix the issue, we will rewrite %undef as %uniform. For details, please refer the comment in AMDGPURewriteUndefForPHI.cpp. Currently there is no test changes shown, but this is mandatory for later changes. Reviewed by: sameerds Differential Revision: https://reviews.llvm.org/D133840	2022-09-26 09:54:47 +08:00
Weining Lu	394f30919a	[Clang][LoongArch] Add inline asm support for constraints f/l/I/K This patch adds support for constraints `f`, `l`, `I`, `K` according to [1]. The remain constraints (`k`, `m`, `ZB`, `ZC`) will be added later as they are a little more complex than the others. f: A floating-point register (if available). l: A signed 16-bit constant. I: A signed 12-bit constant (for arithmetic instructions). K: An unsigned 12-bit constant (for logic instructions). For now, no need to support register alias (e.g. `$a0`) in llvm as clang will correctly decode the usage of register name aliases into their official names. And AFAIK, the not yet upstreamed `rustc` for LoongArch will always use official register names (e.g. `$r4`). [1] https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html Differential Revision: https://reviews.llvm.org/D134157	2022-09-26 08:49:58 +08:00
James Y Knight	4f188ef89c	[AVR] Fix useDeprecatedPositionallyEncodedOperands errors. This is a follow-on to https://reviews.llvm.org/D134073. It renames a few fields to have consistent names, as well as renaming operands to match the field names. The encoder behavior is unchanged by this cleanup, but a few instructions were previously being disassembled incorrectly, and have been corrected by this change. All of the affected instructions were missing disassembly tests, which are now added. Differential Revision: https://reviews.llvm.org/D134185	2022-09-25 17:55:09 -04:00
James Y Knight	a8c59bcc01	[AMDGPU] Fix useDeprecatedPositionallyEncodedOperands errors in R600. This is a follow-on to https://reviews.llvm.org/D134073. It renames a couple of fields to match their operands, as well as introducing sub-operand names where required. This change _only_ fixes the 'R600' half of the target, not the 'AMDGPU' half. Fixing the AMDGPU half will be a significantly more difficult change (which I've not yet attempted.) Differential Revision: https://reviews.llvm.org/D134078	2022-09-25 17:55:09 -04:00
James Y Knight	0f99958e79	[Lanai] Fix useDeprecatedPositionallyEncodedOperands errors. This is a follow-on to https://reviews.llvm.org/D134073. Lanai was almost clean: the only issue is that 'bit' behaves differently than 'bits<1>', because only the 'bits' type preserves unresolved references via 'keepUnsetBits()' in TableGen/Record.h. Thus, use bits instead. This issue _would_ have caused invalid instruction emission/decoding, except that the PQ bits were being overriden after the fact by code in 'adjustPqBits' in MCTargetDesc/LanaiMCCodeEmitter.cpp, and 'PostOperandDecodeAdjust' in Disassembler/LanaiDisassembler.cpp. Differential Revision: https://reviews.llvm.org/D134075	2022-09-25 17:55:09 -04:00
Simon Pilgrim	196f27bb56	[CostModel][X86] Add missing cost kinds for v2i64 icmp on SLM	2022-09-25 15:12:21 +01:00
Simon Pilgrim	faff990e9b	[X86] Fix Icelake VPMULLQ zmm pipes and adjust AVX512DQ v8i64 mul costs to match worse case Icelake PMULLQ throughput regressed cf SkylakeServer as its Pipe0 only Confirmed with Intel SOM, Agner and instlatx64	2022-09-25 14:18:08 +01:00
Petar Avramovic	dcc756d03e	[AMDGPU] Pattern for flat atomic fadd f64 intrinsic with local addr Fix regression from clang opencl test in builtins-fp-atomics-gfx90a.cl test_flat_add_local_f64 caused by D130579 Revert `a3becb333d`. Differential Revision: https://reviews.llvm.org/D134568	2022-09-25 13:25:41 +02:00
Philip Reames	5358968e13	[RISCV] Pattern match scalable strided load/store Very straight forward extension of the existing pattern matching pass to handle scalable types as well as fixed length types. The only extra bit beyond removing a bailout is recognizing stepvector. Differential Revision: https://reviews.llvm.org/D134502	2022-09-24 17:41:58 -07:00
Philip Reames	6e7c54ecaf	[RISCV] Add lowering for scalable @llvm.riscv.masked.strided.load/store The code previously assumed fixed length vectors; make the relevant code conditional. Having the lowering in place is neccessary for an upcoming change to generalize scatter/gather matching to scalable vectors. Differential Revision: https://reviews.llvm.org/D134489	2022-09-24 17:41:57 -07:00
James Y Knight	5351878ba1	[TableGen] Add useDeprecatedPositionallyEncodedOperands option. Summary: The existing undefined-bitfield-to-operand matching behavior is very hard to understand, due to the combination of positional and named matching. This can make it difficult to track down a bug in a target's instruction definitions. Over the last decade, folks have tried to work-around this in various ways, but it's time to finally ditch the positional matching. With https://reviews.llvm.org/D131003, there are no longer cases that _require_ positional matching, and it's time to start removing usage and support for it. Therefore: add a (default-false) option, and set it to true only in those targets that require positional matching today. Subsequent changes will start cleaning up additional in-tree targets. NOTE TO OUT OF TREE TARGET MAINTAINERS: If this change breaks your build, you may restore the previous behavior simply by adding: let useDeprecatedPositionallyEncodedOperands = 1; to your target's InstrInfo tablegen definition. However, this is temporary -- the option will be removed in the future. If your target does not set 'decodePositionallyEncodedOperands', you may thus start migrating to named operands. However, if you _do_ currently set that option, I recommend waiting until a subsequent change lands, which adds decoder support for named sub-operands. Differential Revision: https://reviews.llvm.org/D134073	2022-09-24 09:40:45 -04:00
James Y Knight	a538d1f13a	[TableGen][CodeEmitterGen] Allow local names for sub-operands in a operand list. These names can then be matched by name against 'bits' fields in a record, to populate an instruction's encoding. This does _not_ yet change DecoderEmitter to allow by-name matching of sub-operands. Unlike the encoder, the decoder already defaulted to not supporting positional matching, and backends had workarounds in place for the missing decoding support. Additionally, use this new capability to allow the ARM and AArch64 backends not to require any positional operand matching. Differential Revision: https://reviews.llvm.org/D131003	2022-09-24 09:40:44 -04:00
Craig Topper	4f86c5cbb7	[RISCV] Rename RISCVScheduleB.td to RISCVScheduleZb.td. NFC	2022-09-23 21:38:42 -07:00
Craig Topper	3967abcc0b	[RISCV] Add missing scheduler classes to Zbkb and Zbkx instructions.	2022-09-23 21:38:42 -07:00
Craig Topper	cde3de5381	[RISCV] Remove a few remnants of Zbr I misssed.	2022-09-23 21:21:51 -07:00
Craig Topper	19850cc2d8	Revert "[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type." This reverts commit `dd53a0bb30`. We have seen crashes from this internally. Probably due to the use of RoundingMode::Dynamic.	2022-09-23 18:41:41 -07:00
Craig Topper	90a5d8499a	[RISCV] Promote f16 STRICT_FCEIL/FLOOR/TRUNC/NEARBYINT/RINT/ROUND,ROUNDEVEN to f32.	2022-09-23 14:01:51 -07:00
Jay Foad	ddfa0f62d8	[AMDGPU] Add GFX11 feature for subtargets with more VGPRs The full complement of physical VGPRs for GFX11 is 50% more than GFX10. Some subtargets have this, others stay the same as GFX10. This affects occupancy calculations. Differential Revision: https://reviews.llvm.org/D134522	2022-09-23 20:18:23 +01:00
Josh Stone	cb46ffdbf4	[X86] Use BuildStackAdjustment in stack probes This has the advantage of dealing with live EFLAGS, using LEA instead of SUB if needed to avoid clobbering. That also respects feature "lea-sp". We could allow unrolled stack probing from blocks with live-EFLAGS, if canUseAsEpilogue learns when emitStackProbeInlineGeneric will be used. Differential Revision: https://reviews.llvm.org/D134495	2022-09-23 09:30:32 -07:00
Josh Stone	26c37b461a	[X86] Don't allow prologue stack probing with live EFLAGS Fixes https://github.com/llvm/llvm-project/issues/49509 Differential Revision: https://reviews.llvm.org/D134494	2022-09-23 09:30:32 -07:00
Josh Stone	4dcfb09e40	[NFC][CodeGen] Use const MF in TargetLowering stack probe functions This makes them callable from places like canUseAsPrologue. Differential Revision: https://reviews.llvm.org/D134492	2022-09-23 09:30:32 -07:00
Petar Avramovic	6db7921b65	AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd Remove manual selection for atomic fadd from global-isel. Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD which corresponds to llvm-ir's atomicrmw fadd instruction. global and flat atomic fadd patterns changes: Split rtn/no-rtn patterns Add missing patterns or fix predicates Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors). Patterns now check addrspace of pointer, added patterns for flat intrinsic. with global addrspace pointer that selects into global atomic instruction. buffer atomic fadd patterns changes: Rdit patterns to import into global-isel. Remove gfx6/gfx7 _addr64 and _offset patterns. Remove patterns that can't be reached (same pattern but different feature). Differential Revision: https://reviews.llvm.org/D130579	2022-09-23 17:52:10 +02:00
Petar Avramovic	5cee9047d5	AMDGPU: Improve atomicrmw fadd selection Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11 as for gfx90a. Add missing globalisel legalizer support for flat atomicrmw fadd f32 on gfx940 and gfx11. Isel support for gfx11 will be added in D130579. Differential Revision: https://reviews.llvm.org/D131560	2022-09-23 17:52:10 +02:00
Petar Avramovic	e03d36d4ae	[AMDGPU] Add FeatureFlatAtomicFaddF32Inst Feature used by targets that have flat_atomic_add_f32 instruction (gfx940 and gfx11). Remove isGFX940GFX11Plus. Add hasFlatAtomicFaddF32Inst Subtarget check for codegen. Differential Revision: https://reviews.llvm.org/D134532	2022-09-23 17:52:10 +02:00
Simon Pilgrim	a6e9141505	[TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436) The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both. This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling. Fixes #50778 Differential Revision: https://reviews.llvm.org/D111968	2022-09-23 14:03:18 +01:00
Hassnaa Hamdi	181f200a1c	[NFC]: AArch64-SVE modify some comments	2022-09-23 12:07:31 +00:00
Caroline Concatto	5431bf27bd	[AArch64]Remove svget/svset/svcreate from llvm This patch removes the aarch64 instrinsic svget/svset/svcreate from llvm. It also implements the InstCombine for vector.extract that used to be in svget. Depends on: D131547 Differential Revision: https://reviews.llvm.org/D131548	2022-09-23 10:48:43 +01:00
gonglingqin	ac295597a8	[LoongArch] Add codegen support for atomicrmw add/sub/nand/and/or/xor operation Differential Revision: https://reviews.llvm.org/D133755	2022-09-23 09:32:11 +08:00
Philip Reames	60c91fd364	[RISCV] Disallow scale for scatter/gather RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely. This has two effects: * First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to. * Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.) The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics. Differential Revision: https://reviews.llvm.org/D134382	2022-09-22 15:31:26 -07:00
Craig Topper	52708be182	[RISCV] Remove support for the unratified Zbe, Zbf, and Zbm extensions. These extensions do not appear to be on their way to ratification.	2022-09-22 13:04:41 -07:00
Simon Pilgrim	98907f8685	[CostModel][X86] Tidyup sdiv/srem/udiv/urem by constant cost tables Preparation for adding cost kinds handling This is necessary to eventually unblock D111968	2022-09-22 20:46:33 +01:00
Chris Bieneman	4959bfa060	[NFC] Refactor dxil metadata code DXIL relies on a whole bunch of IR metadata constructs being populated in the right shape. Rather than just hard coding or using complicated arrangements of constant data strings, let's make first-class objects that reprensent the metadata and manage reading and writing the metadata from the module. Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D134397	2022-09-22 12:17:51 -05:00
Hassnaa Hamdi	f2072e0ae0	[AArh64-SVE]: Improve cost model for div/udiv/mul 128-bit vector operations Differential Revision: https://reviews.llvm.org/D132477	2022-09-22 16:50:55 +00:00
Simon Pilgrim	dc93202b44	[CostModel][X86] Remove duplicate ashr v4i64 cost table entry. NFCI.	2022-09-22 17:27:26 +01:00
Fraser Cormack	92d71c615d	[RISCV] Use structured bindings in common RVV lowering code This patch uses structured bindings to simplify a couple of specific cases when lowering RVV operations where we commonly declare two SDValues and immediately 'tie' them to the mask and vector length. There's also a couple places where we split vectors that structured bindings make sense to use. This patch tries to keep these sorts of changes minimal and to cases where the returned types are commonly understood, rather than applying this wholesale to the RISCV backend. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D134442	2022-09-22 16:38:40 +01:00
Philip Reames	e41765aa4d	[RISCV] Verify consistency of a couple TSFlags related to vector operands Various bits of existing code assume the presence of one operand implies the presence of another. Add verifier rules to catch violations. Differential Revision: https://reviews.llvm.org/D133810	2022-09-22 08:35:17 -07:00
Craig Topper	bf7c7696fe	[RISCV] Improve support for vector fp_to_sint_sat/uint_sat. The default fixed vector legalization is to unroll. The default scalable vector legalization is to clamp in the FP domain. The RVV vfcvt instructions have saturating behavior so we can use them directly. The only difference is that RVV instruction turn nan into the max value, but the _SAT intrinsics want 0. I'm only supporting 1 step of narrowing for now. I think we can support more steps by using VNCLIP to saturate and narrower. The only case that needs 2 steps of widening is f16->i64 which we can do as f16->f32->i64. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D134400	2022-09-22 08:13:48 -07:00
Craig Topper	d6cb8f85bf	[RISCV] Formatting fixes to RISCV.td NFC Improve indentation. Fix the worst of the 80 column violations.	2022-09-22 08:12:59 -07:00
Tim Northover	677da09d02	AArch64: add support for newer Apple CPUs They're roughly ARMv8.6. This works in the .td file, but in AArch64TargetParser.def, marking them v8.6 brings in support for the SM4 cryptographic hash and we don't actually have that. So TargetParser side they're marked as v8.5, with the extra features (BF16 and I8MM added manually). Finally, A16 supports the HCX extension in addition to v8.6. This has no TargetParser implications.	2022-09-22 11:58:51 +01:00
Simon Pilgrim	e030be64d8	[CostModel][X86] Add partial CostKinds handling for funnelshifts/rotates This mainly just adds costs for the targets where we have actual funnelshift/rotate instructions (VBMI2/XOP etc.) - the cases where we expand still need addressing, although for many the default shift+or expansion, especially for uniform cases, isn't that bad. This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 11:24:11 +01:00
Simon Pilgrim	b2cd8118d0	[CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 10:19:23 +01:00
Ilia Diachkov	4421b24fe2	[SPIRV] fix build with clang and use PoisonValue instead of UndefValue The patch fixes the SPIRV backend build using clang. It also replaces UndefValue with PoisonValue in SPIRVRegularizer.cpp. Fixes: #57773 Differential Revision: https://reviews.llvm.org/D134071	2022-09-22 11:49:54 +03:00
Craig Topper	8b8e18e11f	[RISCV] Replace RISCVISD::GREV/GORC/SHFL/UNSHFL with BREV8/ORC_B/ZIP/UNZIP. With Zbp removed, we no longer need the generalized forms. The computeKnownBitsForTargetNode code brev8/orc.b is still based on the general form with the shift amount forced to 7.	2022-09-21 21:57:59 -07:00
Craig Topper	182aa0cbe0	[RISCV] Remove support for the unratified Zbp extension. This extension does not appear to be on its way to ratification. Still need some follow up to simplify the RISCVISD nodes.	2022-09-21 21:22:42 -07:00
Fanchen Kong	8a2729fea7	[WebAssembly] Improve codegen for loading scalars from memory to v128 Use load32_zero instead of load32_splat to load the low 32 bits from memory to v128. Test cases are added to cover this change. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D134257	2022-09-21 21:05:44 -07:00
Craig Topper	3b8ec0fde5	[RISCV] Remove some unused Predicates from tablegen. NFC Specifically predicates for extensions that are subsets of other extensions. These predicates should never be used. Should always check the superset extension or the superset ORed with the sub extendsion.	2022-09-21 18:26:43 -07:00
Chris Bieneman	e77c40ffbd	[NFC] Make dxil namespace consistent We have namespaces `DXIL` and `dxil`, which is just confusing. This renames `DXIL` -> `dxil` making everything consistent. While the LLVM coding standards don't have a clear direction here, I chose lower case because by my current unscientific count there are more places where we had the lowercase namespace than the uppercase.	2022-09-21 17:48:13 -05:00
Craig Topper	1d8a7adca6	[RISCV] Rename RISCVISD::SINT_TO_FP_VL/UINT_TO_FP_VL. NFC Name them after the instructions VFCVT_RTZ_X(U)_F_VL to make it clear that the ISD nodes don't have the poison semantics of ISD::SINT_TO_FP/UINT_TO_FP. I play to reuse this node for a FP_TO_SINT_SAT/FP_TO_UINT_SAT patch and need the instruction semantics.	2022-09-21 15:33:04 -07:00
Fangrui Song	8805e5d1b7	[Hexagon] Fix -Wunused-variable in non-assertion builds after `f6e7ad5604`	2022-09-21 14:14:45 -07:00
Jay Foad	5c7ee894f8	AMDGPU: Stop validating earlyclobber operands in assembler This validation was introduced in D34003 for v_qsad/v_mqsad instructions but it applies to all instructions with earlyclobber operands, which now includes v_mad_i64/v_mad_u64. In all these cases I do not think there is documentation saying that the destination must not overlap the sources. Rather there are some cases where the instruction may not function correctly if there is an overlap, and we are using earlyclobber as a conservative way of preventing codegen from generating those cases. I think it is unhelpful for the assembler to enforce the earlyclobber restriction because it prevents assembling cases where the programmer knows that in fact the overlap is safe. See also: https://github.com/llvm/llvm-project/issues/57610 Differential Revision: https://reviews.llvm.org/D134272	2022-09-21 21:46:59 +01:00
Scott Linder	552539bdac	Revert "[NFC][AMDGPU] Refactor AMDGPUDisassembler" This reverts commit `f583151461`.	2022-09-21 18:48:42 +00:00
Krzysztof Parzyszek	f6e7ad5604	[Hexagon] Revamp type legalization of ext/trunc/sat in HVX Resizing operations (e.g. sign extension) in DAG can go from any width to any other width, e.g. i8 -> i32. If the input and the result differ by a factor larger than 2, the operation cannot be legal in HVX, since the only two legal vector sizes in HVX are a single vector and a pair of vectors. To simplify the legalization, such operations are expanded into steps that only double/halve the type size, so that each such step can be fully legalized on its own. The complication is that DAG will automatically fold these steps back into one, e.g. sext(sext) -> sext. To prevent that new HexagonISD nodes are introduced: TL_EXTEND and TL_TRUNCATE. Once legalized, these nodes are replaced with the original opcodes. The type legalization is now common to aext/sext/zext/trunc and Hexagon- specific ssat/usat nodes.	2022-09-21 11:25:27 -07:00
Florian Hahn	ac434afed8	[AArch64] Try to fold shuffle (tbl2, tbl2) to tbl4. shuffle (tbl2, tbl2) can be folded into a single tbl4 if the mask for the selected elements is constant. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133491	2022-09-21 19:15:56 +01:00
Simon Pilgrim	839ba13c3e	[CostModel][X86] Add vbmi2 costs for funnelshift/rotate intrinsics Add costs for the funnel shift instructions - fixes some discrepancies I was hitting with costs numbers from the 'cost-tables vs llvm-mca' script D103695	2022-09-21 13:48:22 +01:00
Kazushi (Jam) Marukawa	eaa263485d	[VE] Remove obsolete ANDrm patterns Remove obsolete ANDrm patterns for MIMM operands. We add these translations to optimize commonly used cast operations before we support MIMM operands directly by each isntruction. Such translations are obsolete now. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134341	2022-09-21 19:23:34 +09:00
David Green	4f78e022ee	[AArch64] Lower scalar sqxtn intrinsics to use fp registers The llvm.aarch64.neon.scalar.sqxtn.i32.i64 intrinsics take and return integer types, but operate on fp registers. This can create some inefficiencies in their lowering, where the registers are converted to fp a little too late. This patch adds lowering for the intrinsics, creating bitcasts to/from fp types to allow nicer folding later when the instructions are selected, especially around insert/extracts. Differential Revision: https://reviews.llvm.org/D134024	2022-09-21 10:46:43 +01:00
Kazushi (Jam) Marukawa	021d05a1ab	[VE][NFC] Change to use l2i/i2l to simplify code We previously added l2i/i2l macros to simpily EXTRACT_SUBREG/INSERT_SUBREG conversions. This patch changes VEInstrInfo.td to use such macros to simplify existing code. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134118	2022-09-21 18:04:29 +09:00
Kazushi (Jam) Marukawa	337e54ec95	[VE] Add maxnum and minnum Add maxnum and minnum for float and double. Lowering is already implemented, so this patch changes them legal and adds regression tests. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134108	2022-09-21 18:03:49 +09:00
Kazushi (Jam) Marukawa	3ee64ea5cf	[VE] Change to expand FMA VE has fused multiply-add instruction for only vector calculations. This patch forces to expand scalar FMA to multiply and add instructions. This patch also adds regression test. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134107	2022-09-21 18:02:55 +09:00
David Green	9a20596f48	[AArch64] Insert/Extract of bitcast patterns This adds some quick tablegen patterns for vector_insert(bitcast(..)) and bitcast(vector_extract(..)), allowing us to avoid a round-trip through GPRs. Differential Revision: https://reviews.llvm.org/D134022	2022-09-21 09:54:17 +01:00
David Sherwood	64bef3d568	[AArch64][SME] Disable inlining when SME attributes require smstart/smstop or lazy-save. Inlining must be disabled when the call-site needs to toggle PSTATE.SM or when the callee's function body is executed in a different streaming mode than its caller. This is needed because function calls are the boundaries for streaming mode changes. More details about the SME attributes and design can be found in D131562. Differential Revision: https://reviews.llvm.org/D131581	2022-09-21 09:35:47 +01:00
Craig Topper	70a64fe7b1	[RISCV] Remove support for the unratified Zbt extension. This extension does not appear to be on its way to ratification. Out of the unratified bitmanip extensions, this one had the largest impact on the compiler. Posting this patch to start a discussion about whether we should remove these extensions. We'll talk more at the RISC-V sync meeting this Thursday. Reviewed By: asb, reames Differential Revision: https://reviews.llvm.org/D133834	2022-09-20 20:26:48 -07:00
jacquesguan	1cbf44bd50	[RISCV] Support peephole optimization to fold vmerge.vvm that has tail agnostic policy and unmasked intrinsics. This patch supports the tail agnostic part of D130442. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132923	2022-09-21 10:56:37 +08:00
Changpeng Fang	3ae4c3589e	AMDGPU: Implicit kernel arguments related optimization when uniform-workgroup-size=true Summary: Under code object version 5, ockl_get_local_size returns the value computed by the expression: workgroup_id < hidden_block_count ? hidden_group_size : hidden_remainder For functions with the attribute uniform-work-group-size=true. we can evaluate workgroup_id < hidden_block_count as true, and thus hidden_group_size is returned for ockl_get_local_size. With uniform-workgroup-size=true, this work also set all remainders to zero, and if there is reqd_work_group_size, we also set work-group-size to the required value from the metadata. Reviewers: arsenm and bcahoon Differential Revision: https://reviews.llvm.org/D131276	2022-09-20 17:25:52 -07:00
Scott Linder	f583151461	[NFC][AMDGPU] Refactor AMDGPUDisassembler Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler. Use lit.local.cfg substitutions and more idiomatic use of split-file to simplify and extend existing kernel-descriptor disassembly tests. Add a comment to AMDHSAKernelDescriptor.h, as at least one small set towards keeping all kernel-descriptor sensitive code in sync. Reviewed By: kzhuravl, arsenm Differential Revision: https://reviews.llvm.org/D130105	2022-09-20 20:37:19 +00:00
Anshil Gandhi	a0c53524a5	[AMDGPU] Fix size of SOPK instructions to 4 bytes Instructions in SOPK format may not have 32-bit literal constants following the instruction. Differential Revision: https://reviews.llvm.org/D133972	2022-09-20 14:27:09 -06:00
Matt Arsenault	28e03692ae	AMDGPU: Fix expansion of 16-bit atomicrmw Fixes issue 57830	2022-09-20 14:47:40 -04:00
Anton Sidorenko	3cd503f181	[NFC][RISCV] Move calculations of SDNode policy operand idx to a separate function Since there is no guaranteed correspondence of SDNode and MI operands, we need getters simular to RISCVII::get*OpNum for SDNodes. More uses of getVecPolicyOpIdx will be added in D130895. Reviewed By: craig.topper, arcbbb Differential Revision: https://reviews.llvm.org/D134179	2022-09-20 10:36:47 -07:00
Philip Reames	eda2af575f	[RISCV][MC] Add support for experimental Zawrs extension This implements experimental support for the Zawrs extension as specified here: https://github.com/riscv/riscv-zawrs/releases/download/V1.0-rc3/Zawrs.pdf. Despite the 1.0 version name, this has not been ratified and there was a major change to proposed specification between rc2 and rc3. Once this is ratified, it'll move out of experimental status. This change adds assembly support, but does not include C language or IR intrinsics. We can decide if we want them, and handle that in a separate patch. Differential Revision: https://reviews.llvm.org/D133443	2022-09-20 10:15:11 -07:00
Jay Foad	f19cc793d2	[AMDGPU] Disable fp atomic to s_denorm_mode hazard for GFX11 This hazard only exists on GFX10. Differential Revision: https://reviews.llvm.org/D134276	2022-09-20 17:40:49 +01:00
David Green	cb375e8c1f	[AArch64] Enable LSLFast for modern OoO cpus This patch enables the LSLFast feature for Cortex-A76, Cortex-A77, Cortex-A78, Cortex-A78C, Cortex-A710, Cortex-X1, Cortex-X2, Neoverse N1, Neoverse N2, Neoverse V1 and the Neoverse 512TB pseudo-cpu, in-line with the software optimization guides for those CPUs. Differntial revision: https://reviews.llvm.org/D134273	2022-09-20 17:09:14 +01:00
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Simon Pilgrim	0015edeefd	Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.	2022-09-20 14:24:07 +01:00
Caroline Concatto	d32b8fdbdb	[LLVM][AArch64] Replace aarch64.sve.ld by aarch64.sve.ldN.sret This patch removes the intrinsic aarch64.sve.ldN from tablegen in favour of using arch64.sve.ldN.sret. Depends on: D133023 Differential Revision: https://reviews.llvm.org/D133025	2022-09-20 13:15:07 +01:00
gonglingqin	7328ff75ba	[LoongArch] Add codegen support for fmaxnum_ieee and fminnum_ieee Thanks for @xry111's previous bug fixes. See https://github.com/loongson/llvm-project/pull/1 for more details. Differential Revision: https://reviews.llvm.org/D133478	2022-09-20 19:22:32 +08:00
Simon Pilgrim	70582bc4d3	Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warnings. NFCI.	2022-09-20 10:35:32 +01:00
Serge Pavlov	181279ffcd	[X86][GlobalISel] Add support for sret demotion The change add support for the cases when return value is passed in memory rathen than in registers. Differential Revision: https://reviews.llvm.org/D134181	2022-09-20 11:47:53 +07:00
Craig Topper	94049db913	[RISCV] Make computeIncomingVLVTYPE more conservative when merging predecessor state. If we have already calculated the incoming state before, use that as our starting point to ensure we are conservative. This fixes an infinite loop found in our downstream where we we allowed two waves of updates to propagate through a loop and the merge points allowed us to toggle back and forth between states. No small reproducer right now. Differential Revision: https://reviews.llvm.org/D134229	2022-09-19 15:57:55 -07:00
Alexander Timofeev	2e8817b90a	[AMDGPU] SIFixSGPRCopies reworking to use one pass over the MIR for analysis and lowering. This change finalizes the series of patches aiming to replace the old strategy of VGPR to SGPR copy lowering. # Following the https://reviews.llvm.org/D128252 and https://reviews.llvm.org/D130367 code parts that are no longer used were removed. # The first pass over the MachineFunctoin collects all the necessary information. # Lowering is done in 3 phases: - VGPR to SGPR copies analysis lowering - REG_SEQUENCE, PHIs, and SGPR to VGPR copies lowering - SCC copies lowering is done in a separate pass over the Machine Function Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131246	2022-09-19 23:31:45 +02:00
Craig Topper	0cec96ab25	[RISCV] Manage the InQueue flag in insertvli correctly. We were only setting this flag the first time we added the blocks not when we mark them for revisiting. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D134193	2022-09-19 14:28:22 -07:00
Haojian Wu	eec19987c0	Fix one more unused warning in release build, NFC	2022-09-19 20:56:39 +02:00
Haojian Wu	20822e2d42	Fix an unused warning in release build, NFC	2022-09-19 20:45:51 +02:00
Krzysztof Parzyszek	94a71361d6	[Hexagon] Implement [SU]INT_TO_FP and FP_TO_[SU]INT for HVX	2022-09-19 11:11:20 -07:00
Krzysztof Parzyszek	ec51e38062	[Hexagon] Add HVX patterns for ISD::ABS	2022-09-19 10:12:15 -07:00
Krzysztof Parzyszek	3eee45cdc8	[Hexagon] Rework SplitHvxPairOp to be a general vector splitting utiity Enable creating an idiom: V -> opJoin(SplitVectorOp(V))	2022-09-19 09:42:13 -07:00
Simon Pilgrim	6b4d409f69	[CostModel][X86] Add CostKinds handling for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-19 17:37:58 +01:00
Krzysztof Parzyszek	e5844462f6	[Hexagon] Use proper output chain when widening HVX loads	2022-09-19 09:04:13 -07:00
Simon Pilgrim	135c9b2c4b	[CostModel][X86] Add CostKinds handling for vector ctlz instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-19 16:44:09 +01:00
Simon Pilgrim	2538adde5c	[CostModel][X86] Add CostKinds handling for cttz This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-19 15:57:03 +01:00
Simon Pilgrim	d90a42d64c	[CostModel][X86] Add CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF cost handling Without LZCNT/BMI, the *_ZERO_UNDEF costs are cheaper as they can avoid the zero handling.	2022-09-19 14:06:33 +01:00
David Green	908b3b6ccb	[AArch64] Use fast-math-flags in isAssociativeAndCommutative Previously only using the UnsafeFPMath option, this now looks for the fast moth flags on the instructions, using the same flag flags as other backends.	2022-09-19 11:34:00 +01:00
LiaoChunyu	2e74157ad4	[RISCV]Preserve (and X, 0xffff) in targetShrinkDemandedConstant shrinkdemandedconstant does some optimizations, but is not very friendly to riscv, targetShrinkDemandedConstant to limit the damage. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134155	2022-09-19 14:19:38 +08:00
LiaoChunyu	8fee91c435	[RISCV][NFC]Remove outdated comment from targetShrinkDemandedConstant Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134154	2022-09-19 10:23:06 +08:00
Kazu Hirata	cf07277fb4	[X86] Fix the LEA optimization pass The LEA optimization pass visits each basic block of a given machine function. In each basic block, for each pair of LEAs that differ only in their displacement fields, we replace all uses of the second LEA with the first LEA while adjusting the displacement. Now, without this patch, after all the replacements are made, the following assert triggers: assert(MRI->use_empty(LastVReg) && "The LEA's def register must have no uses"); The replacement loop uses: for (MachineOperand &MO : llvm::make_early_inc_range(MRI->use_operands(LastVReg))) { which is equivalent to: for (auto UI = MRI->use_begin(LastVReg), UE = MRI->use_end(); UI != UE;) { MachineOperand &MO = UI++; // <-- Look! That is, immediately after the post increment, make_early_inc_range already has the iterator for the next iteration in its mind. The problem is that in one iteration of the loop, we could replace two uses in a debug instruction like: DBG_VALUE_LIST !"r", !DIExpression(DW_OP_LLVM_arg, 0), %0:gr64, %0:gr64, ... So, the iterator for the next iteration becomes invalid. We end up traversing a garbage use list from that point on. In turn, we don't get to visit remaining uses. The patch fixes the problem by switching to a "draining" while loop: while (!MRI->use_empty(LastVReg)) { MachineOperand &MO = MRI->use_begin(LastVReg); MachineInstr &MI = *MO.getParent(); The credit goes to Simon Pilgrim for reducing the test case. Fixes https://github.com/llvm/llvm-project/issues/57673 Differential Revision: https://reviews.llvm.org/D133631	2022-09-18 17:50:17 -07:00
Carl Ritson	930315f6aa	[AMDGPU] Fix isSGPRReg for special registers Special registers, e.g. MODE, do not have register classes so will cause null pointer exception if passed to isSGPRReg. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134025	2022-09-19 08:49:43 +09:00
Kazu Hirata	20d764aff0	[llvm] Don't including SetVector.h (NFC) llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without including SetVector.h, so this patch adds an appropriate #include there.	2022-09-17 12:36:43 -07:00
Sander de Smalen	bed214cf0f	[AArch64][SME] Add intrinsics for enabling/disabling ZA. This adds the intrinsics: * void @llvm.aarch64.sme.za.enable() -> smstart za * void @llvm.aarch64.sme.za.disable() -> smstop za Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D133894	2022-09-17 16:41:42 +00:00
Sander de Smalen	5fae000f36	[AArch64][SME] Disable tail-call optimization when streaming mode change or lazy-save may be required. When a streaming mode change is (or may be) required for a call, it will need to restore the original mode after the call, which prevents the use of tail-call optimization. The same holds true for a call that requires the lazy-save mechanism to be set up before the call, and possibly restored after. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131579	2022-09-17 16:15:07 +00:00
Jessica Paquette	1076b31da8	[GlobalISel] Combine select + fcmp to fminnum/fmaxnum/fminimum/fmaximum This is a partial port of the code used by the SelectionDAGBuilder to translate selects. In particular, see matchSelectPattern in ValueTracking.cpp. This is a GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum. I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler. On the AArch64-end, it seems like the FP cases are more important for perf right now, so I bit the bullet and went at the more complicated problem. :) I elected to do this as a post-legalize combine rather than in the IRTranslator because Deciding which fmax/fmin to use can depend on legalization rules Philosophically-speaking (TM), putting it in a combine just feels cleaner Being able to enable/disable the combine is handy Another option would be to use the ValueTracking code in the IRTranslator and match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat annoying since we'd need to write lowerings back into the selects in the legalizer. I'm not strongly opposed to the approach. We'd also want to be careful with vector selects once that's implemented, which explicitly check if a vector select is legal on the target. That'd probably need a hook. From what I can tell, doing this as a combine is probably a cleaner option long-term. Differential Revision: https://reviews.llvm.org/D116702	2022-09-16 13:35:46 -07:00
Craig Topper	61595c45af	[RISCV] Simplify some code in vector fp<->int handling. NFC We changed the way container types are selected since this code was written. We no longer need to use the largest type.	2022-09-16 12:56:42 -07:00
David Majnemer	8a868d8859	Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"" This reverts commit `cd20a18286` and adds a "let Heading" to NoStackProtectorDocs.	2022-09-16 19:39:48 +00:00
Simon Pilgrim	23cb1c42cd	[CostModel][X86] Update throughput costs for CTLZ ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (and recent fixes to the bdver2 + alderlake models) Adding full CostKinds costs are affecting some other tests as they make assumptions about SizeLatency costs, so they need addressing first	2022-09-16 16:56:49 +01:00
Dmitry Preobrazhensky	ef8feb6359	[AMDGPU][MC][NFC] Correct error message Differential Revision: https://reviews.llvm.org/D134028	2022-09-16 18:22:08 +03:00
Sander de Smalen	bd4935c175	[AArch64][SME] Implement ABI for calls from streaming-compatible functions. When a function is streaming-compatible and calls a function with a normal or streaming interface, it may need to enable/disable stremaing mode before the call, and needs to restore PSTATE.SM after the call. This patch implements this with a Pseudo node that gets expanded to a conditional branch and smstart/smstop node. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131578	2022-09-16 14:48:37 +00:00
Simon Pilgrim	89e4cb603d	[X86] Add missing (unsupported) zmm vector move classes Although unsupported on HSW, we reuse this model for KNL which does require them Noticed when running the cost model fuzz script from D103695 with -mcpu=knl	2022-09-16 15:31:26 +01:00
Sander de Smalen	b00c36c295	[AArch64][SME] Implement ABI for calls to/from streaming functions. This patch implements the ABI for calls from: Normal -> Streaming Normal -> Streaming-compatible Streaming -> Normal Streaming -> Streaming-compatible Streaming -> Streaming The compiler inserts SMSTART/SMSTOP instructions before and after the call, depending on the required transition. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131576	2022-09-16 14:07:47 +00:00
Florian Hahn	6b86b481e3	[AArch64] Use tbl for truncating vector FPtoUI conversions. On AArch64, doing the vector truncate separately after the fptoui conversion can be lowered more efficiently using tbl.4, building on D133495. https://alive2.llvm.org/ce/z/T538CC Depends on D133495 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133496	2022-09-16 14:57:43 +01:00
Simon Pilgrim	f8fa04295f	[CostModel][X86] Add CostKinds handling for vector integer comparisons These were based off a mixture of vector integer add/sub costs and the numbers from the 'cost-tables vs llvm-mca' script from D103695 - the extra costs for different predicates are still proving tricky to implement, but I've gotten most costs to within +/1 now - the AVX512 are tricky as we still don't handle predicate results properly, so most of these were done by hand.	2022-09-16 13:03:41 +01:00
Florian Hahn	8491d01cc3	[AArch64] Lower vector trunc using tbl. Similar to using tbl to lower vector ZExts, tbl4 can be used to lower vector truncates. The initial version support i32->i8 conversions. Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133495	2022-09-16 12:42:49 +01:00
Florian Hahn	5871f18827	[AArch64] Lower extending uitofp using tbl. On AArch64, doing the zero-extend separately first can be lowered more efficiently using tbl, building on D120571. https://alive2.llvm.org/ce/z/8Je595 Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133494	2022-09-16 10:20:25 +01:00
Philip Reames	fdff1bb103	[RISCV] Verify merge operand is tied properly Differential Revision: https://reviews.llvm.org/D133957	2022-09-15 13:06:52 -07:00
Philip Reames	32cfafddb1	[RISCV] Verify VL operand on instructions if present These should only be immediate values or GPR registers. Differential Revision: https://reviews.llvm.org/D133953	2022-09-15 13:06:52 -07:00
Alexander Timofeev	fbdea5a2e9	[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133593	2022-09-15 22:03:56 +02:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Simon Pilgrim	94620e4fc3	[CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 16:51:58 +01:00
Jay Foad	3822a01e0b	[AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction Differential Revision: https://reviews.llvm.org/D133928	2022-09-15 16:46:14 +01:00
Matt Arsenault	69153d6c0a	AMDGPU: Use GlobalPriority for largest register tuples Only do this for 16 and 32 register tuples, although we might want to extend to 8 tuples. It's incredibly expensive to spill these, and doing so majorly interferes with the ability to allocate anything else in the function. The lit tests show mostly sizeable improvements with a handful of tiny regressions with large vectors.	2022-09-15 11:45:02 -04:00
Sander de Smalen	45d28779c5	[AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm() A thread may not have access to SME or TPIDR2_EL0, so in order to safely query PSTATE.SM in a streaming-compatible function, the code should call `__arm_sme_state()`, as described in the ABI: `c2bb09c4d4` This means that the value of pstate.sm is: * 0 if the function is non-streaming. * 1 if the function has `arm_streaming` or `arm_locally_streaming`. * evaluated at runtime by a call to __arm_sme_state() otherwise. This patch also adds a calling convention for calls to SME support routines. At some point we can remove the need for the llvm.aarch64.get.pstatesm() intrinsic and use function calls (with the corresponding cc) directly instead. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131571	2022-09-15 15:14:13 +00:00
Matt Arsenault	63d1d37d35	RegAllocGreedy: Avoid overflowing priority bitfields The class priority is expected to be at most 5 bits before it starts clobbering bits used for other fields. Also clamp the instruction distance in case we have millions of instructions. AMDGPU was accidentally overflowing into the global priority bit in some cases. I think in principal we would have wanted this, but in the cases I've looked at, it had the counter intuitive effect and de-prioritized the large register tuple. Avoid using weird bit hack PPC uses for global priority. The AllocationPriority field is really 5 bits, and PPC was relying on overflowing this to 6-bits to forcibly set the global priority bit. Split this out as a separate flag to avoid having magic behavior for values above 31.	2022-09-15 10:38:40 -04:00
Dmitry Preobrazhensky	0e868aff43	[AMDGPU][MC][GFX11] Add validation of constant bus limitations for VOPD Differential Revision: https://reviews.llvm.org/D133881	2022-09-15 16:36:19 +03:00
Dmitry Preobrazhensky	c89e60bf1f	[AMDGPU][MC][GFX11] Add VOPD literals validation Differential Revision: https://reviews.llvm.org/D133864	2022-09-15 16:29:53 +03:00
Dmitry Preobrazhensky	8bb5c89205	[AMDGPU][MC][NFC] Refactor AMDGPUAsmParser::validateVOPLiteral Differential Revision: https://reviews.llvm.org/D133861	2022-09-15 16:26:14 +03:00
Simon Pilgrim	0ec028fe10	[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 14:05:30 +01:00
wanglei	a65557d4b3	[LoongArch] Fixup value adjustment in applyFixup A complete implementation of `applyFixup` for D132323. Makes `LoongArchAsmBackend::shouldForceRelocation` to determine if the relocation types must be forced. This patch also adds range and alignment checks for `b*` instructions' operands, at which point the offset to a label is known. Differential Revision: https://reviews.llvm.org/D132818	2022-09-15 21:00:22 +08:00
Ivan Kosarev	693f816288	[AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D133787	2022-09-15 13:48:51 +01:00
Ilia Diachkov	3544d200d9	[SPIRV] add IR regularization pass The patch adds the regularization pass that prepare LLVM IR for the IR translation. It also contains following changes: - reduce indentation, make getNonParametrizedType, getSamplerType, getPipeType, getImageType, getSampledImageType static in SPIRVBuiltins, - rename mayBeOclOrSpirvBuiltin to getOclOrSpirvBuiltinDemangledName, - move isOpenCLBuiltinType, isSPIRVBuiltinType, isSpecialType from SPIRVGlobalRegistry.cpp to SPIRVUtils.cpp, renaming isSpecialType to isSpecialOpaqueType, - implment getTgtMemIntrinsic() in SPIRVISelLowering, - add hasSideEffects = 0 in Pseudo (SPIRVInstrFormats.td), - add legalization rule for G_MEMSET, correct G_BRCOND rule, - add capability processing for OpBuildNDRange in SPIRVModuleAnalysis, - don't correct types of registers holding constants and used in G_ADDRSPACE_CAST (SPIRVPreLegalizer.cpp), - lower memset/bswap intrinsics to functions in SPIRVPrepareFunctions, - change TargetLoweringObjectFileELF to SPIRVTargetObjectFile in SPIRVTargetMachine.cpp, - correct comments. 5 LIT tests are added to show the improvement. Differential Revision: https://reviews.llvm.org/D133253 Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com> Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com> Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>	2022-09-15 15:53:44 +03:00
esmeyi	6e0e926c2f	[PowerPC] Converts to comparison against zero even when the optimization doesn't happened in peephole optimizer. Summary: Converting a comparison against 1 or -1 into a comparison against 0 can exploit record-form instructions for comparison optimization. The conversion will happen only when a record-form instruction can be used to replace the comparison during the peephole optimizer (see function optimizeCompareInstr). In post-RA, we also want to optimize the comparison by using the record form (see D131873) and it requires additional dataflow analysis to reliably find uses of the CR register set. It's reasonable to common the conversion for both peephole optimizer and post-RA optimizer. Converting to comparison against zero even when the optimization doesn't happened in peephole optimizer may create additional opportunities for the post-RA optimization. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D131374	2022-09-15 06:06:25 -04:00
Marco Elver	72e7575ffe	[GlobalISel][AArch64] Fix pcsections for expanded atomics and add more tests Add fix for propagation of !pcsections metadata for expanded atomics, together with more tests for interesting atomic instructions (based on llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133710	2022-09-15 10:36:11 +02:00
Sheng	bea33f75e2	[M68k] Fix the crash of fast register allocator `MOVEM` is used to spill the register, which will cause problem with 1 byte data, since it only supports word (2 bytes) and long (4 bytes) size. We change to use the normal `move` instruction to spill 1 byte data. Fixes #57660 Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D133636	2022-09-15 09:24:22 +08:00
Craig Topper	5888c157a7	[RISCV] Simplify some code in RISCVInstrInfo::verifyInstruction. NFCI This code was written as if it lived in the MC layer instead of the CodeGen layer. We get the MCInstrDesc directly from MachineInstr. And we can use RISCVSubtarget::is64Bit instead of going to the Triple. Differential Revision: https://reviews.llvm.org/D133905	2022-09-14 17:07:21 -07:00
Philip Reames	e395915ac0	[RISCV] Verify SEW/VecPolicy immediate values Copy the asserts from the printing code, and turn them into actual verifier rules. Doing this revealed an existing bug - see `0a14551`. Differential Revision: https://reviews.llvm.org/D133869	2022-09-14 14:45:16 -07:00
Philip Reames	0a145516a2	[RISCV] Fix a silent miscompile in copyPhysReg Found this when adding verifier rules. The case which arises is that we have a DefMBBI which has a VecPolicy operand. The code was not expecting this, and the unconditional copy of the last two operands resulted in the SEW and VecPolicy fields being added to the VMV_V_V as AVL and SEW respectively. Oddly, this appears to be a silent in practice. There's no test change despite verifier changes proving that we definitely hit this in existing tests. Differential Revision: https://reviews.llvm.org/D133868	2022-09-14 14:45:01 -07:00
Piotr Sobczak	abd927e5a8	[AMDGPU] Check for num elts in SelectVOP3PMods The rest of the code section assumes there are exactly two elements in the vector (Lo, Hi), so add the check before entering the section. Differential Revision: https://reviews.llvm.org/D133852	2022-09-14 20:00:19 +02:00
David Spickett	3acaf04033	[LLVM][AArch64] Don't warn about clobbering X16 when Speculative Load Hardening is used SLH will fall back to a different technique if X16 is being used, so there is no need to warn for inline asm use. Only prevent other codegen from using it. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D133766	2022-09-14 15:19:53 +00:00
Zain Jaffal	d1dec04d76	[AArch64] Disable nontemproal load for Big Endian The current code for generating nontemporal load outputs the wrong assembly for big endian architecture. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133789	2022-09-14 14:49:55 +01:00
Simon Pilgrim	854a4595b6	[CostModel][X86] getArithmeticInstrCost - move GLM/SLM custom costs AFTER constant shift -> multiply canonicalization Corrects the shift by constant costs to better account for them being converted to multiples for lowering - which demonstrates that we should probably be trying harder NOT to convert these to multiplies for some CPUs (v4i32 in particular).	2022-09-14 11:46:26 +01:00
Simon Pilgrim	40ab7875f8	[CostModel][X86] Fix throughput costs for AVX512BW v32i16 shifts Fixes regression from `a931dbfbd3`	2022-09-14 11:18:23 +01:00
Jon Chesterfield	cdb9738963	[amdgpu] Expand all ConstantExpr users of LDS variables in instructions Bug noted in D112717 can be sidestepped with this change. Expanding all ConstantExpr involved with LDS up front makes the variable specialisation simpler. Excludes ConstantExpr that don't access LDS to avoid disturbing codegen elsewhere. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133422	2022-09-14 07:55:46 +01:00

... 3 4 5 6 7 ...

69244 Commits