llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	5e1a9d319d	[ARM] Add lowering for bf16 neon vtrn, vzup and vuzp. These go via Dag2Dag, which are better based on element sizes not the exact element types.	2022-10-02 15:34:37 +01:00
David Green	f2fde99461	[ARM] More bf16 shuffle handling, including perfect shuffles.	2022-10-02 14:31:51 +01:00
David Green	8193f0d1d2	[ARM] Add tablegen patterns for bf16 vrev	2022-10-02 13:42:14 +01:00
David Green	58369c8631	[ARM] Add tablegen patterns for bf16 vext This adds missing tablegen patterns for VEXT, identical to the fp16 patterns as they only use baseline Neon operations. Part of fixing #57770.	2022-10-02 12:45:58 +01:00
Craig Topper	5bbc5eb55f	[RISCV] Use _TIED form of VWADD(U)_WX/VWSUB(U)_WX to avoid early clobber. One of the sources is the same size as the destination so that source doesn't have an overlap with the destination register. By using the _TIED form we avoid an early clobber contraint for that source. This matches what was already done for instrinsics. ConvertToThreeAddress will fix it if it can't stay tied.	2022-10-01 16:34:39 -07:00
Craig Topper	85db4f10e3	[RISCV] Minor tablegen formatting cleanup. NFC	2022-10-01 15:59:25 -07:00
zhongyunde	4a549be9c3	[AArch64] Lower multiplication by a negative constant to shl+sub+shl Change the costmodel to lower a = b * C where C = -(2^n - 2^m) to lsl w8, w0, m sub w0, w8, w0, lsl n Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134934	2022-10-01 21:27:42 +08:00
Filipp Zhinkin	945a1468c9	[ARM] Support all versions of AND, ORR, EOR and BIC in optimizeCompareInstr Combine cmp with zero and all versions of AND, ORR, EOR and BIC instructions into S-suffixed versions. Related issue: https://github.com/llvm/llvm-project/issues/57122 Reviewed By: efriedma, samtebbs Differential Revision: https://reviews.llvm.org/D131786	2022-10-01 12:41:37 +03:00
Carl Ritson	a35013bec6	[AMDGPU][GFX11] Mitigate VALU mask write hazard VALU use of an SGPR (pair) as mask followed by SALU write to the same SGPR can cause incorrect execution of subsequent SALU reads of the SGPR. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D134151	2022-10-01 16:21:24 +09:00
Craig Topper	9273f860c0	[RISCV] Prevent performCombineVMergeAndVOps from creating cycles in the DAG. If True has a Chain result, the other operands of the vmerge may depend on it through that Chain. We need to ensure it isn't a predecessor of those operands. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D134980	2022-09-30 20:01:45 -07:00
Craig Topper	de0de294eb	[RISCV] Update cost of vector roundeven to match round which uses the same sequence but a different FRM value. Reviewed By: reames, eopXD Differential Revision: https://reviews.llvm.org/D134978	2022-09-30 20:01:35 -07:00
Yeting Kuo	cefb7aab61	[VP][RISCV] Add vp.copysign and RISC-V support. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134935	2022-10-01 10:19:10 +08:00
Matthias Braun	189900eb14	X86: Stop assigning register costs for longer encodings. This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`. This was previously done because instruction encoding require a REX prefix when using them resulting in longer instruction encodings. I found that this regresses the quality of the register allocation as the costs impose an ordering on eviction candidates. I also feel that there is a bit of an impedance mismatch as the actual costs occure when encoding instructions using those registers, but the order of VReg assignments is not primarily ordered by number of Defs+Uses. I did extensive measurements with the llvm-test-suite wiht SPEC2006 + SPEC2017 included, internal services showed similar patterns. Generally there are a log of improvements but also a lot of regression. But on average the allocation quality seems to improve at a small code size regression. Results for measuring static and dynamic instruction counts: Dynamic Counts (scaled by execution frequency) / Optimization Remarks: Spills+FoldedSpills -5.6% Reloads+FoldedReloads -4.2% Copies -0.1% Static / LLVM Statistics: regalloc.NumSpills mean -1.6%, geomean -2.8% regalloc.NumReloads mean -1.7%, geomean -3.1% size..text mean +0.4%, geomean +0.4% Static / LLVM Statistics: mean -2.2%, geomean -3.1%) regalloc.NumSpills mean -2.6%, geomean -3.9%) regalloc.NumReloads mean +0.6%, geomean +0.6%) size..text Static / LLVM Statistics: regalloc.NumSpills mean -3.0% regalloc.NumReloads mean -3.3% size..text mean +0.3%, geomean +0.3% Differential Revision: https://reviews.llvm.org/D133902	2022-09-30 16:01:33 -07:00
Peter Collingbourne	0caa9d4b1e	AArch64: Don't use RETA[AB] when ShadowCallStack is enabled. When returning from a function with both SCS and PAC-RET enabled, we need to authenticate the return address from the stack and then load from the SCS, but this was happening in the reverse order when RETA[AB] were being used. Fix it by disabling the use of RETA[AB] when SCS is enabled. Fixes pr58072. Differential Revision: https://reviews.llvm.org/D134931	2022-09-30 12:33:23 -07:00
Xiang Li	a80a888de5	[DirectX backend] Support global ctor for DXILBitcodeWriter. 1. Save typed pointer type for GlobalVariable/Function instead of the ObjectType. This will allow use GlobalVariable/Function as value. 2. Save target type for global ctors for Constant. 3. In DXILBitcodeWriter::getTypeID, check PointerMap first for Constant case. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D133283	2022-09-30 11:27:23 -07:00
Florian Hahn	fe49ba84d3	[AArch64] Reflow comment in AArch64IselLowering.cpp (NFC).	2022-09-30 17:17:04 +01:00
Zain Jaffal	fca8730793	[AArch64] Refactor opcode selection for LowerMUL (NFC) Move the logic for selecting `NewOpc` out of `LowerMUL` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134875	2022-09-30 16:48:02 +01:00
Philip Reames	1e3c179519	[RISCV] Address post commit review comments from D134881	2022-09-30 08:31:40 -07:00
Saleem Abdulrasool	519a73111b	RISCV: adjust relocation emission Simplify and make the pair-wise relocation more precise. If either of the symbol references are textual, the relocation must be delayed. If the difference is across sections, delay it as well which partially matches the behaviour of gas. We unfortunately do not handle the case where the difference references a symbol that is not yet defined. In such a case, we simply fail to resolve the difference, which should hopefully not be too onerous (particularly since no other target supports cross-section references and it is not clear if this was intentional on the part of RISCV). Differential Revision: https://reviews.llvm.org/D132262 Reviewed By: @MaskRay	2022-09-30 15:28:48 +00:00
Philip Reames	2b5960028e	[RISCV] Branchless lowering for select (and (x , 0x1) == 0), y, (z ^ y) ) and select (and (x , 0x1) == 0), y, (z \| y) ) This code is directly ported from the X86 backend which applies the same rewrite (along with several others). Planning on looking more closely at the other branchless variants from x86 to see if any are worth porting in future changes. Motivation here is the coremark crc8 routine from https://github.com/eembc/coremark/blob/main/core_util.c#L165. This patch significantly reduces the number of unpredictable branches in the workload. Differential Revision: https://reviews.llvm.org/D134881	2022-09-30 08:24:32 -07:00
Ray Wang	4c786c9747	[RISCV] Remove some unused var decl. NFC Differential Revision: https://reviews.llvm.org/D134707	2022-09-30 08:08:15 -07:00
Pierre van Houtryve	d8258508d4	[AMDGPU][GISel] Update `isCanonicalized` Recognize more opcodes in the function. Fixes some regressions introduced in D134857 for fdiv.f16 too. Depends on D134857 Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D134862	2022-09-30 14:13:35 +00:00
Pierre van Houtryve	9a67a6b72a	[AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal. Also removes RegBankInfo's scalarization of small BUILD_VECTORs, replacing it with InstructionSelector logic instead. This allows for V2S16 BUILD_VECTOR instructions to survive all the way to ISel so we can select FMA/MAD_MIX instructions in D134354. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134433	2022-09-30 14:04:53 +00:00
Ivan Kosarev	a964099ce5	[AMDGPU][SetWavePriority] Fix dealing with MBBInfo records. Happened earlier than I anticipated. :) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134726	2022-09-30 14:27:50 +01:00
Zain Jaffal	661403b85c	[AArch64] Add support for 128-bit non temporal loads. Adding to the work done in `D131773` here we add support to 128-bit loads. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D132559	2022-09-30 11:04:04 +01:00
gonglingqin	853a1b7236	[LoongArch] Clean up redundant code introduced by conflict resolution. NFC	2022-09-30 16:45:21 +08:00
Yeting Kuo	1cc02b05b7	[SelectionDAG] Add helper function to check whether a SDValue is neutral element. NFC. Using this helper makes work about neutral elements more easier. Although I only find one case now, I think it will have more chance to be used since so many combine works are related to neutral elements. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133866	2022-09-30 11:29:11 +08:00
Amara Emerson	7653586d88	[AArch64][GlobalISel] Implement another combine for shufflevector->AArch64 G_EXT. This is a port of an existing optimization in AArch64 ISelLowering, handling a case when the same input vector can be used for both ext inputs. Differential Revision: https://reviews.llvm.org/D134891	2022-09-29 22:53:24 +01:00
Philip Reames	900364fccf	[RISCV] Minor code motion in InsertVSETVLI [nfc]	2022-09-29 14:01:57 -07:00
Bjorn Pettersson	0513b0305a	[X86] Avoid miscompile in combineOr (X86ISelLowering.cpp) In combineOr (X86ISelLowering.cpp) there is a DAG combine that rewrite a "(0 - SetCC) \| C" pattern into something simpler given that a LEA can be used. Another requirement is that C has some specific value, for example 1 or 7. When checking those requirements the code used a 32-bit unsigned variable to store the value of C. So for a 64-bit OR this could miscompile in case any of the 32 most significant bits in C were non zero. This patch adds fixes the bug by using a large enough type for the C value. The faulty code seem to have been introduced by commit `9bceb8981d` (D131358). Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134892	2022-09-29 21:24:31 +02:00
zhongyunde	4d15e7b21b	[AArch64] Lower multiplication by a constant (NFC) Refactor according https://reviews.llvm.org/D134706#inline-1298952 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134848	2022-09-30 01:37:28 +08:00
zhongyunde	62a51c357c	[AArch64] Lower multiplication by a constant int to shl+sub+shl Decompose the const 14 can be separated from D132322 Change the costmodel to lower a = b * C where C = 2^n - 2^m to lsl w8, w0, n sub w0, w8, w0, lsl m Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134706	2022-09-30 01:31:06 +08:00
Chris Bieneman	5d4dd53570	Revert "[DirectX backend] Support global ctor for DXILBitcodeWriter." This reverts commit `26129766df`. The reverted commit broke in-tree unit tests for the DirectX backend.	2022-09-29 11:58:27 -05:00
Dmitry Preobrazhensky	485c539391	[AMDGPU][MC][GFX11] Disable non-null src0 for s_waitcnt_*cnt Differential Revision: https://reviews.llvm.org/D134809	2022-09-29 19:56:03 +03:00
David Green	4c4e544cd8	[ARM] Add an option for disabling omitting DLS. Useful for testing, this option disables when `DLS lr, lr` gets removed.	2022-09-29 17:42:45 +01:00
Philip Reames	02bfe2de7c	[RISCV] Adjust vector immediate store materialization cost This change updates the costs to make constant pool loads match their actual cost, and adds the broadcast special case to avoid too many regressions. We really need more information about the constants being rematerialized, but this is an incremental improvement. Differential Revision: https://reviews.llvm.org/D134746	2022-09-29 07:37:13 -07:00
eopXD	02a982829c	[RISCV] Add lowering for llvm.roundeven Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134785	2022-09-29 06:08:14 -07:00
Fangrui Song	04a65d62a0	Revert D134638 "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC" This reverts commit `b7baddc755`. Broke CodeGen/X86/callbr-asm-kill.mir We shall pay attention when adding new constraints.	2022-09-29 00:54:56 -07:00
Weining Lu	b7baddc755	[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC k: A memory operand whose address is formed by a base register and (optionally scaled) index register. m: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as st.w and ld.w. ZB: An address that is held in a general-purpose register. The offset is zero. ZC: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ll.w and sc.w. Differential Revision: https://reviews.llvm.org/D134638	2022-09-29 15:02:08 +08:00
Abinav Puthan Purayil	3759398b4b	[AMDGPU] Report minimum scratch size in code object v5 and later by default This change sets -amdgpu-assume-{external-call-stack-size \| dynamic-stack-object-size} options to zero by default for code object v5 and later. The runtime is expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit in the kernel descriptor is set. Differential Revision: https://reviews.llvm.org/D128346	2022-09-29 09:52:45 +05:30
gonglingqin	dc3c5a78f2	[LoongArch] Add fp_to_sint support for soft floating point Differential Revision: https://reviews.llvm.org/D134692	2022-09-29 10:25:35 +08:00
WANG Xuerui	3155f6c508	[LoongArch] Expand llvm.stacksave and llvm.stackrestore As in commit `bfb00d4c1c` ("[RISCV] Allow lowering of dynamic_stackalloc, stacksave, stackrestore"). Differential Revision: https://reviews.llvm.org/D134435	2022-09-29 09:07:44 +08:00
wanglei	036b170c24	[LoongArch] Produce a R_LARCH_32_PCREL relocation LoongArchELFObjectWriter::getRelocType check IsPCRel for FK_Data_4 (which we produce a R_LARCH_32_PCREL relocation for if IsPCRel). R_LARCH_32_PCREL is required for FDE relocation. Differential Revision: https://reviews.llvm.org/D134715	2022-09-29 09:04:44 +08:00
wanglei	7b1bdfbeb0	[LoongArch] Override TargetSubtargetInfo::getSelectionDAGInfo The target selection DAG lowering information is needed for SelectionDAGBuilder to lower a call like memcmp into an optimized form. Differential Revision: https://reviews.llvm.org/D134712	2022-09-29 08:46:53 +08:00
Jessica Paquette	95dabac7a5	[AArch64][GlobalISel] Make G_PTRTOINT only legal for s64 + p0 A few issues: 1. There was no legalizer test for G_PTRTOINT 2. Same clamping issue as in many other opcodes 3. AArch64 pointers can only be 64b, so in reality we always have to trunc or extend with any size other than p0 anyway. This seems to actually produce more correct selection for narrow types as well. Differential Revision: https://reviews.llvm.org/D107588	2022-09-28 16:20:24 -07:00
Jessica Paquette	a7aaafde2e	[AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN This is intended to be equivalent to the s32 + s64 cases in AArch64TargetLowering::LowerFCOPYSIGN. Widen everything and then use G_BIT + a mask to handle the actual copysign operation. Then, narrow back down to s32/s64. I wasn't sure about what the best/most canonical INSERT_SUBREG-selectable pattern is. I chose G_INSERT_VECTOR_ELT + an undef vector because it produces reasonably okay codegen. (It doesn't produce INSERT_SUBREG right now though.) If there's a better way to do this then I'm happy to change it. We also have a couple codegen deficiencies with how we emit vector constants right now. (We need a GISel equivalent to the tryAdvSIMDModImm64 stuff) Differential Revision: https://reviews.llvm.org/D108725	2022-09-28 16:03:22 -07:00
Florian Mayer	0401dc2913	[MTE] [HWASan] unify isInterestingAlloca Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D134779	2022-09-28 15:52:34 -07:00
Jessica Paquette	4957ee6529	[AArch64][GlobalISel] Add a target-specific G_BIT opcode. This is necessary for custom-legalizing G_FCOPYSIGN. This is equivalent to the BIT instruction (bitwise insert if true). Add selection testcases for imported patterns. Differential Revision: https://reviews.llvm.org/D108714	2022-09-28 15:48:35 -07:00
Xiang Li	26129766df	[DirectX backend] Support global ctor for DXILBitcodeWriter. 1. Save typed pointer type for GlobalVariable/Function instead of the ObjectType. This will allow use GlobalVariable/Function as value. 2. Save target type for global ctors for Constant. 3. In DXILBitcodeWriter::getTypeID, check PointerMap first for Constant case. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D133283	2022-09-28 13:23:56 -07:00
Stanislav Mekhanoshin	5a3fe9a039	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-28 13:13:40 -07:00

1 2 3 4 5 ...

69035 Commits