llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	bf5748a1af	[x86] fold vector (X > -1) & Y to shift+andn and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y This avoids the -1 constant vector in favor of an arithmetic shift instruction if it exists (the ISA is still not complete after all these years...). We catch this pattern late in combining by matching PCMPGT, so it should not interfere with more general folds. Differential Revision: https://reviews.llvm.org/D113603	2021-11-12 08:17:46 -05:00
Neubauer, Sebastian	d1f45ed58f	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672	2021-11-12 11:37:21 +01:00
Simon Moll	751aa6c280	[VE][NFCi] Remove unused tablegen parameters TableGen has started warning about unused template parameters in the isel patterns. Remove those. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D113675	2021-11-12 08:19:50 +01:00
Kazu Hirata	2ca45adf24	[CodeGen, Target] Use MachineRegisterInfo::use_operands (NFC)	2021-11-11 22:28:55 -08:00
Serge Pavlov	3057e850b8	[X86] Preserve FPSW when popping x87 stack When compiler converts x87 operations to stack model, it may insert instructions that pop top stack element. To do it the compiler inserts instruction FSTP right after the instruction that calculates value on the stack. It can break the code that uses FPSW set by the last instruction. For example, an instruction FXAM is usually followed by FNSTSW, but FSTP is inserted after FXAM. As FSTP leaves condition code in FPSW undefined, the compiler produces incorrect code. With this change FSTP in inserted after the FPSW consumer if the last instruction sets FPSW. Differential Revision: https://reviews.llvm.org/D113335	2021-11-12 12:00:09 +07:00
Phoebe Wang	74b979abcd	[X86][FP16] Avoid to generate VZEXT_MOVL with i16 This fixes the crash due to lacking VZEXT_MOVL support with i16. Reviewed By: LuoYuanke, RKSimon Differential Revision: https://reviews.llvm.org/D113661	2021-11-12 09:32:29 +08:00
Min-Yih Hsu	99152a4164	[M68k][NFC] Rename 'GlSel' -> 'GISel' AArch64 as well as other targets use the abbrev "GISel" so we'd better to be consistent with them. NFC.	2021-11-11 11:01:09 -08:00
Simon Pilgrim	94a901a50a	[X86] Move LowerFunnelShift below LowerShift. NFC. Makes it easier to reuse the various vector shift helpers defined above LowerShift	2021-11-11 18:45:51 +00:00
Jordan Rupprecht	da4822f6c8	[PowerPC][NFC] Ignore unused var in release builds. Note we can't inline this call into assert because `isIntS16Immediate` has a side effect. But we only look at the return value in asserts builds.	2021-11-11 08:57:40 -08:00
Craig Topper	ee7a006ce4	[RISCV] Promote f16 ceil/floor/round/roundeven/nearbyint/rint/trunc intrinsics to f32 libcalls. Previously these would crash. I don't think these can be generated directly from C. Not sure if any optimizations can introduce them. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D113527	2021-11-11 08:28:41 -08:00
Victor Huang	18fe0a0d9e	[PowerPC] PPC backend optimization to lower int_ppc_tdw/int_ppc_tw intrinsics to TDI/TWI machine instructions This patch adds the backend optimization to match XL behavior for the two builtins __tdw and __tw that when the second input argument is an immediate, emitting tdi/twi instructions instead of td/tw. Reviewed By: nemanjai, amyk, PowerPC Differential revision: https://reviews.llvm.org/D112285	2021-11-11 09:52:00 -06:00
Florian Hahn	c2ed9fd054	[AArch64] Use custom lowering for {U,S}INT_TO_FP with i8. With fullfp16, it is cheaper to cast the {U,S}INT_TO_FP operand to i16 first, rather than promoting it to i32. The custom lowering for {U,S}INT_TO_FP already supports that, it just needs to be used. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D113601	2021-11-11 08:47:15 +00:00
David Green	703ded8dda	[AArch64] Allow FP16 vector fixed point converts This extends performFpToIntCombine to work on FP16 vectors as well as the f32 and f64 vectors it already supported. Differential Revision: https://reviews.llvm.org/D113297	2021-11-11 07:32:52 +00:00
Kazu Hirata	642a361b7e	[llvm] Use make_early_inc_range (NFC)	2021-11-10 19:56:35 -08:00
kpyzhov	c9690092c8	[AMDGPU] Small correction in SITargetLowering::performOrCombine(). Differential Revision: https://reviews.llvm.org/D113203	2021-11-10 21:07:27 -05:00
Craig Topper	4183522e80	[RISCV] Promote f16 frem with Zfh. Add riscv64 coverage for f32 and f64 frem. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D113531	2021-11-10 17:35:07 -08:00
Stanislav Mekhanoshin	476ab0f809	[AMDGPU] Fixed stack pointer init with architected flat scratch Even if wave offset is not present we still need to do the rest of the initialization. The mov into s32 was missing in the kernels. Fixes: SWDEV-310935 Differential Revision: https://reviews.llvm.org/D113628	2021-11-10 17:18:38 -08:00
Matt Arsenault	c7a0c2d0f7	AMDGPU: Report large stack usage for recursive calls We were previously setting an ignored bit in the kernel headers. The current behavior is to add the large amount on top of the statically known size of a single stack frame. I'm not sure if we should just use the large size as the entire reported size instead.	2021-11-10 20:02:01 -05:00
Craig Topper	9ee5cec688	[RISCV] Prevent bad legalizer behavior when bitcasting fixed vectors to i64 on RV32 with Zve32. Similar to D113219, we need to make sure we don't create a vXi64 vector when it isn't legal. This fixes an error found by an expensive checks build.	2021-11-10 11:58:49 -08:00
Roman Lebedev	a70d74323e	[X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 8 bit-wide elements with AVX512VBMI VBMI introduced VPERMB, so cost-model i8 replication shuffle using it. Note that we can still model i8 replication shufflle without VBMI, by promoting to i16/i32. That will be done in follow-ups. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113479	2021-11-10 22:52:40 +03:00
Roman Lebedev	c6e894b9b2	[X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 16 bit-wide elements with AVX512BW BWI introduced VPERMW, so cost-model i16 replication shuffle using it. Note that we can still model i16 replication shufflle without BWI, by promoting to i32. That will be done in follow-ups. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113478	2021-11-10 22:52:39 +03:00
Roman Lebedev	4101c7bf19	[X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 32/64 bit-wide elements with AVX512F This models lowering to `vpermd`/`vpermq`/`vpermps`/`vpermpd`, that take a single input vector and a single index vector, and are cross-lane. So far i haven't seen evidence that replication ever results in demanding more than a single input vector per output vector. This results in shockingly lesser costs :) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113350	2021-11-10 22:52:33 +03:00
Sanjay Patel	a8abd19b10	[x86] simplify code; NFC We bail out if the types don't match, so it's clearer to have a single variable to show that common type.	2021-11-10 13:29:57 -05:00
Sanjay Patel	5424fb164a	[x86] fix formatting; NFC	2021-11-10 13:29:57 -05:00
Zarko Todorovski	ed4a91300b	[NFC][llvm][M68k] Inclusive language: reword comment Rewording the comment to avoid the use of blacklist.	2021-11-10 13:28:32 -05:00
Craig Topper	57bc7b1089	[RISCV] Prevent crashes when bitcasting between fixed vectors and scalars. Not all scalar element types are allowed in vectors so we may not be able to bitcast to a 1 element vector to use insert/extract. This will become a bigger issue when the Zve extensions are commited. For now, I'm using the ELEN limit to limit the element types. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113219	2021-11-10 09:21:52 -08:00
Simon Pilgrim	a1e0aa75ca	[X86] combineMulToPMADDWD - remove useless TODO We should always be able to use PMULUDQ/PMULDQ in PMADDWD patterns with greater than 32-bit extended integer sources	2021-11-10 16:56:44 +00:00
David Green	509b397dd5	[AArch64] Combine vector fptoi.sat(fmul) to fixed point fcvtz Similar to D113199 but dealing with the vector size, this extends the fptosi+fmul to fixed point fold to handle fptosi.sat nodes that are equally viable, so long as the saturation width matches the output width. Differential Revision: https://reviews.llvm.org/D113200	2021-11-10 16:12:48 +00:00
Andrew Savonichev	00aa0aeb06	[NVPTX] Add imm variants for surface and texture instructions Texture/sampler/surface operands can be either a register or an immediate (an index of .texref, .samplerref or .surfref). TableGen declarations for these instructions used to only have Int64Regs operands, so this caused issues when machine verifier is turned on: * Bad machine code: Expected a register operand. * - function: bar - basic block: %bb.0 (0x55b144d99ab8) - instruction: %4:int32regs = SULD_1D_I32_TRAP 0, killed %2:int32regs - operand 1: 0 The solution is to duplicate these instructions for all possible operand types (i16imm and Int64Regs). Since this would essentially double the amount code in TableGen, the patch also does some refactoring for the original instructions to keep things manageable. Differential Revision: https://reviews.llvm.org/D112232	2021-11-10 19:05:03 +03:00
Nico Weber	e23c6cc54e	[aarch64/mac] Correctly disassemble @TLVPPAGE(OFF) relocs `llvm-otool -tV foo.o` and `llvm-objdump --macho -d foo.o` would previously fail on object files containing @TLVPPAGE or @TLVPPAGEOFF relocs. Move llvm-objdump-specific test from llvm/test/MC/AArch64/arm64-tls-modifiers-darwin.s to new llvm/test/tools/llvm-objdump/MachO/disassemble-arm64-tlv-modifers.test and put test for this fix to that new file. Fixes PR52356. Differential Revision: https://reviews.llvm.org/D112843	2021-11-10 10:41:18 -05:00
Sanjay Patel	be9e892e9d	[x86] shorten function name; NFC	2021-11-10 09:44:55 -05:00
Nemanja Ivanovic	5840f7197d	[PowerPC] Respect rounding mode in the back end Currently, the floating point instructions that depend on rounding mode are correctly marked in the PPC back end with an implicit use of the RM register. Similarly, instructions that explicitly define the register are marked with an implicit def of the same register. So for the most part, RM-using code won't be moved across RM-setting instructions. However, calls are not marked as RM-setting instructions so code can be moved across calls. This is generally desired, but so is the ability to turn off this behaviour with an appropriate option - and -frounding-math really should be that option. This patch provides a set of call instructions (for direct and indirect calls) that are marked with an implicit def of the RM register. These will be used for calls that are marked with the strictfp attribute. Differential revision: https://reviews.llvm.org/D111433	2021-11-10 08:19:58 -06:00
Andrew Savonichev	e201232ece	[NFC][AArch64] Handle processLogicalImmediate error If processLogicalImmediate fails, we should return from the function without changing InsInstrs or DelInstrs. This happens for CodeGen/AArch64/urem-seteq-nonzero.ll LIT test as described in https://reviews.llvm.org/D99662#2662296. Callers of genAlternativeCodeSequence skip patterns where InsInstrs stays empty, so this does not cause any issues now. Differential Revision: https://reviews.llvm.org/D100047	2021-11-10 16:57:24 +03:00
Kazu Hirata	ef2d0e0f20	[llvm] Use MachineBasicBlock::{successors,predecessors} (NFC)	2021-11-09 23:05:15 -08:00
Amara Emerson	af4dc633f8	[AArch64][GlobalISel] Fix atomic truncating stores from generating invalid copies. If the source reg is a 64b vreg, then we need to emit a subreg copy to a 32b gpr before we select sub-64b variants like STLRW.	2021-11-09 20:47:50 -08:00
Matt Arsenault	90ff148719	AMDGPU: Account for implicit argument alignment for kernarg segment If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For some reason we require an 8-byte alignment for this, even though there's no real advantage and I don't see where this is documented in the ABI. The code object header code also claims the minimum alignment is 16, which is what I thought you always got at runtime anyway so I don't know why this matters.	2021-11-09 17:48:37 -05:00
Alexander Shaposhnikov	b705e13341	[CodeGen][Outliner] Clean up dead code Clean up dead code in X86InstrInfo.cpp and AArch64InstrInfo.cpp Test plan: make check-all Differential revision: https://reviews.llvm.org/D111151	2021-11-09 21:39:38 +00:00
Yonghong Song	8d499bd5bc	BPF: change btf_type_tag BTF output format For the declaration like below: int __tag1 * __tag1 __tag2 g Commit `41860e602a` ("BPF: Support btf_type_tag attribute") implemented the following encoding: VAR(g) -> __tag1 --> __tag2 -> pointer -> __tag1 -> pointer -> int Some further experiments with linux btf_type_tag support, esp. with generating attributes in vmlinux.h, and also some internal discussion showed the following format is more desirable: VAR(g) -> pointer -> __tag2 -> __tag1 -> pointer -> __tag1 -> int The format makes it similar to other modifier like 'const', e.g., const int g which has encoding VAR(g) -> PTR -> CONST -> int Differential Revision: https://reviews.llvm.org/D113496	2021-11-09 11:34:25 -08:00
Benjamin Kramer	194897eccf	[ARM] Fix unused variable warning in Release builds	2021-11-09 18:55:52 +01:00
Ard Biesheuvel	a19da876ab	[ARM] implement support for TLS register based stack protector Implement support for loading the stack canary from a memory location held in the TLS register, with an optional offset applied. This is used by the Linux kernel to implement per-task stack canaries, which is impossible on SMP systems when using a global variable for the stack canary. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112768	2021-11-09 18:19:47 +01:00
Simon Pilgrim	d510fd2bed	[X86] combineMulToPMADDWD - handle any pow2 vector type and split to legal types combineMulToPMADDWD is currently limited to legal types, but there's no reason why we can't handle any larger type that the existing SplitOpsAndApply code can use to split to legal X86ISD::VPMADDWD ops. This also exposed a missed opportunity for pre-SSE41 targets to handle SEXT ops from types smaller than vXi16 - without PMOVSX instructions these will always be expanded to unpack+shifts, so we can cheat and convert this into a ZEXT(SEXT()) sequence to make it a valid PMADDWD op. Differential Revision: https://reviews.llvm.org/D110995	2021-11-09 15:20:43 +00:00
Kazu Hirata	cba40c4ede	[llvm] Use MachineBasicBlock::{successors,predecessors} (NFC)	2021-11-09 07:11:14 -08:00
Sergei Larin	a721ddbae9	Update MaxMinLatency even if dependencies have been already scheduled. Covers an extremely rare corner case on internal book keeping.	2021-11-09 06:47:49 -08:00
Andrew Savonichev	b702276ad0	[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner. FMUL_indexed is normally selected during instruction selection, but it does not work in cases when VDUP and VMUL are in different basic blocks. Differential Revision: https://reviews.llvm.org/D99662	2021-11-09 15:30:19 +03:00
Roman Lebedev	d484cc152b	[TTI] Adjust `getReplicationShuffleCost()` interface It is trivial to produce DemandedSrcElts given DemandedReplicatedElts, so don't pass the former. Also, it isn't really useful so far to have the overload taking the Mask, so just inline it.	2021-11-09 14:07:59 +03:00
Shao-Ce SUN	1c81941f19	[NFC][RISCV] Fix wrong predicates of vfwredsum	2021-11-09 17:19:50 +08:00
Kazu Hirata	c375cdc932	[Hexagon] Use MachineBasicBlock::{successors,predecessors} (NFC)	2021-11-09 00:26:06 -08:00
Carlos Galvez	7ecec3f0f5	[CUDA] Bump supported CUDA version to 11.5 Differential Revision: https://reviews.llvm.org/D113249	2021-11-09 08:20:53 +00:00
Wouter van Oortmerssen	4a0c89a6cf	[WebAssembly] Fix fixBrTableIndex removing instruction without checking uses Fixes: https://bugs.llvm.org/show_bug.cgi?id=52352 Differential Revision: https://reviews.llvm.org/D113230	2021-11-08 15:53:44 -08:00
Craig Topper	376233113e	[RISCV] Use TargetConstant for CSR number for READ_CSR/WRITE_CSR. This is consistent with what we do for other operands that are required to be constants. I don't think this results in any real changes. The pattern match code for isel treats ConstantSDNode and TargetConstantSDNode the same.	2021-11-08 15:10:24 -08:00

1 2 3 4 5 ...

64867 Commits