llvm-project

Commit Graph

Author	SHA1	Message	Date
Filipp Zhinkin	ef774bec63	[AArch64] Support SETCCCARRY lowering Support SETCCCARRY lowering to SBCS instruction. Related issue: https://github.com/llvm/llvm-project/issues/44629 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135302	2022-10-14 22:29:31 +03:00
Craig Topper	1fab0ac559	[RISCV] Rename ReadVIALUCV->ReadVICALUV to match WriteVICALUV. NFC	2022-10-14 12:11:55 -07:00
Krzysztof Parzyszek	e8375e3042	[Hexagon] Use IRBuilderBase in function parameters This will allow using builders with different folders.	2022-10-14 12:10:59 -07:00
Krzysztof Parzyszek	7f4ce3f1eb	[Hexagon] Introduce PS_vsplat[ir][bhw] pseudo instructions HVX v60 only has splats that take a 32-bit word as input, while v62+ has splats that take 8- or 16-bit value. This makes writing output patterns that need to use a splat annoying, because the entire output pattern needs to be replicated for various versions of HVX. To avoid this, the patterns will always use the pseudos, and then the pseudos will be handled using a post-ISel hook.	2022-10-14 12:03:13 -07:00
Chris Bieneman	911d2dc230	[NFC] [HLSL] Move common metadata to LLVMFrontend This change pulls some code from the DirectX backend into a new LLVMFrontendHLSL library to share utility data structures between the HLSL code generation in Clang and the backend in LLVM. This is a small refactoring as a first start to get code into the right structure and get the library built and dependencies correct. Fixes #58000 (https://github.com/llvm/llvm-project/issues/58000) Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D135110	2022-10-14 13:40:04 -05:00
Craig Topper	44f0b13494	[RISCV] Correct RISCVTTIImpl::getRegUsageForType for vectors of pointers. getPrimitiveSizeInBits returns 0 for pointers, we need to query the size via DataLayout instead. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135976	2022-10-14 11:34:12 -07:00
Chris Bieneman	e530a1188e	[DX] Add pass to pretty-print DXIL metadata in asm When DXC prints IR output it adds a bunch of IR comments in a header that describe the DXIL metadata in a more human-readable format. This pass will serve that purpose for LLVM by printing out ahead of the IR printer. Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D135802	2022-10-14 13:32:59 -05:00
Caroline Concatto	60e2aad109	[AArch64]Change printVectorList to print SVE vector range This patch has the prefered disassembly changed for SVE vector list. For instance, instead of printing this assembly: ld4d { z1.d, z2.d, z3.d, z4.d }, p0/z, [x0] it will print this: ld4d { z1.d-z4.d }, p0/z, [x0] Differential Revision: https://reviews.llvm.org/D135952	2022-10-14 18:59:56 +01:00
David Green	de6dfbbb30	[ARM] Fix for MVE i128 vector icmp costs. We were hitting an assert as the legalied type needn't be a vector. Fixes #58364	2022-10-14 18:49:25 +01:00
Hassnaa Hamdi	2c72d90ecc	[AArch64-SVE]: Force generating code compatible to streaming mode. Add a compile-time flag for enabling streaming mode. When streaming mode is enabled, lower basic loads and stores of fixed-width vectors; to generate code that is compatible to streaming mode. Differential Revision: https://reviews.llvm.org/D133433	2022-10-14 17:46:56 +00:00
Dmitry Preobrazhensky	bf96703fb3	[AMDGPU][MC][GFX8+] Correct v_cndmask modifiers Correct v_cndmask_b32 to support abs/neg modifiers in dpp/sdwa/e64 variants. Correct v_cndmask_b16 for proper disassembly of abs/neg modifiers in e64_dpp variants. Differential Revision: https://reviews.llvm.org/D135900	2022-10-14 19:37:27 +03:00
Sander de Smalen	02df03c5b7	[AArch64][SME] Add support for arm_locally_streaming functions. Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start of the function, and a SMSTOP at the end of the function, such that all operations use the right value for vscale. Because the placement of these nodes is critically important (i.e. no vscale-dependent operations should be done before SMSTART has been issued), we require glueing the CopyFromReg to the Entry node such that we can insert the SMSTART as part of that glued chain. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131582	2022-10-14 13:47:53 +00:00
chenglin.bi	85e41fcaac	[AArch64] Select to CCMN when the CCMP's second operator is negative constant CCMP/CCMN's second operator support const from 0 to 31. When the CCMP's second operator is in the range [-31, -1] we can replace it with CCMN to avoid extra mov. Fix: #57034 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135939	2022-10-14 21:41:25 +08:00
Martin Storsjö	f309f095e7	Revert "[AArch64] Fix aligning the stack after calling __chkstk" This reverts commit `50e0aced45`. This could accidentally start producing invalid code in some cases (in particular, if compiling with -mstack-alignment=16, which one could expect to be a no-op for a target where the stack always is aligned to 16 bytes anyway).	2022-10-14 11:55:59 +03:00
Benjamin Kramer	08dc847f33	Add missing `override`s after `aad013de41`	2022-10-14 10:38:32 +02:00
gonglingqin	e632bb6543	[LoongArch] Add codegen support for atomicrmw umin/umax operation on LA64 Furthermore, use `beqz $rd, .BB` instead of `beq $rd, $zero, .BB`. Differential Revision: https://reviews.llvm.org/D135525	2022-10-14 15:24:43 +08:00
Leon Clark	6370bc2435	Add f16 nearbyint support. Enable lowering of FNEARBYINT for f16 and extend existing tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135124	2022-10-14 08:05:24 +01:00
Xiang1 Zhang	d0269dd059	[AArch64][BuildErrorFix] Add compatible classifyGlobalFunctionReference	2022-10-14 12:03:50 +08:00
Xiang1 Zhang	aad013de41	[InlineAsm][bugfix] Correct function addressing in inline asm In Linux PIC model, there are 4 cases about value/label addressing: Case 1: Function call or Label jmp inside the module. Case 2: Data access (such as global variable, static variable) inside the module. Case 3: Function call or Label jmp outside the module. Case 4: Data access (such as global variable) outside the module. Due to current llvm inline asm architecture designed to not "recognize" the asm code, there are quite troubles for us to treat mem addressing differently for same value/adress used in different instuctions. For example, in pic model, call a func may in plt way or direclty pc-related, but lea/mov a function adress may use got. This patch fix/refine the case 1 and case 2 in inline asm. Due to currently inline asm didn't support jmp the outsider lable, this patch mainly focus on fix the function call addressing bugs in inline asm. Reviewed By: Pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D133914	2022-10-14 09:47:26 +08:00
Nemanja Ivanovic	0d253bbd33	[PowerPC] Change CRNOT to a code gen single operand instruction Inputs to crnor can come from operands with chains so if it is being used simply to negate such an operand, the repeated input cannot be CSE'd. This patch just adds a code-gen only instruction for this that takes a single input and duplicates it in the encoding of the underlying crnor. Differential revision: https://reviews.llvm.org/D133577	2022-10-13 20:09:44 -05:00
Xiang Li	1fc78d66ea	[DirectX backend] [NFC] Change Resources::write to const. Change Resources::write to const. Also fix parameter name. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D135705	2022-10-13 12:38:12 -07:00
Jakub Chlanda	8407fdbd69	[NVPTX] Support neg{.ftz} for f16 and f16x2 Differential Revision: https://reviews.llvm.org/D135428	2022-10-13 10:48:33 -07:00
Craig Topper	e68b0d5875	[RISCV] Match (select C, -1, X)->(or -C, X) during lowerSelect Same with (select C, X, -1), (select C, 0, X), and (select C, X, 0). There's a DAGCombine after we turn the select into select_cc, but that may introduce a setcc that didn't previously exist. We could add more DAGCombines to remove the extra setcc, but this seemed lower effort. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135833	2022-10-13 09:06:12 -07:00
Nemanja Ivanovic	a77a70fa3c	[PowerPC] Stash GPR to VSR if emergency spill slot is not reachable When removing frame indices on PowerPC, we need to scavenge a GPR to materialize a large constant if the stack offset for the spill/reload cannot be reached by a D-Form instruction. However, in a perfect storm of conditions, we may not have GPR's available to scavenge, thereby requiring an emergency spill. If such an emergency spill also needs to be spilled to a location with a large offset, it would itself require register scavenging thereby creating an infinite loop. This patch detects when the scavenger cannot scavenge a register and the spill/reload is to a location with a large offset. It then stashes a GPR into a VSR so that it can use the GPR to materialize the constant (rather than scavenging a GPR). Fixes: https://github.com/llvm/llvm-project/issues/52894 Differential revision: https://reviews.llvm.org/D124841	2022-10-13 09:06:37 -05:00
Anton Sidorenko	4431e705cc	[NFC] Use forward decl of MachineCombinerPattern enum to reduce dependencies Differential Revision: https://reviews.llvm.org/D135776	2022-10-13 14:56:14 +01:00
Simon Pilgrim	fa9c12ed96	[X86] Attempt to combine binary shuffles where both operands come from the same larger vector Allows us to use combineX86ShuffleChainWithExtract to combine targetshuffle(low_subvector(x),high_subvector(x)) -> low_subvector(targetshuffle(x)) style patterns This is currently very limited (it must have a v2i64/v2f64 result), but while triaging I noticed we might be able to extend this to allow more types for targets with suitable variable cross lane shuffle support. Fixes #58339	2022-10-13 14:34:11 +01:00
WANG Xuerui	f017e92c1c	[LoongArch] Add support for llvm.trap and llvm.debugtrap Similar to D69390 for RISCV, use a guaranteed non-existing insn for llvm.trap and the break insn for llvm.debugtrap. Differential Revision: https://reviews.llvm.org/D134365	2022-10-13 19:27:47 +08:00
WANG Xuerui	4e2dfd3589	[LoongArch] Updates for the LoongArch ELF psABI v2.01 revision The e_flags of existing object files are all 0x3 which happens to be compatible. From this commit on, all LoongArch objects produced with upstream LLVM will be of object file ABI v1, which is already supported by binutils' master branch (to be released as 2.40), and is allowed by the same binutils version to interlink with v0 objects so the existing distributions have time to migrate. Differential Revision: https://reviews.llvm.org/D134601	2022-10-13 19:12:26 +08:00
Sheng	62fc58a61d	[AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases To achieve this, we need this observation: `uzp1` is just a `xtn` that operates on two registers For example, given the following register with type v2i64: LSB_______MSB x0 x1 x2 x3 Applying xtn on it we get: x0 x2 This is equivalent to bitcast it to v4i32, and then applying uzp1 on it: x0 x1 x2 x3 \| uzp1 v x0 x2 <value from other register> We can transform xtn to uzp1 by this observation, and vice versa. This observation only works on little endian target. Big endian target has a problem: the uzp1 cannot be replaced by xtn since there is a discrepancy in the behavior of uzp1 between the little endian and big endian. To illustrate, take the following for example: LSB____________________MSB x0 x1 x2 x3 On little endian, uzp1 grabs x0 and x2, which is right; on big endian, it grabs x3 and x1, which doesn't match what I saw on the document. But, since I'm new to AArch64, take my word with a pinch of salt. This bevavior is observed on gdb, maybe there's issue in the order of the value printed by it ? Whatever the reason is, the execution result given by qemu just doesn't match. So I disable this on big endian target temporarily until we find the crux. Fixes #57502 Reviewed By: dmgreen, mingmingl Co-authored-by: Mingming Liu <mingmingl@google.com> Differential Revision: https://reviews.llvm.org/D133850	2022-10-13 19:08:33 +08:00
Caroline Concatto	3ee96a26d5	[AArch64] Add SME 2 target feature for Armv8-A and Armv9-A 2022 Architecture Extension First patch in a series adding MC layer support for Scalable Matrix Extension 2 (SME2). This patch adds the following feature: sme2 The 2022 Architecture Extension release adds other feature flags(eg.:sme2.1), that will be in follow-up patches. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 Differential Revision: https://reviews.llvm.org/D135448	2022-10-13 11:28:08 +01:00
Archibald Elliott	7d15212b8c	[ARM] Support fp16/bf16 using w constraint fp16 and bf16 values can be used in GCC's inline assembly using the "w" constraint, which means "VFP floating-point registers d0-d31" - fp16 and bf16 values are stored in S registers (which alias the D registers). This change ensures that LLVM is compatible with GCC for programs that use fp16 and the 'w' constraint. Differential Revision: https://reviews.llvm.org/D135662	2022-10-13 10:32:06 +01:00
Martin Storsjö	cbd8464595	[MC] [Win64EH] Check that ARM64 prologs and epilogs have the right matching number of instructions This matches what was done for the ARM implementation (where getting the instruction sizes right is even more tricky, and hence needed tighter testing). This will allow catching any future cases where prologs and epilogs don't match the instructions within them. Differential Revision: https://reviews.llvm.org/D131394	2022-10-13 09:47:39 +03:00
Martin Storsjö	24303e3ad2	[AArch64] Use encodeLogicalImmediate for forming the immediate to an AND. NFC. Differential Revision: https://reviews.llvm.org/D135817	2022-10-13 09:47:38 +03:00
Martin Storsjö	19e2b403b4	[AArch64] Remove dead code for inserting SEH opcodes for realignment. NFC. If the stack is realigned, we've emitted a frame pointer and already terminated the SEH prologue, making this dead code since `a07787c9a5`. The immediate to this SEH opcode was entirely bogus - we don't know how many bytes the AND operation adjusts the SP, and by doing "NumBytes & andMaskEncoded" (where andMaskEncoded was the immediate bitpattern for the AND instruction), the immediate to the opcode was total gibberish. This hasn't had any practical effect, since the original stack pointer always was restored from the frame pointer afterwards anyway. Differential Revision: https://reviews.llvm.org/D135815	2022-10-13 09:47:38 +03:00
Martin Storsjö	50e0aced45	[AArch64] Fix aligning the stack after calling __chkstk Whenever a call to __chkstk was made, the frame lowering previously omitted the aligning (as NumBytes was reset to zero before doing alignment). This fixes https://github.com/llvm/llvm-project/issues/56182. Differential Revision: https://reviews.llvm.org/D135687	2022-10-13 09:47:38 +03:00
Matt Arsenault	838fd611b7	AMDGPU: Fix assertion on <1 x i16> vectors Fixes issue 58331.	2022-10-12 17:25:24 -07:00
Matt Arsenault	575eed3dac	AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs If inline asm has a VGPR def, it must have come from a VGPR write somewhere inside the asm. This should be further extended to all read after write hazards.	2022-10-12 17:25:24 -07:00
Krzysztof Parzyszek	79632163db	[Hexagon] Switch vunpackub->op->vpackeb pattern to vzb/vshuffeb V6_vzb and V6_vshuffeb can use any 2 resources in a packet, while V6_vunpackub/V6_vpackeb both need a shift resource. Also, add patterns for shifting vectors of i8.	2022-10-12 15:31:28 -07:00
Philip Reames	1c41d0cb62	[RISCV] Use branchless form for selects with 0 in either arm Continuing the theme of adding branchless lowerings for simple selects, this time handle the 0 arm case. This is very common for various umin idioms, etc.. Differential Revision: https://reviews.llvm.org/D135600	2022-10-12 13:51:52 -07:00
Krzysztof Parzyszek	dca7e451ee	[Hexagon] Handle packing of even/odd 32-bit words This is a workaround until perfect shuffle generation is improved.	2022-10-12 13:00:14 -07:00
Krzysztof Parzyszek	2d8d2bec70	[Hexagon] Implement TLI::isExtractSubvectorCheap hook	2022-10-12 12:48:56 -07:00
Martin Storsjö	bd3fa31887	[AArch64] Generate SEH info for PAC instructions Without this, unwinding through functions that does use PAC would fail, if PAC actually was active. Differential Revision: https://reviews.llvm.org/D135103	2022-10-12 22:21:03 +03:00
Martin Storsjö	918f6f581d	[AArch64] [SEH] Rename pac_sign_return_address to pac_sign_lr This new opcode was initially documented as "pac_sign_return_address" in https://github.com/MicrosoftDocs/cpp-docs/pull/4202, but was soon afterwards renamed into "pac_sign_lr" in https://github.com/MicrosoftDocs/cpp-docs/pull/4209, as the other name was unwieldy, and there were no other external references to that name anywhere. Rename our external .seh assembler directive - it hasn't been merged for very long yet, so there's probably no external use to account for. Rename all other internal references to the opcode similarly. Differential Revision: https://reviews.llvm.org/D135762	2022-10-12 22:19:59 +03:00
gonglingqin	ec2640bf3a	[LoongArch] Handle missing CondCodes Support SETLE/SETEQ and expand SETGE/SETNE/SETGT Differential Revision: https://reviews.llvm.org/D135511	2022-10-12 21:26:57 +08:00
gonglingqin	b1d7a95e4e	[LoongArch] Add earlyclobber of destination register to atomic instructions If the AM* atomic memory access instruction has the same register number as rd and rj, the execution will trigger an Instruction Non-defined Exception. If the AM* atomic memory access instruction has the same register number as rd and rk, the execution result is uncertain. Reference: https://github.com/loongson/LoongArch-Documentation Differential Revision: https://reviews.llvm.org/D135641	2022-10-12 21:09:21 +08:00
Luo, Yuanke	f885c08034	Don't widen shuffle element with AVX512 Fix crash issue of D129537 and reopen it. Currently the X86 shuffle lowering would widen the element type for shuffle if the mask element value is adjacent. For below example %t2 = add nsw <16 x i32> %t0, %t1 %t3 = sub nsw <16 x i32> %t0, %t1 %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3, <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15> ret <16 x i32> %t4 Compiler would transform the shuffle to %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3, <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> This may lose the oppotunity to let ISel select mask instruction when avx512 is enabled. This patch is to prevent the tranform when avx512 feature is enabled. Thank Simon for the idea. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130830	2022-10-12 19:18:10 +08:00
David Green	1e723b7ab3	Revert "[AArch64] Add support for 128-bit non temporal loads." This reverts commit `661403b85c` as the custom lowering of loads prevents expanding unaligned loads with strict-align.	2022-10-12 11:11:32 +01:00
Martin Storsjö	a07787c9a5	[AArch64] Exclude instructions after setting the FP from SEH prologues After setting up the FP, the rest of the prologue doesn't need to be replayed for unwinding the stack frame. This allows reverting the functional parts of `2f7fbf8376` (but fixing inconsistent duplicate setting of HasWinCFI). Differential Revision: https://reviews.llvm.org/D135686	2022-10-12 12:36:21 +03:00
Cullen Rhodes	388cacb341	[AArch64][SVE] Add instcombine for PTEST_ANY(X=OP(PG,...), X) -> PTEST_ANY(PG, X)) Given this is an OR reduction the two are equivalent and later optimizations (AArch64InstrInfo::optimizePTestInstr) may rewrite the sequence to use the flag-setting variant of instruction X, to remove the PTEST altogether. Reviewed By: paulwalker-arm, bsmith Differential Revision: https://reviews.llvm.org/D134946	2022-10-12 09:14:08 +00:00
Cullen Rhodes	a17fcb2230	[AArch64][SVE] Fix BRKNS bug in optimizePTestInstr The BRKNS instruction is unlike the other instructions that set flags since it has an all active implicit predicate, so the existing PTEST(PG, BRKN(PG, A, B)) -> BRKNS(PG, A, B) in AArch64InstrInfo::optimizePTestInstr is incorrect, however PTEST(PTRUE_B(31), BRKN(PG, A, B)) -> BRKNS(PG, A, B) is correct. Spotted by @paulwalker-arm in D134946. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135655	2022-10-12 08:34:41 +00:00
Martin Storsjö	c43bff64e9	[AArch64] Add support for the SEH opcode for return address signing This was documented upstream in https://github.com/MicrosoftDocs/cpp-docs/pull/4202. Differential Revision: https://reviews.llvm.org/D135276	2022-10-12 11:07:11 +03:00
chenglin.bi	41f5bbe18b	[AArch64][Windows] Check sret attribute also for inreg attribute Fix the issue: #57684 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135512	2022-10-12 09:58:50 +08:00
Yeting Kuo	2749b942e9	[RISCV] Add isel patterns for vmacc, vnmsac. The patch selects VSELECT/VP_MERGE_VL which uses fmadd/fnmsub as true operand and the adden of the fmadd/fnmsub as false operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D135330	2022-10-12 09:19:01 +08:00
Craig Topper	1bdf21d55c	[RISCV] Use mask/tail agnostic if tied source is IMPLICIT_DEF regardless of the policy operand. If the source is implicit_def, the register allocator won't have any constraint on what register it picks for the destination. This doesn't give the user much control of what register is being used. So in my mind that means the only reason to honor the policy operand is to control what policy is used in vsetvli to maybe avoid a vtype change. Given the other optimizations we do on the policy field, I don't think allowing the user this control is reliable. Therefore, I think we should use agnostic policies if the source is undef. This should give better performance on some CPUs for VP intrinsics where there is no merge operand and the backend adds IMPLICIT_DEF to the instruction. Differential Revision: https://reviews.llvm.org/D135396	2022-10-11 16:40:16 -07:00
Craig Topper	902c1b3c2f	[RISCV] Remove unused SchedClass WriteVFCvtFToFV. NFC This isn't bound to any instruction. From the section comment it was for single-width F-to-F conversions, but those don't exist.	2022-10-11 16:26:06 -07:00
Chris Bieneman	2b2afb2529	[DX] Add analysis and printer for shader flags This adds infrastructural pieces for an analysis to compute the DXIL shader flags. In this state the analysis can compute two fairly straightforward feature flags for use of double-precision floating point values and the DX 11.1 extended double support. This patch does conflict with D135190, conflicts will be resolved prior to merging. Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D135393 # Conflicts: # llvm/lib/Target/DirectX/CMakeLists.txt # llvm/lib/Target/DirectX/DirectXTargetMachine.cpp	2022-10-11 14:27:05 -05:00
Abinav Puthan Purayil	3d9f011a9c	[AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later Unfortunately, we have a broken handling of this in the runtime of rocm 5.3. The runtime is expected to handle this correctly when v5 becomes the default. Differential Revision: https://reviews.llvm.org/D134714	2022-10-11 23:28:43 +05:30
Joe Nash	3648fc5b42	[AMDGPU] Make disassembler convertFMAanyK call more generic Make support more generic to support future instructions. Currently NFC. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D135678	2022-10-11 11:22:25 -04:00
Krzysztof Parzyszek	cb6804104f	[Hexagon] Remove unused function, NFC	2022-10-11 08:05:22 -07:00
Luke Drummond	940fa35ece	[NVPTX] Fix a segfault for bitcasted calls with byval params `getFunctionParamOptimizedAlign` was being passed a null function argument when getting the callee of a bitcasted function symbol. This is because `CallBase::getCalledFunction` does not look through bitcasts. There is already code to handle this case in `NVPTXTargetLowering::getArgumentAlignment`, which is now hoisted into an NVPTX util. The alignment computation now gracefully handles computing alignment of virtual functions with a check for null.	2022-10-11 15:12:25 +01:00
Weining Lu	42b70793a1	Reland "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC" Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html k: A memory operand whose address is formed by a base register and (optionally scaled) index register. m: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as st.w and ld.w. ZB: An address that is held in a general-purpose register. The offset is zero. ZC: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ll.w and sc.w. Note: The INLINEASM SDNode flags in below tests are updated because the new introduced enum `Constraint_k` is added before `Constraint_m`. llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/X86/callbr-asm-kill.mir This patch passes `ninja check-all` on a X86 machine with all official targets and the LoongArch target enabled. Differential Revision: https://reviews.llvm.org/D134638	2022-10-11 19:51:48 +08:00
Dmitry Preobrazhensky	4e62d02db9	[AMDGPU][MC] Correct image_gather4h Correct encoding of image_gather4h for GFX9; disable this instruction for SI, CI and VI. Differential Revision: https://reviews.llvm.org/D135605	2022-10-11 14:41:27 +03:00
Martin Storsjö	018ac7847b	[AArch64] Add SEH_Nop opcodes for BTI hints These are harmless for the unwinder - the unwinder doesn't need to handle them for being able to unwind correctly. Only add the opcodes when the branch target is in a SEH prologue; for jumptables e.g. within a function, we shouldn't add any SEH opcodes. Differential Revision: https://reviews.llvm.org/D135277	2022-10-11 14:32:01 +03:00
wanglei	64c42a4d70	[LoongArch] Define getSetCCResultType for setting vector setCC type To avoid trigger "No default SetCC type for vectors!" Assertion. Differential Revision: https://reviews.llvm.org/D135527	2022-10-11 19:05:14 +08:00
wanglei	d1b526fb95	[LoongArch] Add codegen support of GlobalTLSAddress lowering There are static and dynamic TLS address lowering in DAG stage according to different TLS models. TLS address will be lowered to pseudo instruction and then expanded by the `LoongArch Pre-RA pseudo instruction expansion` pass. Differential Revision: https://reviews.llvm.org/D134713	2022-10-11 18:10:13 +08:00
Nikita Popov	ac47db6aca	[Attributes] Return Optional from getAllocSizeArgs() (NFC) As suggested on D135572, return Optional<> from getAllocSizeArgs() rather than the peculiar pair(0, 0) sentinel. The method on Attribute itself does not return Optional, because the attribute must exist in that case.	2022-10-11 11:05:21 +02:00
Michal Paszkowski	7a3c9a85c5	[SPIRV] Fix call lowering of "anonymous" functions The patch fixes lowering of anonymous functions, removes file/linkage info for builtin call demangling, and adds relevant test demonstrating a fixed problem. Differential Revision: https://reviews.llvm.org/D135390	2022-10-11 00:06:29 +02:00
David Green	deb8f8ab17	[ARM] Add errors for MVE exclusive registers. These instructions already had errors for operands that could not share the same register: VCMUL, VMULL, VQDMULL. This extends that to a few others: VREV64, VQDMULLqr, VCADD and VHCADD. Only the i32 types require the error. Differential Revision: https://reviews.llvm.org/D135560	2022-10-10 22:06:35 +01:00
Joe Nash	8a7d4993b7	[AMDGPU] Fix True16 patterns for cmp on GFX11 These patterns should have a True16 version and a non-true16 version. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135609	2022-10-10 16:41:06 -04:00
Joe Nash	ebb258d3b0	[AMDGPU] Make V_SAT_PK_U8_I16 a True16 Instruction The return type is two u8 packed into a 16 bit VGPR, so this instruction should be True16. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D135478	2022-10-10 10:33:49 -04:00
Peter Rong	2343ad755d	[AArch64] Add index check before lowerInterleavedStore() uses ShuffleVectorInst's mask This commit fixes https://github.com/llvm/llvm-project/issues/57326. Currently we would take a Mask out and directly use it by doing auto Mask = SVI->getShuffleMask(); However, if the mask is undef, this Mask is not initialized. It might be a vector of -1 or random integers. This would cause an Out-of-bound read later when trying to find a StartMask. This change checks if all indices in the Mask is in the allowed range, and fixes the out-of-bound accesses. Differential Revision: https://reviews.llvm.org/D132634	2022-10-10 12:52:31 +01:00
Matt Devereau	d48e63074f	[AArch64][SVE] Fix AArch64_SVE_VectorCall calling convention This fixes the case where callees with SVE arguments outside of the z0-z7 range were incorrectly deduced as SVE calling convention functions	2022-10-10 10:25:29 +00:00
LiaoChunyu	a835b92e6c	[RISCV] Use hasAllWUsers to recover XORI/ORI reference `0fbe71e91f`. Also add testcase for addi. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D135538	2022-10-10 14:16:50 +08:00
Mingming Liu	159fb378f7	[AArch64] Swap 'lsl(val1,small-shmt)' to right hand side for AND(lsl(val1,small-shmt), lsl(val2,large-shmt)) On many aarch64 processors (Cortex A78, Neoverse N1/N2/V1, etc), ADD with LSL shift (shift-amount <= 4) has smaller latency and higher throughput than ADD with larger shift (shift-amunt > 4). This is at least no-op for the rest of the processors. Differential Revision: https://reviews.llvm.org/D135208	2022-10-09 17:26:54 -07:00
Yeting Kuo	7329dc0cc3	[RISCV][NFC] Fix unused variable warning. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135365	2022-10-09 21:39:30 +08:00
gonglingqin	5593d36356	[LoongArch] Expand fptrunc store from f64 to f32 Differential Revision: https://reviews.llvm.org/D135510	2022-10-09 17:55:42 +08:00
Ting Wang	bc5e969ca1	[PowerPC] Add vector pair calling convention for AIX This is AIX part of update after https://reviews.llvm.org/D117225 Fixed the issue that AIX64 with vector pair enabled saw redundant spill/reload of callee saved vector registers. Based on original patch by: Kai Luo Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D133466	2022-10-09 01:23:18 -04:00
WANG Xuerui	31327c29fb	[LoongArch] Don't merge FrameIndex accesses into [F]{LD,ST}X Otherwise eliminateFrameIndex cannot figure out how to fixup the stack offset with its stateless logic, because there wouldn't be an immediate slot for it to trivially write to, and it may not be easy to transform the surrounding code to make it work. This fixes a fairly common crash when compiling moderately complex code with Clang. Differential Revision: https://reviews.llvm.org/D135251	2022-10-09 13:04:21 +08:00
Freddy Ye	566c277c64	[X86] Remove AVX512VP2INTERSECT from Sapphire Rapids. For more details, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D135509	2022-10-08 14:54:03 +08:00
gonglingqin	f4ccb577f8	[LoongArch] Do not assert value type in isFPImmLegal This patch fixes the failure of llvm/test/CodeGen/Generic/vector.ll and CodeGen/PowerPC/2007-11-19-VectorSplitting.ll for a LoongArch native build. Differential Revision: https://reviews.llvm.org/D134798	2022-10-08 14:50:48 +08:00
Craig Topper	f749b2d9a5	[RISCV] Fix incorrect parenthese placement in comment. NFC	2022-10-07 17:16:38 -07:00
Craig Topper	9f67047cf0	[VP][RISCV] Add vp.smax/smin/umax/umin intrinsics Differential Revision: https://reviews.llvm.org/D135418	2022-10-07 17:14:31 -07:00
Krzysztof Parzyszek	09d84e0ad8	[Hexagon] Implement helper to get intrinsic for instruction opcode There are intrinsics for most scalar instructions and almost all HVX instructions. What's somewhat painful is that there are two intrinsics for each HVX instruction: one for 64- and one for 128-byte mode. Instead of checking the current codegen settings every time, this function would simply return the right intrinsic.	2022-10-07 15:56:06 -07:00
Jessica Paquette	42cb2f8b12	[GlobalISel] Mark mi_match as nodiscard Typically when you match something, you want to check the result. Fix a couple warnings in the AMDGPUPostLegalizerCombiner which appear as a result of this. Differential Revision: https://reviews.llvm.org/D135491	2022-10-07 15:47:05 -07:00
Artem Belevich	9a01cca660	Add support for CUDA-11.8 and sm_{87,89,90} GPUs. Differential Revision: https://reviews.llvm.org/D135306	2022-10-07 13:59:28 -07:00
Matt Arsenault	74ef03d38a	AMDGPU: Update SlotIndexes independently of LiveIntervals Apparently StackColoring depends on SlotIndexes, but not LiveIntervals. If regalloc fast were manually requested, LiveIntervals would be dropped before SILowerSGPRSpills but not SlotIndexes. SILowerSGPRSpills preserved SlotIndexes, but only through LiveIntervals. As a result, SILowerSGPRSpills was incorrectly reporting it preserved SlotIndexes. Start updating these directly, instead of depending on LiveIntervals also being available.	2022-10-07 13:15:15 -07:00
Krzysztof Parzyszek	d184045d36	[Hexagon] Formatting changes, NFC	2022-10-07 09:13:51 -07:00
Krzysztof Parzyszek	e492cdc358	[Hexagon] Add couple of helper functions in HexagonVectorCombine 1. `length(value/type)`: return the number of elements in the vector input, 2. `getHvxTy(elem_type)`: return the HVX vector type with the element type provided. These will help write things more succintly.	2022-10-07 09:10:08 -07:00
Krzysztof Parzyszek	06019b8e55	[Hexagon] Add default parameter to HexagonVectorCombine::getIntTy, NFC	2022-10-07 08:52:19 -07:00
Krzysztof Parzyszek	d376b2667a	[Hexagon] Make HexagonSubtarget::isHVXVectorType take EVT instead of MVT EVT can be created for any Type, and so this function can now be used to check if given Type, as-is, is an HVX type (as opposed to a type that may be subject to legalization to an HVX type).	2022-10-07 08:42:39 -07:00
Kazu Hirata	7f90597be6	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:800:17: error: unused variable 'DST_IDX' [-Werror,-Wunused-variable]	2022-10-07 08:27:02 -07:00
Krzysztof Parzyszek	2216d8f6b8	[Hexagon] Replace llvm::Optional with std::optional, NFC	2022-10-07 08:23:39 -07:00
Krzysztof Parzyszek	473210ae90	[Hexagon] Constify member refererence, NFC	2022-10-07 08:23:39 -07:00
Dmitry Preobrazhensky	8f8e4e3b38	[AMDGPU][MC][GFX11] Correct v_fmac_.*_e64_dpp Differential Revision: https://reviews.llvm.org/D134961	2022-10-07 16:21:55 +03:00
Dmitry Preobrazhensky	1d1c7555e2	[AMDGPU][GFX11][NFC] Refactor VOPD handling in codegen Differential Revision: https://reviews.llvm.org/D135084	2022-10-07 16:13:05 +03:00
Dmitry Preobrazhensky	fd7b0eeaf6	[AMDGPU][MC][GFX11] Add VOPD VGPR bank access validation Differential Revision: https://reviews.llvm.org/D134960	2022-10-07 15:52:59 +03:00
zhongyunde	75358f060c	[AArch64] Lower multiplication by a constant int to madd Lower a = b * C -1 into madd a) instcombine change b * C -1 --> b * C + (-1) b) machine-combine change b * C + (-1) --> madd Assembler will transform the neg immedate of sub to add, see https://gcc.godbolt.org/z/cTcxePPf4 Fixes AArch64 part of https://github.com/llvm/llvm-project/issues/57255. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134336	2022-10-07 19:33:47 +08:00
eopXD	dbc681c98e	[VP][RISCV] Add vp.roundtozero and its RISC-V support The scalar instruction of this is `llvm.trunc`. However the naming of ISD::VP_TRUNC is already taken by `trunc` of the LLVM IR. Naming this as `vp.ftrunc` would likely cause confusion with `vp.fptrunc`. So adding `vp.roundtozero` that will look similar to `vp.roundeven`. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D135233	2022-10-07 02:15:23 -07:00
Xiang Li	220185552f	[DirectX backend] Add analysis to collect DXILResources Now only DXILTranslateMetadata uses DXILResources, so DXILResourceWrapper is only used by DXILTranslateMetadata. Once we add lower for createHandle, DXILResourceWrapper will be used in more passes. Also we can add resource index allocation in DXILResourceWrapper. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D135190	2022-10-06 19:34:29 -07:00
Craig Topper	3b20765cf7	[RISCV] Use mask agnostic policy for isel patterns where the merge operand is IMPLICIT_DEF. I tend to think we should ignore the policy bit in vsetvli insertion if the tied operand is IMPLICIT_DEF. But that raises questions about what the policy operand on RVV intrinsics means if you also pass vundefined(). This change at least fixes some cases. I'll post a separate patch for vsetvli insertion for discussion. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135386	2022-10-06 15:44:39 -07:00

1 2 3 4 5 ...

69244 Commits