llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	ded8866f4a	[X86][Atom] Fix vector fp<->int resource/throughputs Match whats documented in the Intel AOM - almost all the conversion instructions requires BOTH ports (apart from the MMX cvtpi2ps/cvtpi2ps instructions which we already override) - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-07-07 16:52:34 +01:00
Irina Dobrescu	5888a194c1	[AArch64][GlobalISel] Lower vector types for min/max Differential Revision: https://reviews.llvm.org/D105433	2021-07-07 15:34:03 +01:00
Zarko Todorovski	ee6ca9c7df	[AIX] Use VSSRC/VSFRC Register classes for f32/f64 callee arguments on P8 and above Adding usage of VSSRC and VSFRC when adding the live in registers on AIX. This matches the behaviour of the rest of PPC Subtargets. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D104396	2021-07-07 09:18:20 -04:00
Simon Pilgrim	4c7e9a3852	[CostModel][X86] Adjust sext/zext SSE/AVX legalized costs based on llvm-mca reports. Update costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-07 13:58:27 +01:00
Simon Pilgrim	a7da0296a6	[CostModel][X86] Adjust sitofp/uitofp SSE/AVX legalized costs based on llvm-mca reports. Update (mainly) vXi8/vXi16 -> vXf32/vXf64 sitofp/uitofp costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-07 12:03:45 +01:00
Jay Foad	ce098ccc1c	[AMDGPU] Simplify tablegen files. NFC. There is no need to cast records to strings before comparing them.	2021-07-07 09:19:23 +01:00
Stanislav Mekhanoshin	b16400449f	[AMDGPU] isPassEnabled() helper to check cl::opt and OptLevel We have several checks for both cl::opt and OptLevel over our pass config, although these checks do not properly work if default value of a cl::opt will be false. Create a helper to use instead and properly handle it. NFC for now. Differential Revision: https://reviews.llvm.org/D105517	2021-07-06 21:53:35 -07:00
Nemanja Ivanovic	3553698de7	[PowerPC] Re-enable combine for i64 BSWAP on targets without LDBRX The combine was disabled in `4e22c7265d` as it caused failures in the ppc64be-multistage (bootstrap) bot. It turns out that the combine did not correctly update the MMO for the high load which caused aliased stores to be reported as unaliased. This patch fixes that problem and re-enables the combine.	2021-07-06 20:42:01 -05:00
Eli Friedman	56b3e9edc4	[AArch64] Sync isDef32 to the current x86 version. We should probably come up with some better way to do this, but let's make sure to catch known issues for now.	2021-07-06 17:05:01 -07:00
Stanislav Mekhanoshin	a0ab45799b	[AMDGPU] Move atomic expand past infer address spaces There are cases where infer address spaces pass cannot yet infer an address space in the opt pipeline and then in the llc pipeline it runs too late for atomic expand pass to benefit from a specific address space. Move atomic expand pass past the infer address spaces. Fixes: SWDEV-293410 Differential Revision: https://reviews.llvm.org/D105511	2021-07-06 15:53:32 -07:00
Stanislav Mekhanoshin	5915d33874	[AMDGPU] Do not run IR optimizations at -O0 Differential Revision: https://reviews.llvm.org/D105515	2021-07-06 15:29:52 -07:00
Stanislav Mekhanoshin	aff66b7eef	[AMDGPU] Fix pass name of AMDGPULowerKernelAttributes. NFC. This was obviously copy-pasted.	2021-07-06 15:03:31 -07:00
Krzysztof Parzyszek	94e01d579c	[Hexagon] Generate trap/undef if misaligned access is detected This applies to memory accesses to (compile-time) constant addresses (such as memory-mapped registers). Currently when a misaligned access to such an address is detected, a fatal error is reported. This change will emit a remark, and the compilation will continue with a trap, and "undef" (for loads) emitted. This fixes https://llvm.org/PR50838. Differential Revision: https://reviews.llvm.org/D50524	2021-07-06 14:52:23 -05:00
Craig Topper	12d51f95fe	[RISCV] Implement lround/llround/lrint/llrint with fcvt instruction with -fno-math-errno These are fp->int conversions using either RMM or dynamic rounding modes. The lround and lrint opcodes have a return type of either i32 or i64 depending on sizeof(long) in the frontend which should follow xlen. llround/llrint should always return i64 so we'll need a libcall for those on rv32. The frontend will only emit the intrinsics if -fno-math-errno is in effect otherwise a libcall will be emitted which will not use these ISD opcodes. gcc also does this optimization. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D105206	2021-07-06 11:43:22 -07:00
Jonas Paulsson	458eac2573	[SystemZ] Support the 'N' code for the odd register in inline-asm. The odd register of a (128 bit) register pair is accessed with the 'N' code with an inline assembly operand. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D105502	2021-07-06 19:46:49 +02:00
Craig Topper	2b5e53111a	[RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors. This adds a DAG combine to detect sext/zext inputs and emit a new ISD opcode. The extends will either be removed or replaced with narrower extends. Isel patterns are used to match add and widening mul to vwmacc similar to the recently added vmacc patterns. There's still some work to be to match vmulsu. We should also rewrite splats that were extended as scalars and then splatted. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D104802	2021-07-06 10:24:31 -07:00
Simon Pilgrim	b298308ba2	[CostModel][X86] fptosi/fptoui to i8/i16 are truncated from fptosi to i32 Provide a generic fallback that performs the fptosi to i32 types, then truncates to sub-i32 scalars. These numbers can be tweaked for specific sse levels, but we should get the default handling in place first.	2021-07-06 17:28:03 +01:00
Jonas Paulsson	37a92f3b03	[SystemZ] Generate XC loop for memset 0 of variable length. Benchmarking has shown that it is worthwhile to implement a variable length memset of 0 with XC (exclusive or) like gcc does, instead of using a libcall. This requires the use of the EXecute Relative Long (EXRL) instruction which can now be done in a framework that can also be used with other target instructions (not just XC). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D103865	2021-07-06 18:07:31 +02:00
Bradley Smith	5ab9000fbb	[AArch64][SVE] Fix selection failures for scalable MLOAD nodes with passthru Differential Revision: https://reviews.llvm.org/D105348	2021-07-06 14:17:23 +00:00
Simon Pilgrim	6f3f9535fc	[CostModel][X86] i8/i16 sitofp/uitofp are sext/zext to i32 for sitofp Provide a generic fallback that extends sub-i32 scalars before using the existing sitofp instructions. These numbers can be tweaked for specific sse levels, but we should get the default handling in place first. We get the extension for free for non-vector loads.	2021-07-06 13:58:52 +01:00
Kerry McLaughlin	a7512401e5	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Peter Waller	c5dfee44b9	[CodeGen][AArch64][SVE] Use ld1r[bhsd] for vector splat from memory This avoids the use of the vector unit for copying from scalar to vector. There is an extra ptrue instruction, but a predicate register with the ptrue pattern populated is likely to be free in the context of real code. Tests were generated from a template to cover the axes mentioned at the top of the test file. Co-authored-by: Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D103170	2021-07-06 12:03:54 +00:00
Jay Foad	c9d747e9cd	[AMDGPU] Remove outdated comment and tidy up. NFC. This was left over from D94746.	2021-07-06 11:29:36 +01:00
Sebastian Neubauer	db646de3ee	[AMDGPU] Set optional PAL metadata Set informational fields in the .shader_functions table. Also correct the documentation, .scratch_memory_size and .lds_size are integers. Differential Revision: https://reviews.llvm.org/D105116	2021-07-06 11:58:00 +02:00
Albion Fung	7d10dd60ce	[PowerPC] Implament Load and Reserve and Store Conditional Builtins This patch implaments the load and reserve and store conditional builtins for the PowerPC target, in order to have feature parody with xlC on AIX. Differential revision: https://reviews.llvm.org/D105236	2021-07-05 21:35:41 -05:00
David Green	a77e2d196c	[ARM] Fix arm.mve.pred.v2i range upper limit The range metadata specifies a half open range, so our top limit was one off.	2021-07-05 21:06:30 +01:00
Sushma Unnibhavi	086370faee	[M68k][GloballSel] Lower outgoing return values in IRTranslator Implementation of lowerReturn in the IRTranslator for the M68k backend. Differential Revision: https://reviews.llvm.org/D105332	2021-07-05 11:39:09 -07:00
Tiehu Zhang	d4ed965b2d	[AArch64ISelDAGToDAG] Fix ORRWrs/ORRXrs usefulbits calculation bug For the following case: t8: i32 = or t7, t4 t10: i32 = ORRWrs t8, t8, TargetConstant:i32<73> Current code wrongly returns (t8 >> shiftConstant) as the UsefulBits of t8, which in fact is (t8 \| (t8 >> shiftConstant)). Reviewed by: sdesmalen, mdchen Differential Revision: https://reviews.llvm.org/D102759	2021-07-06 00:38:42 +08:00
Paul Walker	88522455c0	Fix typo in help text for -aarch64-enable-branch-targets.	2021-07-05 16:15:40 +01:00
Caroline Concatto	a2c5c56055	[AArch64][CostModel] Add cost model for experimental.vector.splice This patch adds a new ShuffleKind SK_Splice and then handle the cost in getShuffleCost, as in experimental.vector.reverse. Differential Revision: https://reviews.llvm.org/D104630	2021-07-05 14:30:24 +01:00
Wang, Pengfei	9ab99f773f	[X86] Twist shuffle mask when fold HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y)) This patch fixes PR50823. The shuffle mask should be twisted twice before gotten the correct one due to the difference between inner HOP and outer. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104903	2021-07-05 21:29:42 +08:00
Simon Pilgrim	5db826e4ce	[CostModel][X86] Handle costs for insert/extractelement with non-immediate indices via stack Determine the insert/extractelement costs when performing this as a sequence of aliased loads+stores via the stack.	2021-07-05 13:26:53 +01:00
Simon Pilgrim	65e4240fa1	[CostModel][X86] Adjust i32/i64 to f32/f64 scalar based on llvm-mca reports (+ Agner). Older SSE targets have slower gpr->fpu scalar conversions - we also need to account for uitofp i32 > f32/f64 being lowered as sitofp i64 -> f32/f64	2021-07-05 13:26:53 +01:00
Bradley Smith	cc273983f7	[AArch64][SVE] Improve fixed length codegen for common vector shuffle case Improve codegen when lowering the common vector shuffle case from the vectorizer (op1[last]:op2[0:last-1]). This patch only handles this common case as it is difficult to handle this more generally when using fixed length vectors, due to being unable to use the SVE ext instruction. Differential Revision: https://reviews.llvm.org/D105289	2021-07-05 12:09:27 +01:00
Sjoerd Meijer	ee752134ac	[AArch64] Cost-model i8 vector loads/stores Loads of <4 x i8> vectors were modeled as extremely expensive. And while we don't have a load instruction that supports this, it isn't that expensive to create a vector of i8 elements. The codegen for this was fixed/optimised in D105110. This now tweaks the cost model and enables SLP vectorisation of my motivating case loadi8.ll. Differential Revision: https://reviews.llvm.org/D103629	2021-07-05 11:25:10 +01:00
David Stuttard	b8173c3178	[AMDGPU] Stop mulhi from doing 24 bit mul for uniform values Added support to check if architecture supports s_mulhi which is used as part of the decision whether or not to use valu 24 bit mul (if the mulhi gets transformed to a valu op anyway, then may as well use it). This is an extension of the work in D97063 Differential Revision: https://reviews.llvm.org/D103321 Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14	2021-07-05 10:33:23 +01:00
Craig Topper	21a1bcbd4d	[RISCV] Pass FeatureBitset by reference rather than by value. NFCI FeatureBitset is 4 64-bit values in an array. It's better passed by reference rather than copying it. I may be adding FeatureBitset as an argument to another function and noticed this while working on that.	2021-07-04 23:11:40 -07:00
Nikita Popov	a213f735d8	[IR] Deprecate GetElementPtrInst::CreateInBounds without element type This API is not compatible with opaque pointers, the method accepting an explicit pointer element type should be used instead. Thankfully there were few in-tree users. The BPF case still ends up using the pointer element type for now and needs something like D105407 to avoid doing so.	2021-07-04 16:49:30 +02:00
Paul Walker	287d39dd5a	[NFC] Fix a few whitespace issues and typos.	2021-07-04 11:49:58 +01:00
Nikita Popov	fabc17192e	[IRBuilder] Add type argument to CreateMaskedLoad/Gather Same as other CreateLoad-style APIs, these need an explicit type argument to support opaque pointers. Differential Revision: https://reviews.llvm.org/D105395	2021-07-04 12:17:59 +02:00
David Green	fbc329efbd	[AArch64] Add S/UQXTRN tablegen patterns. This adds simple patterns for signed and unsigned saturating extract narrow instructions. They combine a min/max/truncate into a single instruction, providing that the immediates on the min/max are correct for the saturation type. This is just handled in tablegen with some extra patterns. v2i64->v2i32 is not handled here as the min/max nodes are not legal, making the lowering quite different. Differential Revision: https://reviews.llvm.org/D103263	2021-07-03 07:57:19 +01:00
Kai Luo	c063946476	[AIX] Adjust CSR order to avoid breaking ABI regarding traceback Allocate non-volatile registers in order to be compatible with ABI, regarding gpr_save. Quoted from https://www.ibm.com/docs/en/ssw_aix_72/assembler/assembler_pdf.pdf page55, > The preferred method of using GPRs is to use the volatile registers first. Next, use the nonvolatile registers > in descending order, starting with GPR31. This patch is based on @jsji 's initial draft. Tested on test-suite and SPEC, found no degradation. Reviewed By: jsji, ZarkoCA, xingxue Differential Revision: https://reviews.llvm.org/D100167	2021-07-03 04:45:26 +00:00
Krzysztof Parzyszek	df88c26f0d	[OpaquePtr] Add type parameter to emitLoadLinked Differential Revision: https://reviews.llvm.org/D105353	2021-07-02 13:07:40 -05:00
Krzysztof Parzyszek	81b42ca951	[Hexagon] Handle opaque pointers in vector combine	2021-07-02 13:07:40 -05:00
Amir Ayupov	884bc6a6ed	[X86] Modify LOOP, HLT control flow attributes Add missing control flow attributes: - LOOP: isBranch, isTerminator - HLT: isTerminator This helps downstream disassemblers (such as BOLT) reconstruct the control flow graph more accurately. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102297	2021-07-02 10:34:29 -07:00
Simon Pilgrim	e5fdff1cf8	[X86][SLM] Keep similar scheduler costs types together. NFCI. The SLM model is inconsistent about where it kept its 'unsupported' schedule classes - better to keep them close to similar classes. I'm not sure why some ymm classes are defined and others are unsupported though (but I haven't altered them) - the only SLM-like CPU supporting any ymm is KNL and that currently uses the HSW model.	2021-07-02 14:50:24 +01:00
Simon Pilgrim	d867634fbd	[CostModel][X86] Update comment describing source of costs - we now use llvm-mca more than IACA	2021-07-02 14:29:32 +01:00
Simon Pilgrim	d181fd918d	[CostModel][X86] Drop some hard coded fp<->int scalarization costs Scalarization costs handling is a lot better now, and the hard coded costs were higher than the worse case numbers from the script in D103695	2021-07-02 14:29:32 +01:00
Simon Pilgrim	2aecffcd40	[CostModel][X86] Find AVX conversion costs using legalized types if custom types didn't match Building on rG2a1ef8784ad9a, fallback to attempting to match against legalized types like we do for SSE targets.	2021-07-02 13:49:31 +01:00
Simon Pilgrim	cdca1785d3	[CostModel][X86] Adjust uitofp(vXi64) SSE/AVX legalized costs based on llvm-mca reports. Update v4i64 -> v4f32/v4f64 uitofp costs based on the worst case costs from the script in D103695. Fixes a few regressions before we start adding AVX costs for legalized types.	2021-07-02 13:09:00 +01:00
Florian Hahn	1a248233a5	[AArch64] Use custom lowering for fp16 vector copysign. The custom copysign lowering already supports fp16. Use it. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105277	2021-07-02 11:15:30 +01:00
Roman Lebedev	c2c0d3ea89	Revert "[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR" This reverts commit `4facbf213c`. ``` ****************** FAIL: LLVM :: CodeGen/WebAssembly/funcref-call.ll (44466 of 44468) **************** TEST 'LLVM :: CodeGen/WebAssembly/funcref-call.ll' FAILED ****************** Script: -- : 'RUN: at line 1'; /builddirs/llvm-project/build-Clang12/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll --mtriple=wasm32-unknown-unknown -asm-verbose=false -mattr=+reference-types \| /builddirs/llvm-project/build-Clang12/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll -- Exit Code: 2 Command Output (stderr): -- llc: /repositories/llvm-project/llvm/include/llvm/Support/LowLevelTypeImpl.h:44: static llvm::LLT llvm::LLT::scalar(unsigned int): Assertion `SizeInBits > 0 && "invalid scalar size"' failed. ```	2021-07-02 11:49:51 +03:00
Paulo Matos	4facbf213c	[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR Reland of `31859f896`. This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and lowering methods for load and stores of reference types from IR globals. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set. Differential Revision: https://reviews.llvm.org/D104797	2021-07-02 09:46:28 +02:00
Matt Arsenault	32a73198fc	Mips/GlobalISel: Use accurate memory LLTs	2021-07-01 20:08:14 -04:00
Eli Friedman	0176ac9503	[AArch64] Optimize SVE bitcasts of unpacked types. Target-independent code only knows how to spill to the stack; instead, use AArch64ISD::REINTERPRET_CAST. Differential Revision: https://reviews.llvm.org/D104573	2021-07-01 15:35:48 -07:00
David Green	3d48775b89	[ARM] Reassociate BFI D104868 removed an (incorrect) fold for distributing BFI instructions in a chain, combining them into a single instruction. BFIs like that are hard to test, as the patterns are often destroyed before they become BFIs. But it can come up in places, with chains of BFIs that can be combined. This patch adds a replacement, which reassociates BFI instructions with non-overlapping insertion masks so that low bits are inserted first. This can end up sorting the nodes so that adjacent inserts are next to one another, allowing the existing folds to combine into a single BFI. Differential Revision: https://reviews.llvm.org/D105096	2021-07-01 21:08:13 +01:00
Matt Arsenault	99c7e918b5	GlobalISel: Use LLT in call lowering callbacks This preserves the memory type so the lowerings can rely on them.	2021-07-01 12:15:54 -04:00
Bradley Smith	2668727929	[SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR Inserting into a smaller-than-legal scalable vector would result in an internal compiler error. For example, inserting a <vscale x 4 x i8> into a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a crash. This crash was happening because there was no code to promote (legalise) the result of an INSERT_SUBVECTOR node. This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises the ISD node. This is currently done by going through memory. This is necessary because of the requirement that the SubVec parameter of the INSERT_SUBVECTOR node must be smaller than the Vec parameter, which means that INSERT_SUBVECTOR cannot always have a legal result/operand types. Co-Authored-by: Joe Ellis <joe.ellis@arm.com> Differential Revision: https://reviews.llvm.org/D102766	2021-07-01 17:05:53 +01:00
Stanislav Mekhanoshin	661577e698	[AMDGPU] Fix immediate sign during V_MOV_B64_PSEUDO expansion Creating a V_MOV_B32 with zero extended immediate source prevented conversion to V_BFREV_B32. Differential Revision: https://reviews.llvm.org/D105235	2021-07-01 09:00:29 -07:00
Irina Dobrescu	71d5b0a757	[AArch64][GlobalISel]Legalise some vector types for min/max Differential Revision: https://reviews.llvm.org/D105200	2021-07-01 16:29:38 +01:00
Simon Pilgrim	5e5ba14b4d	[CostModel][X86] Adjust fp<->int vXi32 SSE legalized costs based on llvm-mca reports. Building on rG2a1ef8784ad9a, adjust the SSE cost tables to use the legalized types based on the worst case costs from the script in D103695. To account for different numbers of src/dst legalized type registers we must scale the cost by maximum of the src/dst, not just use src	2021-07-01 15:34:20 +01:00
Sam Tebbs	24d76419d6	[ARM] Transform a floating-point to fixed-point conversion to a VCVT_fix Much like fixed-point to floating-point conversion, the converse can also be transformed into a fixed-point VCVT. This patch transforms multiplications of floating point numbers by 2^n into a VCVT_fix. The exception is that a float to fixed conversion with 1 fractional bit ends up being an FADD (FADD(x, x) emulates FMUL(x, 2)) rather than an FMUL so there is a special case for that. This patch also moves the code from https://reviews.llvm.org/D103903 into a separate function as fixed to float and float to fixed are very similar. Differential Revision: https://reviews.llvm.org/D104793	2021-07-01 15:10:40 +01:00
Bradley Smith	01b846674d	[AArch64][SVE] Add support for fixed length MSCATTER/MGATHER Since gather lowering can now lower to nodes that may need expansion via the vector legalizer, do MGATHER lowering via vector legalizer. Additionally, as part of adding passthru support for fixed typed gathers, fix passthru support for scalable types. Depends on D104910 Differential Revision: https://reviews.llvm.org/D104217	2021-07-01 12:13:59 +01:00
Simon Pilgrim	2a1ef8784a	[CostModel][X86] getCastInstrCost - attempt to match custom cast/conversion before legalized types. Move the (SSE-only) generic, legalized type conversion matching after the specific,custom conversion cases, allowing us to properly provide cost overrides. The next step will be to clean up some of the weird existing costs and then to enable AVX+ legalized costs, which will let us strip out a lot of the cost tables entries.	2021-07-01 12:06:40 +01:00
Jeremy Morse	47c3fe2a22	[DebugInfo][InstrRef][1/4] Support transformations that widen values Very late in compilation, backends like X86 will perform optimisations like this: $cx = MOV16rm $rax, ... -> $rcx = MOV64rm $rax, ... Widening the load from 16 bits to 64 bits. SEeing how the lower 16 bits remain the same, this doesn't affect execution. However, any debug instruction reference to the defined operand now refers to a 64 bit value, nto a 16 bit one, which might be unexpected. Elsewhere in codegen, there's often this pattern: CALL64pcrel32 @foo, implicit-def $rax %0:gr64 = COPY $rax %1:gr32 = COPY %0.sub_32bit Where we want to refer to the definition of $eax by the call, but don't want to refer the copies (they don't define values in the way LiveDebugValues sees it). To solve this, add a subregister field to the existing "substitutions" facility, so that we can describe a field within a larger value definition. I would imagine that this would be used most often when a value is widened, and we need to refer to the original, narrower definition. Differential Revision: https://reviews.llvm.org/D88891	2021-07-01 11:19:27 +01:00
Qiu Chaofan	07f0faed11	[NFC][Scheduler] Refactor tryCandidate to return boolean This patch changes return type of tryCandidate from void to bool: 1. Methods in some targets already follow this convention. 2. This would help if some target wants to re-use generic code. 3. It looks more intuitive if these try-method returns the same type. We may need to change return type of them from bool to some enum further, to make it less confusing. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D103951	2021-07-01 14:31:47 +08:00
Jun Ma	ae5433945f	[AArch64][SVEIntrinsicOpts] Convect cntb/h/w/d to vscale intrinsic or constant. As is mentioned above Differential Revision: https://reviews.llvm.org/D104852	2021-07-01 10:09:47 +08:00
Fangrui Song	17858da022	[AArch64] Remove unneeded ExternalSymbolSDNode code for machine constraint "S". NFC ExternalSymbolSDNode is implicitly generated libcalls but with an address taking operation we cannot reference an ExternalSymbolSDNode.	2021-06-30 17:52:56 -07:00
Min-Yih Hsu	557bed31e4	Reapply "[M68k][GloballSel] Formal arguments lowering in IRTranslator" Implementation of formal arguments lowering in the IRTranslator for the M68k backend Differential Revision: https://reviews.llvm.org/D104542	2021-06-30 17:13:45 -07:00
Matt Arsenault	28f2f66200	GlobalISel: Use LLT in memory legality queries This enables proper lowering of non-byte sized loads. We still aren't faithfully preserving memory types everywhere, so the legality checks still only consider the size.	2021-06-30 17:44:13 -04:00
Jonas Paulsson	7aef99351a	[MCStreamer] Move emission of attributes section into MCELFStreamer Enable the emission of a GNU attributes section by reusing the code for emitting the ARM build attributes section. The GNU attributes follow the exact same section format as the ARM BuildAttributes section, so this can be factored out and reused for GNU attributes generally. The immediate motivation for this is to emit a GNU attributes section for the vector ABI on SystemZ (https://reviews.llvm.org/D105067). Review: Logan Chien, Ulrich Weigand Differential Revision: https://reviews.llvm.org/D102894	2021-06-30 16:00:27 -05:00
Jon Roelofs	a642872476	[GISel] Support llvm.memcpy.inline Differential revision: https://reviews.llvm.org/D105072	2021-06-30 12:39:05 -07:00
Stanislav Mekhanoshin	381ded345b	[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot rematerizalize a 64 bit register. This gives 10-20% uplift in a set of huge apps heavily using double precession math. Fixes: SWDEV-292645 Differential Revision: https://reviews.llvm.org/D104874	2021-06-30 11:45:38 -07:00
David Green	cd76f43b49	[ARM] Set the immediate cost of GEP operands to 0 This prevents constant gep operands from being hoisted by the Constant Hoisting pass, leaving them to CodegenPrepare which can usually do a better job at splitting large offsets. This can, in general, improve performance and decrease codesize, especially for v6m where many constants have a high cost. Differential Revision: https://reviews.llvm.org/D104877	2021-06-30 19:19:03 +01:00
zhijian	9a9e6189d7	[AIX][XCOFF][BUG-Fixed] need to switch back to text section after emit a dumy eh structure Summary: in the patch https://reviews.llvm.org/D103651 [AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table. when generate eh_info, it switch to other section, when it done, it need to switch back to text section again. Reviewers: Jason Liu Differential Revision: https://reviews.llvm.org/105195	2021-06-30 13:56:37 -04:00
Simon Pilgrim	59fa435ea6	[X86] Canonicalize SGT/UGT compares with constants to use SGE/UGE to reduce the number of EFLAGs reads. (PR48760) This demonstrates a possible fix for PR48760 - for compares with constants, canonicalize the SGT/UGT condition code to use SGE/UGE which should reduce the number of EFLAGs bits we need to read. As discussed on PR48760, some EFLAG bits are treated independently which can require additional uops to merge together for certain CMOVcc/SETcc/etc. modes. I've limited this to cases where the constant increment doesn't result in a larger encoding or additional i64 constant materializations. Differential Revision: https://reviews.llvm.org/D101074	2021-06-30 18:46:50 +01:00
Craig Topper	0f1f92156f	[ARM] Fix incorrect assignment of Changed variable in MVEGatherScatterLowering::optimiseOffsets. I believe this Changed flag should be initialized to false, otherwise the if (!Changed) is always dead. This doesn't manifest in a functional issue because the PHINode checks will fail if nothing changed. They are identical to the earlier checks that must have already failed to get into this else block. While there remove an else after return to reduce indentation. Differential Revision: https://reviews.llvm.org/D105159	2021-06-30 07:52:57 -07:00
Simon Pilgrim	47941d601d	[CostModel][X86] Adjust fp<->int vXi32 AVX1+ costs based on llvm-mca reports Based off the worse case numbers generated by D103695, the AVX1/2/512 sitofp/uitofp/fptosi/fptoui costs were higher than necessary (based off instruction counts instead of actual throughput). The SSE costs still need further fixes, but I hit an issue with the order in which SSE costs are checked - we need to check CUSTOM costs (with non-legal types) first, and then fallback to LEGALIZED types. I'm looking at this now, and this should let us start thinning out a lot of the duplicates in the costs tables. Then we can finally start work on vXi64 / vXi16 / vXi8 / vXi1 integers, which should let us look at sub-128-bit vectorization (D103925).	2021-06-30 15:23:34 +01:00
alex-t	e585b332e4	[AMDGPU] PHI node cost should not be counted for the size and latency. Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the result of the PHI lowering are inserted into the basic block predecessors - not into the block itself. As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop body size/cost estimation. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D105104	2021-06-30 16:11:17 +03:00
Simon Pilgrim	fcd0cb3921	Fix MSVC "32-bit shift implicitly converted to 64 bits" warning.	2021-06-30 13:23:53 +01:00
madhur13490	a7ed55f64c	[AMDGPU] Simplify getReservedNumSGPRs This is a followup patch on D103636 where it seemed checking on amdgpu-calls and amdgpu-stack-objects is unnecessary. Removing these checks didn't regress any tests functionally. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D104513	2021-06-30 16:19:39 +05:30
Florian Mayer	a24f104645	[MTE] Remove redundant helper function. Looking at PostDominatorTree::dominates, we can see that has the same logic (with the addition of handling Phi nodes - which are not used as inputs in this pass) as the helper function. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105141	2021-06-30 11:11:26 +01:00
Igor Kudrin	657e067bb5	[ARMInstPrinter] Print the target address of a branch instruction This follows other patches that changed printing immediate values of branch instructions to target addresses, see D76580 (x86), D76591 (PPC), D77853 (AArch64). As observing immediate values might sometimes be useful, they are printed as comments for branch instructions. // llvm-objdump -d output (before) 000200b4 <_start>: 200b4: ff ff ff fa blx #-4 <thumb> 000200b8 <thumb>: 200b8: ff f7 fc ef blx #-8 <_start> // llvm-objdump -d output (after) 000200b4 <_start>: 200b4: ff ff ff fa blx 0x200b8 <thumb> @ imm = #-4 000200b8 <thumb>: 200b8: ff f7 fc ef blx 0x200b4 <_start> @ imm = #-8 // GNU objdump -d. 000200b4 <_start>: 200b4: faffffff blx 200b8 <thumb> 000200b8 <thumb>: 200b8: f7ff effc blx 200b4 <_start> Differential Revision: https://reviews.llvm.org/D104701	2021-06-30 16:35:28 +07:00
Igor Kudrin	17bcae8906	[ARM][NFC] Remove an unused method `ARMInstPrinter::printMveAddrModeQOperand()` was added in D62680, but was never used. It looks like `printT2AddrModeImm8Operand<false>()` is used instead. Differential Revision: https://reviews.llvm.org/D105124	2021-06-30 15:55:37 +07:00
Sjoerd Meijer	b062fff87a	Recommit "[AArch64] Custom lower <4 x i8> loads" This recommits D104782 including a fix for adding a wrong operand to the new load node. Differential Revision: https://reviews.llvm.org/D105110	2021-06-30 09:18:06 +01:00
Tony Tye	7f19aa73c2	[AMDGPU] Update gfx90a memory model support Update AMDGPU gfx90a memory model to make coarse grain memory allocations consistent when fine grained system scope atomic acquire and release is performed. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D105137	2021-06-30 04:05:22 +00:00
Steffen Larsen	3644726a78	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0. PTX ISA description of - `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld - `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st - `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma - `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com> Reviewed By: tra Differential Revision: https://reviews.llvm.org/D104847	2021-06-29 15:44:07 -07:00
Matt Arsenault	990278d026	CodeGen: Store LLT instead of uint64_t in MachineMemOperand GlobalISel is relying on regular MachineMemOperands to track all of the memory properties of accesses. Just the raw byte size is insufficent to disambiguate all situations. For example, if we need to split an unaligned extending load, we need to know the number of bits in the original source value and can't infer it from the result type. This is also a problem for extending vector loads. This does decrease the maximum representable size from the full uint64_t bytes to a maximum of 16-bits. No in tree testcases hit this, other than places using UINT64_MAX for unknown sizes. This may be an issue for G_MEMCPY and co., although they can just use unknown size for large static sizes. This also has potential for backend abuse by relying on the type when it really shouldn't be relevant after selection. This does not include the necessary MIR printer/parser changes to represent this.	2021-06-29 17:38:51 -04:00
Craig Topper	3b6dfa381e	[RISCV] Protect the SHL/SRA/SRL handlers in LowerOperation against being called for an illegal i32 shift amount. It seems it is possible for DAG combine to create a shl with an i64 result type and an i32 shift amount. This is ok before type legalization since the type don't need to match in SelectionDAG. This results in type legalization calling LowerOperation to legalize just the amount. We weren't expecting this so we asserted for not finding a fixed vector shift. To fix this, I've added a check for the fixed vector case and returned SDValue() to get the default type legalizer. I've factored all shifts together and added a fixed vector specific handler to avoid repeating similar code for each in LowerOperation. The particular case I found was exposed by D104581, but the bad shift is created after that patch triggers.	2021-06-29 09:45:13 -07:00
Piotr Sobczak	f38a8b54ea	[AMDGPU] Fix 224-bit spills Related to D104622. Differential Revision: https://reviews.llvm.org/D105109	2021-06-29 17:52:16 +02:00
Dylan Fleming	c3d3defd11	[SVE] Added CodeGen support for inserting an element into a predicate vector Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D104722	2021-06-29 14:55:40 +01:00
Ben Shi	c85175c5f6	[AVR] Fix a bug in prologue of ISR The r1 register should be cleared in prologue of ISR as it is used as constant zero. Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D99467	2021-06-29 21:44:50 +08:00
Tim Northover	c82957e792	ARM: fix vacuously true assertion to actually check what it should. NFC.	2021-06-29 14:24:03 +01:00
David Green	371ee32e01	[ARM] Fold extract of ARM_BUILD_VECTOR This adds a small fold for extract (ARM_BUILD_VECTOR) to fold to the original node. This can help simplify the resulting codegen in some cases. Differential Revision: https://reviews.llvm.org/D104860	2021-06-29 11:03:19 +01:00
Krzysztof Parzyszek	9c5ed8d567	[Hexagon] Add patterns to load i1 This fixes https://llvm.org/PR50853	2021-06-28 12:17:30 -05:00
Sjoerd Meijer	3a7cea2858	Revert "[AArch64] Custom lower <4 x i8> loads" This reverts commit `51e434fc25` because of a build bot failure in test-suite::GCC-C-execute-pr60960.test that I need to investigate.	2021-06-28 17:44:46 +01:00
Jay Foad	75cacc6775	[AMDGPU] Use opName instead of PseudoName in VOP2 multiclasses. NFC. This is just for consistency with all other instruction multiclasses that pass around pseudo names as arguments.	2021-06-28 16:46:35 +01:00
David Spickett	558d9e8228	[llvm][ARM] Treat xscale arch as an alias of armv5te Previously xscale was known to everything apart from the ELF streamer so we would crash as soon as you tried to output an object file. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D104776	2021-06-28 15:20:24 +00:00
Bradley Smith	c089e29aa4	[AArch64][SVE] DAG combine SETCC_MERGE_ZERO of a SETCC_MERGE_ZERO This helps remove extra comparisons when generating masks for fixed length masked operations. Differential Revision: https://reviews.llvm.org/D104910	2021-06-28 15:06:06 +01:00
Brendon Cahoon	f9f5d41545	[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX Adds legalizer, register bank select, and instruction select support for G_SBFX and G_UBFX. These opcodes generate scalar or vector ALU bitfield extract instructions for AMDGPU. The instructions allow both constant or register values for the offset and width operands. The 32-bit scalar version is expanded to a sequence that combines the offset and width into a single register. There are no 64-bit vgpr bitfield extract instructions, so the operations are expanded to a sequence of instructions that implement the operation. If the width is a constant, then the 32-bit bitfield extract instructions are used. Moved the AArch64 specific code for creating G_SBFX to CombinerHelper.cpp so that it can be used by other targets. Only bitfield extracts with constant offset and width values are handled currently. Differential Revision: https://reviews.llvm.org/D100149	2021-06-28 09:06:44 -04:00
Lucas Prates	88b1135e72	[Aarch64] Adding support for Armv9-A Realm Management Extension This adds support for Armv9-A's Realm Management Extension, including three new system registers - MFAR_EL3, GPCCR_EL3 and GPTBR_EL3 - and four new TLBI instructions. The reference for the Realm Management Extension can be found at: https://developer.arm.com/documentation/ddi0615/aa. Based on patches by Victor Campos. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D104773	2021-06-28 13:45:22 +01:00
David Green	a1c0f09a89	[ARM] Add an extra fold for f32 extract(vdup(i32)) This adds another small fold for extract of a vdup, between a i32 and a f32, converting to a BITCAST. This allows some extra folding to happen, simplifying the resulting code. Differential Revision: https://reviews.llvm.org/D104857	2021-06-28 08:54:03 +01:00
Min-Yih Hsu	04242bdca9	Revert "[M68k][GloballSel] Formal arguments lowering in IRTranslator" This reverts commit `8f43407a07` due to failure on its associated test.	2021-06-27 23:22:40 -07:00
Sushma Unnibhavi	8f43407a07	[M68k][GloballSel] Formal arguments lowering in IRTranslator Implementation of formal arguments lowering in the IRTranslator for the M68k backend Differential Revision: https://reviews.llvm.org/D104542	2021-06-27 16:13:05 -07:00
Craig Topper	010f0f000f	Revert "[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions." I thought this might help with another optimization I was thinking about, but I don't think it will. So it just wastes compile time calling computeKnownBits for no benefit. This reverts commit `81b2f95971`.	2021-06-27 10:33:43 -07:00
Craig Topper	81f6d7c082	[X86] Tighten up some inline assembly constraint handling. Don't allow vectors to split into GPRs for 'r' and other scalar constraints. Prevents assertion in getCopyToPartsVector. Makes PR50907 give a better error instead of crashing.	2021-06-26 22:57:22 -07:00
David Green	41d8149ee9	[ARM] Lower MVETRUNC to stack operations The MVETRUNC node truncates two wide vectors to a single vector with narrower elements. This is usually lowered to a series of extract/insert elements, going via GPR registers. This patch changes that to instead use a pair of truncating stores and a stack reload. This cuts down the number of instructions at the expense of some stack space. Differential Revision: https://reviews.llvm.org/D104515	2021-06-26 22:12:57 +01:00
David Green	5955812927	[ARM] Introduce MVETRUNC ISel lowering Currently, when encountering store(trunc(..)) where the trunc is double a legal vector lenth in MVE, we spilt the node into two different stores each performing half of the trunc from the wider type. This works well for efficiently lowering wider than legal types, else the trunc becomes a series of individual lane moves. Unfortunately this splitting is currently one of the first combines attempted, so can happen before any other combines which might be more preferable. This patch instead introduces the concept of a MVETRUNC ISel node that the trunk is initially lowered to, to keep it intact as a single item as opposed to splitting it up. This allows us to push the store(trunc(..)) combine later, allowing other optimisations to potentially happen on the trunc first. The store(trunc(..)) splitting can then be done later in the legalisation period if needed, or else fall back to a buildvector as before. This can also be used in the future to lower to loads/stores, as opposed to the more expensive lane extracts/inserts. Some extra combines are added to keep all the existing tests happy. Differential Revision: https://reviews.llvm.org/D91921	2021-06-26 22:00:26 +01:00
Craig Topper	81b2f95971	[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions.	2021-06-26 11:57:26 -07:00
David Green	0f83d37a14	[ARM] MVE vabd This adds MVE lowering for VABDS/VABDU, using the code parted from AArch64 in D91937. Differential Revision: https://reviews.llvm.org/D91938	2021-06-26 19:41:32 +01:00
David Green	2887f14639	[ISel] Port AArch64 SABD and UABD to DAGCombine This ports the AArch64 SABD and USBD over to DAG Combine, where they can be used by more backends (notably MVE in a follow-up patch). The matching code has changed very little, just to handle legal operations and types differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing a ubds/abdu which is zexted to the original type. Differential Revision: https://reviews.llvm.org/D91937	2021-06-26 19:34:16 +01:00
Jim Lin	779d2b0a42	[RISCV][NFC] Combine the control flow for different RetOp of interrupt function Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104838	2021-06-26 17:28:03 +08:00
Craig Topper	d4f4a1ba62	[RISCV] Add DAG combine to detect opportunities to replace (i64 (any_extend (i32 X)) with sign_extend. If type legalization is going to insert a sign_extend for other users of X and we can fold the sign_extend into ADDW/MULW/SUBW, it is better to replace the ANY_EXTEND so we don't end up with a separate ADD/MUL/SUB instruction for the users of the ANY_EXTEND. I'm only handling setcc uses right now, but there are other instructions that force sign_extends like ashr. There are probably other *W instructions we could use in addition to ADDW/SUBW/MULW. My motivating case was a loop terminating compare and a phi use as seen in the new test file. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D104581	2021-06-25 23:16:37 -07:00
Eric Astor	e074d580b2	[ms] [llvm-ml] Disable C-style comments	2021-06-25 23:09:13 -04:00
Luo, Yuanke	36003c20ad	[X86] Selecting fld0 for undefined value in fast ISEL. When set opt-bisect-limit to some value that is less than ISel pass in command line and CurBisectNum expired, "DAG to DAG" pass lower its opt level to O0. However "processimpdefs" and "X86 FP Stackifier" is not stopped due to the CurBisectNum expiration. So undefined fp0 is generated. This cause crash in the "X86 FP Stackifier" pass, because Stackifier doesn't expect any undefined fp value. Here is the scenario that cause compiler crash. successors: %bb.26 liveins: $r14 ST_FPrr $st0, implicit-def $fpsw, implicit $fpcw renamable $rdi = MOV64ri @.str.3.16422 renamable $rdx = LEA64r %stack.6, 1, $noreg, 0, $noreg ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def dead $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp dead $esi = MOV32r0 implicit-def dead $eflags, implicit-def $rsi CALL64pcrel32 @foo, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit-def dead $fp0 renamable $xmm0 = MOVSDrm_alt %stack.10, 1, $noreg, 0, $noreg :: (load 8 from %stack.10) ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def dead $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp renamable $fp2 = CHS_Fp80 killed undef renamable $fp0, implicit-def $fpsw JMP_1 %bb.26 The CALL64pcrel32 mark fp0 dead, so llvm free the stack slot for fp0 and the stack become empty. In the late instruction CHS_Fp80, it use undefined register fp0, the original code assume there must be a stack slot for the src register (fp0) without respecting it is undefined, so llvm report error. We have some discussion in https://reviews.llvm.org/D104440 and we decide to fix it in fast ISel. The fix is to lower undefined fp value to zero value, so that it release the burden of "X86 FP Stackifier" pass. Thank Craig for the suggestion and the initial patch to fix it. Differential Revision: https://reviews.llvm.org/D104678	2021-06-26 08:43:09 +08:00
Jon Chesterfield	50ad3478bd	Disable ReplaceLDS pass, patch up tests to match Most tests passed with an extra argument to explicitly enable the pass. One does not, deleted it as part of this change. I can't see why the codegen would be different between default on and default off but switched on. It can be retrieved from the project history. This would be a revert, but git revert was not clean. Disabling the pass and leaving it in tree is less likely to cause breakage elsewhere than patching up the git revert conflicts on unfamiliar code. It'll be landed without review, as @hsmhsm is believed unavailable at present. Differential Revision: https://reviews.llvm.org/D104962	2021-06-26 01:36:42 +01:00
Nemanja Ivanovic	4e22c7265d	[PowerPC] Disable combine 64-bit bswap(load) without LDBRX This causes failures on the big endian bootstrap bot. Disabling this combine temporarily until I can get a proper fix.	2021-06-25 15:11:22 -05:00
Ulrich Weigand	b2674670f2	[SystemZ] Add support for .reloc assembler directive Add support for the .reloc directive along the lines of other back-ends. This fixes a regression after https://reviews.llvm.org/D104080 was merged, since that patch presupposed support for .reloc.	2021-06-25 21:51:10 +02:00
Craig Topper	0f3bc00a7d	[X86] Simplify part of the isel for X86ISD::FCMP/STRICT_FCMP/STRICT_FCMPS. We don't need to have the compare output a value and then copy it to FPSW for use by FNSTSW. Instead we can just have the compare output Glue and glue the FNSTSW to it. InstrEmitter effectively performed this optimization when emitting the Machine IR. Doing it directly simplifies the codes and reduces the work in InstrEmitter. There's no change in the machine IR at the end of isel before and after this change.	2021-06-25 11:39:01 -07:00
Sander de Smalen	b732e6c9a8	Revert "[GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize." This patch seems to be causing build errors, reverting it for now. This reverts commit `aeab9d9570`.	2021-06-25 17:37:16 +01:00
Sander de Smalen	aeab9d9570	[GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize. To reflect that the size may be scalable, a TypeSize is returned instead of an unsigned. In places where the result is used, it currently relies on an implicit cast of TypeSize -> uint64_t, which asserts that the type is not scalable. This patch is NFC for fixed-width vectors. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104454	2021-06-25 17:06:50 +01:00
Jay Foad	c3cc9d1eb2	[AMDGPU] Removed unused Predicate HasOffset3fBug. NFC. The predicate definition didn't make sense anyway because it was defined as being the opposite of what the name suggests.	2021-06-25 16:58:44 +01:00
Krzysztof Parzyszek	8a9ec39bd0	[Hexagon] Convert getTypeAlignment to return Align Plus some minor related changes of the same nature.	2021-06-25 10:53:14 -05:00
Sander de Smalen	c9acd2f32e	[GlobalISel] NFC: Change LLT::changeNumElements to LLT::changeElementCount. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104453	2021-06-25 15:54:00 +01:00
Sander de Smalen	968980ef08	[GlobalISel] NFC: Change LLT::scalarOrVector to take ElementCount. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104452	2021-06-25 11:26:16 +01:00
Sjoerd Meijer	51e434fc25	[AArch64] Custom lower <4 x i8> loads This custom lowers <4 x i8> vector loads using a 32-bit load, followed by 2 SSHLL instructions to extend it to e.g. a <4 x i32> vector. Before, it was really inefficient and expensive to construct a <4 x i32> for this as 4 byte loads and 4 moves were used. With this improvement SLP vectorisation might for example become profitable, see D103629. Differential Revision: https://reviews.llvm.org/D104782	2021-06-25 09:53:51 +01:00
Qiu Chaofan	a08fc1361a	[PowerPC] Change VSRpRC allocation order On PowerPC, VSRpRC represents the pairs of even and odd VSX register, and VRRC corresponds to higher 32 VSX registers. In some cases, extra copies are produced when handling incoming VRRC arguments with VSRpRC. This patch changes allocation order of VSRpRC to eliminate this kind of copy. Stack frame sizes may increase if allocating non-volatile registers, and some other vector copies happen. They need fix in future changes. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D104855	2021-06-25 16:04:41 +08:00
Amara Emerson	f9b3840c3d	[ARM] Fix crash in chained BFI combine due to incorrectly RAUW'ing a node. For a bfi chain like: a = bfi input, x, y b = bfi a, x', y' The previous code was RAUW'ing a with x, mutating the second 'b' bfi, and when SelectionDAG's CSE code ended up deleting it unexpectedly, bad things happend. There's no need to RAUW in this case because we can just return our newly created replacement BFI node. It also looked incorrect because it didn't account for other users of the 'a' bfi. Since it seems that chains of more than 2 BFI nodes are hard/impossible to produce without this combine kicking in at some point, I've removed that functionality since it had no test coverage. rdar://79095399 Differential Revision: https://reviews.llvm.org/D104868	2021-06-24 23:35:47 -07:00
Fraser Cormack	ab1bd25593	[RISCV] Permit larger RVV stacks and stack offsets This patch teaches the compiler to generate code to handle larger RVV stack sizes and stack offsets which resolve an amount larger than 2047 vector registers in size. The previous behaviour was asserting on such large values as it was only able to materialize the constant by feeding it to the 12-bit immediate of an `ADDI` instruction. The compiler can now materialize this amount into a temporary register before continuing with the computation. A test case for this scenario is included which also checks that the temporary register used to materialize the amount doesn't require an additional spill slot over what we're already reserving for RVV code. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D104727	2021-06-25 07:17:33 +01:00
Serge Pavlov	b36d214bed	[X86] Add description of FXAM instruction Previously this instruction could be used only in assembler. This change makes it available for compiler also. Scheduling information was copied from FTST instruction, hopefully this can be a satisfactory approximation. Differential Revision: https://reviews.llvm.org/D104853	2021-06-25 12:26:51 +07:00
Kai Luo	b904574b3d	[PowerPC] Move PPCBranchSelector as close to asm printer as possible Currently, PPCBranchSelector is not immediately preceding asm printer pass. `-debug-pass=Structure` gives ``` PowerPC Branch Selector Contiguously Lay Out Funclets StackMap Liveness Analysis Live DEBUG_VALUE analysis Lazy Machine Block Frequency Analysis Machine Optimization Remark Emitter Linux PPC Assembly Printer ``` After the patch ``` Contiguously Lay Out Funclets StackMap Liveness Analysis Live DEBUG_VALUE analysis PowerPC Branch Selector Lazy Machine Block Frequency Analysis Machine Optimization Remark Emitter Linux PPC Assembly Printer ``` Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D104762	2021-06-25 02:05:19 +00:00
Nemanja Ivanovic	dcccb2f594	[PowerPC] Fix bswap combine for big endian systems Commit `0464586ac5` added a combine for a 64-bit load feeding a bswap but the implementation is only correct for little endian systems. This fixes it for big endian systems.	2021-06-24 18:04:50 -05:00
Martin Storsjö	42f74e8249	[llvm] Rename StringRef _lower() method calls to _insensitive() This is a mechanical change. This actually also renames the similarly named methods in the SmallString class, however these methods don't seem to be used outside of the llvm subproject, so this doesn't break building of the rest of the monorepo.	2021-06-25 00:22:01 +03:00
Krzysztof Parzyszek	d09218a82e	[Hexagon] Opaquify pointer usage in GEP commoning	2021-06-24 16:06:36 -05:00
Nemanja Ivanovic	0464586ac5	[PowerPC] Combine 64-bit bswap(load) without LDBRX When targeting CPUs that don't have LDBRX, we end up producing code that is very inefficient and large for this common idiom. This patch just optimizes it two 32-bit LWBRX instructions along with a merge. This fixes https://bugs.llvm.org/show_bug.cgi?id=49610 Differential revision: https://reviews.llvm.org/D104836	2021-06-24 15:11:47 -05:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Pablo Barrio	571c8c5263	[AArch64][v8.3A] Avoid inserting implicit landing pads (PACISP) PACISP have the advantage that they are in HINT space, meaning they can be run successfully in hardware without PAuth support - they will just behave as a NOP. However, PACISP are also implicit landing pads (think of an extra BTI jc). Therefore, they allow indirect jumps of all kinds into them, potentially inserting new gadgets. This patch replaces PACISP by PACI* LR, SP when compiling explicitly for hardware with full PAuth support. PACI* is not in the HINT space, therefore it will fault when run in hardware without PAuth support, but it is also not a landing pad, making programs safer in newer HW. Differential Revision: https://reviews.llvm.org/D101920	2021-06-24 18:24:32 +01:00
Anirudh Prasad	631362665c	[AsmParser][SystemZ][z/OS] Support for emitting labels in upper case - Currently, the emitting of labels in the parsePrimaryExpr function is case independent. It just takes the identifier and emits it. - However, for HLASM the emitting of labels is case independent. We are emitting them in the upper case only, to enforce case independency. So we need to ensure that at the time of parsing the label we are emitting the upper case (in `parseAsHLASMLabel`), but also, when we are processing a PC-relative relocatable expression, we need to ensure we emit it in upper case (in `parsePrimaryExpr`) - To achieve this a new MCAsmInfo attribute has been introduced which corresponding targets can override if needed. Reviewed By: abhina.sreeskantharajan, uweigand Differential Revision: https://reviews.llvm.org/D104715	2021-06-24 12:50:11 -04:00
David Green	1113e06821	[ARM] Extend narrow values to allow using truncating scatters As a minor adjustment to the existing lowering of offset scatters, this extends any smaller-than-legal vectors into full vectors using a zext, so that the truncating scatters can be used. Due to the way MVE legalizes the vectors this should be cheap in most situations, and will prevent the vector from being scalarized. Differential Revision: https://reviews.llvm.org/D103704	2021-06-24 13:09:11 +01:00
Florian Hahn	a54c6fc083	[X86] Exclude invalid element types for bitcast/broadcast folding. It looks like the fold introduced in `63f3383ece` can cause crashes if the type of the bitcasted value is not a valid vector element type, like x86_mmx. To resolve the crash, reject invalid vector element types. The way it is done in the patch is a bit clunky. Perhaps there's a better way to check? Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104792	2021-06-24 12:39:01 +01:00
Rosie Sumpter	0c4651f0a8	[CostModel][AArch64] Improve cost model for vector reduction intrinsics OR, XOR and AND entries are added to the cost table. An extra cost is added when vector splitting occurs. This is done to address the issue of a missed SLP vectorization opportunity due to unreasonably high costs being attributed to the vector Or reduction (see: https://bugs.llvm.org/show_bug.cgi?id=44593). Differential Revision: https://reviews.llvm.org/D104538	2021-06-24 12:02:58 +01:00
Simon Pilgrim	c4d3eedc7f	[X86] Fold nested select_cc to select (cmp*ge/le Cond0, Cond1), LHS, Y) select (cmpeq Cond0, Cond1), LHS, (select (cmpugt Cond0, Cond1), LHS, Y) --> (select (cmpuge Cond0, Cond1), LHS, Y) etc, We already perform this fold in DAGCombiner for MVT::i1 comparison results, but these can still appear after legalization (in x86 case with MVT::i8 results), where we need to be more careful about generating new comparison codes. Pulled out of D101074 to help address the remaining regressions. Differential Revision: https://reviews.llvm.org/D104707	2021-06-24 11:27:57 +01:00
Sander de Smalen	d5e14ba88c	[GlobalISel] NFC: Change LLT::vector to take ElementCount. This also adds new interfaces for the fixed- and scalable case: * LLT::fixed_vector * LLT::scalable_vector The strategy for migrating to the new interfaces was as follows: * If the new LLT is a (modified) clone of another LLT, taking the same number of elements, then use LLT::vector(OtherTy.getElementCount()) or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2) or operator. That is because there is no reason to specifically restrict the types to 'fixed_vector'. If the algorithm works on the number of elements (as unsigned), then just use fixed_vector. This will need to be fixed up in the future when modifying the algorithm to also work for scalable vectors, and will need then need additional tests to confirm the behaviour works the same for scalable vectors. * If the test used the '/Scalable=/true` flag of LLT::vector, then this is replaced by LLT::scalable_vector. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104451	2021-06-24 11:26:12 +01:00
Fraser Cormack	a4729f7f88	[RISCV] Lower RVV vector SELECTs to VSELECTs This patch optimizes the code generation of vector-type SELECTs (LLVM select instructions with scalar conditions) by custom-lowering to VSELECTs (LLVM select instructions with vector conditions) by splatting the condition to a vector. This avoids the default expansion path which would either introduce control flow or fully scalarize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104772	2021-06-24 10:12:51 +01:00
Carl Ritson	98f48723f2	[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D104622	2021-06-24 12:41:22 +09:00
Jon Chesterfield	660cae84c3	Revert "[AMDGPU] [IndirectCalls] Don't propagate attributes to address taken functions and their callees" This reverts commit `6a3beb1f68`. Test case that triggers an infinite loop before the revert is at the review for D103138.	2021-06-24 02:33:50 +01:00
Stanislav Mekhanoshin	d274d64ef4	[AMDGPU] Check for pointer operand while refining LDS align Also skips the propagation if alignment is 1. Differential Revision: https://reviews.llvm.org/D104796	2021-06-23 12:27:55 -07:00
David Green	8cfc080132	[ARM] Limit v6m unrolling with multiple live outs v6m cores only have a limited number of registers available. Unrolling can mean we spend more on stack spills and reloads than we save from the unrolling. This patch adds an extra heuristic to put a limit on the unroll count for loops with multiple live out values, as measured from the LCSSA phi nodes. Differential Revision: https://reviews.llvm.org/D104659	2021-06-23 16:36:37 +01:00
Craig Topper	a37cf17834	[RISCV] Add explicit copy to V0 in the masked vmsge(u).vx intrinsic handling. This is consistent with our other masked vector instructions. Previously we found cases where not doing this broke fast reg alloc.	2021-06-23 08:04:42 -07:00
Jay Foad	a16cb95a3a	[AMDGPU] Remove unused multiclass MUBUF_Real_gfx10_with_name	2021-06-23 14:37:28 +01:00
Nikita Popov	8c01deb8e6	[ARMParallelDSP] Remove unnecessary wrapper function (NFC) AreSequentialAccesses() forwards directly to isConsecutiveAccess() and has an unnecessary template parameter to boot.	2021-06-23 15:27:54 +02:00
Jay Foad	dfb8c08739	[AMDGPU] Stop using LegacyLegalizerInfo. NFCI. Differential Revision: https://reviews.llvm.org/D103684	2021-06-23 10:50:32 +01:00
Jay Foad	c65f3f562b	[AMDGPU] Simplify collectReachableCallees. NFCI. Don't use SCC iterators when we're only interested in reachability. Use df_begin/df_end inline to find reachable nodes. Differential Revision: https://reviews.llvm.org/D104704	2021-06-23 09:11:29 +01:00
Stanislav Mekhanoshin	2b43209ee3	[AMDGPU] Propagate LDS align into to instructions Differential Revision: https://reviews.llvm.org/D104316	2021-06-23 00:57:16 -07:00
Martin Storsjö	1cb7849a55	Revert "[AArch64LoadStoreOptimizer] Recommit: Generate more STPs by renaming registers earlier" This reverts commit `ea011ec5ed`. This still causes some miscompiles, I'll follow up in the phabricator review with a sample of that issue (which is part of the sample of the previous issue).	2021-06-23 09:54:16 +03:00
Min-Yih Hsu	dfafd56daa	[M68k] Fix incorrect #include-ed file in M68kSubtarget In https://reviews.llvm.org/rG2193347e72fa , a cpp file is accidentally included instead of its header file counterpart. This patch fixes this error.	2021-06-22 23:02:21 -07:00
Jim Lin	5cb5225cf5	[M68k] Refactor codegen patterns for logic operations and add tests for it Refactor pat for and, or and xor operation and add missing tests for it Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D104626	2021-06-23 13:25:24 +08:00
David Green	015c27caa2	[ARM] Change some Gather/Scatter interface types to Instructions. NFC These returned Values are cast to an Instruction already, this just cleans up the interface a little to match the expected types.	2021-06-22 19:11:39 +01:00
Matt Arsenault	39f8a792f0	AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. This also manages to pick up a few cases that the pattern fails to optimize away. We handle some cases with instruction patterns, but some get through. In particular this improves the integer cases.	2021-06-22 13:42:49 -04:00
Matt Arsenault	9ad8a1f6fb	AMDGPU: Fix high 16-bit optimization on gfx9 We can do this optimization in the majority of cases, but we currently don't have a way to do it. We do not track/model which instructions have which behavior, the control bit to change the high bit behavior, or making use of preserved bits at all. This is a bit fuzzy since we don't know precisely how the source instruction will be lowered, but that only really matters in one case (for fma_mixlo). We do need to fixup some of these cases after selection, but the pattern helps eliminate many of these zexts.	2021-06-22 13:16:45 -04:00
zhijian	bd240b3d77	[AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table. Summary: generate eh_info when vector registers are saved according to the traceback table. struct eh_info_t { unsigned version; /* EH info version 0 / #if defined(64BIT) char _pad[4]; / padding / #endif unsigned long lsda; / Pointer to Language Specific Data Area / unsigned long personality; / Pointer to the personality routine */ }; the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function Reviewers: Jason Liu Differential Revision: https://reviews.llvm.org/D103651	2021-06-22 13:01:31 -04:00
Stanislav Mekhanoshin	d797a7f8da	[AMDGPU] Use performOptimizedStructLayout for LDS sort This gives better packing. Differential Revision: https://reviews.llvm.org/D104331	2021-06-22 09:58:10 -07:00
Matt Arsenault	a7786badb7	AMDGPU: Move zeroed FP high bits optimization to patterns	2021-06-22 12:47:56 -04:00
Meera Nakrani	ea011ec5ed	[AArch64LoadStoreOptimizer] Recommit: Generate more STPs by renaming registers earlier This is a recommit that fixes unwanted STP generation by checking that the base register has not been modified or used elsewhere. Our initial motivating case was memcpy's with alignments > 16. The loads/stores, to which small memcpy's expand, are kept together in several places so that we get a sequence like this for a 64 bit copy: LD w0 LD w1 ST w0 ST w1 The load/store optimiser can generate a LDP/STP w0, w1 from this because the registers read/written are consecutive. In our case however, the sequence is optimised during ISel, resulting in: LD w0 ST w0 LD w0 ST w0 This instruction reordering allows reuse of registers. Since the registers are no longer consecutive (i.e. they are the same), it inhibits LDP/STP creation. The approach here is to perform renaming: LD w0 ST w0 LD w1 ST w1 to enable the folding of the stores into a STP. We do not yet generate the LDP due to a limitation in the renaming implementation, but plan to look at that in a follow-up so that we fully support this case. While this was initially motivated by certain memcpy's, this is a general approach and thus is beneficial for other cases too, as can be seen in some test changes. Differential Revision: https://reviews.llvm.org/D103597	2021-06-22 15:29:13 +00:00
Thomas Johnson	2ef1fbfe0e	Add norm sub-target feature to table gen for ARC This adds the `norm` sub-target feature (without backing implementation for now) to table gen. Differential Revision: https://reviews.llvm.org/D104558	2021-06-22 14:39:29 +03:00
Eli Friedman	74909e4b6e	Rename MachineMemOperand::getOrdering -> getSuccessOrdering. Since this method can apply to cmpxchg operations, make sure it's clear what value we're actually retrieving. This will help ensure we don't accidentally ignore the failure ordering of cmpxchg in the future. We could potentially introduce a getOrdering() method on AtomicSDNode that asserts the operation isn't cmpxchg, but not sure that's worthwhile. Differential Revision: https://reviews.llvm.org/D103338	2021-06-21 16:49:27 -07:00
Eli Friedman	bf0d0671a1	[ARM] Make sure we don't transform unaligned store to stm on Thumb1. This isn't likely to come up in practice; the combination of compiler flags required to hit this issue should be rare. Found by inspection.	2021-06-21 14:32:42 -07:00
Fangrui Song	c618692218	[AArch64][X86] Allow 64-bit label differences lower to IMAGE_REL_*_REL32 `IMAGE_REL_ARM64_REL64/IMAGE_REL_AMD64_REL64` do not exist and `.quad a - .` is currently not representable. For instrumentation, `.quad a - .` is useful representing a cross-section reference in a metadata section, to allow ELF medium/large code models. The COFF limitation makes such generic instrumentations inconvenient. I plan to make a PGO/coverage metadata section field relative in D104556. Differential Revision: https://reviews.llvm.org/D104564	2021-06-21 14:32:25 -07:00
Craig Topper	c2e01ee4a5	[RISCV] Remove extra character from a comment. NFC	2021-06-21 12:52:02 -07:00
Jonas Paulsson	b2cd98d5fe	[SystemZ] Fix some typos in comments.	2021-06-21 13:50:54 -05:00
Craig Topper	9080659ac7	[RISCV] Add isel patterns to match vmacc/vmadd/vnmsub/vnmsac from add/sub and mul. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104163	2021-06-21 11:27:44 -07:00
Sam Tebbs	bbe16b7af2	[ARM] Transform a fixed-point to floating-point conversion into a VCVT_fix Conversion from a fixed-point number to a floating-point number is done by multiplying the fixed-point number by 2^(-n) where n is the number of fractional bits. Currently this is lowered to a vcvt (integer to floating-point) then a vmul, but it can instead be lowered directly to a vcvt (fixed-point to floating-point). This patch enables such transformations as long as the multiplication factor is a power of 2. Differential Revision: https://reviews.llvm.org/D103903	2021-06-21 14:14:09 +01:00
Bradley Smith	9e7329e37e	[AArch64][SVE] Wire up vscale_range attribute to SVE min/max vector queries Differential Revision: https://reviews.llvm.org/D103702	2021-06-21 13:00:36 +01:00
Jordan Rupprecht	b650778dc4	[NFC] Wrap entire assert-only block in LLVM_DEBUG	2021-06-21 04:01:27 -07:00
Sebastian Neubauer	bbd7424402	[AMDGPU] Fix linking with shared libraries AMDGPULDSUtils depends on llvm::CallGraph.	2021-06-21 11:11:13 +02:00
Ruiling Song	208332de8a	[AMDGPU] Add Optimize VGPR LiveRange Pass. This pass aims to optimize VGPR live-range in a typical divergent if-else control flow. For example: def(a) if(cond) use(a) ... // A else use(a) As AMDGPU access vgpr with respect to active-mask, we can mark `a` as dead in region A. For details, please refer to the comments in implementation file. The pass is enabled by default, the frontend can disable it through "-amdgpu-opt-vgpr-liverange=false". Differential Revision: https://reviews.llvm.org/D102212	2021-06-21 15:25:55 +08:00
hsmahesha	80fd5fa526	[AMDGPU] Replace non-kernel function uses of LDS globals by pointers. The main motivation behind pointer replacement of LDS use within non-kernel functions is - to avoid subsequent LDS lowering pass from directly packing LDS (assume large LDS) into a struct type which would otherwise cause allocating huge memory for struct instance within every kernel. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103225	2021-06-21 11:51:49 +05:30
Fangrui Song	521d373274	Fix -Wunused-variable and -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build. NFC	2021-06-20 11:09:07 -07:00
Craig Topper	b663f30fa4	[RISCV] Prevent formation of shXadd(.uw) and add.uw if it prevents the use of addi. If the outer add has an simm12 immediate operand we should prefer it instead of materializing it in a register. This would guarantee and extra instruction and temporary register. Since we don't check one use on the shl or zext we might generate more instructions if there is an additional user.	2021-06-19 12:10:42 -07:00
Roman Lebedev	834aafa55b	[NFC] AMD Zen 3: fix typo in a comment	2021-06-19 22:05:17 +03:00
Fangrui Song	59d90fe817	Simplify some typedef struct	2021-06-19 11:36:44 -07:00
Michael Liao	940efa4f69	[amdgpu] Improve the from f32 to i64. - Take the same principle as the conversion from f64 to i64 with extra necessary pre- and post-processing. It helps to reduce that conversion sequence by half compared to legacy one. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D104427	2021-06-19 12:46:48 -04:00
Tomas Matheson	18dbe68978	[ARM][NFC] Tidy up subtarget frame pointer routines getFramePointerReg only depends on information in ARMSubtarget, so move it in there so it can be accessed from more places. Make use of ARMSubtarget::getFramePointerReg to remove duplicated code. The main use of useR7AsFramePointer is getFramePointerReg, so inline it. Differential Revision: https://reviews.llvm.org/D104476	2021-06-19 17:00:45 +01:00
Ben Shi	d934b72809	[RISCV] Optimize add-mul in the zba extension with SHADD This patch does the following optimization. Rx + Ry 18 => (SH1ADD (SH3ADD Rx, Rx), Ry) Rx + Ry * 20 => (SH2ADD (SH2ADD Rx, Rx), Ry) Rx + Ry * 24 => (SH3ADD (SH1ADD Rx, Rx), Ry) Rx + Ry * 36 => (SH2ADD (SH3ADD Rx, Rx), Ry) Rx + Ry * 40 => (SH3ADD (SH2ADD Rx, Rx), Ry) Rx + Ry * 72 => (SH3ADD (SH3ADD Rx, Rx), Ry) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104588	2021-06-19 14:33:27 +08:00
Matt Arsenault	d6467e00df	AMDGPU: Fix infinite loop in DAG combine with fneg + fma We were not reporting isFNegFree for v2f32, although it is effectively free after legalization. The generic combine was pulling fneg out of the fma source operands, and the AMDGPU combine was doing the opposite.	2021-06-18 19:09:03 -04:00
Matt Arsenault	ad4a18251a	AMDGPU: Fix assert on m0_lo16/m0_hi16 These get added (redundantly) to the bundle expanded for indirect register accesses. We hit this path only when there is a call in the function.	2021-06-18 18:48:53 -04:00
Craig Topper	ac87133f1d	[RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch. Previously we went directly to unknown state on VTYPE mismatch. If we instead remember the partial match, we can use this to still use X0, X0 vsetvli in successors if AVL and needed SEW/LMUL ratio match. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104069	2021-06-18 12:16:07 -07:00
Anshil Gandhi	2e5dc4a1ef	[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask Implemented the transformation of xor (llvm.amdgcn.class x, mask), -1 into llvm.amdgcn.class(x, ~mask). Added LIT tests as well. Differential Revision: https://reviews.llvm.org/D104049	2021-06-18 13:04:12 -06:00
Jingu Kang	78b75b452b	[AArch64] Add TableGen patterns to generate uaddlv uaddv(uaddlp(x)) ==> uaddlv(x) addp(uaddlp(x)) ==> uaddlv(x) Differential Revision: https://reviews.llvm.org/D104236	2021-06-18 17:23:26 +01:00
Luke	c2e97ba85e	[RISCV] Don't enable Interleaved Access Vectorization The patch https://reviews.llvm.org/D101469 is intended to enable loop unrolling, not interleaved access vectorization. The method bool enableInterleavedAccessVectorization() should not be implemented.	2021-06-18 12:32:30 +08:00
Igor Kudrin	85ec210751	[objdump][ARM] Fix evaluating the target address of a Thumb BLX(i) The instruction can be 16-bit aligned while targeting 32-bit aligned code. To calculate the target address correctly, the address of the instruction has to be adjusted. Differential Revision: https://reviews.llvm.org/D104446	2021-06-18 10:40:55 +07:00
Carl Ritson	a10aeb3b32	[AMDGPU] Remove duplicate setOperationAction for v4i16/v4f16 (NFC)	2021-06-18 12:38:54 +09:00
Heejin Ahn	1d891d44f3	[WebAssembly] Rename event to tag We recently decided to change 'event' to 'tag', and 'event section' to 'tag section', out of the rationale that the section contains a generalized tag that references a type, which may be used for something other than exceptions, and the name 'event' can be confusing in the web context. See - https://github.com/WebAssembly/exception-handling/issues/159#issuecomment-857910130 - https://github.com/WebAssembly/exception-handling/pull/161 Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D104423	2021-06-17 20:34:19 -07:00
Jim Lin	e7bf451056	[M68k][NFC] Fix indentation in M68kInstrArithmetic.td Merely fix indentation Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D104434	2021-06-18 09:49:04 +08:00
Saleem Abdulrasool	116841c623	RISCV: clean up target expression handling The target specific expression handling was slightly regressed by `bbea64250f`. This restores the proper sub-expression evaluation to allow for constant folding within the expression. We explicitly discard the layout and assembler when evaluating the expression to avoid any symbolic computation and instead using the `evaluateAsRelocatable` to canonicalise and constant fold only. We can also simplify the expression handling - none of the target variants support symbolic difference. This simplifies the logic for that and adds additional tests to ensure that we do not accidentally regress here in the future. Reviewed By: maskray Differential Revision: https://reviews.llvm.org/D104473	2021-06-17 13:35:32 -07:00
Jon Roelofs	7b06120882	[AArch64][GISel] and+or+shl => bfi This fixes a GISEL vs SDAG regression that showed up at -Os in 256.bzip2 In `_getAndMoveToFrontDecode`: gisel: ``` and w9, w0, #0xff orr w9, w9, w8, lsl #8 ``` sdag: ``` bfi w0, w8, #8, #24 ``` Differential revision: https://reviews.llvm.org/D103291	2021-06-17 12:52:59 -07:00
Roman Lebedev	69caacc626	[X86] AMD Zen 3: don't confuse shift and shuffle, NFC These proc res groups occupy the exact same pipes, so this doesn't affect the modelling, but it's confusing nontheless.	2021-06-17 21:07:35 +03:00
Haojian Wu	53f5f14136	fix an -Wunused-variable warning in release built, NFC	2021-06-17 18:48:47 +02:00
Saleem Abdulrasool	bbea64250f	RISCV: adjust handling of relocation emission for RISCV This re-architects the RISCV relocation handling to bring the implementation closer in line with the implementation in binutils. We would previously aggressively resolve the relocation. With this restructuring, we always will emit a paired relocation for any symbolic difference of the type of S±T[±C] where S and T are labels and C is a constant. GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE` which indicates that a fixup may be expanded into multiple relocations. This is used by the RISCV backend to always emit a paired relocation - either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] + SUB[WIDTH] for a debug info relocation. Irrespective of whether linker relaxation support is enabled, symbolic difference is always emitted as a paired relocation. This change also sinks the target specific behaviour down into the target specific area rather than exposing it to the shared relocation handling. In the process, we also sink the "special" handling for debug information down into the RISCV target. Although this improves the path for the other targets, this is not necessarily entirely ideal either. The changes in the debug info emission could be done through another type of hook as this functionality would be required by any other target which wishes to do linker relaxation. However, as there are no other targets in LLVM which currently do this, this is a reasonable thing to do until such time as the code needs to be shared. Improve the handling of the relocation (and add a reduced test case from the Linux kernel) to ensure that we handle complex expressions for symbolic difference. This ensures that we correct relocate symbols with the adddends normalized and associated with the addition portion of the paired relocation. This change also addresses some review comments from Alex Bradbury about the relocations meant for use in the DWARF CFA being named incorrectly (using ADD6 instead of SET6) in the original change which introduced the relocation type. This resolves the issues with the symbolic difference emission sufficiently to enable building the Linux kernel with clang+IAS+lld (without linker relaxation). Resolves PR50153, PR50156! Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143 Reviewed By: nickdesaulniers, maskray Differential Revision: https://reviews.llvm.org/D103539	2021-06-17 08:20:02 -07:00
Simon Pilgrim	cdb4fcf9a1	[X86] combineSelect - refactor MIN/MAX detection code to make it easier to add additional select(setcc,x,y) folds. NFCI. I need to add some additional handling to address some of the regressions from D101074	2021-06-17 13:50:59 +01:00
Fraser Cormack	fed1503e85	[RISCV][VP] Lower FP VP ISD nodes to RVV instructions With the exception of `frem`, this patch supports the current set of VP floating-point binary intrinsics by lowering them to to RVV instructions. It does so by using the existing `RISCVISD *_VL` custom nodes as an intermediate layer. Both scalable and fixed-length vectors are supported by using this method. The `frem` node is unsupported due to a lack of available instructions. For fixed-length vectors we could scalarize but that option is not (currently) available for scalable-vector types. The support is intentionally left out so it equivalent for both vector types. The matching of vector/scalar forms is currently lacking, as scalable vector types do not lower to the custom `VFMV_V_F_VL` node. We could either make floating-point scalable vector splats lower to this node, or support the matching of multiple kinds of splat via a `ComplexPattern`, much like we do for integer types. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D104237	2021-06-17 10:04:00 +01:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Fangrui Song	1a76bff626	RISCVFixupKinds.h: Don’t duplicate function or class name at the beginning of the comment && fix some comments	2021-06-16 10:42:43 -07:00
Sushma Unnibhavi	2193347e72	[M68k][GloballSel] Adding initial GlobalISel infrastructure Wiring up GlobalISel for the M68k backend Differential Revision: https://reviews.llvm.org/D101819	2021-06-16 10:48:38 -06:00
Dylan Fleming	2a936be388	[SVE] Selection failure with scalable insertelements Reviewed By: efriedma, CarolineConcatto Differential Revision: https://reviews.llvm.org/D104244	2021-06-16 15:38:31 +01:00
Jay Foad	66234ce49f	[AMDGPU] Set VOP3P flag on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 15:00:45 +01:00
David Spickett	e4ecd83fe9	[llvm][AArch64] Handle arrays of struct properly (from IR) This only applies to FastIsel. GlobalIsel seems to sidestep the issue. This fixes https://bugs.llvm.org/show_bug.cgi?id=46996 One of the things we do in llvm is decide if a type needs consecutive registers. Previously, we just checked if it was an array or not. (plus an SVE specific check that is not changing here) This causes some confusion when you arbitrary IR like: ``` %T1 = type { double, i1 }; define [ 1 x %T1 ] @foo() { entry: ret [ 1 x %T1 ] zeroinitializer } ``` We see it is an array so we call CC_AArch64_Custom_Block which bails out when it sees the i1, a type we don't want to put into a block. This leaves the location of the double in some kind of intermediate state and leads to odd codegen. Which then crashes the backend because it doesn't know how to implement what it's been asked for. You get this: ``` renamable $d0 = FMOVD0 $w0 = COPY killed renamable $d0 ``` Rather than this: ``` $d0 = FMOVD0 $w0 = COPY $wzr ``` The backend knows how to copy 64 bit to 64 bit registers, but not 64 to 32. It can certainly be taught how but the real issue seems to be us even trying to assign a register block in the first place. This change makes the logic of AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters a bit more in depth. If we find an array, also check that all the nested aggregates in that array have a single member type. Then CC_AArch64_Custom_Block's assumption of a type that looks like [ N x type ] will be valid and we get the expected codegen. New tests have been added to exercise these situations. Note that some of the output is not ABI compliant. The aim of this change is to simply handle these situations and not to make our processing of arbitrary IR ABI compliant. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D104123	2021-06-16 13:56:01 +00:00
Jay Foad	7f3ac6714a	[AMDGPU] Set SALU, VALU and other instruction type flags on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 13:36:02 +01:00
Jay Foad	24ffc343f9	[AMDGPU] Set IsAtomicRet and IsAtomicNoRet on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 12:23:29 +01:00
Jay Foad	323b3e645d	[AMDGPU] Set mayLoad and mayStore on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 12:10:23 +01:00
David Green	0a714eaa51	[ARM] Correct type of setcc results for FP vectors Under MVE v4f32 and v8f16 vectors should be using v4i1/v8i1 predicates for the setcc result type, as they have predicated registers for those types. Setting this correctly prevents some inefficient optimizations from happening.	2021-06-16 11:11:03 +01:00
Jay Foad	6f778fed8e	[AMDGPU] Set more flags on Real instructions This does not affect codegen, which only tests these flags on Pseudo instructions, but might help llvm-mca which has to work with Real instructions. In particular setting LGKM_CNT on DS instructions helps with the problem identified in D104149. Differential Revision: https://reviews.llvm.org/D104293	2021-06-16 09:58:50 +01:00
Jay Foad	37109974af	[AMDGPU] Use defvar in SOPInstructions.td. NFC. Factor out repeated !cast<SOP*_Pseudo>(NAME) into a new "defvar ps", just to improve readability and maintainability. Differential Revision: https://reviews.llvm.org/D104306	2021-06-16 09:16:45 +01:00
Roman Lebedev	308f6a5245	[NFC][X86] lowerVECTOR_SHUFFLE(): drop FIXME about widening to i128 (YMM half) element type As per the discussion in D103818, so far, this does not appear to be worthwhile. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103818	2021-06-16 10:24:33 +03:00
Saleem Abdulrasool	17bdc0ff6f	X86: balance the frame prologue and epilogue on Win64 This was broken in `ba1509da7b`. The Win64 frame would not perform the setup of the Swift async context parameter but would tear down the setup in the epilogue resulting in crashes. This ensures that we do the full setup when we do the tear down. Although this is non-conforming to the Win64 calling convention, it corrects the setup and exposes the actual issue that the change introduced: incorrect frame setup. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104246	2021-06-15 20:13:52 -07:00
Wenlei He	76de2f4a9c	CMake: allow overriding CMAKE_CXX_VISIBILITY_PRESET This allows overriding the `CMAKE_CXX_VISIBILITY_PRESET` on the command line. For example, setting the value to `default` lets PIC LLVM static libraries be converted to DSOs, without the need to rebuild LLVM with BUILD_SHARED_LIBS=ON. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D104168	2021-06-15 15:51:18 -07:00
Nemanja Ivanovic	821a8f680e	[PowerPC] Fix spilling of paired VSX registers We have added STXVP/LXVP for spilling and restoring the registers but we neglected to add FI elimination code for these. The result is that we end up producing impossible MachineInstr's that have register operands in place of immediates.	2021-06-15 14:13:17 -05:00
Bob Haarman	3bc899b4de	[X86] avoid assert with varargs, soft float, and no-implicit-float Fixes: - PR36507 Floating point varargs are not handled correctly with -mno-implicit-float - PR48528 __builtin_va_start assumes it can pass SSE registers when using -Xclang -msoft-float -Xclang -no-implicit-float On x86_64, floating-point parameters are normally passed in XMM registers. For va_start, we spill those to memory so va_arg can find them. There is an interaction here with -msoft-float and -no-implicit-float: When -msoft-float is in effect, instead of passing floating-point parameters in XMM registers, they are passed in general-purpose registers. When -no-implicit-float is in effect, it "disables implicit floating-point instructions" (per the LangRef). The intended effect is to not have the compiler generate floating-point code unless explicit floating-point operations are present in the source code, but what exactly counts as an explicit floating-point operation is not specified. The existing behavior of LLVM here has led to some surprises and PRs. This change modifies the behavior as follows: \| soft \| no-implicit \| old behavior \| new behavior \| \| no \| no \| spill XMM regs \| spill XMM regs \| \| yes \| no \| don't spill XMM \| don't spill XMM \| \| no \| yes \| don't spill XMM \| spill XMM regs \| \| yes \| yes \| assert \| don't spill XMM \| In particular, this avoids the assert that happens when -msoft-float and -no-implicit-float are both in effect. This seems like a perfectly reasonable combination: If we don't want to rely on hardware floating-point support, we want to both avoid using float registers to pass parameters and avoid having the compiler generate floating-point code that wasn't in the original program. Instead of crashing the compiler, the new behavior is to not synthesize floating-point code in this case. This fixes PR48528. The other interesting case is when -no-implicit-float is in effect, but -msoft-float is not. In that case, any floating-point parameters that are present will be in XMM registers, and so we have to spill them to correctly handle those. This fixes PR36507. The spill is conditional on %al indicating that parameters are present in XMM registers, so no floating-point code will be executed unless the function is called with floating-point parameters. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104001	2021-06-15 11:27:35 -07:00
David Green	93aa445e16	Revert "[ARM] Extend narrow values to allow using truncating scatters" This commit adds nodes that might not always be used, which the expensive checks builder does not like. Reverting for now to think up a better way of handling it.	2021-06-15 18:19:25 +01:00
Arthur Eubanks	be5d454f3f	[NFC][OpaquePtr] Avoid calling getPointerElementType() Pointee types are going away soon. For this, we mostly just care about store/load types, which are already available without the pointee types. The other intrinsics always use i8*. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D103719	2021-06-15 09:53:12 -07:00
Arthur Eubanks	25b2126b9e	[NFC] Remove redundant variable Differential Revision: https://reviews.llvm.org/D103706	2021-06-15 09:53:11 -07:00
David Green	b9bd2936f9	[ARM] Extend narrow values to allow using truncating scatters As a minor adjustment to the existing lowering of offset scatters, this extends any smaller-than-legal vectors into full vectors using a zext, so that the truncating scatters can be used. Due to the way MVE legalizes the vectors this should be cheap in most situations, and will prevent the vector from being scalarized. Differential Revision: https://reviews.llvm.org/D103704	2021-06-15 17:45:14 +01:00
David Green	680d3f8f17	[ARM] Use rq gather/scatters for smaller v4 vectors A pointer will always fit into an i32, so a rq offset gather/scatter can be used with v4i8 and v4i16 gathers, using a base of 0 and the Ptr as the offsets. The rq gather can then correctly extend the type, allowing us to use the gathers without falling back to scalarizing. This patch rejigs tryCreateMaskedGatherOffset in the MVEGatherScatterLowering pass to decompose the Ptr into Base:0 + Offset:Ptr (with a scale of 1), if the Ptr could not be decomposed from a GEP. v4i32 gathers will already use qi gathers, this extends that to v4i8 and v4i16 gathers using the extending rq variants. Differential Revision: https://reviews.llvm.org/D103674	2021-06-15 17:06:15 +01:00
David Green	09924cbab7	[ARM] Rejig some of the MVE gather/scatter lowering pass. NFC This adjusts some of how the gather/scatter lowering pass passes around data and where certain gathers/scatters are created from. It should not effect code generation on its own, but allows other patches to more clearly reason about the code. A number of extra test cases were also added for smaller gathers/ scatters that can be extended, and some of the test comments were updated.	2021-06-15 15:38:39 +01:00
Roman Lebedev	88da6c1ead	[X86] Schedule-model second (mask) output of GATHER instruction Much like `mulx`'s `WriteIMulH`, there are two outputs of AVX2 GATHER instructions. This was changed back in rL160110, but the sched model change wasn't present. So right now, for sched models that are marked as complete (`znver3` only now), codegen'ning `GATHER` results in a crash: ``` DefIdx 1 exceeds machine model writes for early-clobber renamable $ymm3, dead early-clobber renamable $ymm2 = VPGATHERDDYrm killed renamable $ymm3(tied-def 0), undef renamable $rax, 4, renamable $ymm0, 0, $noreg, killed renamable $ymm2(tied-def 1) :: (load 32, align 1) ``` https://godbolt.org/z/Ks7zW7WGh I'm guessing we need to deal with this like we deal with `WriteIMulH`. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104205	2021-06-15 12:04:33 +03:00
Craig Topper	4017d0335a	[X86] Use EVT::getVectorVT instead of changeVectorElementType in reduceVMULWidth. Changing vector element type doesn't work for v6i32->v6i16 now that v6i32 is an MVT and v6i16 is not. I would like to fix this in changeVectorElementType, but you need a LLVMContext to call getVectorVT which we can't get from an MVT. Fixes PR50709.	2021-06-14 22:07:04 -07:00
Kai Luo	1c450c3d7e	[PowerPC] Export 16 byte load-store instructions Export `lq`, `stq`, `lqarx` and `stqcx.` in preparation for implementing 16-byte lock free atomic operations on AIX. Add a new register class `g8prc` for these instructions, since these instructions require even-odd register pair. Reviewed By: nemanjai, jsji, #powerpc Differential Revision: https://reviews.llvm.org/D103010	2021-06-15 01:56:10 +00:00
Huihui Zhang	1c096bf09f	[SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE. Currently, Loop strengh reduce is not handling loops with scalable stride very well. Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance, (refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added). Memory accesses are incremented by "16vscale", while induction variable is incremented by "8vscale". The scaling factor "2" needs to be extracted to build candidate formula i.e., "reg(%in) + 2reg({0,+,(8 %vscale)}". So that addrec register reg({0,+,(8vscale)}) can be reused among Address and ICmpZero LSRUses to enable optimal solution selection. This patch allow LSR getExactSDiv to recognize special cases like "C1XY /s C2X*Y", and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR is missing candidate formula with proper scaled factor to leverage target scaled-index addressing mode. Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable vector. But allow simple valid scale to pass through. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103939	2021-06-14 16:42:34 -07:00
Piotr Sobczak	e0c382a9d5	[AMDGPU] Limit runs of fixLdsBranchVmemWARHazard The code in fixLdsBranchVmemWARHazard looks for patterns of a vmem/lds access followed by a branch, followed by an lds/vmem access. The handling of the hazard requires an arbitrary number of instructions to process. In the worst case where a function has a vmem access, but no lds accesses, all instructions are examined only to conclude that the hazard cannot occur. Add the pre-processing stage which detects if there is both lds and vmem present in the function and only then does the more costly search. This patch significantly improves compilation time in the cases the hazard cannot happen. In one pathological case I looked at IsHazardInst is needlesly called 88.6 milions times. The numbers could also be improved by introducing a map around the inner calls to ::getWaitStatesSince in fixLdsBranchVmemWARHazard, but nothing will beat not running fixLdsBranchVmemWARHazard at all in the cases detected by shouldRunLdsBranchVmemWARHazardFixup(). Differential Revision: https://reviews.llvm.org/D104219	2021-06-14 22:30:23 +02:00
Saleem Abdulrasool	8c8dbc1082	X86: pass swift_async context in R14 on Win64 Pass swift_async context in a callee-saved register rather than as a regular parameter. This is similar to the Swift `self` and `error` parameters.	2021-06-14 11:02:21 -07:00
Fraser Cormack	c75e454cb9	[RISCV] Transform unaligned RVV vector loads/stores to aligned ones This patch adds support for loading and storing unaligned vectors via an equivalently-sized i8 vector type, which has support in the RVV specification for byte-aligned access. This offers a more optimal path for handling of unaligned fixed-length vector accesses, which are currently scalarized. It also prevents crashing when `LegalizeDAG` sees an unaligned scalable-vector load/store operation. Future work could be to investigate loading/storing via the largest vector element type for the given alignment, in case that would be more optimal on hardware. For instance, a 4-byte-aligned nxv2i64 vector load could loaded as nxv4i32 instead of as nxv16i8. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104032	2021-06-14 18:12:18 +01:00
zhijian	7ed515d168	[AIX][XCOFF] emit vector info of traceback table. Summary: emit vector info of traceback table. Reviewers: Jason Liu,Hubert Tong Differential Revision: https://reviews.llvm.org/D93659	2021-06-14 11:15:22 -04:00
Jingu Kang	08ce52ef5e	[AArch64] Improve SAD pattern Given a vecreduce_add node, detect the below pattern and convert it to the node sequence with UABDL, [S\|U]ADB and UADDLP. i32 vecreduce_add( v16i32 abs( v16i32 sub( v16i32 [sign\|zero]_extend(v16i8 a), v16i32 [sign\|zero]_extend(v16i8 b)))) =================> i32 vecreduce_add( v4i32 UADDLP( v8i16 add( v8i16 zext( v8i8 [S\|U]ABD low8:v16i8 a, low8:v16i8 b v8i16 zext( v8i8 [S\|U]ABD high8:v16i8 a, high8:v16i8 b Differential Revision: https://reviews.llvm.org/D104042	2021-06-14 15:48:51 +01:00
Eric Astor	f09e200b31	[ms] [llvm-ml] When parsing MASM, "jmp short" instructions are case insensitive Handle "short" in a case-insensitive fashion in MASM. Required to correctly parse z_Windows_NT-586_asm.asm from the OpenMP runtime. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D104195	2021-06-13 18:36:00 -04:00
Eric Astor	56edcbc2ad	Fix misspelled instruction in X86 assembly parser Did not correctly handle "jecxz short <address>". Discovered while working on LLVM-ML; shows up in z_Windows_NT-586_asm.asm from the OpenMP runtime Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D104194	2021-06-13 18:34:15 -04:00
LemonBoy	5be3a1a064	[SPARC] Legalize truncation and extension between fp128 and half Lower truncations and expansions between fp128 and half values into libcalls. Expand truncating stores into two separate truncation and a store operations. Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D104185	2021-06-13 20:05:15 +02:00
David Green	bee2f618d5	[ARM] Introduce t2WhileLoopStartTP This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in D90591. It keeps a reference to both the tripcount register and the element count register, so that the ARMLowOverheadLoops pass in the backend can pick the correct one without having to search for it from the operand of a VCTP. Differential Revision: https://reviews.llvm.org/D103236	2021-06-13 13:55:34 +01:00
Kristina Bessonova	f6b9836b09	[ARM][NEON] Combine base address updates for vld1Ndup intrinsics Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D103836	2021-06-13 11:18:32 +02:00
Luo, Yuanke	5be314f79b	[X86] Check immediate before get it. For CMP imm instruction, when the operand 1 is symbol address we should check if it is immediate first. Here is the example code. `CMP64mi32 $noreg, 8, killed renamable $rcx, @d, $noreg, @a, implicit-def $eflags` Many thanks to Craig, Topper for the test case to reproduce this issue. Differential Revision: https://reviews.llvm.org/D104037	2021-06-13 15:40:52 +08:00
Luo, Yuanke	1e72b9d52f	Revert "[X86] Check immediate before get it." This reverts commit `9eb2f723c2`.	2021-06-13 13:55:38 +08:00
Luo, Yuanke	9eb2f723c2	[X86] Check immediate before get it. For CMP imm instruction, when the operand 1 is symbol address we should check if it is immediate first. Here is the example code. `CMP64mi32 $noreg, 8, killed renamable $rcx, @d, $noreg, @a, implicit-def $eflags` Many thanks to Craig, Topper for the test case to reproduce this issue. Differential Revision: https://reviews.llvm.org/D104037	2021-06-13 09:08:40 +08:00
Craig Topper	c997867dc0	[X86] Add ISD::FREEZE and ISD::AssertAlign to the list of opcodes that don't guarantee upper 32 bits are zero. The freeze issue was reported here https://llvm.discourse.group/t/bug-or-feature-freeze-instruction/3639 I don't have a test for AssertAlign. I just noticed it was missing and assume it should be similar to the other two Asserts. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104178	2021-06-12 09:52:29 -07:00
Florian Hahn	5cd66420cc	Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB" This reverts commit `1b748faf2b` because it breaks building the llvm-test-suite with -verify-machineinstrs on X86: http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/9585/ Running llc -verify-machineinstr on X86 crashes on the IR below: target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" %struct.widget = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [16 x [16 x i16]], [6 x [32 x i32]], [16 x [16 x i32]], [4 x [12 x [4 x [4 x i32]]]], [16 x i32], i8, i32, i32*, i32, i32, i32, i32, i32, %struct.baz, %struct.wobble.1, i32, i32, i32, i32, i32, i32, %struct.quux.2, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32**, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [3 x [2 x i32]], i32, i32, i64, i64, %struct.zot.3, %struct.zot.3, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.baz = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, %struct.snork, %struct.wombat.0, %struct.wobble, i32, i32, i32, i32, i32, i32, i32, i32, i32 (%struct.widget, %struct.eggs), i32, i32, i32, i32 } %struct.snork = type { %struct.spam, %struct.zot, i32 (%struct.wombat, %struct.widget, %struct.snork) } %struct.spam = type { i32, i32, i32, i32, i8, i32 } %struct.zot = type { i32, i32, i32, i32, i32, i8, i32* } %struct.wombat = type { i32, i32, i32, i32, i32, i32, i32, i32, void (i32, i32, i32, i32), void (%struct.wombat, %struct.widget, %struct.zot)* } %struct.wombat.0 = type { [4 x [11 x %struct.quux]], [2 x [9 x %struct.quux]], [2 x [10 x %struct.quux]], [2 x [6 x %struct.quux]], [4 x %struct.quux], [4 x %struct.quux], [3 x %struct.quux] } %struct.quux = type { i16, i8 } %struct.wobble = type { [2 x %struct.quux], [4 x %struct.quux], [3 x [4 x %struct.quux]], [10 x [4 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]] } %struct.eggs = type { [1000 x i8], [1000 x i8], [1000 x i8], i32, i32, i32, i32, i32, i32, i32, i32 } %struct.wobble.1 = type { i32, [2 x i32], i32, i32, %struct.wobble.1, %struct.wobble.1, i32, [2 x [4 x [4 x [2 x i32]]]], i32, i64, i64, i32, i32, [4 x i8], [4 x i8], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.quux.2 = type { i32, i32, i32, i32, i32, %struct.quux.2* } %struct.zot.3 = type { i64, i16, i16, i16 } define void @blam(%struct.widget* %arg, i32 %arg1) local_unnamed_addr { bb: %tmp = load i32, i32* undef, align 4 %tmp2 = sdiv i32 %tmp, 6 %tmp3 = sdiv i32 undef, 6 %tmp4 = load i32, i32* undef, align 4 %tmp5 = icmp eq i32 %tmp4, 4 %tmp6 = select i1 %tmp5, i32 %tmp3, i32 %tmp2 %tmp7 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* undef, i64 0, i64 0, i64 0 %tmp8 = zext i16 undef to i32 %tmp9 = zext i16 undef to i32 %tmp10 = load i16, i16* undef, align 2 %tmp11 = zext i16 %tmp10 to i32 %tmp12 = zext i16 undef to i32 %tmp13 = zext i16 undef to i32 %tmp14 = zext i16 undef to i32 %tmp15 = load i16, i16* undef, align 2 %tmp16 = zext i16 %tmp15 to i32 %tmp17 = zext i16 undef to i32 %tmp18 = sub nsw i32 %tmp8, %tmp9 %tmp19 = shl nsw i32 undef, 1 %tmp20 = add nsw i32 %tmp19, %tmp18 %tmp21 = sub nsw i32 %tmp11, %tmp12 %tmp22 = shl nsw i32 undef, 1 %tmp23 = add nsw i32 %tmp22, %tmp21 %tmp24 = sub nsw i32 %tmp13, %tmp14 %tmp25 = shl nsw i32 undef, 1 %tmp26 = add nsw i32 %tmp25, %tmp24 %tmp27 = sub nsw i32 %tmp16, %tmp17 %tmp28 = shl nsw i32 undef, 1 %tmp29 = add nsw i32 %tmp28, %tmp27 %tmp30 = sub nsw i32 %tmp20, %tmp29 %tmp31 = sub nsw i32 %tmp23, %tmp26 %tmp32 = shl nsw i32 %tmp30, 1 %tmp33 = add nsw i32 %tmp32, %tmp31 store i32 %tmp33, i32* undef, align 4 %tmp34 = mul nsw i32 %tmp31, -2 %tmp35 = add nsw i32 %tmp34, %tmp30 store i32 %tmp35, i32* undef, align 4 %tmp36 = select i1 %tmp5, i32 undef, i32 undef br label %bb37 bb37: ; preds = %bb %tmp38 = load i32, i32* undef, align 4 %tmp39 = ashr i32 %tmp38, %tmp6 %tmp40 = load i32, i32* undef, align 4 %tmp41 = sdiv i32 %tmp39, %tmp40 store i32 %tmp41, i32* undef, align 4 ret void }	2021-06-12 11:41:38 +01:00
Florian Hahn	e087b4f149	Revert "[X86FixupLEAs] Sub register usage of LEA dest should block LEA/SUB optimization" This reverts commit `f35bcea1d4` because it depends on `1b748faf2b`, which breaks building the llvm-test-suite with -verify-machineinstrs on X86. See 154adc0f135cff3f8a8861c335d2b88c8049d098 for more details.	2021-06-12 11:40:47 +01:00
madhur13490	c27e8141b3	[AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls This patch computes max SGPRs and VGPRs used by module in presence of indirect calls and makes that as register requirement for functions/kernels which makes indirect calls. This patch also refactors code AMDGPUSubTarget.cpp which add a "base" variants of getMaxNumSGPRs which is used by MachineFunction and new Function version. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103636	2021-06-12 11:59:34 +05:30
Matt Arsenault	a845dc1e56	AMDGPU/GlobalISel: Remove leftover hack for argument memory sizes Since the call lowering code now tries to respect the tablegen reported argument types, this is no longer necessary.	2021-06-11 13:45:25 -04:00
Matt Arsenault	6dd54dada3	AMDGPU/GlobalISel: Fix indentation	2021-06-11 13:45:25 -04:00
Guozhi Wei	f35bcea1d4	[X86FixupLEAs] Sub register usage of LEA dest should block LEA/SUB optimization In function searchALUInst, sub register usage of LEA dest should also block LEA/SUB optimization, otherwise the sub register usage gets an undefined value. This patch fixes https://bugs.llvm.org/show_bug.cgi?id=50615. Differential Revision: https://reviews.llvm.org/D103922	2021-06-11 09:45:56 -07:00
Simon Pilgrim	61cdaf66fe	[ADT] Remove APInt/APSInt toString() std::string variants <string> is currently the highest impact header in a clang+llvm build: https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps. This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code. Differential Revision: https://reviews.llvm.org/D103888	2021-06-11 13:19:15 +01:00
Zarko Todorovski	c1bb75febe	[PowerPC] Allow wa inline asm to also accept floating point arguments GCC documentation for the `wa` constraint states that: ``` wa A VSX register (VSR), vs0…vs63. This is either an FPR (vs0…vs31 are f0…f31) or a VR (vs32…vs63 are v0…v31). ``` This technically means that we could accept floating point parameters. In fact, gcc itself does. The following testcase compiles and runs on all PPC platforms with GCC, whereas clang/llc will assert: ``` #include <stdio.h> double foo ( vector double a ) { double b, c; asm("xvabsdp %x0, %x2 \n" "xxsldwi %x1, %x0, %x0, 2 \n" : "+wa" (b), "=wa" (c) : "wa" (a) ); return b+c; } int main(void) { vector double a = {-3., -4.}; double t = foo( a ); printf("%g\n", t); } ``` This patch allows clang/llc to build and run this testcase. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D103409	2021-06-11 07:19:10 -04:00
Koutheir Attouchi	789708617d	Do not generate calls to the 128-bit function __multi3() on 32-bit ARM Re-applying this patch after bots failures. Should be fine now. The function __multi3() is undefined on 32-bit ARM, so a call to it should never be emitted. Instead, plain instructions need to be generated to perform 128-bit multiplications. Differential Revision: https://reviews.llvm.org/D103906	2021-06-11 11:45:21 +01:00
Rosie Sumpter	d7c219a506	[CostModel][AArch64] Improve the cost estimate of CTPOP intrinsic Added a case for CTPOP to AArch64TTIImpl::getIntrinsicInstrCost so that the cost estimate matches the codegen in test/CodeGen/AArch64/arm64-vpopcnt.ll Differential Revision: https://reviews.llvm.org/D103952	2021-06-11 11:15:46 +01:00
Bing1 Yu	56d5c46b49	[X86] Support __tile_stream_loadd intrinsic for new AMX interface Adding support for __tile_stream_loadd intrinsic. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D103784	2021-06-11 17:28:43 +08:00
Simon Pilgrim	5e6bfb661e	[Analysis] Pass RecurrenceDescriptor as const reference. NFCI. We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.). Differential Revision: https://reviews.llvm.org/D104029	2021-06-11 10:24:14 +01:00
Qiu Chaofan	bc104fdcec	[PowerPC] Relax register superclasses for paired memops Relaxing superclass constraint for VSX register classes helps reducing 32-byte spills and copies when register pressure is high. In test case affected, some of them introduces more copies due to new allocation order. However, this patch should not be the root cause, and we may be able to fix it in other places of register allocation. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D104006	2021-06-11 14:54:03 +08:00
Hsiangkai Wang	643b6407fa	[RISCV] Avoid scalar outgoing argumetns overwriting vector frame objects. When using FP to access stack objects, the scalable stack objects will be put at the lower end of the frame. It looks like ``` \|-------------------\| <-- FP \| callee-saved regs \| \|-------------------\| \| scalar local vars \| \|-------------------\| \| RVV local vars \| \|-------------------\| <-- SP ``` If there are scalar arguments that need to pass through memory and there are vector objects on the stack using FP to access. The outgoing scalar arguments will overwrite the vector objects. It looks like ``` \|-------------------\| <-- FP \| callee-saved regs \| \|-------------------\| \| scalar local vars \| \|-------------------\| \|-------------------\| \| RVV local vars \| \| outgoing args \| <- outgoing arguments \|-------------------\| <-- SP \|-------------------\| overwrite from here. ``` In this patch, we reserve the stack for the outgoing arguments before function calls if using FP to access and there are scalable vector frame objects. It looks like ``` \|-------------------\| <-- FP \| callee-saved regs \| \|-------------------\| \| scalar local vars \| \|-------------------\| \| RVV local vars \| \|-------------------\| \| outgoing args \| \|-------------------\| <-- SP ``` Differential Revision: https://reviews.llvm.org/D103622	2021-06-11 12:26:29 +08:00
Craig Topper	420bd5ee8e	[RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel::selectSExti32/selectZExti32. This helps us select W instructions in more cases. Most of the affected tests have had the sign_extend_inreg or AND folded into sextload/zextload. Differential Revision: https://reviews.llvm.org/D104079	2021-06-10 19:06:45 -07:00
Amara Emerson	670edf3ee0	[AArch64][GlobalISel] Fix incorrectly generating uxtw/sxtw for addressing modes. When the extend is from 8 or 16 bits, the addressing modes don't support those extensions, but we weren't checking that and therefore always generated the 32->64b extension mode. Fun. Differential Revision: https://reviews.llvm.org/D104070	2021-06-10 16:59:39 -07:00
Jessica Paquette	933df6ca79	[AArch64][GlobalISel] Legalize scalar G_CTTZ + G_CTTZ_ZERO_UNDEF This adds legalization for scalar G_CTTZ and G_CTTZ_ZERO_UNDEF. Vector support requires handling vector G_BITREVERSE, which I haven't gotten around to yet. For G_CTTZ_ZERO_UNDEF, we just lower it to G_CTTZ. For G_CTTZ, we match SelectionDAG's lowering to a G_BITREVERSE + G_CTLZ. e.g. https://godbolt.org/z/nPEseYh1s (With this patch, we have slightly worse codegen than SDAG for types smaller than s32; it seems like we're missing a combine.) Also, this adds in a function to build G_BITREVERSE to MachineIRBuilder. Differential Revision: https://reviews.llvm.org/D104065	2021-06-10 15:29:51 -07:00
David Green	5d5b686f6b	[ARM] Fix Changed status in MVEGatherScatterLoweringPass. Now that we are calling SimplifyInstructionsInBlock, make sure we update Changed when it reports alterations.	2021-06-10 21:53:04 +01:00
David Green	e0c605f638	[ARM] Ensure instructions are simplified prior to GatherScatter lowering. Surprisingly, not all instructions are always simplified after unrolling and before MVE gather/scatter lowering. Notably dead gather operations can be left around which cause the gather/scatter lowering pass to crash if there are multiple gathers, some of which are dead. This patch ensures they are simplified before we modify anything, which can change some of the existing tests, including making them no-longer test what they originally tested. This uses a combination of disabling the gather/scatter lowering pass and adjusting the test to keep them as before. Differential Revision: https://reviews.llvm.org/D103150	2021-06-10 20:18:12 +01:00
Jessica Paquette	1b894ccdc9	[AArch64][GlobalISel] Mark some G_BITREVERSE types as legal + select them We fall back on G_CTTZ_ZERO_UNDEF a lot when building clang for arm64 with gisel. Handling this will require that we can handle G_BITREVERSE. This patch marks G_BITREVERSE instructions with natively supported types as legal. We get selection on these types for free via the importer. Differential Revision: https://reviews.llvm.org/D103999	2021-06-10 10:33:52 -07:00
Benjamin Kramer	3dceffd0fd	[AArch64] Silence fallthrough warning. NFC. AArch64TargetTransformInfo.cpp:302:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] default: ^	2021-06-10 17:23:37 +02:00
Luo, Yuanke	63233da723	[X86][NFC] Fix typo.	2021-06-10 22:49:11 +08:00
Irina Dobrescu	de79919e9e	[AArch64] Add cost tests for bitreverse This patch includes cost tests for bit reverse as well as some adjustments to the cost model. Differential Revision: https://reviews.llvm.org/D102755	2021-06-10 14:51:33 +01:00
David Green	9872551ca0	[ARM] Skip debug during vpt block creation Debug info is currently preventing VPT block creation, leading to different codegen. This patch attempts to skip any debug instructions during vpt block creation, making sure they do not interfere. Differential Revision: https://reviews.llvm.org/D103610	2021-06-10 14:49:04 +01:00
Timm Bäder	a9e4f91adf	[llvm][PPC] Add missing case for 'I' asm memory operands From https://llvm.org/docs/LangRef.html#asm-template-argument-modifiers: I: Print the letter ‘i’ if the operand is an integer constant, otherwise nothing. Used to print ‘addi’ vs ‘add’ instructions. Differential Revision: https://reviews.llvm.org/D103968	2021-06-10 12:52:50 +02:00
David Spickett	64de8763aa	Revert "Implementation of global.get/set for reftypes in LLVM IR" This reverts commit `31859f896c`. Causing SVE and RISCV-V test failures on bots.	2021-06-10 10:11:17 +00:00
Simon Pilgrim	4eb47e3cd4	[TargetLowering] getABIAlignmentForCallingConv - pass DataLayout by const reference. NFCI. Avoid unnecessary copies and match every other method in TargetLowering that takes DataLayout as an argument.	2021-06-10 10:55:24 +01:00
Paulo Matos	31859f896c	Implementation of global.get/set for reftypes in LLVM IR This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and lowering methods for load and stores of reference types from IR globals. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D95425	2021-06-10 10:07:45 +02:00
Martin Storsjö	99653702fd	Revert "[AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier" This reverts commit `d96ea46629`, as it caused various misoptimizations, see https://reviews.llvm.org/D103597 for discussion on the issues.	2021-06-10 10:30:13 +03:00
hsmahesha	f6632f11ed	[AMDGPU] Fix missing lowering of LDS used in global scope. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103431	2021-06-10 08:40:01 +05:30
Jinsong Ji	4a89ed373c	[AIX] Add traceback ssp canary bit support We will need to set the ssp canary bit in traceback table to communicate with unwinder about the canary. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D103202	2021-06-10 02:40:02 +00:00
Craig Topper	8dfd0810f2	[RISCV] Remove unused method from RISCVInsertVSETVLI. NFC If this becomes needed its trivial to add it back.	2021-06-09 15:35:26 -07:00
Nico Weber	68a1d9a1f5	Revert "Do not generate calls to the 128-bit function __multi3() on 32-bit ARM" This reverts commit `64e9aa3302`. Breaks check-llvm everywhere, see https://reviews.llvm.org/D103906	2021-06-09 13:21:05 -04:00
Koutheir Attouchi	64e9aa3302	Do not generate calls to the 128-bit function __multi3() on 32-bit ARM The function __multi3() is undefined on 32-bit ARM, so a call to it should never be emitted. Instead, plain instructions need to be generated to perform 128-bit multiplications. Differential Revision: https://reviews.llvm.org/D103906	2021-06-09 16:21:16 +01:00
Craig Topper	765ef4bb2a	[X86] Check destination element type before forming VTRUNCS/VTRUNCUS in combineTruncateWithSat. Fixes crash reported here https://reviews.llvm.org/D73607 Using a store to keep the trunc intact. Returning v16i24 would cause the trunc to be optimized away in SelectionDAGBuilder. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103940	2021-06-09 07:08:17 -07:00
Yvan Roux	6c78dbd4ca	[ARM] Fix Machine Outliner LDRD/STRD handling in Thumb mode. This is a fix for PR50481 Immediate values for AddrModeT2_i8s4 are already scaled in MCinst operand. This patch changes the number of bits and scale factor to reflect that state when checking stack offset status. AddrModeT2_i7s[2\|4] also have this particularity but since MVE instructions are not outlined, just move these cases to the unhandled ones. Differential Revision: https://reviews.llvm.org/D103167	2021-06-09 15:37:21 +02:00
Simon Pilgrim	630820bafc	[X86][SLM] Adjust XMM non-PMULLD throughput costs to half rate. Match what's reported in the costs table, Agner's tables and the Intel AOM	2021-06-09 13:51:40 +01:00
Meera Nakrani	d96ea46629	[AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier Our initial motivating case was memcpy's with alignments > 16. The loads/stores, to which small memcpy's expand, are kept together in several places so that we get a sequence like this for a 64 bit copy: LD w0 LD w1 ST w0 ST w1 The load/store optimiser can generate a LDP/STP w0, w1 from this because the registers read/written are consecutive. In our case however, the sequence is optimised during ISel, resulting in: LD w0 ST w0 LD w0 ST w0 This instruction reordering allows reuse of registers. Since the registers are no longer consecutive (i.e. they are the same), it inhibits LDP/STP creation. The approach here is to perform renaming: LD w0 ST w0 LD w1 ST w1 to enable the folding of the stores into a STP. We do not yet generate the LDP due to a limitation in the renaming implementation, but plan to look at that in a follow-up so that we fully support this case. While this was initially motivated by certain memcpy's, this is a general approach and thus is beneficial for other cases too, as can be seen in some test changes. Differential Revision: https://reviews.llvm.org/D103597	2021-06-09 11:25:26 +00:00
Fraser Cormack	502edebd9d	[ValueTypes][RISCV] Cap RVV fixed-length vectors by size This patch changes RVV's policy for its supported list of fixed-length vector types by capping by vector size rather than element count. Now all 1024-byte vectors (of supported element types) are supported, rather than all 256-element vectors. This is a more natural fit for the architecture, and allows us to, for example, improve the support for vector bitcasts. This change necessitated the adding of some new simple types to avoid "regressing" on the number of currently-supported vectors. We round out the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and `v512f16`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103884	2021-06-09 12:15:37 +01:00
Fraser Cormack	e8f1f89103	[RISCV] Support CONCAT_VECTORS on scalable masks This patch is a simple fix which registers CONCAT_VECTORS as custom-lowered for scalable mask vectors. This follows the pattern of all other scalable-vector types, as the default expansion of CONCAT_VECTORS cannot handle scalable types, and even if it did it'd go through the stack and generate worse code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103896	2021-06-09 09:07:44 +01:00
Kai Luo	bf58600bad	[PowerPC] Make sure the first probe is full size or is the last probe when stack is realigned When `-fstack-clash-protection` is enabled and stack has to be realigned, some parts of redzone is written prior the probe, so probe might overwrite content already written in redzone. To avoid it, we have to make sure the first probe is at full probe size or is the last probe so that we can skip redzone. It also fixes violation of ABI under PPC where `r1` isn't updated atomically. This fixes https://bugs.llvm.org/show_bug.cgi?id=49903. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D100290	2021-06-09 06:35:35 +00:00
Jim Lin	242ddd5089	[RISCV][NFC] Add a single space after comma for VType In most of cases, it has a single space after comma in assembly operands. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103790	2021-06-09 11:18:22 +08:00
Kai Luo	c87c294397	[PowerPC][Dwarf] Assign MMA register's dwarf register number to negative value According to ELF V2 ABI, `0` should be the dwarf number of `r0`. Currently MMA's register also uses `0` as its dwarf number, this confuses `RegisterInfoEmitter` and generates wrong dwarf -> llvm mapping. ``` extern const MCRegisterInfo::DwarfLLVMRegPair PPCDwarfFlavour1Dwarf2L[] = { { 0U, PPC::VSRp31 }, ``` This leads to wrong cfi output in https://reviews.llvm.org/D100290. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D103761	2021-06-09 02:24:01 +00:00
Brendon Cahoon	294efbbd3e	Reland "[AMDGPU] Add gfx1013 target" This reverts commit `211e584fa2`. Fixed a use-after-free error that caused the sanitizers to fail.	2021-06-08 21:15:35 -04:00
Jonas Paulsson	8b32e25bc2	[SystemZ] Return true from convertSetCCLogicToBitwiseLogic for scalar integer. Review: Ulrich Weigand	2021-06-08 16:27:28 -05:00
Jonas Paulsson	d5e4f28c0a	[SystemZ] Return true from isMaskAndCmp0FoldingBeneficial(). Return true if the mask is a constant uint of 2 bytes, in which case TMLL is available. Review: Ulrich Weigand	2021-06-08 15:42:46 -05:00
Brendon Cahoon	211e584fa2	Revert "[AMDGPU] Add gfx1013 target" This reverts commit `ea10a86984`. A sanitizer buildbot reports an error.	2021-06-08 16:29:41 -04:00
David Green	d7853bae94	[ARM] Generate VDUP(Const) from constant buildvectors If we cannot otherwise use a VMOVimm/VMOVFPimm/VMVNimm, fall back to producing a VDUP(const) as opposed to a constant pool load. This will at least be smaller codesize and can allow the VDUP to be folded into other instructions. Differential Revision: https://reviews.llvm.org/D103808	2021-06-08 20:51:33 +01:00
Michael Liao	27332968d8	[amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`. - Add `-enable-ocl-mangling-mismatch-workaround` to work around the mismatch on OCL name mangling so far. Reviewed By: yaxunl, rampitec Differential Revision: https://reviews.llvm.org/D103920	2021-06-08 15:42:27 -04:00
Nick Desaulniers	3787ee4571	reland [IR] make -stack-alignment= into a module attr Relands commit `433c8d950c` with fixes for MIPS. Similar to D102742, specifying the stack alignment via CodegenOpts means that this flag gets dropped during LTO, unless the command line is re-specified as a plugin opt. Instead, encode this information as a module level attribute so that we don't have to expose this llvm internal flag when linking the Linux kernel with LTO. Looks like external dependencies might need a fix: * https://github.com/llvm-hs/llvm-hs/issues/345 * https://github.com/halide/Halide/issues/6079 Link: https://github.com/ClangBuiltLinux/linux/issues/1377 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103048	2021-06-08 10:59:46 -07:00
Simon Pilgrim	01b77159e3	PPCISelLowering.cpp - don't dereference a dyn_cast<>. dyn_cast<> can return nullptr which we would then dereference - use cast<> which will assert that the type is correct.	2021-06-08 17:59:05 +01:00
Brendon Cahoon	ea10a86984	[AMDGPU] Add gfx1013 target Differential Revision: https://reviews.llvm.org/D103663	2021-06-08 12:49:49 -04:00
Craig Topper	8b4c80d380	Further improve register allocation for vwadd(u).wv, vwsub(u).wv, vfwadd.wv, and vfwsub.wv. The first source has the same EEW as the destination, but we're using earlyclobber which prevents them from ever being the same register. This patch attempts to work around this. -For unmasked .wv, add a special TIED pseudo that pretends like the first operand and the destination must be the same register. This disables the earlyclobber for that source. Mark the instruction as convertible to 3 address form which will switch it to the original untied pseudo when the TwoAddressInstructionPass decides that keeping them tied would require an extra copy. This uses code in RISCVInstrInfo.cpp to do the conversion to the untied opcode. The untie test case show that we can generate the untied version. Not sure it was profitable to do it in this case, but they have really simple IR. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D103552	2021-06-08 09:43:43 -07:00
Craig Topper	c57bce9cc5	[RISCV] Remove ForceTailAgnostic flag from vmv.s.x, vfmv.s.f and reductions. In 0.9 these were defined to leave elements other than 0 in the destination unmodified. They were changed to use the tail policy in 0.10. I missed that update. I assume no one has noticed because in order cores treat tail agnostic the same as tail undisturbed. I believe Spike and QEMU do the same. Reviewed By: arcbbb, frasercrmck Differential Revision: https://reviews.llvm.org/D103736	2021-06-08 09:22:40 -07:00
Nick Desaulniers	a596b54d47	Revert "[IR] make -stack-alignment= into a module attr" This reverts commit `433c8d950c`. Breaks the MIPS build.	2021-06-08 08:55:50 -07:00
Nick Desaulniers	433c8d950c	[IR] make -stack-alignment= into a module attr Similar to D102742, specifying the stack alignment via CodegenOpts means that this flag gets dropped during LTO, unless the command line is re-specified as a plugin opt. Instead, encode this information as a module level attribute so that we don't have to expose this llvm internal flag when linking the Linux kernel with LTO. Looks like external dependencies might need a fix: * https://github.com/llvm-hs/llvm-hs/issues/345 * https://github.com/halide/Halide/issues/6079 Link: https://github.com/ClangBuiltLinux/linux/issues/1377 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103048	2021-06-08 08:31:04 -07:00
Simon Moll	2b626aba44	[VE][NFC] IRBuilder<> -> IRBuilderBase VE's TTI broke with the switch from IRBuilder<> to IRBuilderBase. Following that change to compile again.	2021-06-08 13:55:49 +02:00
Kerry McLaughlin	5db52751a5	[CostModel] Return an invalid cost for memory ops with unsupported types Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector type cannot be legalized by widening/splitting. When this is the method of legalization found, getTypeLegalizationCost will return an Invalid cost. The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call getTypeLegalizationCost and will now also return an Invalid cost for unsupported types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102515	2021-06-08 12:07:36 +01:00
Simon Pilgrim	49d3a367c0	[CostModel][X86] Improve AVX1/AVX2 truncation costs Based off the worse case numbers generated by D103695, we were overestimating the cost of a number of vector truncations: AVX2: v2i32->v2i8, v2i64->v2i16 + v4i64->v4i32 AVX1: v2i32->v2i8, v4i64->v4i16 + v16i16->v16i8 Once we have a working set of conversion costs, the intention is to cleanup the tables and use legalized types a lot more to reduce the number of entries we currently have.	2021-06-08 10:41:03 +01:00
Simon Pilgrim	27f3041c88	NVPTXTargetLowering::LowerReturn - Pass DataLayout by reference. NFCI.	2021-06-08 10:41:01 +01:00
Craig Topper	7c4e9a6826	[RISCV] Use 0 for Log2SEW for vle1/vse1 intrinsics to enable vsetvli optimization. Missed in D103299.	2021-06-07 22:41:14 -07:00
Craig Topper	ae3ab4f0ec	[RISCV] Masked compares should use a tail agnostic policy. Writes of a mask result are always tail agnostic. Unfortunately, this seems to have made codegen worse. I can only think this must be because the vsetvli was acting as some sort of barrier that prevented some code movement in the scheduler. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D103331	2021-06-07 21:43:44 -07:00
Craig Topper	7a105b5768	[RISCV] Use AVL Operand instead of GPR for tied mask pseudo for vwadd.wv and similar. I mistakenly copied this from an older version of our internal repo.	2021-06-07 21:16:50 -07:00
Carl Ritson	c8bbfb8cf5	[AMDGPU] Allow oversize vaddr in GFX10 MIMG assembly As a follow up to D103672, we should allow vaddr to be larger than required when assembling GFX10 MIMG instructions. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D103733	2021-06-08 11:57:07 +09:00
Carl Ritson	f8816c7400	[AMDGPU] Add v5f32/VReg_160 support for MIMG instructions Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are required for a MIMG address operand. Maintain _V8 instruction variants of pseudo instructions allowing assembly prior to GFX10 to work as-is. Currently the validator can tell for GFX10 what the correct size is, so will disallow oversize address registers. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103672	2021-06-08 11:11:40 +09:00
Craig Topper	0aa941654f	[RISCV] Use bitfields to shrink the size of the vector load/store intrinsics to pseudo instruction lookup tables.	2021-06-07 17:57:51 -07:00
Ben Shi	c705b7b04d	[RISCV] Optimize bitwise and with constant for the Zbs extension This patch optimizes (and r i) to (BCLRI (BCLRI r, i0), i1) in which i = ~((1<<i0) \| (1<<i1)). or (BCLRI (ANDI r, i0), i1) in which i = i0 & ~(1<<i1). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103743	2021-06-08 07:26:00 +08:00
Craig Topper	9b92ae01ee	[RISCV] Store Log2 of EEW in the vector load/store intrinsic to pseudo lookup tables. NFCI This uses 3 bits of data instead of 7. I'm wondering if we can use bitfields for the lookup table key where this would matter. I also name the shift_amount template to log2 since it is used with more than just an srl now.	2021-06-07 15:47:45 -07:00
Stanislav Mekhanoshin	05289dfb62	[AMDGPU] Handle constant LDS uses from different kernels This allows to lower an LDS variable into a kernel structure even if there is a constant expression used from different kernels. Differential Revision: https://reviews.llvm.org/D103655	2021-06-07 15:39:08 -07:00
hsmahesha	713ca2f360	[AMDGPU] Introduce command line switch to control super aligning of LDS. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103817	2021-06-08 03:58:13 +05:30
Harald van Dijk	75521bd9d8	[X32] Add Triple::isX32(), use it. So far, support for x86_64-linux-gnux32 has been handled by explicit comparisons of Triple.getEnvironment() to GNUX32. This worked as long as x86_64-linux-gnux32 was the only X32 environment to worry about, but we now have x86_64-linux-muslx32 as well. To support this, this change adds an isX32() function and uses it. It replaces all checks for GNUX32 or MuslX32 by isX32(), except for the following: - Triple::isGNUEnvironment() and Triple::isMusl() are supposed to treat GNUX32 and MuslX32 differently. - computeTargetTriple() needs to be able to transform triples to add or remove X32 from the environment and needs to map GNU to GNUX32, and Musl to MuslX32. - getMultiarchTriple() completely lacks any Musl support and retains the explicit check for GNUX32 as it can only return x86_64-linux-gnux32. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D103777	2021-06-07 20:48:39 +01:00
Craig Topper	f30f8b4f12	[RISCV] Lower i8/i16 bswap/bitreverse to grevi/greviw with Zbp. Include known bits support so we know we don't need to zext the output if the input was already zero extended. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D103757	2021-06-07 10:31:51 -07:00
Craig Topper	8c6bd6c22f	[RISCV] Don't enable loop vectorizer interleaving if the V extension isn't enabled. This can cause the vectorizer to generate interleaved scalar code which might be ok for some CPUs, but definitely not all. Disable it to restore the previous scalar behavior. Differential Revision: https://reviews.llvm.org/D103787	2021-06-07 10:20:59 -07:00
Sebastian Neubauer	96e1fcb1e0	[AMDGPU] Use s_add_i32 for address additions This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction. Differential Revision: https://reviews.llvm.org/D103322	2021-06-07 16:09:48 +02:00
hsmahesha	52ffbfdffc	[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering. Before packing LDS globals into a sorted structure, make sure that their alignment is properly updated based on their size. This will make sure that the members of sorted structure are properly aligned, and hence it will further reduce the probability of unaligned LDS access. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103261	2021-06-07 18:00:41 +05:30
Bradley Smith	60c9b5f35c	[AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics Use llvm.experimental.vector.insert instead of storing into an alloca when generating code for these intrinsics. This defers the codegen of the generated vector to instruction selection, allowing existing shufflevector style optimizations to apply. Additionally, introduce a new target transform that can recognise fixed predicate patterns in the svbool variants of these intrinsics. Differential Revision: https://reviews.llvm.org/D103082	2021-06-07 12:21:38 +01:00
Simon Pilgrim	432eff22ab	[CostModel][X86] Add 512-bit bswap costs	2021-06-06 22:36:34 +01:00
Simon Pilgrim	ae973380c5	[CostModel][X86] Improve AVX512 FDIV costs Add missing v16f32/v8f64 costs and adjust other costs as well based off the SkylakeServer model	2021-06-06 21:41:05 +01:00
Craig Topper	8bde5f06a1	[RISCV] Replace && with \|\|. Spotted by coverity. We should be exiting when the shift amount is greater than the bit width regardless of whether it is a power of 2. Reported by Simon Pilgrim here https://reviews.llvm.org/D96661 This requires getting a shift amount that is out of bounds that wasn't already optimized by SelectionDAG. This would be pretty trick to construct a test for. Or it would require a non-power of 2 shift amount and a mask that has runs of ones and zeros of the next lowest power of 2 from that shift amount. I tried a little to produce a test for this, but didn't get it to work.	2021-06-06 13:09:51 -07:00
Simon Pilgrim	8ab8b3fad7	[X86][SSE] LowerFP_TO_INT - remove dead code. NFCI. Non-Strict v2f32->v2i64 cases have already early-returned to be handled by legalization.	2021-06-06 20:04:15 +01:00
Simon Pilgrim	4879c8f3b0	[X86][SSE] combineVectorTruncation - simplify PSHUFB-is-better logic. NFCI. OutSVT is guaranteed to be i8/i16 and we accept any InSVT that isn't i64	2021-06-06 20:04:14 +01:00
Simon Pilgrim	b69e16b5cc	X86MachObjectWriter.cpp - silence null deference warnings. NFCI. The MCSymbol data should always be present for non-absolute sections so assert that it is to silence static analysis warnings.	2021-06-06 15:33:47 +01:00
Nikita Popov	1ffa6499ea	[TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC) Don't require a specific kind of IRBuilder for TargetLowering hooks. This allows us to drop the IRBuilder.h include from TargetLowering.h. Differential Revision: https://reviews.llvm.org/D103759	2021-06-06 16:29:50 +02:00
Simon Pilgrim	0f938a6ed8	X86Operand.h - fix uninitialized variable warnings in constructor. NFCI.	2021-06-06 15:25:03 +01:00
Nikita Popov	9914200393	[CodeGen] Add missing includes (NFC) These currently rely on the IRBuilder.h include in TargetLowering.h. Make them explicit.	2021-06-06 15:48:27 +02:00
Simon Pilgrim	ab2d295552	BPFISelDAGToDAG.cpp - don't dereference a dyn_cast<> result. NFCI. Use cast<> instead which will assert that the cast is correct and not just return null. Fixes static analysis warnings.	2021-06-06 13:24:29 +01:00
David Green	12f53e5392	[AArch64] Remove AArch64ISD::NEG This NEG node is just a vector negation, easily represented as a SUB zero. Removing it from the one place it is generated is essentially an NFC, but can allow some extra folding. The updated tests are now loading different constant literals, which have already been negated. Differential Revision: https://reviews.llvm.org/D103703	2021-06-05 19:54:42 +01:00
Simon Pilgrim	be51737f59	Fix "not all control paths return a value" MSVC warning. NFCI.	2021-06-05 19:42:00 +01:00
Fangrui Song	06e7de795b	Fix some -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build	2021-06-04 23:34:43 -07:00
Jim Lin	170b70b74b	[RISCV] Replace (XLenVT (VLOp GPR:$vl)) with VLOpFrag This is for D100288 to reduce the changes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103682	2021-06-05 12:49:31 +08:00
Roman Lebedev	852497711d	[X86] AMD Zen 3: double the LoopMicroOpBufferSize While the IndVars issue (PR50384) has been resolved, and the compile performance improved, a new blocker emerged, the codegen machine instruction scheduling is also quadratic. So we still can't really specify the right value here. Filed PR50584.	2021-06-05 01:23:58 +03:00
Rong Xu	8d581857d7	[SampleFDO] New hierarchical discriminator for FS SampleFDO (llvm-profdata part) This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is for llvm-profdata part of change. It sets the bit masks for the profile reader in llvm-profdata. Also add an internal option "-fs-discriminator-pass" for show and merge command to process the profile offline. This patch also moved setDiscriminatorMaskedBitFrom() to SampleProfileReader::create() to simplify the interface. Differential Revision: https://reviews.llvm.org/D103550	2021-06-04 11:22:06 -07:00
Jessica Paquette	507d193ea7	[AArch64][GlobalISel] Handle multiple phis in fixupPHIOpBanks If we ended up with two phi instructions in a block, and we needed to fix up the banks for the first one, we'd end up inserting our COPY before the second phi. E.g. ``` %x = G_PHI ... %fixup = COPY ... %y = G_PHI ... ``` This is invalid MIR, and breaks assumptions made by the register allocator later down the line. With the verifier enabled, it also emits a verification error. This teaches fixupPHIOpBanks to walk past any phi instructions in the block when emitting the fixup copies. Here's an example of the crashing code (same as added testcase): https://godbolt.org/z/h5j1x3o6e Differential Revision: https://reviews.llvm.org/D103582	2021-06-04 09:59:36 -07:00
Mark Schimmel	12592a439a	Add commutable attribute to opcodes for ARC This patch sets the isCommutable attribute for several opcodes that have the "reg = OPCODE reg, reg" format. Differential Revision: https://reviews.llvm.org/D103653	2021-06-04 19:49:19 +03:00
Craig Topper	c653711fd3	[RISCV] Teach vsetvli insertion pass that operations on masks don't care about SEW/LMUL. All that really matters is that the VLMAX of the preceding instructions is the same as the VLMAX required by the mask operation. Also update the vmsge(u) handling to use the SEW/LMUL we use for other mask register operations. We were matching it to the compare before. Some cases will be improve if we fix masked compares to use tail agnostic policy. I think they ignore the tail policy anyway. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D103299	2021-06-04 09:17:46 -07:00
Bradley Smith	a85f5874e2	[AArch64] Remove SETCC of CSEL when the latter's condition can be inverted setcc (csel 0, 1, cond, X), 1, ne ==> csel 0, 1, !cond, X Where X is a condition code setting instruction. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D103256	2021-06-04 15:53:21 +01:00
Nicholas Guy	3043cbc436	[AArch64] Further enable UnrollAndJam Due to the dependency on runtime unrolling, UnJ is only enabled by default on in-order scheduling models, and if a cpu is specified through -mcpu. Differential Revision: https://reviews.llvm.org/D103604	2021-06-04 14:18:49 +01:00
Mirko Brkusanin	35ef4c940b	[AMDGPU][GlobalISel] Legalize G_ABS Legalize and select G_ABS so that we can use llvm.abs intrinsic Differential Revision: https://reviews.llvm.org/D102391	2021-06-04 14:46:43 +02:00
Dmitry Preobrazhensky	cd093cbb11	[AMDGPU][MC][NFC] Fixed typos in parser Differential Revision: https://reviews.llvm.org/D103680	2021-06-04 15:40:42 +03:00
Bradley Smith	e42ee2d509	[AArch64][SVE] Add support for using reverse forms of SVE2 shifts When using and ACLE intrinsic for an SVE2 shift, if the predicate passed has all relevant lanes active, then use a reversed version of the instruction if beneficial.	2021-06-04 12:56:53 +01:00
Tim Northover	b16ddd0375	AArch64: support atomic zext/sextloads	2021-06-04 09:45:51 +01:00
madhur13490	6a3beb1f68	[AMDGPU] [IndirectCalls] Don't propagate attributes to address taken functions and their callees Don't propagate launch bound related attributes to address taken functions and their callees. The idea is to do a traversal over the call graph starting at address taken functions and erase the attributes set by previous logic i.e. process(). This two phase approach makes sure that we don't miss out on deep nested callees from address taken functions as a function might be called directly as well as indirectly. This patch is also reattempt to D94585 as latent issues are fixed in hasAddressTaken function in the recent past. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103138	2021-06-04 11:36:56 +05:30
hsmahesha	753437fc1d	Revert "[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering." This reverts commit `d71ff907ef`.	2021-06-04 11:16:46 +05:30
hsmahesha	d71ff907ef	[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering. Before packing LDS globals into a sorted structure, make sure that their alignment is properly updated based on their size. This will make sure that the members of sorted structure are properly aligned, and hence it will further reduce the probability of unaligned LDS access. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103261	2021-06-04 09:34:37 +05:30
Craig Topper	e9313fa33a	[RISCV] Simplify some code in RISCVInsertVSETVLI by calling an existing function that does the same thing. NFCI	2021-06-03 17:31:54 -07:00
Julien Pagès	37821155c9	[AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16 In this particular example, we had a crash when compiling it for several architectures. This patch extends the legalization of extract_subvector to avoid this problem. Differential Revision: https://reviews.llvm.org/D103344	2021-06-03 16:40:18 -04:00
Nikita Popov	983565a6fe	[ADT] Move DenseMapInfo for ArrayRef/StringRef into respective headers (NFC) This is a followup to D103422. The DenseMapInfo implementations for ArrayRef and StringRef are moved into the ArrayRef.h and StringRef.h headers, which means that these two headers no longer need to be included by DenseMapInfo.h. This required adding a few additional includes, as many files were relying on various things pulled in by ArrayRef.h. Differential Revision: https://reviews.llvm.org/D103491	2021-06-03 18:34:36 +02:00
David Green	929c54379a	[ARM] Prettify gather/scatter debug comments. NFC	2021-06-03 12:33:03 +01:00
Fraser Cormack	8790e85255	[RISCV] Reserve an emergency spill slot for any RVV spills This patch addresses an issue in which fixed-length (VLS) vector RVV code could fail to reserve an emergency spill slot for their frame index elimination. This is because we were previously only reserving a spill slot when there were `scalable-vector` frame indices being used. However, fixed-length codegen uses regular-type frame indices if it needs to spill. This patch does the fairly brute-force method of checking ahead of time whether the function contains any RVV spill instructions, in which case it reserves one slot. Note that the second RVV slot is still only reserved for `scalable-vector` frame indices. This unfortunately causes quite a bit of churn in existing tests, where we chop and change stack offsets for spill slots. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103269	2021-06-03 10:44:34 +01:00
Fangrui Song	aba67ba784	[MC] Delete unneeded MCAsmParser &Parser	2021-06-02 16:10:18 -07:00
Fangrui Song	c980d93d91	[MC] Change "unexpected tokens" to "expected newline" and remove unneeded "in .xxx directive"	2021-06-02 16:08:05 -07:00
Anshil Gandhi	1c5ff0b03f	[PowerPC] [GlobalISel] Implementation of formal arguments lowering in the IRTranslator for the PPC backend Differential Revision: https://reviews.llvm.org/D99812	2021-06-02 16:46:39 -06:00
Anshil Gandhi	3e5ddb83e3	Revert "Differential Revision: https://reviews.llvm.org/D99812 " This reverts commit `c729f2a48a`.	2021-06-02 16:36:00 -06:00
Simon Pilgrim	9f5d783d46	[X86][SSE] combineScalarToVector - only reuse broadcasts for scalar_to_vector if the source operands scalar types match We were hitting an issue when the scalar_to_vector source was being implicitly truncated (in this case to i8 to vXi1) but we were also using the i8 source in a broadcast to a vXi8 value. Fixes PR50374	2021-06-02 22:05:40 +01:00
Min-Yih Hsu	344e919b1a	[CodeGen][NFC] Remove unused virtual function `TargetFrameLowering::emitCalleeSavedFrameMoves` with 4 arguments is not used anywhere in CodeGen. Thus it shouldn't be exposed as a virtual function. NFC. Differential Revision: https://reviews.llvm.org/D103328	2021-06-02 13:11:12 -07:00
Anshil Gandhi	c729f2a48a	Differential Revision: https://reviews.llvm.org/D99812	2021-06-02 14:09:52 -06:00
Rong Xu	6745ffe4fa	[SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part) This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is mainly for ProfileData part of change. It will load FS Profile when such profile is detected. For an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. For other format profiles, the users need to use an internal option (-profile-isfs) to tell the compiler that the profile uses FS discriminators. This patch also simplified the bit API used by FS discriminators. Differential Revision: https://reviews.llvm.org/D103041	2021-06-02 10:32:52 -07:00
Daniil Fukalov	0195e594fe	[TTI] NFC: Change getIntImmCodeSizeCost to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102915	2021-06-02 16:04:11 +03:00
Irina Dobrescu	e971099a9b	[AArch64] Optimise bitreverse lowering in ISel Differential Revision: https://reviews.llvm.org/D103105	2021-06-02 12:51:12 +01:00
Fraser Cormack	3b0a33d0ad	[RISCV] Expand unaligned fixed-length vector memory accesses RVV vectors must be aligned to their element types, so anything less is unaligned. For regular loads and stores, our custom-lowering of fixed-length vectors meant that we opted out of LegalizeDAG's built-in unaligned expansion. This patch adds that logic in to our custom lower function. For masked intrinsics, we declare that anything unaligned is not legal, leaving the ScalarizeMaskedMemIntrin pass to do the expansion for us. Note that neither of these methods can handle the expansion of scalable-vector memory ops, so those cases are left alone by this patch. Scalable loads and stores already go through expansion by default but hit an assertion, and scalable masked intrinsics will silently generate incorrect code. It may be prudent to return an error in both of these cases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102493	2021-06-02 09:27:44 +01:00
Craig Topper	41ff1e0e29	[RISCV] Improve register allocation for masked vwadd(u).wv, vwsub(u).wv, vfwadd.wv, and vfwsub.wv. The first source has the same EEW as the destination, but we're using earlyclobber which prevents them from ever being the same register. To workaround this, add a special TIED pseudo to use whenever the first source and merge operand are the same value. This allows us to use a single operand for the merge operand and first source which we can then tie to the destination. A tied source disables earlyclobber for that operand. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D103211	2021-06-01 18:59:00 -07:00
Stanislav Mekhanoshin	9e2e49328f	[AMDGPU] All GWS instructions need aligned VGPR on gfx90a Fixes: SWDEV-288006 Differential Revision: https://reviews.llvm.org/D103197	2021-06-01 17:08:03 -07:00
Arthur Eubanks	8961293851	[OpaquePtr] Create API to make a copy of a PointerType with some address space Some existing places use getPointerElementType() to create a copy of a pointer type with some new address space. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D103429	2021-06-01 16:52:32 -07:00
Michael Benfield	00d19c6704	[various] Remove or use variables which are unused but set. This is in preparation for the -Wunused-but-set-variable warning. Differential Revision: https://reviews.llvm.org/D102942	2021-06-01 15:38:48 -07:00
Daniel Sanders	aaac268285	[globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one It's still in use in a few places so we can't delete it yet but there's not many at this point. Differential Revision: https://reviews.llvm.org/D103352	2021-06-01 13:23:48 -07:00
Arthur Eubanks	2983053d23	[NFC][OpaquePtr] Explicitly pass GEP source type to IRBuilder in more places	2021-06-01 13:13:37 -07:00
madhur13490	3c874ce427	[AMDGPU][NFC] Remove author's name from codebase This must have made to code by accident. Differential Revision: https://reviews.llvm.org/D103484	2021-06-02 00:51:48 +05:30
Jessica Paquette	e7f501b5e7	[GlobalISel][AArch64] Combine and (lshr x, cst), mask -> ubfx x, cst, width Also add a target hook which allows us to get around custom legalization on AArch64. Differential Revision: https://reviews.llvm.org/D99283	2021-06-01 10:56:17 -07:00
Guozhi Wei	1b748faf2b	[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB This patch transforms the sequence lea (reg1, reg2), reg3 sub reg3, reg4 to two sub instructions sub reg1, reg4 sub reg2, reg4 Similar optimization can also be applied to LEA/ADD sequence. The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD). Differential Revision: https://reviews.llvm.org/D101970	2021-06-01 10:31:30 -07:00
Jonas Paulsson	9ee3f16919	[SystemZ] Return true from hasBitPreservingFPLogic(). This is currently NFC on benchmarks and tests. Review: Ulrich Weigand	2021-06-01 11:52:50 -05:00
Craig Topper	896f9bc350	[RISCV] Remove earlyclobber from vnsrl/vnsra/vnclip(u) when the source and dest are a single vector register. This guarantees they meet this overlap exception: "The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group" Being a single register guarantees the overlap is always in the lowerst-number part of the group. Reviewed By: frasercrmck, khchen Differential Revision: https://reviews.llvm.org/D103351	2021-06-01 09:17:52 -07:00
Craig Topper	5a5219a0f9	[RISCV] Remove earlyclobber from compares with LMUL<=1. Compares are considered a narrowing operation for register overlap. I believe for LMUL<=1 they meet this exception to allow overlap "The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group" Both the result and the sources will occupy a single register for LMUL<=1 so the overlap would always be in the "lowest-numbered part". Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D103336	2021-06-01 09:08:11 -07:00
Fraser Cormack	4f500c402b	[RISCV] Support vector types in combination with fastcc This patch extends the RISC-V lowering of the 'fastcc' calling convention to vector types, both fixed-length and scalable. Without this patch, any function passing or returning vector types by value would throw a compiler error. Vectors are handled in 'fastcc' much as they are in the default calling convention, the noticeable difference being the extended set of scalar GPR registers that can be used to pass vectors indirectly. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102505	2021-06-01 10:31:18 +01:00
Andy Wingo	82f92e35c6	[WebAssembly][CodeGen] IR support for WebAssembly local variables This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a pointer to memory. For the WebAssembly target, these objects are lowered to WebAssembly local variables, which are managed by the WebAssembly run-time and are not addressable by linear memory. For the WebAssembly target IR indicates that an AllocaInst should be put on TargetStackID::WasmLocal by putting it in the non-integral address space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift these allocations to SSA locals, but any alloca that reaches instruction selection (usually in non-optimized builds) will be assigned the new TargetStackID there. Loads and stores to those values are transformed to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes, which then lower to the type-specific LOCAL_GET_I32 etc instructions via tablegen patterns. Differential Revision: https://reviews.llvm.org/D101140	2021-06-01 11:31:39 +02:00
Roman Lebedev	a3b8695bf5	[X86] AMD Zen 3 has fast variable per-lane shuffles ... but lane-crossing shuffles are slow.	2021-06-01 10:46:05 +03:00
Roman Lebedev	cf9b1f7a0e	[X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature, which controls profitability of both the cross-lane and per-lane variable shuffles. I guess, this has been fine so far. But at least on AMD Zen 3, while per-line variable shuffles (e.g. `VPSHUFB`) are as fast as as shuffles with fixed/immediate mask, while lane-crossing shuffles, e.g. `VPERMPS` is performing worse. So to get the benefits of variable-mask shuffles, but not the drawbacks of lane-crossing shuffles, as suggested by @RKSimon, split the feature flag into two. Differential Revision: https://reviews.llvm.org/D103274	2021-06-01 10:39:36 +03:00
Albion Fung	db26cd30b6	[PowerPC] Improve f32 to i32 bitcast code gen The code gen for f32 to i32 bitcast is not currently the most efficient; this patch removes some unneccessary instructions gerneated. Differential revision: https://reviews.llvm.org/D100782	2021-05-31 16:00:58 -05:00
Arthur Eubanks	2c3afa3237	[OpaquePtr] Clean up some uses of Type::getPointerElementType() These depend on pointee types.	2021-05-31 09:54:57 -07:00
Arthur Eubanks	8815ce03e8	Remove "Rewrite Symbols" from codegen pipeline It breaks up the function pass manager in the codegen pipeline. With empty parameters, it looks at the -mllvm flag -rewrite-map-file. This is likely not in use. Add a check that we only have one function pass manager in the codegen pipeline. Some tests relied on the fact that we had a module pass somewhere in the codegen pipeline. addr-label.ll crashes on ARM due to this change. This is because a ARMConstantPoolConstant containing a BasicBlock to represent a blockaddress may hold an invalid pointer to a BasicBlock if the blockaddress is invalidated by its BasicBlock getting removed. In that case all referencing blockaddresses are RAUW a constant int. Making ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure that's the right fix. As a workaround, create a barrier right before ISel so that IR optimizations can't happen while a ARMConstantPoolConstant has been created. Reviewed By: rnk, MaskRay, compnerd Differential Revision: https://reviews.llvm.org/D99707	2021-05-31 08:32:36 -07:00
Fraser Cormack	2b37c405cc	[RISCV] Scale scalably-typed split argument offsets by VSCALE This patch fixes a bug in lowering scalable-vector types in RISC-V's main calling convention. When scalable-vector types are split and passed indirectly, the target is responsible for scaling the offset -- initially set to the known-minimum store size -- by the scalable factor. Before this we were issuing overlapping loads or stores to the different parts, leading to incorrect codegen. Credit to @HsiangKai for spotting this. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D103262	2021-05-31 10:43:13 +01:00
Fraser Cormack	eb23936591	[RISCV] Support vector conversions between fp and i1 This patch custom lowers FP_TO_[US]INT and [US]INT_TO_FP conversions between floating-point and boolean vectors. As the default action is scalarization, this patch both supports scalable-vector conversions and improves the code generation for fixed-length vectors. The lowering for these conversions can piggy-back on the existing lowering, which lowers the operations to a supported narrowing/widening conversion and then either an extension or truncation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103312	2021-05-31 09:55:39 +01:00
Andy Wingo	bc1ad6e3c4	Revert "[WebAssembly][CodeGen] IR support for WebAssembly local variables" This reverts commit `bf35f4af51`. There was an error in a shared-library build.	2021-05-31 10:55:15 +02:00
Andy Wingo	bf35f4af51	[WebAssembly][CodeGen] IR support for WebAssembly local variables This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a pointer to memory. For the WebAssembly target, these objects are lowered to WebAssembly local variables, which are managed by the WebAssembly run-time and are not addressable by linear memory. For the WebAssembly target IR indicates that an AllocaInst should be put on TargetStackID::WasmLocal by putting it in the non-integral address space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift these allocations to SSA locals, but any alloca that reaches instruction selection (usually in non-optimized builds) will be assigned the new TargetStackID there. Loads and stores to those values are transformed to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes, which then lower to the type-specific LOCAL_GET_I32 etc instructions via tablegen patterns. Differential Revision: https://reviews.llvm.org/D101140	2021-05-31 10:40:38 +02:00
Ben Shi	22668c6e1f	[AVR][NFC] Refactor 8-bit & 16-bit shifts Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D98335	2021-05-31 10:30:46 +08:00
David Green	2176be556b	[ARM] Guard against loop variant gather ptr operands This ensures that the operands of any gather/scatter instructions that we attempt to push out of the loop are invariant, preventing invalid IR from being generated.	2021-05-30 18:02:14 +01:00
Ben Shi	86812faa5f	[AVR] Improve inline assembly Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D96394	2021-05-30 23:44:43 +08:00
Mindong Chen	71acce68da	[NFCI] Move DEBUG_TYPE definition below #includes When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE definition defined around the #includes in files include it could result in redefinition warnings even compile errors. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D102594	2021-05-30 17:31:01 +08:00
David Green	65831422a9	[ARM] Guard against WhileLoopStart kill flags If the operand of the WhileLoopStart is flagged as killed, that currently gets propogated to both the t2CMPri as the instruction is reverted, and the newly created t2DoLoopStart. Only the second should remain as killing the operand, the first dropping the flags.	2021-05-29 21:04:26 +01:00
Jessica Clarke	00dfd4f870	Revert "[RISCV] Remove -riscv-no-aliases in favour of new -M no-aliases" The replacement doesn't work for llc, but it is needed by patchable-function-entry.ll. This reverts commit `aa9a30b83a`.	2021-05-29 15:11:37 +01:00
Jessica Clarke	aa9a30b83a	[RISCV] Remove -riscv-no-aliases in favour of new -M no-aliases Whilst here, also remove a couple of unnecessary -o - instances. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D103201	2021-05-29 14:58:28 +01:00
Ulrich Weigand	c123c178b2	[SystemZ] Set getExtendForAtomicOps to ISD::ANY_EXTEND The implementation of subword atomics does not actually guarantee the result is zero-extended, which now caused build bot failures after https://reviews.llvm.org/D101342 was landed.	2021-05-29 12:15:18 +02:00
Luke	c4c3869554	[RISCV] Enable interleaved vectorization for RVV Enable interleaved vectorization for RVV. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101469	2021-05-29 11:03:27 +08:00
Amara Emerson	018a9641ff	[AArch64][GlobalISel] Fix a crash during selection of a G_ZEXT(s8 = G_LOAD) We have special handling for a zext of a load <32b because the load does a zext for free. In that case, we just select the G_ZEXT as if it were a copy but this triggered the copy checking code to balk at the mismatched size. This was being hidden because normally these get combined into G_ZEXTLOAD but for atomics this doesn't happen. The test case here just uses a normal load because the particular atomic isn't supported yet anyway.	2021-05-28 16:35:24 -07:00
Craig Topper	bc6799f2f7	[RISCV] Add separate MxList tablegen classes for widening/narrowing and sext.zext.vf2/4/8. NFC This is cleaner than slicing the MxList to remove elements from the beginning or end since that requires hardcoding the size. I don't expect the size of the list to change, but we shouldn't repeat it in multiple places.	2021-05-28 14:06:19 -07:00
Eli Friedman	0b3b0a727a	[AArch64][RISCV] Make sure isel correctly honors failure orderings. If a cmpxchg specifies acquire or seq_cst on failure, make sure we generate code consistent with that ordering even if the success ordering is not acquire/seq_cst. At one point, it was ambiguous whether this sort of construct was valid, but the C++ standad and LLVM now accept arbitrary combinations of success/failure orderings. This doesn't address the corresponding issue in AtomicExpand. (This was reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .) Fixes https://bugs.llvm.org/show_bug.cgi?id=50512. Differential Revision: https://reviews.llvm.org/D103284	2021-05-28 12:47:40 -07:00
Craig Topper	58cb649212	[RISCV] Add octuple to LMULInfo tablegen class, remove octuple_from_str. NFCI octuple_from_str was always used with the MX field from an LMULInfo. Might as well just precompute it and put it in the class.	2021-05-28 11:53:05 -07:00
Nemanja Ivanovic	e0c8265437	Revert "Fix "enumerator 'llvm::TargetStackID::WasmLocal' in switch of enum 'llvm::TargetStackID::Value' is not handled" MSVC warnings. NFCI." Since `ca5f07f8c4` already reverted the cause for this warning, this commit now causes warnings about a default label in a switch that covers the enum. This reverts commit `cf2eeb114c`.	2021-05-28 10:53:49 -05:00
Simon Pilgrim	cf2eeb114c	Fix "enumerator 'llvm::TargetStackID::WasmLocal' in switch of enum 'llvm::TargetStackID::Value' is not handled" MSVC warnings. NFCI.	2021-05-28 12:47:22 +01:00
Andy Wingo	ca5f07f8c4	Revert "[WebAssembly][CodeGen] IR support for WebAssembly local variables" This reverts commit `00ecf18979`, as it broke the AMDGPU build. Will reland later with a fix.	2021-05-28 12:42:12 +02:00
Tim Northover	9ff2eb1ea5	SwiftTailCC: teach verifier musttail rules applicable to this CC. SwiftTailCC has a different set of requirements than the C calling convention for a tail call. The exact argument sequence doesn't have to match, but fewer ABI-affecting attributes are allowed. Also make sure the musttail diagnostic triggers if a musttail call isn't actually a tail call.	2021-05-28 11:12:00 +01:00
Tim Northover	d88f96dff3	ARM: support mandatory tail calls for tailcc & swifttailcc This adds support for callee-pop conventions to the ARM backend so that it can ensure a call marked "tail" is actually a tail call.	2021-05-28 11:10:51 +01:00
Sebastian Neubauer	690f5b7a01	[AMDGPU] Fix function calls with flat scratch When flat scratch is used, the stack pointer needs to be added when writing arguments to the stack. For buffer instructions, this is done in SelectMUBUFScratchOffen and SelectMUBUFScratchOffset. Move that to call argument lowering, like it is done in GlobalISel. Differential Revision: https://reviews.llvm.org/D103166	2021-05-28 11:22:13 +02:00
Andy Wingo	00ecf18979	[WebAssembly][CodeGen] IR support for WebAssembly local variables This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a pointer to memory. For the WebAssembly target, these objects are lowered to WebAssembly local variables, which are managed by the WebAssembly run-time and are not addressable by linear memory. For the WebAssembly target IR indicates that an AllocaInst should be put on TargetStackID::WasmLocal by putting it in the non-integral address space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift these allocations to SSA locals, but any alloca that reaches instruction selection (usually in non-optimized builds) will be assigned the new TargetStackID there. Loads and stores to those values are transformed to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes, which then lower to the type-specific LOCAL_GET_I32 etc instructions via tablegen patterns. Differential Revision: https://reviews.llvm.org/D101140	2021-05-28 11:07:41 +02:00
Amara Emerson	59a4ee9728	[AArch64][GlobalISel] Legalize oversize G_EXTRACT_VECTOR_ELT sources. Also changes the fewerElements helper to use the lookthrough constant helper instead of m_ICst, since m_ICst doesn't look through extends. Differential Revision: https://reviews.llvm.org/D103227	2021-05-27 23:52:24 -07:00
Jinsong Ji	b2581196eb	[AIX] Enable stackprotect feature AIX use `__ssp_canary_word` instead of `__stack_chk_guard`. This patch update the target hook to use correct symbol, so that the basic stackprotect feature can work. The traceback will be handled in follow up patch. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D103100	2021-05-28 02:18:15 +00:00
Craig Topper	0fa5aac292	[RISCV] Teach VSETVLI insertion to look through PHIs to prove we don't need to insert a vsetvli. If an instruction's AVL operand is a PHI node in the same block, we may be able to peek through the PHI to find vsetvli instructions that produce the AVL in other basic blocks. If we can prove those vsetvli instructions have the same VTYPE and were the last vsetvli in their respective blocks, then we don't need to insert a vsetvli for this pseudo instruction. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D103277	2021-05-27 15:34:08 -07:00
Quinn Pham	62b5df7fe2	[PowerPC] Added multiple PowerPC builtins This is the first in a series of patches to provide builtins for compatibility with the XL compiler. Most of the builtins already had intrinsics and only needed to be implemented in the front end. Intrinsics were created for the three iospace builtins, eieio, and icbt. Pseudo instructions were created for eieio and iospace_eieio to ensure that nops were inserted before the eieio instruction. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D102443	2021-05-27 16:23:03 -05:00
Craig Topper	020df692d8	[RISCV] Fix typo, use addImm instead of addReg.	2021-05-27 14:04:51 -07:00
Simon Pilgrim	90d25808c4	[CostModel][X86] Improve accuracy of sext/zext to 256-bit vector costs on AVX1 targets Determined from llvm-mca analysis (btver2 vs bdver2 vs sandybridge), the split+extends+concat sequence on AVX1 capable targets are cheaper than the #ops that the cost was previously based on.	2021-05-27 18:17:50 +01:00
Craig Topper	527cd01314	[RISCV] Teach vsetvli insertion to use vsetvl x0, x0 form when we can tell that VLMAX and AVL haven't changed. This can help avoid needing a virtual register for the vsetvl output when the AVL is X0. For other register AVLs it can shorter the live range of the AVL register if it isn't needed later. There's probably no advantage when AVL is a 5 bit immediate that can use vsetivli. But do it anyway for consistency. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D103215	2021-05-27 10:11:38 -07:00
Craig Topper	a105d3024e	[X86] Fold (shift undef, X)->0 for vector shifts by immediate. We could previously do this by accident through the later call to getTargetConstantBitsFromNode I think, but that only worked if N0 had a single use. This patch makes it explicit for undef and doesn't have a use count check. I think this is needed to move the (shl X, 1)->(add X, X) fold to isel for PR50468. We need to be sure X won't be IMPLICIT_DEF which might prevent the same vreg from being used for both operands. Differential Revision: https://reviews.llvm.org/D103192	2021-05-27 09:31:47 -07:00
Matt Arsenault	e892705d74	GlobalISel: Do not change register types in lowerLoad Adjusting the load register type is a widenScalar type action, not a lowering. lowerLoad should be reserved for operations that change the memory access size, such as unaligned load decomposition. With this trying to adjust the register type, it was hard to avoid infinite loops in the legalizer. Adds a bandaid to avoid regressing a few AArch64 tests, but I'm not sure what the exact condition is and there's probably a cleaner way to do this. For AMDGPU this regresses handling of some cases for unaligned loads, but the way this is currently working is a pretty ugly hack.	2021-05-27 11:49:37 -04:00
Matt Arsenault	5efc3bfd32	AMDGPU/GlobalISel: Use IncomingValueAssigner for implicit return This makes no real difference since we assign the same register either way.	2021-05-27 11:28:52 -04:00
Simon Pilgrim	fe8d97cbe5	[CostModel][X86] AVX512 truncation ops are slower than cost models indicate. The SkylakeServer model (and later IceLake/TigerLake targets according to Agner) have the PMOV truncations as uops=2, rthroughput=2 instructions. Noticed while trying to reduce the diffs between cost tables and llvm-mca analysis.	2021-05-27 16:07:42 +01:00
Fraser Cormack	5a80dc4988	[VP][SelectionDAG] Add a target-configurable EVL operand type This patch adds a way for the target to configure the type it uses for the explicit vector length operands of VP SDNodes. The type must be a legal integer type (there is still no target-independent legalization of this operand) and must currently be at least as big as i32, the type used by the IR intrinsics. An implicit zero-extension takes place on targets which choose a larger type. All VP nodes should be created with this type used for the EVL operand. This allows 64-bit RISC-V to avoid custom legalization of all VP nodes, keeping them in their target-independent form for that bit longer. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D103027	2021-05-27 15:27:36 +01:00
Matt Arsenault	ee35900089	AMDGPU/GlobalISel: Lower constant-32-bit zextload/sextload consistently We were accidentally leaning on code in lowerLoad which expands extending loads which should be removed.	2021-05-27 09:49:13 -04:00
Matt Arsenault	8a203ac6d2	AMDGPU/GlobalISel: Remove redundant parameter from function	2021-05-27 09:49:13 -04:00
Fraser Cormack	b7101e218c	[DAGCombine][RISCV] Don't try to trunc-store combined vector stores DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told whether it's to use vector types and also whether it's to issue a truncating store. However, the truncating store code path assumes a scalar integer `ConstantSDNode`, and when using vector types it creates either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which is a constant. The `riscv64` target is able to expose a crash here because it switches on both code paths at the same time. The `f32` is stored as `i32` which must be promoted to `i64`, necessitating a truncating store. It also decides later that it prefers a vector store of `v2f32`. While vector truncating stores are legal, this combine is not able to emit them. We also don't have a test case. This patch adds an assert to catch this case more gracefully, and updates one of the caller functions to the function to turn off the use of truncating stores when preferring vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103173	2021-05-27 14:16:32 +01:00
Fraser Cormack	8c73a31c11	[RISCV] Allow passing fixed-length vectors via the stack The vector calling convention dictates that when the vector argument registers are exhaused, GPRs are used to pass the address via the stack. When the GPRs themselves are exhausted, at best we would previously crash with an assertion, and at worst we'd generate incorrect code. This patch addresses this issue by passing fixed-length vectors via the stack with their full fixed-length size and aligned to their element type size. Since the calling convention lowering can't yet handle scalable vector types, this patch adds a fatal error to make it clear that we are lacking in this regard. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102422	2021-05-27 14:14:07 +01:00
Fraser Cormack	772b58a641	[SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs This patch extends the cases in which the legalizer is able to express VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between boolean vector types, the mask itself is an all-ones or all-ones value of the operand type, so a 0/1 boolean type behaves identically to a 0/-1 type. This greatly helps RISC-V which relies on expansion for these nodes. It also allows scalable-vector bool VSELECTs to use the default expansion, where before it would crash in SelectionDAG::UnrollVectorOp. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103147	2021-05-27 10:08:57 +01:00
Sebastian Neubauer	0bb60dbe34	[AMDGPU][GlobalISel] Allow amdgpu_gfx calling conv Calling functions from shaders already works with the SelectionDAG. Differential Revision: https://reviews.llvm.org/D103183	2021-05-27 10:41:40 +02:00
Amara Emerson	74edfb2805	[AArch64][GlobalISel] Legalize non-power-of-2 vector elements for G_STORE. The rules were already there, it just needed re-ordering so the odd case didn't bail out too early.	2021-05-26 17:01:02 -07:00
Krzysztof Parzyszek	002f5e158d	[Hexagon] Restore handling of expanding shuffles Fixed bugs, added testcases. The byte-unpack is actually recognized by the DAG combiner, but the halfword-unpack it not.	2021-05-26 18:04:15 -05:00
Fangrui Song	5852582532	[AArch64] Support llvm-mc/llvm-objdump -M no-aliases This enables the no-aliases forms of many instructions. Depends on D103004 Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D103005	2021-05-26 13:35:31 -07:00
Craig Topper	fdf10e6197	[RISCV] Use X0 as destination of inserted vsetvli when possible. We aren't going to connect the result to anything so we might as well avoid allocating a register. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D102031	2021-05-26 13:08:51 -07:00
Heejin Ahn	5dd86aadf0	[WebAssembly] Add TargetInstrInfo::getCalleeOperand DwarfDebug unconditionally assumes for all call instructions the 0th operand is the callee operand, which seems to be true for other targets, but not for WebAssembly. This adds `TargetInstrInfo::getCallOperand` method whose default implementation returns `getOperand(0)` and makes WebAssembly overrides it to use its own utility method to get the callee operand. This also fixes an existing bug in `WebAssembly::getCalleeOp`, which was uncovered by this CL. Reviewed By: dschuff, djtodoro Differential Revision: https://reviews.llvm.org/D102978	2021-05-26 11:43:59 -07:00
Stanislav Mekhanoshin	5e2facb922	[AMDGPU] Fix kernel LDS lowering for constants There is a trivial but severe bug in the recent code collecting LDS globals used by kernel. It aborts scan on the first constant without scanning further uses. That leads to LDS overallocation with multiple kernels in certain cases. Differential Revision: https://reviews.llvm.org/D103190	2021-05-26 11:34:50 -07:00
Dmitry Preobrazhensky	13c6568c6e	[AMDGPU][MC][GFX90A] Corrected DS_GWS opcodes Corrected DS_GWS opcodes to use even aligned registers. Differential Revision: https://reviews.llvm.org/D103185	2021-05-26 21:31:50 +03:00
Fangrui Song	73a1179535	[llvm-mc] Add -M to replace -riscv-no-aliases and -riscv-arch-reg-names In objdump, many targets support `-M no-aliases`. Instead of having a `-*-no-aliases` for each target when LLVM adds the support, it makes more sense to introduce objdump style `-M`. -riscv-arch-reg-names is removed. -riscv-no-aliases has too many uses and thus is retained for now. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D103004	2021-05-26 10:43:32 -07:00
Craig Topper	9065118b64	[RISCV] Optimize SEW=64 shifts by splat on RV32. SEW=64 shifts only uses the log2(64) bits of shift amount. If we're splatting a 64 bit value in 2 parts, we can avoid splatting the upper bits and just let the low bits be sign extended. They won't be read anyway. For the purposes of SelectionDAG semantics of the generic ISD opcodes, if hi was non-zero or bit 31 of the low is 1, the shift was already undefined so it should be ok to replace high with sign extend of low. In order do be able to find the split i64 value before it becomes a stack operation, I added a new ISD opcode that will be expanded to the stack spill in PreprocessISelDAG. This new node is conceptually similar to BuildPairF64, but I expanded earlier so that we could go through regular isel to get the right VLSE opcode for the LMUL. BuildPairF64 is expanded in a CustomInserter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102521	2021-05-26 10:23:32 -07:00
Craig Topper	b2c7ac874f	[RISCV] Don't propagate VL/VTYPE across inline assembly in the Insert VSETVLI pass. It's conceivable someone could put a vsetvli in inline assembly so its safer to consider them as barriers. The alternative would be to trust that the user marks VL and VTYPE registers as clobbers of the inline assembly if they do that, but hat seems error prone. I'm assuming inline assembly in vector code is going to be rare. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D103126	2021-05-26 09:56:20 -07:00
Craig Topper	1b47a3de48	[RISCV] Enable cross basic block aware vsetvli insertion This patch extends D102737 to allow VL/VTYPE changes to be taken into account before adding an explicit vsetvli. We do this by using a data flow analysis to propagate VL/VTYPE information from predecessors until we've determined a value for every value in the function. We use this information to determine if a vsetvli needs to be inserted before the first vector instruction the block. Differential Revision: https://reviews.llvm.org/D102739	2021-05-26 09:25:42 -07:00
Sebastian Neubauer	ea91a8cbab	[AMDGPU][NFC] Remove non-existing function header	2021-05-26 18:20:33 +02:00
Jonas Paulsson	d058262b14	[SystemZ] Support i128 inline asm operands. Support virtual, physical and tied i128 register operands in inline assembly. i128 is on SystemZ not really supported and is not a legal type and generally such a value will be split into two i64 parts. There are however some instructions that require a pair of two GPR64 registers contained in the GR128 bit reg class, which is untyped. For inline assmebly operands, it proved to be very cumbersome to first follow the general behavior of splitting an i128 operand into two parts and then later rebuild the INLINEASM MI to have one GR128 register. Instead, some minor common code changes were made to SelectionDAGBUilder to only create one GR128 register part to begin with. In particular: - getNumRegisters() now has an optional parameter "RegisterVT" which is passed by AddInlineAsmOperands() and GetRegistersForValue(). - The bitcasting in GetRegistersForValue is not performed if RegVT is Untyped. - The RC for a tied use in AddInlineAsmOperands() is now computed either from the tied def (virtual register), or by getMinimalPhysRegClass() (physical register). - InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register class (DstRC) can also be computed for an illegal type. In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and joinRegisterPartsIntoValue() have been implemented to handle i128 operands. Differential Revision: https://reviews.llvm.org/D100788 Review: Ulrich Weigand	2021-05-26 10:08:32 -05:00
Anirudh Prasad	1bc0e857bf	[SystemZ][z/OS] Enable the AllowAtInName attribute for the HLASM dialect - Currently, LLVM supports symbols of the name "token1@token2". - "token2" is used to identify whether an appropriate symbol reference can be used for the symbol. - Now, if the symbol reference couldn't be found, the AsmParser usually emits an error, unless the backend is configured to accept the "@" in a symbol name - Thus, this patch aims to do that. It sets the `AllowAtInName` attribute in the SystemZ backend for the HLASM dialect. - Setting this attribute ensures that, if a particular symbol reference is found, it uses that. If it doesn't, and there exists an "@" in the symbol name, it will use that instead of explicitly erroring out. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D103111	2021-05-26 10:49:57 -04:00
jweightma	fcd32d62c0	[AMDGPU] Fix function pointer argument bug in AMDGPU Propagate Attributes pass. This patch fixes a bug in the AMDGPU Propagate Attributes pass where a call instruction with a function pointer argument is identified as a user of the passed function, and illegally replaces the called function of the instruction with the function argument. For example, given functions f and g with appropriate types, the following illegal transformation could occur without this fix: call void @f(void ()* @g) --> call void @g(void ()* @g.1) The solution introduced in this patch is to prevent the cloning and substitution if the instruction's called function and the function which might be cloned do not match. Reviewed By: arsenm, madhur13490 Differential Revision: https://reviews.llvm.org/D101847	2021-05-26 16:40:15 +02:00
Anirudh Prasad	b37a2fcd8d	[SystemZ][z/OS] Validate symbol names for z/OS for printing without quotes - Currently, before printing a label in MCSymbol.cpp (MCSymbol::print), the current code "validates" the label that is to be printed. - If it fails the validation step, then it prints the label within double quotes. - However, the validation is provided as a virtual function in MCAsmInfo.h (i.e. isAcceptableChar() function). So we can override this for the AD_HLASM dialect in SystemZMCAsmInfo.cpp. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D103091	2021-05-26 10:37:09 -04:00
Luo, Yuanke	4ed2b6cccd	[X86][AMX] Fix a bug on tile config. The previous code detect if a MBB is bottom block to determine if it is a backedge of a loop. We should check latch block instead of bottom block and we should check the header and the bottom block are in the same loop. Differential Revision: https://reviews.llvm.org/D103145	2021-05-26 21:57:49 +08:00
Andrew Savonichev	8ac66d61ea	[AArch64] Generate LD1 for anyext i8 or i16 vector load The existing LD1 patterns do not cover cases where result type does not match the memory type. This happens when illegal vector types are extended and scalarized, for example: load <2 x i16>* %v2i16 is lowered into: // first element (v4i32 (insert_subvector (v2i32 (scalar_to_vector (load anyext from i16))))) // other elements (v4i32 (insert_vector_elt (i32 (load anyext from i16)) idx)) Before this patch these patterns were compiled into LDR + INS. Now they are compiled into LD1. The problem was reported in PR24820: LLVM Generates abysmal code in simple situation. Differential Revision: https://reviews.llvm.org/D102938	2021-05-26 14:44:21 +03:00
Simon Pilgrim	21aec4fdc5	[X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate. Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.	2021-05-26 11:14:21 +01:00
Mirko Brkusanin	9601849984	[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks This function can change regbank for registers which already have a selected bank. Depending on the instruction where these registers were used it can cause instruction selection to fail. Differential Revision: https://reviews.llvm.org/D98515	2021-05-26 11:57:41 +02:00
Mirko Brkusanin	7386ad4e9e	Revert "[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks" This reverts commit `18c5444702`.	2021-05-26 11:57:41 +02:00
Tim Northover	8c5ac18d71	AArch64: support post-indexed stores to bfloat types.	2021-05-26 10:35:52 +01:00
Simon Pilgrim	66978466ba	[X86][Atom] Fix vector variable shift resource/throughputs Match whats documented in the Intel AOM - the non-immediate variants of the PSLL/PSRA/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-26 10:30:59 +01:00
Roman Lebedev	8c86161a0b	[NFC][X86] clang-format X86TTIImpl::getInterleavedMemoryOpCostAVX2() I plan to make changes to it, and undoing formatting each time is not going to be fun.	2021-05-26 12:27:47 +03:00
David Green	2cf0e52b85	[ARM] Add patterns for vmulh Now that vmulh can be selected, this adds the MVE patterns to make it legal and generate instructions. Differential Revision: https://reviews.llvm.org/D88011	2021-05-26 09:22:12 +01:00
Krzysztof Parzyszek	6a2869cf1e	[Hexagon] Remove unused function from HexagonISelDAGToDAGHVX.cpp It will be reintroduced shortly with an actual use. This change is simply to eliminate a compilation warning.	2021-05-25 14:47:15 -05:00
Stanislav Mekhanoshin	3975e3277f	[AMDGPU] Fix unused variable warning. NFC.	2021-05-25 12:32:28 -07:00
Stanislav Mekhanoshin	8de4db697f	[AMDGPU] Lower kernel LDS into a sorted structure Differential Revision: https://reviews.llvm.org/D102954	2021-05-25 11:29:29 -07:00
Krzysztof Parzyszek	e7c839b192	[Hexagon] Improve argument packing in vector shuffle selection	2021-05-25 12:48:14 -05:00
Mirko Brkusanin	18c5444702	[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks This function can change regbank for registers which already have a selected bank. Depending on the instruction where these registers were used it can cause instruction selection to fail.	2021-05-25 19:34:09 +02:00
Jinsong Ji	882e4cbd74	[AIX][AsmPrinter] Print Symbol in comments for TOC load We are using TOCEntry symbols like `LC..0` in TOC loads, this is hard to read , at least requiring an additional step to figure out the loaded symbols. We should print out the name in comments. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D102949	2021-05-25 16:37:40 +00:00
Simon Pilgrim	57250f2f3c	[X86][Atom] Fix vector PSHUFB resource/throughputs Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-25 17:31:45 +01:00
Simon Pilgrim	def6269779	[CostModel][X86] Improve accuracy of 256-bit non-uniform vector shifts on AVX1 Determined from llvm-mca analysis, AVX1 capable targets have a higher throughput for VPBLENDVB and shuffle ops, making it cheaper to perform shift+shuffle/select shift patterns.	2021-05-25 17:31:45 +01:00
Jonas Paulsson	e77cb4ae63	[SystemZ] Return true from preferZeroCompareBranch(). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D103057	2021-05-25 10:24:14 -05:00
Yonghong Song	6a2ea84600	BPF: Add more relocation kinds Currently, BPF only contains three relocations: R_BPF_NONE for no relocation R_BPF_64_64 for LD_imm64 and normal 64-bit data relocation R_BPF_64_32 for call insn and normal 32-bit data relocation Also .BTF and .BTF.ext sections contain symbols in allocated program and data sections. These two sections reserved 32bit space to hold the offset relative to the symbol's section. When LLVM JIT is used, the LLVM ExecutionEngine RuntimeDyld may attempt to resolve relocations for .BTF and .BTF.ext, which we want to prevent. So we used R_BPF_NONE for such relocations. This all works fine until when we try to do linking of multiple objects. . R_BPF_64_64 handling of LD_imm64 vs. normal 64-bit data is different, so lld target->relocate() needs more context to do a correct job. . The same for R_BPF_64_32. More context is needed for lld target->relocate() to differentiate call insn vs. normal 32-bit data relocation. . Since relocations in .BTF and .BTF.ext are set to R_BPF_NONE, they will not be relocated properly when multiple .BTF/.BTF.ext sections are merged by lld. This patch intends to address this issue by adding additional relocation kinds: R_BPF_64_ABS64 for normal 64-bit data relocation R_BPF_64_ABS32 for normal 32-bit data relocation R_BPF_64_NODYLD32 for .BTF and .BTF.ext style relocations. The old R_BPF_64_{64,32} semantics: R_BPF_64_64 for LD_imm64 relocation R_BPF_64_32 for call insn relocation The existing R_BPF_64_64/R_BPF_64_32 mapping to numeric values is maintained. They are the most common use cases for bpf programs and we want to maintain backward compatibility as much as possible. ExecutionEngine RuntimeDyld BPF relocations are adjusted as well. R_BPF_64_{ABS64,ABS32} relocations will be resolved properly and other relocations will be ignored. Two tests are added for RuntimeDyld. Not handling R_BPF_64_NODYLD32 in RuntimeDyldELF.cpp will result in "Relocation type not implemented yet!" fatal error. FK_SecRel_4 usages in BPFAsmBackend.cpp and BPFELFObjectWriter.cpp are removed as they are not triggered in BPF backend. BPF backend used FK_SecRel_8 for LD_imm64 instruction operands. Differential Revision: https://reviews.llvm.org/D102712	2021-05-25 08:19:13 -07:00
Joe Nash	b67ea3d0c9	[AMDGPU] Allow no-modifier operands in cvtDPP NFC, since no instructions have their AsmMatchConverter changed, but prepares for that to happen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103046 Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa	2021-05-25 10:58:06 -04:00
Simon Pilgrim	c909ddddda	[CostModel][X86] Improve accuracy of vXi64 vector non-uniform shift costs on AVX2+ targets rG1ad4f887bd7692a9e63fb42586f0ece366f2fe01 incorrectly assumed that vXi64 non-uniform shifts were slow like vXi32 were - but llvm-mca (+Agner) both confirm that Haswell/Broadwell are full rate.	2021-05-25 15:58:23 +01:00
Joe Nash	67c3707b31	[AMDGPU] More accurate names for dpp operand types NFC. Renames the variable in the dpp input operand generators from DstRC to OldRC, because that is what it actually sets. Also documents the importance of setting HasModifiers = 0 in the dpp8 asm string. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103047 Change-Id: Ice69ae38f644de7f228a75ca47c43e88b1f7d9e1	2021-05-25 10:35:25 -04:00
Bradley Smith	f3c577ed38	[AArch64][SVE] Add fixed length codegen for FP_TO_{S,U}INT/{S,U}INT_TO_FP Depends on D102607 Differential Revision: https://reviews.llvm.org/D102777	2021-05-25 12:54:55 +01:00
Simon Pilgrim	68ef68f8ac	[CostModel][X86] Improve accuracy of vXi8/vXi16 vector non-uniform shift costs on AVX2/AVX512 targets Determined from llvm-mca analysis, AVX2+ capable targets have a higher throughput for VPBLENDVB and VPMOVZX ops, making it cheaper to perform shift+select patterns for vXi8 shifts or extend/shift/truncate for vXi16 shifts. Similarly AVX512BW can perform vXi8 as extend/shift/truncate patterns.	2021-05-25 11:35:57 +01:00
Christudasan Devadasan	e3b8e6d482	[AMDGPU] Remove dead declaration (NFC).	2021-05-25 16:04:04 +05:30
Kristina Bessonova	44843e2a04	[ARM][NEON] Combine base address updates for vld1x intrinsics Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D102855	2021-05-25 11:06:39 +02:00
David Spickett	0cd2629d97	[llvm][ARM] Remove non-existent arm1176j-s CPU This was removed in https://reviews.llvm.org/D52594 for clang. The one test using it has been updated to use the mpcore CPU as the linked clang change does. This is part of fixing https://bugs.llvm.org/show_bug.cgi?id=50454. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D103022	2021-05-25 08:56:55 +00:00
Ben Shi	bf77317049	[RISCV] Optimize xor/or with immediate in the zbs extension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102893	2021-05-25 14:14:09 +08:00
Christudasan Devadasan	90d784053f	AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100726	2021-05-25 10:51:07 +05:30
Craig Topper	b510e4cf1b	[RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks. This is a replacement for D101938 for inserting vsetvli instructions where needed. This new version changes how we track the information in such a way that we can extend it to be aware of VL/VTYPE changes in other blocks. Given how much it changes the previous patch, I've decided to abandon the previous patch and post this from scratch. For now the pass consists of a single phase that assumes the incoming state from other basic blocks is unknown. A follow up patch will extend this with a phase to collect information about how VL/VTYPE change in each block and a second phase to propagate this information to the entire function. This will be used by a third phase to do the vsetvli insertion. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102737	2021-05-24 11:47:27 -07:00
Heejin Ahn	a64ebb8637	[WebAssembly] Add NullifyDebugValueLists pass `WebAssemblyDebugValueManager` does not currently handle `DBG_VALUE_LIST`, which is a recent addition to LLVM. We tried to nullify them within the constructor of `WebAssemblyDebugValueManager` in D102589, but it made the class error-prone to use because it deletes instructions within the constructor and thus invalidates existing iterators within the BB, so the user of the class should take special care not to use invalidated iterators. This actually caused a bug in ExplicitLocals pass. Instead of trying to fix ExplicitLocals pass to make the iterator usage correct, which is possible but error-prone, this adds NullifyDebugValueLists pass that nullifies all `DBG_VALUE_LIST` instructions before we run WebAssembly specific passes in the backend. We can remove this pass after we implement handlers for `DBG_VALUE_LIST`s in `WebAssemblyDebugValueManager` and elsewhere. Fixes https://github.com/emscripten-core/emscripten/issues/14255. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D102999	2021-05-24 11:36:01 -07:00
Craig Topper	3c0735c6d8	[X86] Call insertDAGNode on trunc/zext created in tryShiftAmountMod. This puts the new nodes in the proper place in the topologically sorted list of nodes. Fixes PR50431, which was introduced recently in D101944.	2021-05-24 10:23:22 -07:00
Roman Lebedev	c666208f63	[X86][Costmodel] getMaskedMemoryOpCost(): don't scalarize non-power-of-two vectors with legal element type This follows in steps of similar `getMemoryOpCost()` changes, D100099/D100684. Intel SDM, `VPMASKMOV — Conditional SIMD Integer Packed Loads and Stores`: ``` Faults occur only due to mask-bit required memory accesses that caused the faults. Faults will not occur due to referencing any memory location if the corresponding mask bit for that memory location is 0. For example, no faults will be detected if the mask bits are all zero. ``` I.e., if mask is all-zeros, any address is fine. Masked load/store's prime use-case is e.g. tail masking the loop remainder, where for the last iteration, only first some few elements of a vector exist. So much similarly, i don't see why must we scalarize non-power-of-two vectors, iff the element type is something we can masked- store/load. We simply need to legalize it, widen the mask, and be done with it. And we even already count the cost of widening the mask. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D102990	2021-05-24 20:09:54 +03:00
luxufan	d70e9195a3	[RISCV] Optimize getVLENFactoredAmount function. If the local variable `NumOfVReg` isPowerOf2_32(NumOfVReg - 1) or isPowerOf2_32(NumOfVReg + 1), the ADDI and MUL instructions can be replaced with SLLI and ADD(or SUB) instructions. Based on original patch by StephenFan. Reviewed By: frasercrmck, StephenFan Differential Revision: https://reviews.llvm.org/D100577	2021-05-24 10:04:37 -07:00
Simon Pilgrim	dcaca7206e	[CostModel][X86] Add missing SSE41 v2iX sext/zext costs Also fix existing v4i8->v4i16 sext cost to match the equivalents	2021-05-24 15:53:43 +01:00
thomasraoux	505933a489	[NVPTX] Fix lowering of frem for negative values to match fmod frem result must have the dividend sign. Previous implementation had the wrong sign when passing negative numbers. For ex: frem(-16, 7) was returning 5 instead of -2. We should just a ftrunc instead of floor when lowering to get the right behavior. Differential Revision: https://reviews.llvm.org/D102528	2021-05-24 07:45:03 -07:00
Simon Pilgrim	1ad4f887bd	[CostModel][X86] Improve accuracy of vector non-uniform shift costs on XOP/AVX2 targets By llvm-mca analysis, Haswell/Broadwell has a non-uniform vector shift recip-throughput cost of the AVX2 targets at 2 for both 128 and 256-bit vectors - XOP capable targets have better 128-bit vector shifts so improve the fallback in those cases.	2021-05-24 14:18:21 +01:00
Bradley Smith	e40513252a	[AArch64][SVE] Add fixed length codegen for FP_ROUND/FP_EXTEND Depends on D102498 Differential Revision: https://reviews.llvm.org/D102607	2021-05-24 13:02:30 +01:00
Bradley Smith	4bc14be259	[AArch64][SVE] Improve codegen for fixed length vector concat Differential Revision: https://reviews.llvm.org/D102498	2021-05-24 12:56:02 +01:00
David Green	543406a69b	[ARM] Allow findLoopPreheader to return headers with multiple loop successors The findLoopPreheader function will currently not find a preheader if it branches to multiple different loop headers. This patch adds an option to relax that, allowing ARMLowOverheadLoops to process more loops successfully. This helps with WhileLoopStart setup instructions that can branch/fallthrough to the low overhead loop and to branch to a separate loop from the same preheader (but I don't believe it is possible for both loops to be low overhead loops). Differential Revision: https://reviews.llvm.org/D102747	2021-05-24 12:22:15 +01:00
David Green	53c42f7700	[ARM] Ensure WLS preheader blocks have branches during memcpy lowering This makes sure that the blocks created for lowering memcpy to loops end up with branches, even if they fall through to the successor. Otherwise IfCvt is getting confused with unanalyzable branches and creating invalid block layouts. The extra branches should be removed as the tail predicated loop is finalized in almost all cases.	2021-05-24 11:26:45 +01:00
David Green	6cc78b9245	[ARM] Fix inline memcpy trip count sequence The trip count for a memcpy/memset will be n/16 rounded up to the nearest integer. So (n+15)>>4. The old code was including a BIC too, to clear one of the bits, which does not seem correct. This remove the extra BIC. Note that ideally this would never actually be generated, as in the creation of a tail predicated loop we will DCE that setup code, letting the WLSTP perform the trip count calculation. So this doesn't usually come up in testing (and apparently the ARMLowOverheadLoops pass does not do any sort of validation on the tripcount). Only if the generation of the WLTP fails will it use the incorrect BIC instructions. Differential Revision: https://reviews.llvm.org/D102629	2021-05-24 11:01:58 +01:00
Fraser Cormack	7a211ed110	[RISCV] Prevent store combining from infinitely looping RVV code generation does not successfully custom-lower BUILD_VECTOR in all cases. When it resorts to default expansion it may, on occasion, be expanded to scalar stores through the stack. Unfortunately these stores may then be picked up by the post-legalization DAGCombiner which merges them again. The merged store uses a BUILD_VECTOR which is then expanded, and so on. This patch addresses the issue by overriding the `mergeStoresAfterLegalization` hook. A lack of granularity in this method (being passed the scalar type) means we opt out in almost all cases when RVV fixed-length vector support is enabled. The only exception to this rule are mask vectors, which are always either custom-lowered or are expanded to a load from a constant pool. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102913	2021-05-24 10:19:32 +01:00
Simon Pilgrim	243e588681	[CostModel][X86] Improve accuracy of vXi64 MUL costs on AVX2/AVX512 targets By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the currently used cost of 8). Similarly SkylakeServer (our only AVX512 target model) implements PMULLQ with an average cost of 1.5 (rounded up to 2.0), and the PMULUDQ-sequence (without AVX512DQ) as a cost of 6.	2021-05-24 09:48:32 +01:00
Fangrui Song	249b40b558	[AArch64] Delete unneeded fixup_aarch64_ldr_pcrel_imm19 VK_GOT special case An AArch64 VK_GOT fixup must have a symbol. MCAssembler::evaluateFixup considers such a fixup not resolved. The code path cannot trigger.	2021-05-23 15:20:56 -07:00
Fangrui Song	fc82507c89	[AArch64][MC] Remove unneeded "in .xxx directive" from diagnostics The prevailing style does not add the message. The directive name is not useful because the next line replicates the error line which includes the directive.	2021-05-23 13:58:16 -07:00
Joerg Sonnenberger	30c413cda0	[SPARC] recognize the "rd %pc, reg" special form Differential Revision: https://reviews.llvm.org/D96312	2021-05-23 22:52:59 +02:00
Fangrui Song	ff8be66c02	[AArch64] Use \t in AsmStreamer to match the prevailing style	2021-05-23 11:35:42 -07:00
Simon Pilgrim	e4ec5cc8eb	[CostModel][X86] Align v2i64 MUL costs on SSE42+ targets with worst case Based on worst case of sandybridge (which seems to match nehalem for this SSE sequence) (vs btver2 + bdver2) llvm-mca analysis	2021-05-23 16:20:57 +01:00
David Green	edc2dca405	[ARM] Add extra debug messages for gather/scatter lowering. NFC	2021-05-23 08:52:13 +01:00
Simon Pilgrim	fc01b9bdf8	[CostModel][X86] Align v4i64 MUL costs on AVX1 targets with worst case Based on worst case of sandybridge (vs btver2 + bdver2) llvm-mca analysis - which is a lot less than what we were predicting (I think based off total uop count).	2021-05-22 20:07:55 +01:00
Simon Pilgrim	6f9ac11e39	[CostModel][X86] Pull out X86/X64 scalar int arithmetric costs from SSE tables. NFCI. These aren't dependent on any SSE level (and don't tend to get quicker either).	2021-05-22 16:13:49 +01:00
Simon Pilgrim	7a898477bb	[CostModel][X86] vXi8 MUL is always promoted to vXi16	2021-05-22 11:56:49 +01:00
Simon Pilgrim	9bd0dc83b5	[CostModel][X86] Improve v8i32 MUL costs on AVX1 targets to account for slower btver2 BTVER2 has a 2 cycle throughput for v4i32 multiplies (same as SSE41 targets), which is only partially hidden by the subvector extracts/insert when splitting v8i32.	2021-05-22 11:13:07 +01:00
Roman Lebedev	8ed0864fd7	Reland [X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost() Now that getMemoryOpCost() correctly handles all the vector variants, we should no longer hand-roll our own version of it, but use it directly. The AVX512 variant probably needs a similar change, but there it is less obvious. This was initially landed in `69ed93a435`, but was reverted in `6b95fd199d` because the patch it depends on was reverted.	2021-05-22 11:47:08 +03:00
Roman Lebedev	05a4e4a89c	Reland [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again Instead of handling power-of-two sized vector chunks, try handling the large vector in a stream mode, decreasing the operational vector size once it no longer works for the elements left to process. Notably, this improves costs for overaligned loads - loading padding is fine. This more directly tracks when we need to insert/extract the YMM/XMM subvector, some costs fluctuate because of that. This was initially landed in `c02476f315`, but reverted in `5fddc3312b`, because the code made some very optimistic assumptions about invariants that didn't hold in practice. Reviewed By: RKSimon, ABataev Differential Revision: https://reviews.llvm.org/D100684	2021-05-22 11:46:32 +03:00
Nick Desaulniers	033138ea45	[IR] make stack-protector-guard-* flags into module attrs D88631 added initial support for: - -mstack-protector-guard= - -mstack-protector-guard-reg= - -mstack-protector-guard-offset= flags, and D100919 extended these to AArch64. Unfortunately, these flags aren't retained for LTO. Make them module attributes rather than TargetOptions. Link: https://github.com/ClangBuiltLinux/linux/issues/1378 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D102742	2021-05-21 15:53:30 -07:00
Saleem Abdulrasool	6c6b3e3afe	RISCV: add a few deprecated aliases for CSRs This adds the {s,u,m}badaddr CSR aliases as well as the sptbr alias. These are for compatibility with binutils. Furthermore, these are used by the RISC-V Proxy Kernel and are required to enable building the Proxy Kernel with the LLVM IAS. The aliases here are deprecated. These are being introduced in order to provide a compatibility story for the existing GNU toolchain, which still supports the deprecated spelling in the assembler. However, in order to encourage the migration of existing coding, we provide warnings indicating that the aliased CSRs are deprecated and should be replaced. Differential Revision: https://reviews.llvm.org/D101919 Reviewed By: Craig Topper	2021-05-21 13:52:58 -07:00
Simon Pilgrim	fe6c11c571	[CostModel][X86] Improve f64/v2f64/v4f64 FMUL costs on AVX1 targets to account for slower btver2 BTVER2 has a weaker f64 multiplier that other AVX1-era targets, so we need to bump the worst case cost slightly - llvm-mca reports the new vectorization in simplebb is beneficial on btver2, bdver2 and sandybridge AVX1 targets	2021-05-21 18:12:13 +01:00
Benjamin Kramer	ea438b4898	[X86] Inline variable to avoid unused warning in Release builds. NFCI.	2021-05-21 18:28:46 +02:00
Simon Pilgrim	2fca555866	[CostModel][X86] Improve fneg costs These are always lowered as xor ops, so are always cheap	2021-05-21 17:23:45 +01:00
Florian Hahn	c2d44bd230	[X86] Lower calls with clang.arc.attachedcall bundle This patch adds support for lowering function calls with the `clang.arc.attachedcall` bundle. The goal is to expand such calls to the following sequence of instructions: callq @fn movq %rax, %rdi callq _objc_retainAutoreleasedReturnValue / _objc_unsafeClaimAutoreleasedReturnValue This sequence of instructions triggers Objective-C runtime optimizations, hence we want to ensure no instructions get moved in between them. This patch achieves that by adding a new CALL_RVMARKER ISD node, which gets turned into the CALL64_RVMARKER pseudo, which eventually gets expanded into the sequence mentioned above. The ObjC runtime function to call is determined by the argument in the bundle, which is passed through as a target constant to the pseudo. @ahatanak is working on using this attribute in the front- & middle-end. Together with the front- & middle-end changes, this should address PR31925 for X86. This is the X86 version of `46bc40e502`, which added similar support for AArch64. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D94597	2021-05-21 16:33:58 +01:00
Jim Lin	4456805938	[X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z) Check if it has no signed zeros flag (nsz) in getNegatedExpression for x86. This patch fixed miscompilation: https://alive2.llvm.org/ce/z/XxwBAJ Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D90901	2021-05-21 23:02:19 +08:00
Daniil Fukalov	e1cb98be2d	[TTI] NFC: Change getCostOfKeepingLiveOverCall to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102831	2021-05-21 15:18:12 +03:00
Simon Pilgrim	3ae7f7ae0a	[CostModel][X86] Tweak fptoui v4f32->v4i32 + v8f32->v8i32 SSE/AVX costs Adjust for worst case for atom/slm (SSE), btver2/sandybridge (AVX1) and haswell/znver* (AVX2)	2021-05-21 12:09:31 +01:00
Simon Pilgrim	4865ed3020	[CostModel][X86] Match SSE41 legalized conversion costs as well as SSE2	2021-05-21 11:42:22 +01:00
Simon Pilgrim	eb6429d0fb	[CostModel][X86] Add uitpfp v4f32->v4i32 + v8f32->v8i32 SSE/AVX costs These were using (default) scalarized values.	2021-05-21 11:30:15 +01:00
Luke Benes	e2815398ce	Fix warning: comparison of integer expressions of different signedness. NFC This patch resolves the Wsign-compare warning that I observed on armv7l and x86 with both gcc and clang. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D102792	2021-05-21 18:23:27 +08:00
David Green	e7a6df68a6	[ARM] Fix the operand used for WLS in ARMLowOverheadLoops The Loop start instruction handled by the ARMLowOverheadLoops are: $lr = t2DoLoopStart $r0 $lr = t2DoLoopStartTP $r1, $r0 $lr = t2WhileLoopStartLR $r0, %bb, implicit-def dead $cpsr All three of these will have LR as the 0 argument, the trip count as the 1 argument. This patch updated a few places in ARMLowOverheadLoops where the 0th arg was being used for t2WhileLoopStartLR instructions as the trip count. One place was entirely removed as it does not seem valid any more, the case the code is trying to protect against should not be able to occur with our correct-by-construction low overhead loops. Differential Revision: https://reviews.llvm.org/D102620	2021-05-21 09:29:30 +01:00
Stanislav Mekhanoshin	4902885863	[AMDGPU] Request module used variables from LDS lowering as internal I do not see any practical difference but technically used.* variables are internal and a call to getGlobalVariable misses true as a second argument. NFC as far as I can tell. Differential Revision: https://reviews.llvm.org/D102884	2021-05-20 20:55:47 -07:00
Stanislav Mekhanoshin	748db5bfac	[AMDGPU] Fix module LDS selection Accesses to global module LDS variable start from null, but kernel also thinks its variables start address is null. Fixed by not using a null as an address. Differential Revision: https://reviews.llvm.org/D102882	2021-05-20 15:59:01 -07:00
Min-Yih Hsu	dccf5c7dfb	[M68k] Support for inline asm operands w/ simple constraints This patch adds supports for inline assembly operands and some simple operand constraints, including register and constant operands. Differential Revision: https://reviews.llvm.org/D102585	2021-05-20 14:00:09 -07:00
Min-Yih Hsu	e620bea211	[M68k] Allow user to preserve certain registers Add `-ffixed-a[0-6]` and `-ffixed-d[0-7]` and the corresponding subtarget features to prevent certain register from being allocated. Differential Revision: https://reviews.llvm.org/D102805	2021-05-20 13:57:22 -07:00
Jessica Clarke	e10958c807	[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics Unlike normal loads these don't have an extension field, but we know from TargetLowering whether these are sign-extending or zero-extending, and so can optimise away unnecessary extensions. This was noticed on RISC-V, where sign extensions in the calling convention would result in unnecessary explicit extension instructions, but this also fixes some Mips inefficiencies. PowerPC sees churn in the tests as all the zero extensions are only for promoting 32-bit to 64-bit, but these zero extensions are still not optimised away as they should be, likely due to i32 being a legal type. This also simplifies the WebAssembly code somewhat, which currently works around the lack of target-independent combines with some ugly patterns that break once they're optimised away. Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits, where zero-extending atomics were incorrectly returning 0 rather than the (slightly confusing) required return value of 1. Re-landed again after D102819 fixed PowerPC to correctly zero-extend all of its atomics as it claimed to do, since the combination of that bug and this optimisation caused buildbot regressions. Reviewed By: RKSimon, atanasyan Differential Revision: https://reviews.llvm.org/D101342	2021-05-20 20:34:23 +01:00
Fraser Cormack	c74ab891fc	[RISCV] Ensure small mask BUILD_VECTORs aren't expanded The default expansion for BUILD_VECTORs -- save for going through shuffles -- is to go through the stack. This method only works when the type is at least byte-sized, so for v2i1 and v4i1 we would crash. This patch ensures that small mask-type BUILD_VECTORs are always handled without crashing. We lower to a SETCC of the equivalent i8 type. This also exposes some pre-existing issues where the lowering when optimizing for size results in larger code than without. Those will be tackled in future patches. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102767	2021-05-20 19:12:29 +01:00
Simon Pilgrim	a26288e803	[X86][Atom] Fix vector fadd/fcmp/fmul resource/throughputs Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port. Discovered while investigating the correct fptoui costs to fix the regressions in D101555. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-20 18:56:58 +01:00
Stefan Pintilie	45ad207e45	[PowerPC] Add fix to partword atomic operations Partword atomic binaries are not zero extended as they should be. This patch fixes them to ensure that they are zero extended. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D102819	2021-05-20 12:36:37 -05:00
Fraser Cormack	26bd2250c1	[RISCV] Ensure shuffle splat operands are type-legal The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a type-legal splat value as it may implicitly extract a vector element from another shuffle. It is not permitted to introduce an illegal type when lowering shuffles. This patch addresses the crash by adding a boolean flag to `getSplatValue`, defaulting to false, which when set will ensure a type-legal return value. If it is unable to do that it will fail to return a splat value. I've been through the existing uses of `getSplatValue` in other targets and was unable to find a need or test cases showing a need to update their uses. In some cases, the call is made during `LegalizeVectorOps` which may still produce illegal scalar types. In other situations, the illegally-typed splat value may be quickly patched up to a legal type (such as any-extending the returned `extract_vector_elt` up to a legal type) before `LegalizeDAG` notices. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102687	2021-05-20 18:00:03 +01:00
Wouter van Oortmerssen	3a293cbf13	[WebAssembly] Fix PIC/GOT codegen for wasm64 __table_base is know 64-bit, since in LLVM it represents a function pointer offset __table_base32 is a copy in wasm32 for use in elem init expr, since no truncation may be used there. New reloc R_WASM_TABLE_INDEX_REL_SLEB64 added Differential Revision: https://reviews.llvm.org/D101784	2021-05-20 09:59:31 -07:00
Peter Waller	2d574a1104	[CodeGen][AArch64][SVE] Canonicalize intrinsic rdffr{ => _z} Follow up to D101357 / `3fa6510f6`. Supersedes D102330. Goal: Use flags setting rdffrs instead of rdffr + ptest. Problem: RDFFR_P doesn't have have a flags setting equivalent. Solution: in instcombine, canonicalize to RDFFR_PP at the IR level, and rely on RDFFR_PP+PTEST => RDFFRS_PP optimization in AArch64InstrInfo::optimizePTestInstr. While here: * Test that rdffr.z+ptest generates a rdffrs. * Use update_{test,llc}_checks.py on the tests. * Use sve attribute on functions. Differential Revision: https://reviews.llvm.org/D102623	2021-05-20 16:22:50 +00:00
Daniel Kiss	801ab71032	[ARM][AArch64] SLSHardening: make non-comdat thunks possible Linker scripts might not handle COMDAT sections. SLSHardeing adds new section for each __llvm_slsblr_thunk_xN. This new option allows the generation of the thunks into the normal text section to handle these exceptional cases. ,comdat or ,noncomdat can be added to harden-sls to control the codegen. -mharden-sls=[all\|retbr\|blr],nocomdat. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D100546	2021-05-20 17:07:05 +02:00
Joerg Sonnenberger	80836ee519	[SPARCv9] allow stw as alias for st Strictly speaking, the architecture manual no longer uses the st mnemonic, but that's a much more intrusive change for little gain. Differential Revision: https://reviews.llvm.org/D96313	2021-05-20 15:27:36 +02:00
Simon Pilgrim	62fca69a70	[CostModel][X86][AVX2] Improve 256-bit vector non-uniform shifts costs Haswell, Excavator and early Ryzen all have slower 256-bit non-uniform vector shifts (confirmed on AMDSoG/Agner/instlatx64 and llvm models) - so bump the worst case costs accordingly. Noticed while investigating PR50364	2021-05-20 12:16:16 +01:00
David Truby	bf3b6cf920	[llvm][sve] Lowering for VLS MLOAD/MSTORE This adds custom lowering for the MLOAD and MSTORE ISD nodes when passed fixed length vectors in SVE. This is done by converting the vectors to VLA vectors and using the VLA code generation. Fixed length extending loads and truncating stores currently produce correct code, but do not use the built in extend/truncate in the load and store instructions. This will be fixed in a future patch. Differential Revision: https://reviews.llvm.org/D101834	2021-05-20 10:50:59 +00:00
Luke	1595994b28	[RISCV] Add legality check for vectorizing reduction Check if it is legal to vectorize reduction. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99509	2021-05-20 17:45:46 +08:00
Heejin Ahn	412a3381f7	[WebAssembly] Ignore filters in Emscripten EH landingpads We have been handling filters and landingpads incorrectly all along. We pass clauses' (catches') types to `__cxa_find_matching_catch` in JS glue code, which returns the thrown pointer and sets the selector using `setTempRet0()`. We apparently have been doing the same for filters' (exception specs') types; we pass them to `__cxa_find_matching_catch` just the same way as clauses. And `__cxa_find_matching_catch` treats all given types as clauses. So it is a little surprising; maybe we intended to do something from the JS side and didn't end up doing? So anyway, I don't think supporting exception specs in Emscripten EH is a priority, but this can actually cause incorrect results for normal catches when functions are inlined and the inlined spec type has a parent-child relationship with the catch's type. --- The below is an example of a bug that can happen when inlining and class hierarchy is mixed. If you are busy you can skip this part: ``` struct A {}; struct B : A {}; void bar() throw (B) { throw B(); } void foo() { try { bar(); } catch (A &) { fputs ("Expected result\n", stdout); } } ``` In the unoptimized code, `bar`'s landingpad will have a filter for `B` and `foo`'s landingpad will have a clause for `A`. But when `bar` is inlined into `foo`, `foo`'s landingpad has both a filter for `B` and a clause for `A`, and it passes the both types to `__cxa_find_matching_catch`: ``` __cxa_find_matching_catch(typeinfo for B, typeinfo for A) ``` `__cxa_find_matching_catch` thinks both are clauses, and looks at the first type `B`, which belongs to a filter. And the thrown type is `B`, so it thinks the first type `B` is caught. But this makes it return an incorrect selector, because it is supposed to catch the exception using the second type `A`, which is a parent of `B`. As a result, the `foo` in the example program above does not print "Expected result" but just throws the exception to the caller. (This wouldn't have happened if `A` and `B` are completely disjoint types, such as `float` and `int`) Fixes https://bugs.llvm.org/show_bug.cgi?id=50357. Reviewed By: dschuff, kripken Differential Revision: https://reviews.llvm.org/D102795	2021-05-20 01:28:16 -07:00
Caroline Concatto	9199b6535d	[CostModel][AArch64] Add missing costs for getShuffleCost with scalable vectors Differential Revision: https://reviews.llvm.org/D102490	2021-05-20 09:08:31 +01:00
Andrew Savonichev	a647100b43	[AArch64] Combine vector shift instructions in SelectionDAG bswap.v2i16 + sitofp in LLVM IR generate a sequence of: - REV32 + USHR for bswap.v2i16 - SHL + SSHR + SCVTF for sext to v2i32 and scvt The shift instructions are excessive as noted in PR24820, and they can be optimized to just SSHR. Differential Revision: https://reviews.llvm.org/D102333	2021-05-20 10:50:13 +03:00
Ryan Prichard	65d0264ba2	[MC][ARM] Reject Thumb "ror rX, #0" The ROR instruction can only handle immediates between 1 and 31. The would-be encoding for ROR #0 is actually the RRX instruction. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D102455	2021-05-19 15:05:39 -07:00
Sanjay Patel	f12f9beb04	[x86] propagate FMF from x86-specific intrinsic nodes to others during combining This is another FMF gap exposed by D90901, but I don't see a way to show the difference in a regression test as with: `f66ba4c` `6025663` We will see an asm difference if we add a test as part of D90901.	2021-05-19 14:25:09 -04:00
Sanjay Patel	f66ba4cfa7	[x86] propagate FMF from x86-specific intrinsic nodes to others during lowering This is another fast-math-flags failure exposed by D90901.	2021-05-19 13:11:15 -04:00
Jessica Paquette	84ae1cf8ed	Recommit "[GlobalISel] Simplify G_ICMP to true/false when the result is known" Add missing REQUIRES line to prelegalizer-combiner-icmp-to-true-false-known-bits.	2021-05-19 09:29:19 -07:00
Anirudh Prasad	f076da66b9	[AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1 - This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html \| main RFC here ]]) - This patch in particular introduces HLASM Parser support for Z machine instructions. - The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate. - The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well. The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf \| HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format): ``` <TokA><spaces.><TokB><spaces.><TokC><spaces.*><TokD> ``` 1. TokA is referred to as the Name Entry. This token is optional 2. TokB is referred to as the Operation Entry. This token is mandatory. 3. TokC is referred to as the Operand Entry. This token is mandatory 4. TokD is referred to as the Remarks Entry. This token is optional - If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on. - If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field - TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error. - TokD if provided is taken as is, and emitted as a comment. The following additional approach was examined, but not taken: - Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage. Testing: - This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on. - Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated. Reviewed By: Kai, uweigand Differential Revision: https://reviews.llvm.org/D98276	2021-05-19 11:05:30 -04:00
Wang, Pengfei	9d09d20448	Reapply "[X86] Limit X86InterleavedAccessGroup to handle the same type case only" The current implementation assumes the destination type of shuffle is the same as the decomposed ones. Add the check to avoid crush when the condition is not satisfied. This fixes PR37616. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D102751	2021-05-19 22:27:16 +08:00
Simon Pilgrim	707fc2e2f2	Revert rG528bc10e95d5f9d6a338f9bab5e91d7265d1cf05 : "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB" Reports on D101970 indicate this is causing failures on multi-stage compiles.	2021-05-19 15:01:20 +01:00
Simon Pilgrim	ab4e04a0f3	[X86][AVX] createVariablePermute - generalize the PR50356 fix for smaller indices vector as well Generalize the fix from rGd0902a8665b1 by ensuring we widen/narrow the indices subvector first and then perform the ZERO_EXTEND_VECTOR_INREG (if necessary), which should allow us to perform the variable permutes with source/destination/indices vectors of any widths.	2021-05-19 14:39:41 +01:00
Simon Pilgrim	b14f9a1ebd	[X86][Atom] Fix vector integer shift by immediate resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-19 14:39:40 +01:00
Nico Weber	52a7797626	Revert "[GlobalISel] Simplify G_ICMP to true/false when the result is known" This reverts commit `892497c806`. Breaks tests, see comments on https://reviews.llvm.org/D102542	2021-05-19 09:02:27 -04:00
Wang, Pengfei	66513e2f20	Revert "[X86] Limit X86InterleavedAccessGroup to handle the same type case only" This reverts commit `ca23a38e37`. Revert due to EXPENSIVE_CHECKS fail.	2021-05-19 20:35:45 +08:00
Kristina Bessonova	d59a2a32b9	[ARM][NEON] Combine base address updates for vst1x intrinsics Differential Revision: https://reviews.llvm.org/D102256	2021-05-19 14:05:55 +02:00
Simon Pilgrim	222314d8b0	[X86] Atom (pre-SLM) doesn't support PTEST instructions	2021-05-19 12:25:29 +01:00
Simon Pilgrim	8c717920d8	[X86] Remove copy + paste typos in AtomWriteResPair comment. Remnants from when the Atom model was copied from the Btver2 model.....	2021-05-19 12:25:28 +01:00
Wang, Pengfei	ca23a38e37	[X86] Limit X86InterleavedAccessGroup to handle the same type case only The current implementation assumes the destination type of shuffle is the same as the decomposed ones. Add the check to avoid crush when the condition is not satisfied. This fixes PR37616. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D102751	2021-05-19 18:39:08 +08:00
Tim Northover	c1dc267258	MachineBasicBlock: add liveout iterator aware of which liveins are defined by the runtime. Using this in RegAlloc fast reduces register pressure, and in some cases allows x86 code to compile that wouldn't before.	2021-05-19 11:00:24 +01:00
Fraser Cormack	ca2c245ba4	[RISCV] Support INSERT_VECTOR_ELT into i1 vectors Like the element extraction of these vectors, we choose to promote up to an i8 vector type and perform the insertion there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102697	2021-05-19 09:41:50 +01:00
Guozhi Wei	528bc10e95	[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB This patch transforms the sequence lea (reg1, reg2), reg3 sub reg3, reg4 to two sub instructions sub reg1, reg4 sub reg2, reg4 Similar optimization can also be applied to LEA/ADD sequence. The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD). Differential Revision: https://reviews.llvm.org/D101970	2021-05-18 18:02:36 -07:00
Fabian Sommer	5f2b276667	Default stack alignment of x86 NaCl to 16 bytes X86 NaCl generally requires the stack to be aligned to 16 bytes. This change was already implemented in two downstream NaCl compilers based on llvm. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D102610	2021-05-18 15:16:59 -07:00
Neumann Hon	ec4706be8e	[SystemZ] [z/OS] Add XPLINK64 Calling Convention to SystemZ This patch adds the XPLINK64 calling convention to the SystemZ backend. It specifies and implements the argument passing and return value conventions. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D101010	2021-05-18 16:52:47 -04:00
Simon Pilgrim	d0902a8665	[X86][AVX] createVariablePermute - correctly extend same-sized-vector indices (PR50356) D101838 incorrectly handled indices vectors of the same size but with higher element counts to just bitcast to the target indices type instead of performing a ZERO_EXTEND_VECTOR_INREG	2021-05-18 20:30:46 +01:00
Simon Pilgrim	99c0f16ea4	[X86] Use Skylake Server model for x86-64-v4 so we have full instruction coverage The x86-64-v4 generic cpu arch supports AVX512BW/DQ/CD/VLX which isn't covered by the Haswell model, use the SkylakeServer model instead which is a lot closer to what the arch represents. Differential Revision: https://reviews.llvm.org/D102553	2021-05-18 18:06:40 +01:00
Jessica Paquette	58c57e1b5f	[AArch64][GlobalISel] Prefer mov for s32->s64 G_ZEXT We can use an ORRWrs (mov) + SUBREG_TO_REG rather than a UBFX for G_ZEXT on s32->s64. This closer matches what SDAG does, and is likely more power efficient etc. (Also fixed up arm64-rev.ll which had a fallback check line which was entirely useless.) Simple example: https://godbolt.org/z/h1jKKdx5c Differential Revision: https://reviews.llvm.org/D102656	2021-05-18 10:00:00 -07:00
Roman Lebedev	75ea0abaae	[X86] AMD Zen 3: fix MULX modelling - don't forget about WriteIMulH (PR50387) Otherwise lack thereof will be caught by a defensive check during scheduling, and we'll crash. I've literally never seen this syntax before..	2021-05-18 19:58:04 +03:00
Jessica Paquette	892497c806	[GlobalISel] Simplify G_ICMP to true/false when the result is known Use existing KnownBits helpers from KnownBits.h to simplify G_ICMPs. E.g. x == x -> true x != x -> false load(x) > 1 -> true (when the load is known to be greater than 1) And so on. Differential Revision: https://reviews.llvm.org/D102542	2021-05-18 09:26:41 -07:00
Tim Northover	ba1509da7b	Recommit X86: support Swift Async context This adds support to the X86 backend for the newly committed swiftasync function parameter. If such a (pointer) parameter is present it gets stored into an augmented frame record (populated in IR, but generally containing enhanced backtrace for coroutines using lots of tail calls back and forth). The context frame is identical to AArch64 (primarily so that unwinders etc don't get extra complexity). Specfically, the new frame record is [AsyncCtx, %rbp, ReturnAddr], and its presence is signalled by bit 60 of the stored %rbp being set to 1. %rbp still points to the frame pointer in memory for backwards compatibility (only partial on x86, but OTOH the weird AsyncCtx before the rest of the record is because of x86). Recommited with a fix for unwind info when i386 pc-rel thunks are adjacent to a prologue.	2021-05-18 15:19:05 +01:00
Roman Lebedev	3cc3960766	[X86] AMD Zen 3: cap LoopMicroOpBufferSize to workaround PR50384 (quadratic IndVars runtime) While i would like to keep the right value here, i would also like to be able to actually compile e.g. vanilla test-suite. 256 is a pretty random guess, it should be pretty good enough for serious loops, but small enough to result in tolerant compile times for certain edge cases. https://bugs.llvm.org/show_bug.cgi?id=50384	2021-05-18 15:56:57 +03:00
Simon Pilgrim	560b709abe	[X86][AVX] Cleanup AVX2 vector integer truncation costs Noticed while investigating PR50364, the truncation costs for v4i64->v4i16/v4i8 and v8i32->v8i8 were way too optimistic for a shuffle sequence that usually matches the AVX1 codegen (they matched AVX512 numbers which have actual truncation instructions!).	2021-05-18 13:07:29 +01:00
Jay Foad	092a3ce569	[AMDGPU] Fix typo in comment	2021-05-18 10:15:49 +01:00
Neal (nealsid)	e89b60fcfc	Update MSVC version number in preprocessor check Passing template parameter packs to std::map doesn't work in VS 2017/2019, so this updates the preprocessor version check to use an alternate version in VS2019, as well. Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D102260	2021-05-18 10:04:39 +01:00
Fraser Cormack	175bdf127d	[RISCV] Fix operand order in fixed-length VM(OR\|AND)NOT patterns Where the RVV specification writes `vs2, vs1`, our TableGen patterns use `rs1, rs2`. These differences can easily cause confusion. The VMANDNOT instruction performs `LHS && !RHS`, and similarly for VMORNOT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102606	2021-05-18 09:21:25 +01:00
Chen Zheng	15d4ed6d8c	[PowerPC] only check the load instruction result number 0. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D102596	2021-05-18 00:49:37 -04:00
Stanislav Mekhanoshin	45764efb69	[AMDGPU] Do not check denorm for LDS FP atomic with unsafe flag This is already how it is handled for global and flat atomics. Differential Revision: https://reviews.llvm.org/D102366	2021-05-17 16:53:09 -07:00
Eli Friedman	3dd49ec194	[AArch64][SVE] Implement extractelement of i1 vectors. The implementation just extends the vector to a larger element type, and extracts from that. Not fancy, but generates reasonable code. There was discussion in the review of doing the promotion in target-independent code, but I'm sticking with this to avoid making LegalizeDAG infrastructure more complicated. Differential Revision: https://reviews.llvm.org/D87651	2021-05-17 14:51:11 -07:00
Heejin Ahn	6e1c1dac4c	[WebAssembly] Nullify DBG_VALUE_LISTs in DebugValueManager WebAssemblyDebugValueManager class currently does not handle DBG_VALUE_LIST instructions correctly for two reasons, which are explained in https://bugs.llvm.org/show_bug.cgi?id=50361. This effectively nullifies DBG_VALUE_LISTs in WebAssemblyDebugValueManager so that the info will appear as "optimized out" in debuggers but still be at least correct in the meantime. Reviewed By: dschuff, jmorse Differential Revision: https://reviews.llvm.org/D102589	2021-05-17 13:47:36 -07:00
Mitch Phillips	6791a6b309	Revert "X86: support Swift Async context" This reverts commit `747e5cfb9f`. Reason: New frame layout broke the sanitizer unwinder. Not clear why, but seems like some of the changes aren't always guarded by Swyft checks. See https://reviews.llvm.org/rG747e5cfb9f5d944b47fe014925b0d5dc2fda74d7 for more information.	2021-05-17 12:44:57 -07:00
Nick Desaulniers	0f41778919	[AArch64] Support customizing stack protector guard Follow up to D88631 but for aarch64; the Linux kernel uses the command line flags: 1. -mstack-protector-guard=sysreg 2. -mstack-protector-guard-reg=sp_el0 3. -mstack-protector-guard-offset=0 to use the system register sp_el0 for the stack canary, enabling the kernel to have a unique stack canary per task (like a thread, but not limited to userspace as the kernel can preempt itself). Address pr/47341 for aarch64. Fixes: https://github.com/ClangBuiltLinux/linux/issues/289 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen Differential Revision: https://reviews.llvm.org/D100919	2021-05-17 11:49:22 -07:00
Steffen Larsen	f226e28a88	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions for `sm_80` architecture or newer. PTX ISA description of `redux.sync`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Differential Revision: https://reviews.llvm.org/D100124	2021-05-17 09:46:59 -07:00
Stuart Adams	02c2468864	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for `sm_80` architecture or newer. PTX ISA description of `cp.async`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive Authored-by: Stuart Adams <stuart.adams@codeplay.com> Co-Authored-by: Alexander Johnston <alexander@codeplay.com> Differential Revision: https://reviews.llvm.org/D100394	2021-05-17 09:46:59 -07:00
Stanislav Mekhanoshin	f4c0fdc6c9	[AMDGPU] Set unused dst_sel to '?' in the encoding This is to allow disasm with any bits in the unused fields. Differential Revision: https://reviews.llvm.org/D102526	2021-05-17 08:38:52 -07:00
Simon Pilgrim	41587466aa	[X86] Don't dereference a dyn_cast<> - use a cast<> instead. NFCI. dyn_cast<> can return null if the cast fails, by using cast<> we assert that the cast is correct helping to avoid a potential null dereference.	2021-05-17 15:58:32 +01:00
Jay Foad	472f856714	[AMDGPU] Tweak VOP3_INTERP16 profile Set the output register class based on the output type, instead of hard-coding VGPR_32. I think this is more correct. It doesn't make any difference at the moment because we use the same class for 16- and 32-bit results, but it might in future if we make more use of true 16-bit register classes. Differential Revision: https://reviews.llvm.org/D102622	2021-05-17 15:28:00 +01:00
Irina Dobrescu	50511df32e	[AArch64] Lower bitreverse in ISel Adding lowering support for bitreverse. Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D102397	2021-05-17 13:35:27 +01:00
Nemanja Ivanovic	511f4ae54e	[PowerPC] Add patterns for vselect of v1i128 These patterns are missing even though the underlying instruction doesn't really care about the type. Added these patterns to resolve https://bugs.llvm.org/show_bug.cgi?id=50084	2021-05-17 06:37:46 -05:00
Nemanja Ivanovic	74ae778176	[PowerPC] Do not emit dssall on AIX This instruction is a nop on all server cores (certainly on all cores that AIX supports) so it is fine to emit a nop instead of it. In fact, that is exactly what XL emits. So we emit a nop on AIX and we leave the codegen as is on other platforms since there may indeed be cores out there for which this actually does some prefetching.	2021-05-17 06:08:06 -05:00
Tim Northover	747e5cfb9f	X86: support Swift Async context This adds support to the X86 backend for the newly committed swiftasync function parameter. If such a (pointer) parameter is present it gets stored into an augmented frame record (populated in IR, but generally containing enhanced backtrace for coroutines using lots of tail calls back and forth). The context frame is identical to AArch64 (primarily so that unwinders etc don't get extra complexity). Specfically, the new frame record is [AsyncCtx, %rbp, ReturnAddr], and its presence is signalled by bit 60 of the stored %rbp being set to 1. %rbp still points to the frame pointer in memory for backwards compatibility (only partial on x86, but OTOH the weird AsyncCtx before the rest of the record is because of x86).	2021-05-17 11:56:16 +01:00
Tim Northover	769ced3d57	AArch64: mark x22 livein if it's an async context that gets stored. This fixes a crash with expensive checks enabled (the verifier was not happy).	2021-05-17 11:56:03 +01:00
Tim Northover	82a0e808bb	IR/AArch64/X86: add "swifttailcc" calling convention. Swift's new concurrency features are going to require guaranteed tail calls so that they don't consume excessive amounts of stack space. This would normally mean "tailcc", but there are also Swift-specific ABI desires that don't naturally go along with "tailcc" so this adds another calling convention that's the combination of "swiftcc" and "tailcc". Support is added for AArch64 and X86 for now.	2021-05-17 10:48:34 +01:00
Jacob Bramley	900c898994	[AArch64] Lower fptoi.sat intrinsics. AArch64's fctv instructions implement the saturating behaviour that the fpto*i.sat intrinsics require, in cases where the destination width matches the saturation width. Lowering them removes a lot of unnecessary generated code. Only scalar lowerings are supported for now. Differential Revision: https://reviews.llvm.org/D102353	2021-05-17 10:19:19 +01:00
Ben Shi	7746e818a5	[RISCV] Optimize or/xor with immediate in the zbs extension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102398	2021-05-17 10:59:52 +08:00
Craig Topper	0a34ff8bcb	[RISCV] Replace AddiPair ComplexPattern with a PatLeaf. NFC The ComplexPattern is looking for an immediate in a certain range that has a single use. This can be handled with a PatLeaf since we aren't matching multiple patterns or checking any complicated relationships between nodes. This shrinks the isel table a little bit since tablegen no longer has to generate patterns with commuted operands. With the PatLeaf, tablegen can see we're matching an immediate which should always be on the right hand side of add. Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D102510	2021-05-16 12:17:52 -07:00
Alessandro Decina	833e9b2ea7	[BPF] add support for 32 bit registers in inline asm Add "w" constraint type which allows selecting 32 bit registers. 32 bit registers were added in https://reviews.llvm.org/rGca31c3bb3ff149850b664838fbbc7d40ce571879. Differential Revision: https://reviews.llvm.org/D102118	2021-05-16 11:01:47 -07:00
David Green	dd5c52029d	[CPG][ARM] Optimize towards branch on zero in codegenprepare This adds a simple fold into codegenprepare that converts comparison of branches towards comparison with zero if possible. For example: %c = icmp ult %x, 8 br %c, bla, blb %tc = lshr %x, 3 becomes %tc = lshr %x, 3 %c = icmp eq %tc, 0 br %c, bla, blb As a first order approximation, this can reduce the number of instructions needed to perform the branch as the shift is (often) needed anyway. At the moment this does not effect very much, as llvm tends to prefer the opposite form. But it can protect against regressions from commits like rG9423f78240a2. Simple cases of Add and Sub are added along with Shift, equally as the comparison to zero can often be folded with cpsr flags. Differential Revision: https://reviews.llvm.org/D101778	2021-05-16 17:54:06 +01:00
Simon Pilgrim	262e4200d1	[X86][SSE] Pull out combineToHorizontalAddSub helper from inside (F)ADD/SUB combines (REAPPLIED). NFCI. The intention is to be able to run this from additional locations (such as shuffle combining) in the future. Reapplies rGb95a103808ac (after reversion at rGc012a388a15b), with SSE3/SSSE3 typo fix, test added at rG0afb10de1449.	2021-05-16 13:50:58 +01:00
Jinsong Ji	4b91f96a3e	[AIX][AsmPrinter] Print Global Variable in comments The default AsmPrinter print GV in comments, AIX should do so too. This also fix LLVM :: CodeGen/Generic/inline-asm-mem-clobber.ll. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D102534	2021-05-16 03:04:46 +00:00
Nico Weber	c012a388a1	Revert "[X86][SSE] Pull out combineToHorizontalAddSub helper from inside (F)ADD/SUB combines. NFCI." This reverts commit `b95a103808`. Makes clang assert very early in a Chromium build. See https://bugs.chromium.org/p/chromium/issues/detail?id=1209490#c1 for a standalone repro.	2021-05-15 12:20:02 -04:00
Simon Pilgrim	8cb04d891f	[X86] X86OptimizeLEAPass::replaceDebugValue - take a copy of the DebugLoc not a reference as it may be deleted. Fixes msan warning due to rG9ca2c50b3601	2021-05-15 16:28:20 +01:00
Simon Pilgrim	2ed89001e1	[X86] X86CmovConverterPass::convertCmovInstsToBranches - take a copy of the DebugLoc not a reference as it may be deleted. Fixes msan warning due to rG9ca2c50b3601	2021-05-15 16:13:34 +01:00
Simon Pilgrim	73635adb86	X86SpeculativeLoadHardeningPass::hardenValueInRegister - assert that we have a i8/i16/i32/i64 sized register. NFCI. Silence static analyzer warning for out-of-range access to the SubRegImms[] array.	2021-05-15 15:13:28 +01:00
Simon Pilgrim	f9b1208681	[X86][Atom] Fix vector integer multiplication resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits. Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.	2021-05-15 14:25:48 +01:00
Simon Pilgrim	9ca2c50b36	[X86] Try to pass DebugLoc by const-ref to avoid costly TrackingMDNodeRef copies (REAPPLIED). NFCI. Reapply rG5ed56a821c06 (after reverted by rG7aa89c4a22fd) - don't take reference from struct that will be erased in X86FrameLowering::eliminateCallFramePseudoInstr	2021-05-15 13:23:28 +01:00
Brendon Cahoon	3f7b7e7393	[AMDGPU] Update SCC defs to VCC when uses are changed to VCC The FixSGPRCopies pass converts instructions to VALU when removing illegal VGPR to SGPR copies. Instructions that use SCC are changed to use VCC instead. When that happens, the pass must also change instructions that define SCC to define VCC. The pass was not changing the SCC definition when an ADDC is converted due to a input that is a VGPR to SGPR copy. But, the initial ADD insruction, which define SCC, is not converted. This causes a compilation failure due to a use of an undefined physical register. This patch adds code that inserts the SCC definition in the MoveToVALU worklist when a SCC use is converted to a VCC use. Differential Revision: https://reviews.llvm.org/D102111	2021-05-14 18:05:05 -04:00
Mitch Phillips	7aa89c4a22	Revert "[X86] Try to pass DebugLoc by const-ref to avoid costly TrackingMDNodeRef copies. NFCI." This reverts commit `5ed56a821c`. Reason: Broke the MSan buildbots. See Phabricator for more info (https://reviews.llvm.org/rG5ed56a821c0622869739a3ae752eea97a1ee1f48).	2021-05-14 14:30:57 -07:00
Neumann Hon	8a7e2fb5f2	[SystemZ] [z/OS] Add SystemZCallingConventionRegisters class This patch adds the abstract class SystemZCallingConventionRegisters which is a SystemZ-specific class detailing special registers used by calling conventions on the target. SystemZELFRegisters and SystemZXPLINK64Registers implement this class for ELF and XPLINK64 respectively. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D102370	2021-05-14 16:51:26 -04:00
Stanislav Mekhanoshin	6fb02596a2	[AMDGPU] Add support for architected flat scratch Add support for the readonly flat Scratch register initialized by the SPI. Differential Revision: https://reviews.llvm.org/D102432	2021-05-14 10:53:48 -07:00
Matt Arsenault	c7cff08f79	AMDGPU: Fix assert when rewriting saddr d16 loads moveOperands does not handle moving tied operands since it would generally have to fixup the tied operand references. Avoid the assert by untying and retying after the modification. These in place modifications really aren't managable.	2021-05-14 13:24:19 -04:00
Roman Lebedev	1fc1c88704	[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:05 +03:00
Roman Lebedev	2f8572d8e2	[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:04 +03:00
Roman Lebedev	f8f7c765a0	[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:04 +03:00
Roman Lebedev	26eeb6e650	[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:03 +03:00
Roman Lebedev	41a5dcdf87	[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:03 +03:00
Roman Lebedev	6733fe5c0d	[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such.	2021-05-14 20:23:03 +03:00
Roman Lebedev	555e1d2987	[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:02 +03:00

... 10 11 12 13 14 ...

63852 Commits