llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	97f51f0489	AMDGPU: Remove redundant CCAction for i1	2020-12-15 17:00:27 -05:00
Craig Topper	028efac2d7	[RISCV] Only custom legalize i32 arguments to vector intrinsics on RV64.	2020-12-15 13:54:41 -08:00
Baptiste Saleil	57d83c3a90	[PowerPC] Enable paired vector type and intrinsics when MMA is disabled This patch enables the Clang type __vector_pair and its associated LLVM intrinsics even when MMA is disabled. With this patch, the type is now controlled by the PPC paired-vector-memops option. The builtins and intrinsics will be renamed to drop the mma prefix in another patch. Differential Revision: https://reviews.llvm.org/D91819	2020-12-15 15:14:11 -06:00
Tony	d5ea8f7010	[AMDGPU] Clarify scratch initialization - Clarify documentation on initializing scratch. - Rename compute_pgm_rsrc2 field for enabling scratch from ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET to ENABLE_PRIVATE_SEGMENT to match hardware definition. Differential Revision: https://reviews.llvm.org/D93271	2020-12-15 20:14:20 +00:00
Simon Pilgrim	712117338a	[X86] Explicitly use SDValue instead of auto. NFCI. Fix static analyzer warning about not using a SDValue&	2020-12-15 17:27:25 +00:00
Simon Pilgrim	b0e5aea557	[X86] Remove unnecessary SUBV_BROADCAST combines. NFCI. Noticed while dealing with D92645 - these are now handled by getFauxShuffleMask + shuffle combining code.	2020-12-15 16:54:34 +00:00
Paul Walker	632f4d2747	[NFC] Fix a few SVEInstrInfo related stylistic issues.	2020-12-15 16:10:38 +00:00
David Green	6cc3d80a84	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. Differential Revision: https://reviews.llvm.org/D92553	2020-12-15 15:58:52 +00:00
Ulrich Weigand	ebef92169c	[SystemZ] Remove most hard-coded R1D instances for sibcalls Indirect sibling calls need to use %r1 to hold the target address. This is currently hard-coded in many places. This is not only unnecessary, but makes future changes in this area difficult. This patch now encodes the target address as operand without hard coding a register in most places throughout the MI back-end. Code generation still always uses %r1, but this is now decided solely in one place in SystemZTargetLowering::LowerCall. NFC intended.	2020-12-15 16:31:18 +01:00
Kazushi (Jam) Marukawa	697226550e	[VE] Support FRAMEADDR Implement FRAMEADDR for VE. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D93295	2020-12-15 23:31:19 +09:00
Kazushi (Jam) Marukawa	2a2268a6db	[VE][NFC] Sort VEISD operations Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D93294	2020-12-15 23:29:16 +09:00
Sebastian Neubauer	91445979be	[AMDGPU] Unify flat offset logic Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into AMDGPUBaseInfo. Differential Revision: https://reviews.llvm.org/D93287	2020-12-15 14:59:59 +01:00
Hsiangkai Wang	db48a6de77	[RISCV] Define vwadd/vwaddu/vwsub/vwsubu intrinsics. Define vwadd/vwaddu/vwsub/vwsubu intrinsics and lower to V instructions. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com> Differential Revision: https://reviews.llvm.org/D93108	2020-12-15 20:15:06 +08:00
Paul Walker	b74c4dbb96	[SVE] Move INT_TO_FP i1 promotion into custom lowering. AddPromotedToType is being used to legalise INT_TO_FP operations when the source is a predicate. The point where this introduces vector extends might cause problems in the future so this patch falls back to manual promotion within custom lowering. Differential Revision: https://reviews.llvm.org/D90093	2020-12-15 11:57:07 +00:00
Simon Pilgrim	bd07092669	[X86] Remove trailing whitespace. NFC.	2020-12-15 10:11:38 +00:00
Simon Pilgrim	15a31389b2	[X86][AVX] LowerBUILD_VECTOR - reduce 256/512-bit build vectors with zero/undef upper elements + pad. As discussed on D92645, we don't do a good job of recognising when we don't require the full width of a ymm/zmm build vector because the upper elements are undef/zero. This commit allows us to make use of implicit zeroing of upper elements with AVX instructions, which we emulate in DAG with a INSERT_SUBVECTOR into the bottom of a undef/zero vector of the original type. This exposed a limitation in getTargetConstantBitsFromNode which didn't extract bits from INSERT_SUBVECTORs of different element widths which I've included as well to prevent a couple of regressions.	2020-12-15 10:11:38 +00:00
Kazushi (Jam) Marukawa	a2eb07aa55	[VE] Support atomic exchange instructions Support atomic exchange and atomic compare and exchange instructions. Change CAS and TS1AM instructions for ISel patterns. Add selectADDRzi pattern for them. Add TS1AM pseudo instruction also for better ISel. Add shouldExpandAtomicRMWInIR() function to expand all atomicrmw instructions except atomicrmw xchg. Add custom lower for i8/i16 atomicrmw xchg. Modify replaceFI to support CAS/TS1AM instructions which use "reg+disp" operands instead of "reg+imm+disp" operands. And, add several regression tests to check the correctness. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D93161	2020-12-15 17:43:11 +09:00
Hsiangkai Wang	14a91d676b	[RISCV][NFC] Define scalable vectors for half types. This is a preperation work for vfadd intrinsics. Differential Revision: https://reviews.llvm.org/D93275	2020-12-15 16:23:22 +08:00
Hsiangkai Wang	a6805a0e02	[RISCV] Define vadd/vsub/vrsub intrinsics and lower to V instructions. This patch is based on the proposal from Roger Ferrer Ibanez. http://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html Differential Revision: https://reviews.llvm.org/D93013	2020-12-15 12:56:49 +08:00
Craig Topper	b094eaa392	[RISCV] Prevent assertion in the assembler if vmerge or vfmerge are given a V0 destination.	2020-12-14 17:22:55 -08:00
Craig Topper	2cf12ae0cc	[RISCV] Handle Match_InvalidSImm5 in RISCVAsmParser::MatchAndEmitInstruction	2020-12-14 17:22:55 -08:00
Craig Topper	413596ee45	[RISCV] Teach debug output from assembly parser to print register names instead of enum values.	2020-12-14 17:22:55 -08:00
Changpeng Fang	ce0c0013d8	AMDGPU: If a store defines (alias) a load, it clobbers the load. Summary: If a store defines (must alias) a load, it clobbers the load. Fixes: SWDEV-258915 Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D92951	2020-12-14 16:34:32 -08:00
Harald van Dijk	9eac818370	[X86] Fix variadic argument handling for x32 The X86-64 ABI defines va_list as typedef struct { unsigned int gp_offset; unsigned int fp_offset; void overflow_arg_area; void reg_save_area; } va_list[1]; This means the size, alignment, and reg_save_area offset will depend on whether we are in LP64 or in ILP32 mode, so this commit adds the checks. Additionally, the VAARG_64 pseudo-instruction assumed 64-bit pointers, so this commit adds a VAARG_X32 pseudo-instruction that behaves just like VAARG_64, except for assuming 32-bit pointers. Some of these changes were originally done by Michael Liao <michael.hliao@gmail.com>. Fixes https://bugs.llvm.org/show_bug.cgi?id=48428. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D93160	2020-12-14 23:47:27 +00:00
Stanislav Mekhanoshin	cf5845d6c4	[AMDGPU] Use multi-dword flat scratch for spilling Differential Revision: https://reviews.llvm.org/D93067	2020-12-14 14:19:29 -08:00
Zequan Wu	b6b522c4db	[NFC] cleanup cg-profile emission on TargetLowerinng Differential Revision: https://reviews.llvm.org/D93150	2020-12-14 13:07:44 -08:00
Reid Kleckner	55fc64bce0	[Hexagon] Tweak _MSC_VER workaround version My bot runs VS 2019, but it could not compile this code. Message: [55/2465] Building CXX object lib\Target\Hexagon\CMakeFiles\LLVMHexagonCodeGen.dir\HexagonVectorCombine.cpp.obj FAILED: lib/Target/Hexagon/CMakeFiles/LLVMHexagonCodeGen.dir/HexagonVectorCombine.cpp.obj ... C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.23.28105\include\map(71): error C2976: 'std::map': too few template arguments C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.23.28105\include\map(71): note: see declaration of 'std::map' The version in the path, 14.23, corresponds to _MSC_VER 1923, so raise the version floor to 1924. I have not tested with versions between 1924 and 1928 (latest), but the latest works with the variadic version.	2020-12-14 11:26:36 -08:00
Craig Topper	045304701b	[RISCV] Move vtype decoding and printing from RISCVInstPrinter to RISCVBaseInfo. Share with the assembly parser's debug output This moves the vtype decoding and printing to RISCVBaseInfo. This keeps all of the decoding code in the same area as the encoding code. This will make it easier to change the decoding for the 1.0 spec in the future. We're now sharing the printing with the debug output for operands in the assembler. This also fixes that debug output to include the tail and mask agnostic bits. Since the printing code works on the vtype immediate value, we now encode the immediate during parsing and store just the immediate in the operand.	2020-12-14 10:50:26 -08:00
Jonas Paulsson	653b97690f	[SystemZ] Improve handling of backchain offset. - New function SDValue getBackchainAddress() used by lowerDYNAMIC_STACKALLOC() and lowerSTACKRESTORE() to properly handle the backchain offset also with packed-stack. - Make a common function getBackchainOffset() for the computation of the backchain offset and use in some places (NFC). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D93171	2020-12-14 12:39:38 -06:00
Michael Liao	1fd1f638b6	[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified. - Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer valid. Differential Revision: https://reviews.llvm.org/D93174	2020-12-14 13:08:13 -05:00
Nemanja Ivanovic	bfdc19e778	[PowerPC] Restore stack ptr from frame ptr with setjmp If a function happens to: - call setjmp - do a 16-byte stack allocation - call a function that sets up a stack frame and longjmp's back The stack pointer that is restores by setjmp will no longer point to a valid back chain. According to the ABI, stack accesses in such a function are to be frame pointer based - so it is an error (quite obviously) to restore the stack from the back chain. We already restore the stack from the frame pointer when there are calls to fast_cc functions. We just need to also do that when there are calls to setjmp. This patch simply does that. This was pointed out by the Julia team. Differential revision: https://reviews.llvm.org/D92906	2020-12-14 11:34:16 -06:00
Stanislav Mekhanoshin	87d7757bbe	[SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059	2020-12-14 08:49:40 -08:00
Jay Foad	07e92e6b60	[AMDGPU] Make use of HasSMemRealTime predicate. NFC. We have this subtarget feature so it makes sense to use it here. This is NFC because it's always defined by default on GFX8+. Differential Revision: https://reviews.llvm.org/D93202	2020-12-14 16:34:57 +00:00
Kazushi (Jam) Marukawa	aefedb1707	[VE] Add logical mask intrinsic instructions Add andm, orm, xorm, eqvm, nndm, negm, pcvm, lzvm, and tovm intrinsic instructions, a few pseudo instructions to expand logical intrinsic using VM512, a mechnism to expand such pseudo instructions, and regression tests. Also, assign vector mask types and vector mask register classes correctly. This is required to use VM512 registers as function arguments. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D93093	2020-12-15 01:34:31 +09:00
Simon Pilgrim	5f5a2547c1	[X86] LowerBUILD_VECTOR - track zero/nonzero elements with APInt masks. NFCI. Prep work for undef/zero 'upper elements' handling as proposed in D92645.	2020-12-14 16:28:45 +00:00
Kazushi (Jam) Marukawa	c9213e1b29	[VE] Correct addRegisterClass calls Correct addRegisterClass calls for vector mask registers. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D93212	2020-12-15 01:16:56 +09:00
diggerlin	15f2d4f198	[AIX] Fixed "comparison of unsigned expression >= 0 is always true" gcc warnings. Summary: fixed a Fixed "comparison of unsigned expression >= 0 is always true" gcc warnings. http://lab.llvm.org:8011/#/builders/5/builds/2407/steps/2/logs/stdio the error caused by patch https://reviews.llvm.org/D92398	2020-12-14 11:08:40 -05:00
Kerry McLaughlin	c5ced82c8e	[SVE][CodeGen] Lower scalable floating-point vector reductions Changes in this patch: - Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally to also work for scalable types - Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32) - Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D93050	2020-12-14 11:45:42 +00:00
David Green	1de3e7fd62	[ARM] Improve handling of empty VPT blocks in tail predicated loops A vpt block that just contains either VPST;VCTP or VPT;VCTP, once the VCTP is removed will become invalid. This fixed the first by removing the now empty block and bails out for the second, as we have no simple way of converting a VPT to a VCMP. Differential Revision: https://reviews.llvm.org/D92369	2020-12-14 11:17:01 +00:00
Carl Ritson	62c246eda2	[AMDGPU][NFC] Rename opsel/opsel_hi/neg_lo/neg_hi with suffix 0 These parameters set a default value of 0, so I believe they should include a 0 suffix. This allows for versions which do not set a default value in future. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D93187	2020-12-14 20:01:56 +09:00
Carl Ritson	af4570cd3a	[AMDGPU][NFC] Remove unused VOP3Mods0Clamp This is unused and the selection function does not exist. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D93188	2020-12-14 20:00:58 +09:00
Sebastian Neubauer	5733167f54	[AMDGPU] Mark amdgpu_gfx functions as module entry function - Allows lds allocations - Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata Differential Revision: https://reviews.llvm.org/D92946	2020-12-14 10:43:39 +01:00
QingShan Zhang	08e287aaf3	[PowerPC][FP128] Fix the incorrect signature for math library call The runtime library has two family library implementation for ppc_fp128 and fp128. For IBM Long double(ppc_fp128), it is suffixed with 'l', i.e(sqrtl). For IEEE Long double(fp128), it is suffixed with "ieee128" or "f128". We miss to map several libcall for IEEE Long double. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D91675	2020-12-14 07:52:56 +00:00
Chen Zheng	4830d458dd	[MachineCombiner][NFC] Add MustReduceRegisterPressure goal add a new goal MustReduceRegisterPressure for machine combiner pass. PowerPC will use this new goal to do some register pressure related optimization. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D92068	2020-12-14 00:02:42 -05:00
Kazu Hirata	913515e465	[Target] Use llvm::is_contained (NFC)	2020-12-13 19:35:10 -08:00
Craig Topper	0261ce9e17	[X86] Add ExeDomain = SSEPackedSingle to cvtss2sd and cvtsd2ss instrutions. Prep for D92993	2020-12-13 12:35:33 -08:00
Craig Topper	fa31f337a2	[X86] Add isel patterns to form VPDPWSSD from (add (vpmaddwd X, Y), Z) when AVXVNNI is enabled. We already have these patterns for AVX512VNNI.	2020-12-13 12:02:07 -08:00
Simon Pilgrim	d5c434d7dd	[X86][SSE] combineX86ShufflesRecursively - add basic handling for combining shuffles of different widths (PR45974) If a faux shuffle uses smaller shuffle inputs, try to recursively combine with those inputs directly instead of widening them immediately. Then widen all smaller inputs at the bottom of the recursion. This will still mean we're generating nodes on the fly (PR45974) even if we don't combine to a new shuffle but it does help AVX2+ targets combine across xmm/ymm/zmm types, mainly as variable shuffles.	2020-12-13 17:18:07 +00:00
Florian Hahn	46bc40e502	Recommit "[AArch64] Lower calls with rv_marker attribute." This recommits `a87fccb3ff` with a fix to mark the destination operand of the marker instruction as def, to fix a machine verifier failure. This reverts the revert commit `c0f2cea7c0`.	2020-12-13 16:20:39 +00:00
Simon Pilgrim	47321c311b	[X86][SSE] combineReductionToHorizontal - add vXi8 ISD::MUL reduction handling (PR39709) Default expansion leads to repeated extensions/truncations to/from vXi16 which shuffle combining and demanded elts can't completely unravel. Better just to promote (any_extend) the input and perform a vXi16 reduction. We'll be able to remove a lot of this if we ever get decent legalization support for reduction intrinsics in SelectionDAG.	2020-12-13 15:22:54 +00:00
Chris Sears	36a23b33aa	X86: Correcting X86OutgoingValueHandler typo (NFC) https://reviews.llvm.org/D92631	2020-12-12 20:28:37 -05:00
Zarko Todorovski	ce4040a43d	[PPC] Check for PPC64 when emitting 64bit specific VSX nodes when pattern matching built vectors Some of the pattern matching in PPCInstrVSX.td and node lowering involving vectors assumes 64bit mode. This patch disables some of the unsafe pattern matching and lowering of BUILD_VECTOR in 32bit mode. Reviewed By: Xiangling_L Differential Revision: https://reviews.llvm.org/D92789	2020-12-12 15:28:28 -05:00
Krzysztof Parzyszek	baf931a842	[Hexagon] Reconsider getMask fix, return original mask, convert later The getPayload/getMask/getPassThrough functions should return values that could be composed into a masked load/store without any additional type casts. The previous fix violated that. Instead, convert scalar mask to a vector right before rescaling.	2020-12-12 13:27:22 -06:00
Krzysztof Parzyszek	2cf5310471	[Hexagon] Create vector masks for scalar loads/stores AlignVectors treats all loaded/stored values as vectors of bytes, and masks as corresponding vectors of booleans, so make getMask produce a 1-element vector for scalars from the start.	2020-12-12 11:12:17 -06:00
Harald van Dijk	f61e5ecb91	[X86] Avoid data16 prefix for lea in x32 mode The ABI demands a data16 prefix for lea in 64-bit LP64 mode, but not in 64-bit ILP32 mode. In both modes this prefix would ordinarily be ignored, but the instructions may be changed by the linker to instructions that are affected by the prefix. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D93157	2020-12-12 17:05:24 +00:00
David Green	a4823377fd	[ARM] Add basic masked load/store costs This adds some basic MVE masked load/store costs, notably changing the cost of legal loads/stores to the MVECostFactor and the cost of scalarized instructions to 8*NumElts. Differential Revision: https://reviews.llvm.org/D86538	2020-12-12 15:26:32 +00:00
Luo, Yuanke	e52bc1d2bb	[X86] Add chain in ISel for x86_tdpbssd_internal intrinsic.	2020-12-12 21:14:38 +08:00
Jonas Paulsson	42f628c842	Reapply "[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing." Fixed to properly compute the live-in lists of new blocks. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D92803	2020-12-11 18:25:47 -06:00
Jonas Paulsson	0c2d23933f	[SystemZTTIImpl] Allow some non-prefetched accesses in getMinPrefetchStride(). The performance improvement on LBM previously achieved with improved software prefetching (`36d4421`) have gone lost recently with `e00f189`. There now is one memory access in the loop that LoopDataPrefetch cannot handle (while before there was none) which the heuristic rejects. This patch adds a small margin by allowing 1 non-prefetched memory access for every 32 prefetched ones, so that the heuristic doesn't bail in this type of case. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D92985	2020-12-11 18:06:07 -06:00
diggerlin	7c8072ce2d	[AIX] Fixed a link error. Summary: "Speculative fix for link failure on bots" with a mention of "the clang-ppc64le-rhel bot fails on link: http://lab.llvm.org:8011/#/builders/57/builds/2307/steps/6/logs/stdio". PPCAsmPrinter.cpp:(.text._ZN12_GLOBAL__N_116PPCAIXAsmPrinter19emitFunctionBodyEndEv+0x2f8): undefined reference to `llvm::XCOFF::getNameForTracebackTableLanguageId(llvm::XCOFF::TracebackTable::LanguageID)' PPCAsmPrinter.cpp:(.text._ZN12_GLOBAL__N_116PPCAIXAsmPrinter19emitFunctionBodyEndEv+0x2170): undefined reference to `llvm::XCOFF::parseParmsType(unsigned int, unsigned int)'	2020-12-11 18:53:10 -05:00
diggerlin	997d286f2d	[AIX][XCOFF] emit traceback table for function in aix SUMMARY: 1. added a new option -xcoff-traceback-table to control whether generate traceback table for function. 2. implement the functionality of emit traceback table of a function. Reviewers: hubert.reinterpretcast, Jason Liu Differential Revision: https://reviews.llvm.org/D92398	2020-12-11 17:50:25 -05:00
Sanjay Patel	204bdc5322	[InstCombine][x86] fix insertion point bug in vector demanded elts fold (PR48476) This transform was added at: `c63799fc52` From what I see, it's the first demanded elements transform that adds a new instruction using the IRBuilder. There are similar folds in the generic demanded bits chunk of instcombine that also use the InsertPointGuard code pattern. The tests here would assert/crash because the new instruction was being added at the start of the demanded elements analysis rather than at the instruction that is being replaced.	2020-12-11 17:23:35 -05:00
Krzysztof Parzyszek	2d8cc5479b	[Hexagon] Workaround for compilation error with VS2017	2020-12-11 15:11:44 -06:00
Florian Hahn	c0f2cea7c0	Revert "[AArch64] Lower calls with rv_marker attribute ." This reverts commit `a87fccb3ff`. A test appears to fail with expensive checks. Reverting while I investigate.	2020-12-11 20:12:59 +00:00
Florian Hahn	a87fccb3ff	[AArch64] Lower calls with rv_marker attribute . This patch adds support for lowering function calls with the rv_marker attribute. The goal is to expand such calls to the following sequence of instructions: BL @fn mov x29, x29 This sequence of instructions triggers Objective-C runtime optimizations, hence we want to ensure no instructions get moved in between them. This patch achieves that by adding a new CALL_RVMARKER ISD node, which gets turned into the BLR_RVMARKER pseudo, which eventually gets expanded into the sequence mentioned above. The sequence is then marked as instruction bundle, to avoid anything being moved in between. @ahatanak is working on using this attribute in the front- & middle-end. Together with the front- & middle-end changes, this should address PR31925 for AArch64. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D92569	2020-12-11 19:45:44 +00:00
Craig Topper	b577d2df7b	[RISCV] Add a pass to remove duplicate VSETVLI instructions in a basic block. Add simple pass for removing redundant vsetvli instructions within a basic block. This handles the case where the AVL register and VTYPE immediate are the same and no other instructions that change VTYPE or VL are between them. There are going to be more opportunities for improvement in this space as we development more complex tests. Differential Revision: https://reviews.llvm.org/D92679	2020-12-11 10:35:37 -08:00
Jay Foad	4f25e53982	[AMDGPU] Make use of emitRemovedIntrinsicError. NFC. Change-Id: I482bbf528255f2eacd3878ddfe7edb9a8f63d5c2	2020-12-11 14:02:14 +00:00
David Green	3f571be1c0	[ARM] Make t2DoLoopStartTP a terminator Although this was something that I was hoping we would not have to do, this patch makes t2DoLoopStartTP a terminator in order to keep it at the end of it's block, so not allowing extra MVE instruction between it and the end. With t2DoLoopStartTP's also starting tail predication regions, it also marks them as having side effects. The t2DoLoopStart is still not a terminator, giving it the extra scheduling freedom that can be helpful, but now that we have a TP version they can be treated differently. Differential Revision: https://reviews.llvm.org/D91887	2020-12-11 09:23:57 +00:00
Kazushi (Jam) Marukawa	87f308ab3d	[VE] Add vgt and vsc intrinsic instructions Add vgt and vsc intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D93032	2020-12-11 18:23:43 +09:00
Hsiangkai Wang	5aa584ec71	[RISCV] Separate masked and unmasked definitions for pseudo instructions. Differential Revision: https://reviews.llvm.org/D93012	2020-12-11 14:02:56 +08:00
Craig Topper	b90e2d850e	[RISCV] Use tail agnostic policy for vsetvli instruction emitted in the custom inserter The compiler is making no effort to preserve upper elements. To do so would require another source operand tied with the destination and a different intrinsic interface to give control of this source to the programmer. This patch changes the tail policy to agnostic so that the CPU doesn't need to make an effort to preserve them. This is consistent with the RVV intrinsic spec here https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#configuration-setting Differential Revision: https://reviews.llvm.org/D93080	2020-12-10 19:48:03 -08:00
Derek Schuff	8d396acac3	[WebAssembly] Support COMDAT sections in assembly syntax This CL changes the asm syntax for section flags, making them more like ELF (previously "passive" was the only option). Now we also allow "G" to designate COMDAT group sections. In these sections we set the appropriate comdat flag on function symbols, and also avoid auto-creating a new section for them. This also adds asm-based tests for the changes D92691 to go along with the direct-to-object tests. Differential Revision: https://reviews.llvm.org/D92952 This is a reland of rG4564553b8d8a with a fix to the lit pipeline in llvm/test/MC/WebAssembly/comdat.ll	2020-12-10 16:43:59 -08:00
Jonas Paulsson	bc7a61b703	Revert "[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing." Temporarily reverted. This reverts commit `ea475c77ff`.	2020-12-10 18:05:51 -06:00
Derek Schuff	dd1aa4fdd8	Revert "[WebAssembly] Support COMDAT sections in assembly syntax" This reverts commit `4564553b8d`. It broke several buildbots.	2020-12-10 15:55:33 -08:00
Derek Schuff	4564553b8d	[WebAssembly] Support COMDAT sections in assembly syntax This CL changes the asm syntax for section flags, making them more like ELF (previously "passive" was the only option). Now we also allow "G" to designate COMDAT group sections. In these sections we set the appropriate comdat flag on function symbols, and also avoid auto-creating a new section for them. This also adds asm-based tests for the changes D92691 to go along with the direct-to-object tests. Differential Revision: https://reviews.llvm.org/D92952	2020-12-10 14:46:24 -08:00
Craig Topper	e2006ed0f7	[RISCV] Simplify vector instruction handling in RISCVMCInstLower.cpp. Use RegisterClass::contains instead of going through getMinimalPhysRegClass and hasSuperClassEq. Remove the special case for NoRegister. It's identical to the handling for any other regsiter that isn't VRM2/M4/M8.	2020-12-10 13:40:00 -08:00
Jonas Paulsson	ea475c77ff	[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing. The loop-based probing done for stack clash protection altered R1D which corrupted the backchain value to be stored after the probing was done. By using R0D instead for the loop exit value, R1D is not modified. Review: Ulrich Weigand. Differential Revision: https://reviews.llvm.org/D92803	2020-12-10 15:06:18 -06:00
Amara Emerson	c29af37c6c	[AArch64] Don't try to compress jump tables if there are any inline asm instructions. Inline asm can contain constructs like .bytes which may have arbitrary size. In some cases, this causes us to miscalculate the size of blocks and therefore offsets, causing us to incorrectly compress a JT. To be safe, just bail out of the whole thing if we find any inline asm. Fixes PR48255 Differential Revision: https://reviews.llvm.org/D92865	2020-12-10 12:20:02 -08:00
Sam Elliott	12406ade06	[RISCV] Add (Proposed) Assembler Extend Pseudo-Instructions There is an in-progress proposal for the following pseudo-instructions in the assembler, to complement the existing `sext.w` rv64i instruction: - sext.b - sext.h - zext.b - zext.h - zext.w The `.b` and `.h` variants are available with rv32i and rv64i, and `zext.w` is only available with `rv64i`. These are implemented primarily as pseudo-instructions, as these instructions expand to multiple real instructions. In the case of `zext.b`, this expands to a single rv32/64i instruction, so it is implemented with an InstAlias (like `sext.w` is on rv64i). The proposal is available here: https://github.com/riscv/riscv-asm-manual/pull/61 Reviewed By: asb Differential Revision: https://reviews.llvm.org/D92793	2020-12-10 19:25:51 +00:00
Craig Topper	a1ae3c6ac9	[RISCV][LegalizeDAG] Expand SETO and SETUO comparisons. Teach LegalizeDAG to expand SETUO expansion when UNE isn't legal. If SETUNE isn't legal, UO can use the NOT of the SETO expansion. Removes some complex isel patterns. Most of the test changes are from using XORI instead of SEQZ. Differential Revision: https://reviews.llvm.org/D92008	2020-12-10 09:15:52 -08:00
Krzysztof Parzyszek	7c9afe9183	[Hexagon] Fix gcc6 compilation issue	2020-12-10 08:17:07 -06:00
Kerry McLaughlin	abe7775f5a	[SVE][CodeGen] Extend index of masked gathers This patch changes performMSCATTERCombine to also promote the indices of masked gathers where the element type is i8 or i16, and adds various tests for gathers with illegal types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91433	2020-12-10 13:54:45 +00:00
Haojian Wu	2fc4afda0f	Fix a -Wunused-variable warning in release build.	2020-12-10 14:52:45 +01:00
Kazushi (Jam) Marukawa	4b1e329255	[VE] Add vector reduce intrinsic instructions Add vrmax, vrmin, vfrmax, vfrmin, vrand, vror, and vrxor intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92941	2020-12-10 22:21:17 +09:00
David Green	0447f3508f	[ARM][RegAlloc] Add t2LoopEndDec We currently have problems with the way that low overhead loops are specified, with LR being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail. Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions. This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them. Differential Revision: https://reviews.llvm.org/D91358	2020-12-10 12:14:23 +00:00
Mirko Brkusanin	0c7cce54eb	[AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2 Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned loads but the one that will be selected is determined either by the order in tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has priority. While ds_read2_b64 has lower alignment requirements, we cannot always restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode option. This was causing ds_read_b128 to be selected for 8-byte aligned loads regardles of chosen access mode. To resolve this we use two patterns for selecting ds_read_b128. One requires alignment of 16-byte and the other requires unaligned-access-mode option. Same goes for ds_write2_b64 and ds_write_b128. Differential Revision: https://reviews.llvm.org/D92767	2020-12-10 12:40:49 +01:00
David Green	b0ce615b2d	[ARM] Remove copies from low overhead phi inductions. The phi created in a low overhead loop gets created with a default register class it seems. There are then copied inserted between the low overhead loop pseudo instructions (which produce/consume GPRlr instructions) and the phi holding the induction. This patch removes those as a step towards attempting to make t2LoopDec and t2LoopEnd a single instruction, and appears useful in it's own right as shown in the tests. Differential Revision: https://reviews.llvm.org/D91267	2020-12-10 10:30:31 +00:00
Benjamin Kramer	eeb713bbe2	[Hexagon] Fold single-use variables into assert. NFCI. Silences unused variable warnings in Release builds.	2020-12-10 10:53:56 +01:00
Luo, Yuanke	f80b29878b	[X86] AMX programming model. This patch implements amx programming model that discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet. This patch implemeted 7 components. 1. The c interface to end user. 2. The AMX intrinsics in LLVM IR. 3. Transform load/store <256 x i32> to AMX intrinsics or split the type into two <128 x i32>. 4. The Lowering from AMX intrinsics to AMX pseudo instruction. 5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx intruction. 6. The register allocation for tile register. 7. Morph AMX pseudo instruction to AMX real instruction. Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0 Differential Revision: https://reviews.llvm.org/D87981	2020-12-10 17:01:54 +08:00
Stanislav Mekhanoshin	4617cc68f6	[AMDGPU] Fix expansion of 192 bit spills in PEI Differential Revision: https://reviews.llvm.org/D92979	2020-12-09 16:36:29 -08:00
Krzysztof Parzyszek	e3b2828b9d	[Hexagon] Silence warnings about unused objects	2020-12-09 17:54:10 -06:00
Krzysztof Parzyszek	43d1c7a564	[Hexagon] Fix build: move template specialization into namespace scope	2020-12-09 17:40:15 -06:00
Krzysztof Parzyszek	f5d07a05bb	[Hexagon] Realign HVX vectors wherever possible Introduce HexagonVectorCombine as a helper class for vector-related optimizations.	2020-12-09 17:11:25 -06:00
Saleem Abdulrasool	ee74d1b420	X86: use a data driven configuration of Windows x86 libcalls (NFC) Rather than creating a series of associated calls and ensuring that everything is lined up, use a table driven approach that ensures that they two always stay in sync.	2020-12-09 22:49:11 +00:00
Scott Linder	9260a99999	[MC][AMDGPU] Consume EndOfStatement in asm parser Avoids spurious newlines showing up in the output when emitting assembly via MC. Reviewed By: MaskRay, arsenm Differential Revision: https://reviews.llvm.org/D92690	2020-12-09 21:45:55 +00:00
Craig Topper	5ff5cf8e05	[X86] Use APInt::isSignedIntN instead of isIntN for 64-bit ANDs in X86DAGToDAGISel::IsProfitableToFold Pretty sure we meant to be checking signed 32 immediates here rather than unsigned 32 bit. I suspect I messed this up because in MathExtras.h we have isIntN and isUIntN so isIntN differs in signedness depending on whether you're using APInt or plain integers. This fixes a case where we didn't fold a constant created by shrinkAndImmediate. Since shrinkAndImmediate doesn't topologically sort constants it creates, we can fail to convert the Constant to a TargetConstant. This leads to very strange behavior later. Fixes PR48458.	2020-12-09 13:39:07 -08:00
Scott Linder	f5f4b8b60f	[AMDGPU][MC] Restore old error position for "too few operands" Revert part of https://reviews.llvm.org/D92084 to make it simpler to start consuming the EndOfStatement token within AMDGPU's ParseInstruction in a future patch. This also brings us back to what every other target currently does. A future change to move the position back to the end of the statement would likely need to audit all of the AMDGPUOperand SMLoc ranges, and determine the SMLoc for the last character of the last operand. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D92960	2020-12-09 21:09:47 +00:00
Florian Hahn	77fd12a66e	[AArch64] Add aarch64_neon_vcmla{_rot{90,180,270}} intrinsics. Add builtins required to implement vcmla and rotated variants from the ACLE Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D92929	2020-12-09 19:46:49 +00:00
Kazushi (Jam) Marukawa	1a2147fead	[VE] Add vsum and vfsum intrinsic instructions Add vsum and vfsum intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92938	2020-12-10 01:11:53 +09:00
Kazushi (Jam) Marukawa	398f29fbb0	[VE] Add vfmk intrinsic instructions Add vfmk intrinsic instructions, a few pseudo instructions to expand vfmk intrinsic using VM512 correctly, and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92758	2020-12-10 00:08:20 +09:00
Simon Pilgrim	24184dbb82	[X86] Fold CONCAT(VPERMV3(X,Y,M0),VPERMV3(Z,W,M1)) -> VPERMV3(CONCAT(X,Z),CONCAT(Y,W),CONCAT(M0,M1)) Further prep work toward supporting different subvector sizes in combineX86ShufflesRecursively	2020-12-09 14:29:32 +00:00
Kerry McLaughlin	05edfc5475	[SVE][CodeGen] Add DAG combines for s/zext_masked_gather This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true: - fold (and (masked_gather x)) -> (zext_masked_gather x) - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x) LowerMGATHER has also been updated to fetch the LoadExtType associated with the gather and also use this value to determine the correct masked gather opcode to use. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92230	2020-12-09 11:53:19 +00:00
Kerry McLaughlin	4519ff4b6f	[SVE][CodeGen] Add the ExtensionType flag to MGATHER Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode. Also updated SelectionDAGDumper::print_details so that details of the gather load (is signed, is scaled & extension type) are printed. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91084	2020-12-09 11:19:08 +00:00
Tim Northover	45de42116e	AArch64: use correct operand for ubsantrap immediate. I accidentally pushed the wrong patch originally.	2020-12-09 10:17:16 +00:00
Fraser Cormack	af5fd65895	[RISCV] Fix missing def operand when creating VSETVLI pseudos The register operand was not being marked as a def when it should be. No tests for this in the main branch as there are not yet any pseudos without a non-negative VLIndex. Also change the type of a virtual register operand from unsigned to Register and adjust formatting. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D92823	2020-12-09 09:35:28 +00:00
Siddhesh Poyarekar	0ef0de65f1	Fix typo in llvm/lib/Target/README.txt Trivial typo, replace __builtin_objectsize with __builtin_object_size. Differential Revision: https://reviews.llvm.org/D92914	2020-12-09 10:12:26 +01:00
David Green	384383e15c	[ARM] Common inverse constant predicates to VPNOT This scans through blocks looking for constants used as predicates in MVE instructions. When two constants are found which are the inverse of one another, the second can be replaced by a VPNOT of the first, potentially allowing that not to be folded away into an else predicate of a vpt block. Differential Revision: https://reviews.llvm.org/D92470	2020-12-09 07:56:44 +00:00
Craig Topper	aaa925795f	[RISCV] Use SDLoc created early in RISCVDAGToDAGISel::Select instead of recreating it in multiple cases in the switch. NFC	2020-12-08 21:13:25 -08:00
Craig Topper	846f576bea	[RISCV] Add a table showing the layout of the fields in VTYPE. Rename MaskedOffAgnostic->MaskAgnostic. NFC	2020-12-08 20:41:57 -08:00
Jinsong Ji	45b08c41bf	[PowerPC] Set SubRegIndex offset for sub_vsx1/sub_pair1 We defined SubRegIndex for 256/512 regs, but we did not set the offset for higher part, so the offset of lower and higher part are the same. This may cause problem in assessing ranges of SubReg, it is great that this haven't affected any testcases, but I think we should fix it to avoid hidden bugs in the future. Reviewed By: bsaleil, #powerpc Differential Revision: https://reviews.llvm.org/D92864	2020-12-08 22:56:44 -05:00
Sam Clegg	1b6d879ec1	[WebAssembly] Fix code generated for atomic operations in PIC mode The main this this test does is to add the `IsNotPIC` predicate to the all the atomic instructions pattern that directly refer to `tglobaladdr`. This is because in PIC mode we need to generate separate instruction sequence (either a direct global.get, or __memory_base + offset) for accessing global addresses. As part of this change I noticed that many of the `Requires` attributes added to the instruction in `WebAssemblyInstrAtomics.td` were being honored. This is because the wrapped in a `let Predicates = [HasAtomics]` block and it seems that that outer wrapping overrides any `Requires` on defs within it. As a workaround I removed the outer `let` and added `HasAtomics` to all the inner `Requires`. I believe that all the instrucitons that don't have `Requires` explicit bottom out in `ATOMIC_I` and `ATOMIC_NRI` which have `HasAtomics` so this should not remove this predicate from any patterns (at least that is the idea). The alternative to this approach looks like implementing something like `PredicateControl` in `Mips.td` where we can split the predicates into groups so they don't clobber each other. Differential Revision: https://reviews.llvm.org/D92744	2020-12-08 18:41:32 -08:00
Chen Zheng	66a03d1022	[PowerPC] prepare more dq form for P10 pair load/store Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92393	2020-12-08 21:01:40 -05:00
Craig Topper	a64998be99	[RISCV] Share VTYPE encoding code between the assembler and the CustomInserter for adding VSETVLI before vector instructions This merges the SEW and LMUL enums that each used into singles enums in RISCVBaseInfo.h. The patch also adds a new encoding helper to take SEW, LMUL, tail agnostic, mask agnostic and turn it into a vtype immediate. I also stopped storing the Encoding in the VTYPE operand in the assembler. It is easy to calculate when adding the operand which should only happen once per instruction. Differential Revision: https://reviews.llvm.org/D92813	2020-12-08 16:04:20 -08:00
Jessica Paquette	40d1fb2229	[AArch64][GlobalISel] Swap select operands when inverting condition code This was not obvious when reading the imported tablegen patterns in AArch64GenDAGISel. Update select-select.mir.	2020-12-08 14:17:26 -08:00
Jessica Paquette	21308c2b4c	[AArch64][GlobalISel] Check if G_SELECT has been optimized when folding binops `TryFoldBinOpIntoSelect` didn't have a check for `Optimized`, meaning you could end up folding twice. (e.g. a select with a G_ADD on the true side, and a G_SUB on the false side) Add in the missing `if` and a test.	2020-12-08 13:47:08 -08:00
Kazushi (Jam) Marukawa	95ea50e4ad	[VE] Correct LVLGen (LVL instruction insert pass) SX Aurora VE uses an intermediate representation similar to VP as its MIR. VE itself uses invidiual VL register as its own vector length register at the hardware level. So, LLVM needs to insert load VL (LVL) instruction just before vector instructions if the value of VL is changed. This LVLGen pass generates LVL instructions for such purpose. Previously, a bug is pointed out in D91416. This patch correct this bug and add a regression test. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D92716	2020-12-09 06:33:53 +09:00
Florian Hahn	4c69b1b98a	[AArch64] Fix rottype use in complex instr defs. It seems like the order here is wrong. Types like i32 do not take any arguments. Currently this is not a problem, because the patterns are not actually used with any nodes, but will fail once it is used with real ISD nodes. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91345	2020-12-08 21:11:33 +00:00
Harald van Dijk	29c8ea6f1a	[X86] Handle localdynamic TLS model in x32 mode D92346 added TLS_(base_)addrX32 to handle TLS in x32 mode, but missed the different TLS models. This diff fixes the logic for the local dynamic model where `RAX` was used when `EAX` should be, and extends the tests to cover all four TLS models. Fixes https://bugs.llvm.org/show_bug.cgi?id=26472. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92737	2020-12-08 21:06:00 +00:00
Austin Kerbow	4aa842a800	[AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing It is possible for copies or spills to be inserted in the middle of indirect addressing sequences which use VGPR indexing. Spills to accvgprs could be effected by the indexing mode. Add new pseudo instructions that are expanded after register allocation to avoid the problematic spill or copy placement. Differential Revision: https://reviews.llvm.org/D91048	2020-12-08 12:24:12 -08:00
Craig Topper	98bca0a605	[RISCV] Add isel patterns for SBCLRI/SBSETI/SBINVI(W) instruction We can use these instructions for single bit immediates that are too large for ANDI/ORI/CLRI. The _10 test cases are to make sure that we still use ANDI/ORI/CLRI for small immediates. Differential Revision: https://reviews.llvm.org/D92262	2020-12-08 12:22:40 -08:00
Craig Topper	fb5b611af9	[RISCV] Detect more errors when parsing vsetvli in the assembler -Reject an "mf1" lmul -Make sure tail agnostic is exactly "tu" or "ta" not just that it starts with "tu" or "ta" -Make sure mask agnostic is exactly "mu" or "ma" not just that it starts with "mu" or "ma" Differential Revision: https://reviews.llvm.org/D92805	2020-12-08 11:25:39 -08:00
Craig Topper	88e58939dc	[RISCV] When parsing vsetvli in the assembler, use StringRef::getAsInteger instead of APInt's string constructor APInt's string constructor asserts on error. Since this is the parser and we don't yet know if the string is a valid integer we shouldn't use that. Instead use StringRef::getAsInteger which returns a bool to indicate success or failure. Since we no longer need APInt, use 'unsigned' instead. Differential Revision: https://reviews.llvm.org/D92801	2020-12-08 11:25:39 -08:00
Jessica Paquette	5b5d3fa9d9	[AArch64][GlobalISel] Fold G_SELECT cc, %t, (G_ADD %x, 1) -> CSINC %t, %x, cc This implements ``` G_SELECT cc, %true, (G_ADD %x, 1) -> CSINC %true, %x, cc G_SELECT cc, (G_ADD %x, 1), %false -> CSINC %x, %false, inv_cc ``` Godbolt example: https://godbolt.org/z/eoPqKq Differential Revision: https://reviews.llvm.org/D92868	2020-12-08 10:53:37 -08:00
Jessica Paquette	cd9a52b99e	[AArch64][GlobalISel] Fold binops on the true side of G_SELECT This implements the following folds: ``` G_SELECT cc, (G_SUB 0, %x), %false -> CSNEG %x, %false, inv_cc G_SELECT cc, (G_XOR x, -1), %false -> CSINV %x, %false, inv_cc ``` This is similar to the folds introduced in `5bc0bd05e6`. In `5bc0bd05e6` I mentioned that we may prefer to do this in AArch64PostLegalizerLowering. I think that it's probably better to do this in the selector. The way we select G_SELECT depends on what register banks end up being assigned to it. If we did this in AArch64PostLegalizerLowering, then we'd end up checking every G_SELECT to see if it's worth swapping operands. Doing it in the selector allows us to restrict the optimization to only relevant G_SELECTs. Also fix up some comments in `TryFoldBinOpIntoSelect` which are kind of confusing IMO. Example IR: https://godbolt.org/z/3qPGca Differential Revision: https://reviews.llvm.org/D92860	2020-12-08 10:42:59 -08:00
Jessica Paquette	ce199667f6	[AArch64][GlobalISel] Don't explicitly write to the zero register in emitCMN This case was missed in `78ccb0359d`. Differential Revision: https://reviews.llvm.org/D92438	2020-12-08 10:42:05 -08:00
Craig Topper	3e86fbc971	[RISCV] Replace custom isel code for RISCVISD::READ_CYCLE_WIDE with isel pattern This node returns 2 results and uses a chain. As long as we use a DAG as part of the pseudo instruction definition where we can use the "set" operator, it looks like tablegen can handle use a pattern for this without a problem. I believe the original implementation was copied from PowerPC. This also fixes the pseudo instruction so that it is marked as having side effects to match the definition of CSRRS and the RV64 instruction. And we don't need to explicitly clear mayLoad/mayStore since those can be inferred now. Differential Revision: https://reviews.llvm.org/D92786	2020-12-08 10:23:37 -08:00
Huihui Zhang	8e6fc1f97e	[AArch64][SVE] Add lowering for llvm.maxnum\|minnum for scalable type. LLVM intrinsic llvm.maxnum\|minnum is overloaded intrinsic, can be used on any floating-point or vector of floating-point type. This patch extends current infrastructure to support scalable vector type. This patch also fix a warning message of incorrect use of EVT::getVectorNumElements() for scalable type, when DAGCombiner trying to split scalable vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92607	2020-12-08 09:35:53 -08:00
Jessica Paquette	b15491eb33	[AArch64][GlobalISel] Select G_SADDO and G_SSUBO We didn't have selector support for these. Selection code is similar to `getAArch64XALUOOp` in AArch64ISelLowering. Similar to that code, this returns the AArch64CC and the instruction produced. In SDAG, this is used to optimize select + overflow and condition branch + overflow pairs. (See `AArch64TargetLowering::LowerBR_CC` and `AArch64TargetLowering::LowerSelect`) (G_USUBO should be easy to add here, but it isn't legalized right now.) This also factors out the existing G_UADDO selection code, and removes an unnecessary check for s32/s64. AFAIK, we shouldn't ever get anything other than s32/s64. It makes more sense for this to be handled by the type assertion in `emitAddSub`. Differential Revision: https://reviews.llvm.org/D92610	2020-12-08 09:18:28 -08:00
Stefan Pintilie	2812c15156	[PowerPC] Fix missing nop after call to weak callee. Weak functions can be replaced by other functions at link time. Previously it was assumed that no matter what the weak callee function was replaced with it would still share the same TOC as the caller. This is no longer true as a weak callee with a TOC setup can be replaced by another function that was compiled with PC Relative and does not have a TOC at all. This patch makes sure that all calls to functions defined as weak from a caller that has a valid TOC have a nop after the call to allow a place for the linker to restore the TOC. Reviewed By: NeHuang Differential Revision: https://reviews.llvm.org/D91983	2020-12-08 09:38:44 -06:00
David Green	03e675fd12	[ARM] Turn pred_cast(xor(x, -1)) into xor(pred_cast(x), -1) This folds a not (an xor -1) though a predicate_cast, so that it can be turned into a VPNOT and potentially be folded away as an else predicate inside a VPT block. Differential Revision: https://reviews.llvm.org/D92235	2020-12-08 15:22:46 +00:00
David Green	91fb9eac0b	[ARM] Remove dead instructions before creating VPT block bundles We remove VPNOT instructions in VPT blocks as we create them, turning them into else predicates. We don't remove the dead instructions until after the block has been created though. Because the VPNOT will have killed the vpr register it used, this makes finalizeBundle add internal flags to the vpr uses of any instructions after the VPNOT. These incorrect flags can then confuse what is alive and what is not, leading to machine verifier problems. This patch removes them earlier instead, before the bundle is finalized so that kill flags remain valid. Differential Revision: https://reviews.llvm.org/D92227	2020-12-08 14:05:07 +00:00
David Sherwood	59f17b57d9	[SVE] Fix crashes with inline assembly All the crashes found compiling inline assembly are fixed in this patch by changing AArch64TargetLowering::getRegForInlineAsmConstraint to be more resilient to mismatched value and register types. For example, it makes no sense to request a predicate register for a nxv2i64 type and so on. Tests have been added here: test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll Differential Revision: https://reviews.llvm.org/D92554	2020-12-08 13:48:43 +00:00
Tim Northover	c5978f42ec	UBSAN: emit distinctive traps Sometimes people get minimal crash reports after a UBSAN incident. This change tags each trap with an integer representing the kind of failure encountered, which can aid in tracking down the root cause of the problem.	2020-12-08 10:28:26 +00:00
Qiu Chaofan	5e85a2ba16	[PowerPC] Implement intrinsic for DARN instruction Instruction darn was introduced in ISA 3.0. It means 'Deliver A Random Number'. The immediate number L means: - L=0, the number is 32-bit (higher 32-bits are all-zero) - L=1, the number is 'conditioned' (processed by hardware to reduce bias) - L=2, the number is not conditioned, directly from noise source GCC implements them in three separate intrinsics: __builtin_darn, __builtin_darn_32 and __builtin_darn_raw. This patch implements the same intrinsics. And this change also addresses Bugzilla PR39800. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92465	2020-12-08 14:08:52 +08:00
Esme-Yi	49599cb1a2	[PowerPC] Correct the bit-width definition for some imm operand in td. Summary: The imm operands of some instructions are not defined accurately in td. This is a small patch to correct these definitions. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D91603	2020-12-08 03:20:12 +00:00
Jessica Paquette	d49f6491b6	[AArch64][GlobalISel] Refactor G_BRCOND selection `selectCompareBranch` was hard to understand. Also, it was being needlessly pessimistic with the `ProduceNonFlagSettingCondBr` case. It assumed that everything in `selectCompareBranch` would emit a TB(N)Z or C(B)NZ. That's not true; the G_FCMP + G_BRCOND case would never emit those instructions, and the G_ICMP + G_BRCOND case was capable of emitting an integer compare + Bcc. - Refactor `selectCompareBranch` into separate functions based off of what is feeding the G_BRCOND's condition. - Move G_BRCOND selection code from `select` to `selectCompareBranch`. - Remove duplicated constraint code from the code originally in `select`; `emitTestBit` already handles that, so no need to constrain twice. - Factor out the G_FCMP + G_BRCOND case into `selectCompareBranchFedByFCmp`. - Split the G_ICMP + G_BRCOND case into an optimization function, `tryOptCompareBranchFedByICmp` and a general selection function, `selectCompareBranchFedByICmp`. - Reduce the number of things passed to `tryOptAndIntoCompareBranch`. - Improve documentation. - Give some variables more descriptive names. Other than improving the code generation for functions with speculative_load_hardening by getting the logic correct, this is NFC. Differential Revision: https://reviews.llvm.org/D92582	2020-12-07 17:24:23 -08:00
Jessica Paquette	195a7af0ab	[AArch64][GlobalISel] Narrow 128-bit regs to 64-bit regs in emitTestBit When we have a 128-bit register, emitTestBit would incorrectly narrow to 32 bits always. If the bit number was > 32, then we would need a TB(N)ZX. This would cause a crash, as we'd have the wrong register class. (PR48379) This generalizes `narrowExtReg` into `moveScalarRegClass`. This also allows us to remove `widenGPRBankRegIfNeeded` entirely, since `selectCopy` correctly handles SUBREG_TO_REG etc. This does create some codegen changes (since `selectCopy` uses the `all` regclass variants). However, I think that these will likely be optimized away, and we can always improve the `selectCopy` code. It looks like we should revisit `selectCopy` at this point, and possibly refactor it into at least one `emit` function. Differential Revision: https://reviews.llvm.org/D92707	2020-12-07 15:04:33 -08:00
Amara Emerson	2ac4d0f45a	[AArch64] Fix some minor coding style issues in AArch64CompressJumpTables	2020-12-07 12:48:09 -08:00
Stanislav Mekhanoshin	dd89249498	[AMDGPU] Annotate vgpr<->agpr spills in asm Differential Revision: https://reviews.llvm.org/D92125	2020-12-07 11:25:25 -08:00
Stefan Pintilie	49921d1c3c	[PowerPC] Exploitation of xxeval instruction for AND and NAND The xxeval instruction was intorduced in Power PC in Power 10. The instruction accepts three vector registers and an immediate. Depending on the value of the immediate the instruction can be used to perform certain bitwise boolean operations (and, or, xor, ...) on the given vector registers. This patch implements the AND and NAND patterns that can be used by the instruction. Reviewed By: nemanjai, #powerpc, bsaleil, NeHuang, jsji Differential Revision: https://reviews.llvm.org/D92420	2020-12-07 12:36:54 -06:00
Craig Topper	5c819eb389	[RISCV] Form GORCI from (or (rotl/rotr X, Bitwidth/2), X). A rotate by half the bitwidth swaps the bottom and top half which is the same as one of the MSB GREVI stage. We have to do this as a special combine because we prefer to keep (rotl/rotr X, BitWidth/2) as a rotate rather than a single stage GREVI. Differential Revision: https://reviews.llvm.org/D92286	2020-12-07 10:28:04 -08:00
Simon Pilgrim	c86c024e10	[X86] Fix static analyzer warnings. NFCI. Replace '\|' with '\|\|' in condition, and fix case of SignedMode variable.	2020-12-07 18:23:55 +00:00
David Green	d9bf6245bf	[ARM] Revert low overhead loops with calls before registry allocation. This adds code to revert low overhead loops with calls in them before register allocation. Ideally we would not create low overhead loops with calls in them to begin with, but that can be difficult to always get correct. If we want to try and glue together t2LoopDec and t2LoopEnd into a single instruction, we need to ensure that no instructions use LR in the loop. (Technically the final code can be better too, as it doesn't need to use the same registers but that has not been optimized for here, as reverting loops with calls is expected to be very rare). It also adds a MVETailPredUtils.h header to share the revert code between different passes, and provides a place to expand upon, with RevertLoopWithCall becoming a place to perform other low overhead loop alterations like removing copies or combining LoopDec and End into a single instruction. Differential Revision: https://reviews.llvm.org/D91273	2020-12-07 15:44:40 +00:00
Kazushi (Jam) Marukawa	9d4501e2b4	[VE] Add vcp and vex intrinsic instructions Add vcp and vex intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92752	2020-12-07 22:56:55 +09:00
Kerry McLaughlin	111f559bbd	[SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER The refineIndexType & refineUniformBase functions added by D90942 can also be used to improve CodeGen of masked gathers. These changes were split out from D91092 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92319	2020-12-07 13:20:19 +00:00
Petar Avramovic	3a042dcd2e	[AMDGPU] Fix default value of glc for mubuf rtn atomics Mubuf rtn atomics use GLC_1 thus default value for glc operand should be -1, see https://reviews.llvm.org/D90730. This allows us to report error when rtn atomic requires glc=1 but does not have glc operand in input. Differential Revision: https://reviews.llvm.org/D92654	2020-12-07 14:00:08 +01:00
Kerry McLaughlin	f6dd32fd35	[SVE][CodeGen] Lower scalable masked gathers Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only) Changes in this patch: - Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate gather load opcode to use. - Improve codegen with refineIndexType/refineUniformBase, added in D90942 - Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91092	2020-12-07 12:20:41 +00:00
Kazushi (Jam) Marukawa	03898b79fb	[VE] Add vrcp, vrsqrt, vcvt, vmrg, and vshf intrinsic instructions Add vrcp, vrsqrt, vcvt, vmrg, and vshf intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92750	2020-12-07 20:30:12 +09:00
Kazushi (Jam) Marukawa	67dbc8195d	[VE] Add vfmad, vfmsb, vfnmad, and vfnmsb intrinsic instructions Add vfmad, vfmsb, vfnmad, and vfnmsb intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92697	2020-12-07 19:28:17 +09:00
Esme-Yi	28fdeea952	[PowerPC] Add support for intrinsics dcbfps and dcbstps in P10. Summary: This patch added support for the intrinsics llvm.ppc.dcbfps and llvm.ppc.dcbstps. dcbfps and dcbstps are actually extended mnemonics of dcbf. dcbfps RA,RB ---> dcbf RA,RB,4 dcbstps RA,RB ---> dcbf RA,RB,6 Reviewed By: amyk, steven.zhang Differential Revision: https://reviews.llvm.org/D91323	2020-12-07 05:19:06 +00:00
Zi Xuan Wu	365c405411	[CSKY 2/n] Add basic tablegen infra for CSKY This introduce basic tablegen infra such as CSKY{InstrFormats,InstrInfo,RegisterInfo,}.td. For now, only add instruction definitions for basic CSKY ISA operations, and the instruction format and register info are almost complete. Our initial target is a working MC layer rather than codegen, so appropriate SelectionDAG patterns will come later. Differential Revision: https://reviews.llvm.org/D89180	2020-12-07 11:56:09 +08:00
Qiu Chaofan	efdd463050	[PowerPC] Fix chain for i1-to-fp operation A simple SELECT is used for converting i1 to floating types on ppc32, but in constrained cases, the chain is not handled properly. This patch will fix that. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92365	2020-12-07 10:38:56 +08:00
Fangrui Song	9fe1809f8c	[X86] Delete 3 unused declarations	2020-12-06 15:13:39 -08:00
Kazu Hirata	68de75ec55	[Mips] Use llvm::is_contained (NFC)	2020-12-06 10:12:55 -08:00
Simon Pilgrim	0101fb73de	[X86] Fold MOVMSK(ICMP_SGT(X,-1)) -> NOT(MOVMSK(X))) Noticed while triaging PR37506	2020-12-06 17:56:41 +00:00
Layton Kifer	ac522f8700	[DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1) Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets. Differential Revision: https://reviews.llvm.org/D91589	2020-12-06 11:52:10 -05:00
Simon Pilgrim	db900995ed	[CostModel][X86] getGatherScatterOpCost - use default implementation for alt costkinds Noticed while looking at D92701 - we only really handle TCK_RecipThroughput gather/scatter costs - for now drop back to the default implementation for non-legal gathers/scatters.	2020-12-06 14:08:26 +00:00
Fangrui Song	467b669915	[TargetMachine] Delete asan workaround `687b83ceab` has fixed the X86FastISel bug. We can revert the workaround now. Actually, the commit introduced a bug that ppc64 should be excluded.	2020-12-06 00:33:11 -08:00
Fangrui Song	687b83ceab	[X86FastISel] Fix MO_GOTPCREL GlobalValue reference in static relocation model This fixes the bug referenced by `5582a79876` which was exposed by `961f31d8ad`. With this change, `movq src@GOTPCREL, %rcx` => `movq src@GOTPCREL(%rip), %rcx`	2020-12-05 23:13:28 -08:00
Fangrui Song	a4cadc2df9	[TargetMachine] Don't imply dso_local for memprof in static relocation model The workaround is no longer needed with my previous commit to MemProfiler.cpp	2020-12-05 21:39:03 -08:00
Vitaly Buka	19e7741fef	[TargetMachine] Set dso_local for memprof Similar to `5582a79876`	2020-12-05 21:11:04 -08:00
Craig Topper	5fc8f90f0a	[RISCV] Replace a custom SDTypeProfile with SDTIntBinOp which should be sufficient here. On the surface this would be slightly less optimal for the isel table, but due to a tablegen issue with HW mode this ends up generating a smaller isel table.	2020-12-05 20:18:22 -08:00
Fangrui Song	5582a79876	[TargetMachine] Set dso_local if asan is detected AddressSanitizer instrumentation does not set dso_local on non-thread-local global variables in -fno-pic and it seems to rely on implied dso_local to work. Add a hack until we have fixed AddressSanitizer to call setDSOLocal() as appropriate. Thanks to Vitaly Buka for reporting the issue and suggesting the way to detect asan.	2020-12-05 17:51:10 -08:00
Fangrui Song	109e70d357	[TargetMachine] Drop implied dso_local for an edge case (extern_weak + non-pic + hidden) This does not deserve special handling. The code should be added to Clang instead if deemed useful. With this simplification, we can additionally delete the PIC extern_weak special case.	2020-12-05 15:52:33 -08:00
Fangrui Song	930b3398c7	[TargetMachine] Clean up TargetMachine::shouldAssumeDSOLocal after x86-32 specific hack is moved to X86Subtarget With my previous commit, X86Subtarget::classifyGlobalReference has learned to use MO_NO_FLAG for 32-bit ELF -fno-pic code, the x86-32 special case in TargetMachine::shouldAssumeDSOLocal can be removed. Since we no longer imply dso_local for function declarations, we can drop the ppc64 special case as well. This is NFC in terms of Clang emitted assembly.	2020-12-05 15:13:42 -08:00
Fangrui Song	a084c0388e	[TargetMachine] Don't imply dso_local on function declarations in Reloc::Static model for ELF/wasm clang/lib/CodeGen/CodeGenModule sets dso_local on applicable function declarations, we don't need to duplicate the work in TargetMachine:shouldAssumeDSOLocal. (Actually the long-term goal (started by r324535) is to drop TargetMachine::shouldAssumeDSOLocal.) By not implying dso_local, we will respect dso_local/dso_preemptable specifiers set by the frontend. This allows the proposed -fno-direct-access-external-data option to work with -fno-pic and prevent a canonical PLT entry (SHN_UNDEF with non-zero st_value) when taking the address of a function symbol. This patch should be NFC in terms of the Clang emitted assembly because the case we don't set dso_local is a case Clang sets dso_local. However, some tests don't set dso_local on some function declarations and expose some differences. Most tests have been fixed to be more robust in the previous commit.	2020-12-05 14:54:37 -08:00
Fangrui Song	37f0c8df47	[X86] Emit @PLT for x86-64 and keep unadorned symbols for x86-32 This essentially reverts the x86-64 side effect of r327198. For x86-32, @PLT (R_386_PLT32) is not suitable in -fno-pic mode so the code forces MO_NO_FLAG (like a forced dso_local) (https://bugs.llvm.org//show_bug.cgi?id=36674#c6). For x86-64, both `call/jmp foo` and `call/jmp foo@PLT` emit R_X86_64_PLT32 (https://sourceware.org/bugzilla/show_bug.cgi?id=22791) so there is no difference using @PLT. Using @PLT is actually favorable because this drops a difference with -fpie/-fpic code and makes it possible to avoid a canonical PLT entry when taking the address of an undefined function symbol.	2020-12-05 13:17:47 -08:00
Fangrui Song	db13a138bd	[TargetMachine] Move X86 specific shouldAssumeDSOLocal logic to X86Subtarget::classifyGlobalFunctionReference	2020-12-05 12:32:50 -08:00
Fangrui Song	68edf39ede	[TargetMachine] Simplify shouldAssumeDSOLocal by processing ExternalSymbolSDNode early The function accrues many `GV` nullness checks. Process `!GV` (ExternalSymbolSDNode) early to simplify code. Also improve a comment added in r327198 (intrinsics is a subset of ExternalSymbolSDNode). Intended to be NFC.	2020-12-05 11:40:18 -08:00
Dmitry Preobrazhensky	a0b3a9391c	[AMDGPU][MC] Improved diagnostics message for sym/expr operands See bug 48295 (https://bugs.llvm.org/show_bug.cgi?id=48295) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D92088	2020-12-05 14:05:53 +03:00
Dmitry Preobrazhensky	e97dd11977	[AMDGPU][MC] Corrected error position for invalid MOVREL src See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D92084	2020-12-05 13:23:14 +03:00
Fangrui Song	1ab9327d1c	[TargetMachine][CodeGenModule] Delete unneeded ppc32 special case from shouldAssumeDSOLocal PPCMCInstLower does not actually call shouldAssumeDSOLocal for ppc32 so this is dead. Actually Clang ppc32 does produce a pair of absolute relocations which match GCC. This also fixes a comment (R_PPC_COPY and R_PPC64_COPY do exist).	2020-12-05 00:42:07 -08:00
Fangrui Song	2ec43a7b22	[TargetMachine] Delete wasm special case from shouldAssumeDSOLocal	2020-12-04 23:22:47 -08:00
Kazu Hirata	2dc4a14e4d	[AMDGPU] Use llvm::is_contained (NFC)	2020-12-04 21:42:55 -08:00
Hsiangkai Wang	3c12307c7a	[RISCV] Formatting for easier reading (NFC) Authored-by: Hsiangkai Wang <kai.wang@sifive.com>	2020-12-04 23:11:36 -06:00
Fangrui Song	961f31d8ad	[TargetMachine] Don't imply dso_local on global variable declarations in Reloc::Static model clang/lib/CodeGen/CodeGenModule sets dso_local on applicable global variables, we don't need to duplicate the work in TargetMachine:shouldAssumeDSOLocal. (Actually the long-term goal (started by r324535) is to remove as much additional implied dso_local in TargetMachine:shouldAssumeDSOLocal as possible.) By not implying dso_local, we will respect dso_local/dso_preemptable specifiers set by the frontend. This allows the proposed -fno-direct-access-external-data option to work with -fno-pic and prevent copy relocations. This patch should be NFC in terms of the Clang behavior because the case we don't set dso_local is a case Clang sets dso_local. However, some tests don't set dso_local on some `external global` and expose some differences. Most tests have been fixed to be more robust in previous commits.	2020-12-04 19:03:41 -08:00
Craig Topper	03fc4f2e9a	[RISCV] Use fcvt.h/d/f.w if the input is an assertsexti32 not just when the input is sext_inreg.	2020-12-04 18:40:02 -08:00
Kazushi (Jam) Marukawa	23034a4a63	[VE] Add vfsqrt, vfcmp, vfmax, and vfmin intrinsic instructions Add vfsqrt, vfcmp, vfmax, and vfmin intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92651	2020-12-05 07:52:14 +09:00
Craig Topper	5baef6353e	[RISCV] Initial infrastructure for code generation of the RISC-V V-extension The companion RFC (http://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html) gives lots of details on the overall strategy, but we summarize it here: LLVM IR involving vector types is going to be selected using pseudo instructions (only MachineInstr). These pseudo instructions contain dummy operands to represent the vector type being operated and the vector length for the operation. These two dummy operands, as set by instruction selection, will be used by the custom inserter to prepend every operation with an appropriate vsetvli instruction that ensures the vector architecture is properly configured for the operation. Not in this patch: later passes will remove the redundant vsetvli instructions. Register classes of tuples of vector registers are used to represent vector register groups (LMUL > 1). Those pseudos are eventually lowered into the actual instructions when emitting the MCInsts. About the patch: Because there is a bit of initial infrastructure required, this is the minimal patch that allows us to select instructions for 3 LLVM IR instructions: load, add and store vectors of integers. LLVM IR operations have "whole-vector" semantics (as in they generate values for all the elements). Later patches will extend the information represented in TableGen. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com> Co-Authored-by: Craig Topper <craig.topper@sifive.com> Differential Revision: https://reviews.llvm.org/D89449	2020-12-04 11:39:30 -08:00
Evgeny Leviant	993eaf2d69	Recommit [TableGen][SchedModels] Fix read/write variant substitution Original commit rG112b3cb6ba49 introduced non-determinism in subtarget generator due to iteration over DenseMap. New patch fixes this changing ProcModelMapTy from DenseMap to std::map.	2020-12-04 21:50:34 +03:00
Craig Topper	c55d9af8c0	[AArch64] Add custom lowering for ISD::ABS Instead of trying to pattern match the code produced by ISD::ABS expansion, just custom legalize ISD::ABS to the desired sequence. The one test change is because a DAG combine for (neg (abs)) is no longer firing because ISD::ABS is now Custom instead of Expand. Differential Revision: https://reviews.llvm.org/D92154	2020-12-04 10:45:31 -08:00
Craig Topper	ad923edfc1	[RISCV] Add support for printing pcrel immediates as absolute addresses in llvm-objdump This makes the llvm-objdump output much more readable and closer to binutils objdump. This builds on D76591 It requires changing the OperandType for certain immediates to "OPERAND_PCREL" so tablegen will generate code to pass the instruction's address. This means we can't do the generic check on these instructions in verifyInstruction any more. Should I add it back with explicit opcode checks? Or should we add a new operand flag to control the passing of address instead of matching the name? Differential Revision: https://reviews.llvm.org/D92147	2020-12-04 10:34:12 -08:00
Jinsong Ji	c8ec685ca5	[llvm-exegesis][PowerPC] Add more register classes This PR adds more register class support in PowerPC, mark OperandType for imm and memory operands. Also added more unit tests for SnippetGenerator. Reviewed By: #powerpc, steven.zhang Differential Revision: https://reviews.llvm.org/D88044	2020-12-04 15:02:12 +00:00
Kazushi (Jam) Marukawa	e936d1e113	[VE] Add vfadd, vfsub, vfmul, and vfdiv intrinsic instructions Add vfadd, vfsub, vfmul, and vfdiv intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92649	2020-12-04 21:58:51 +09:00
Simon Pilgrim	b96a521077	[X86] LowerRotate - enable custom lowering of ROTL/ROTR vXi16 on VBMI2 targets.	2020-12-04 12:16:59 +00:00
Simon Pilgrim	d073805be6	[X86] LowerRotate - VBMI2 targets can lower vXi16 rotates using funnel shifts. Ideally we'd do this inside DAGCombine but until we can make the FSHL/FSHR opcodes legal for VBMI2 it won't help us.	2020-12-04 11:29:23 +00:00
Simon Pilgrim	df1ddc4234	[X86] Let VBMI2 non-VLX targets still use funnel shifts instructions	2020-12-04 11:06:43 +00:00
QingShan Zhang	c25b039e21	[PowerPC] Fix the regression caused by commit `9c588f53fc` Add a TypeLegal check for MVT::i1 and add the test.	2020-12-04 10:22:13 +00:00
Simon Pilgrim	8eedd18fcb	[X86] Remove unnecessary bitcast. NFC. The X86ISD::SUBV_BROADCAST node is already VT	2020-12-04 09:44:57 +00:00
Xiang1 Zhang	f2e2924463	[X86] Unbind the ebx with GOT address in regcall calling convention No register can be allocated for indirect call when it use regcall calling convention and passed 5/5+ args. For example: call vreg (ag1, ag2, ag3, ag4, ag5, ...) --> 5 regs (EAX, ECX, EDX, ESI, EDI) used for pass args, 1 reg (EBX )used for hold GOT point, so no regs can be allocated to vreg. The Intel386 architecture provides 8 general purpose 32-bit registers. RA mostly use 6 of them (EAX, EBX, ECX, EDX, ESI, EDI). 5 of this regs can be used to pass function arguments (EAX, ECX, EDX, ESI, EDI). EBX used to hold the GOT pointer when making function calls via the PLT. ESP and EBP usually be "reserved" in register allocation. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D91020	2020-12-04 10:00:13 +08:00
Craig Topper	3fcdf9ca78	[RISCV] Rename FPCCToExtend->FPOpToExpand and FPOpToExtend->FPOpToExpand. NFC These are used to call setOperationAction/setCondCodeAction with the Expand action so it seems that Expand is a better name than Extend.	2020-12-03 16:00:49 -08:00
Fangrui Song	86fa896363	Revert D90844 "[TableGen][SchedModels] Fix read/write variant substitution" This reverts commit `112b3cb6ba`. D90844 made lib/Target/AArch64/AArch64GenSubtargetInfo.inc non-deterministic.	2020-12-03 14:24:29 -08:00
Craig Topper	a18d5e3e9f	[RISCV] Merge FMV_H_X_RV32/FMV_H_X_RV64 into a single opcode. Same with FMV_X_ANYEXTH_RV32/RV64 Rather than having a different opcode for RV32 and RV64. Let's just say the integer type is XLenVT and use a single opcode for both modes. Differential Revision: https://reviews.llvm.org/D92538	2020-12-03 11:12:40 -08:00
Craig Topper	92c0d5d958	[RISCV] Remove RISCVMergeBaseOffsetOpt from the -O0 pass pipeline. Internally the pass skips any function with the optnone attribute. But that still requires checking each function. If the opt level is set to None we might as well just skip putting in the pipeline at all. This what is already done for many of the passes added by TargetPassConfig. Differential Revision: https://reviews.llvm.org/D92511	2020-12-03 09:58:25 -08:00
Kazu Hirata	a333071754	[X86] Remove DecodeVPERMVMask and DecodeVPERMV3Mask This patch removes the variants of DecodeVPERMVMask and DecodeVPERMV3Mask that take "const Constant C" as they are not used anymore. They were introduced on Sep 8, 2015 in commit `e88038f235`. The last use of DecodeVPERMVMask(const Constant C, ...) was removed on Feb 7, 2016 in commit `73fc26b44a`. The last use of DecodeVPERMV3Mask(const Constant *C, ...) was removed on May 28, 2018 in commit `dcfcfdb0d1`. Differential Revision: https://reviews.llvm.org/D91926	2020-12-03 09:12:02 -08:00
Ahmed Bougacha	f77c948d56	[Triple][MachO] Define "arm64e", an AArch64 subarch for Pointer Auth. This also teaches MachO writers/readers about the MachO cpu subtype, beyond the minimal subtype reader support present at the moment. This also defines a preprocessor macro to allow users to distinguish __arm64__ from __arm64e__. arm64e defaults to an "apple-a12" CPU, which supports v8.3a, allowing pointer-authentication codegen. It also currently defaults to ios14 and macos11. Differential Revision: https://reviews.llvm.org/D87095	2020-12-03 07:53:59 -08:00
Baptiste Saleil	45ec3a37b0	[PowerPC] Fix for excessive ACC copies due to PHI nodes When using accumulators in loops, they are passed around in PHI nodes of unprimed accumulators, causing the generation of additional prime/unprime instructions. This patch detects these cases and changes these PHI nodes to primed accumulator PHI nodes. We also add IR and MIR test cases for several PHI node cases. Differential Revision: https://reviews.llvm.org/D91391	2020-12-03 09:51:23 -06:00
Yonghong Song	286daafd65	[BPF] support atomic instructions Implement fetch_<op>/fetch_and_<op>/exchange/compare-and-exchange instructions for BPF. Specially, the following gcc intrinsics are implemented. __sync_fetch_and_add (32, 64) __sync_fetch_and_sub (32, 64) __sync_fetch_and_and (32, 64) __sync_fetch_and_or (32, 64) __sync_fetch_and_xor (32, 64) __sync_lock_test_and_set (32, 64) __sync_val_compare_and_swap (32, 64) For __sync_fetch_and_sub, internally, it is implemented as a negation followed by __sync_fetch_and_add. For __sync_lock_test_and_set, despite its name, it actually does an atomic exchange and return the old content. https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html For intrinsics like __sync_{add,sub}_and_fetch and __sync_bool_compare_and_swap, the compiler is able to generate codes using __sync_fetch_and_{add,sub} and __sync_val_compare_and_swap. Similar to xadd, atomic xadd, xor and xxor (atomic_<op>) instructions are added for atomic operations which do not have return values. LLVM will check the return value for __sync_fetch_and_{add,and,or,xor}. If the return value is used, instructions atomic_fetch_<op> will be used. Otherwise, atomic_<op> instructions will be used. All new instructions only support 64bit and 32bit with alu32 mode. old xadd instruction still supports 32bit without alu32 mode. For encoding, please take a look at test atomics_2.ll. Differential Revision: https://reviews.llvm.org/D72184	2020-12-03 07:38:00 -08:00
dfukalov	2ce38b3f03	[NFC] Reduce include files dependency. 1. Removed #include "...AliasAnalysis.h" in other headers and modules. 2. Cleaned up includes in AliasAnalysis.h. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92489	2020-12-03 18:25:05 +03:00
Paul C. Anagnostopoulos	415fab6f67	[TableGen] Eliminate the 'code' type Update the documentation. Rework various backends that relied on the code type. Differential Revision: https://reviews.llvm.org/D92269	2020-12-03 10:19:11 -05:00

... 2 3 4 5 6 ...

60633 Commits