llvm-project

Commit Graph

Author	SHA1	Message	Date
Dmitry Preobrazhensky	ce44bf2cf2	[AMDGPU][MC] Improved diagnostic messages See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D91794	2020-11-23 16:15:05 +03:00
Dmitry Preobrazhensky	e4effef330	[AMDGPU][MC] Improved diagnostic messages for invalid literals See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D91793	2020-11-23 15:48:06 +03:00
Kazushi (Jam) Marukawa	677e94c0f0	[VE] Clean canRealignStack implementation Old canRealignStack calls TRI::canRealignStack and hasReservedCallFrame. But, this hasReservedCallFrame return true whenever for VE since VE allocates call frame all the time. It means this canRealignStack is identical to TRI::canRealignStack. This patch removes VE's canRealignStack and let caller call TRI::canRealignStack directly. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91929	2020-11-23 21:09:03 +09:00
Kazushi (Jam) Marukawa	3a302349eb	[VE][NFC] Clean stack frame description Move stack frame description from VESubtarget.cpp to VEFrameLowering.cpp and add detail. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91946	2020-11-23 20:59:43 +09:00
David Green	c8c3a411c5	[ARM] Ensure MVE_TwoOpPattern is used inside Predicate's	2020-11-22 21:38:00 +00:00
Simon Pilgrim	791040cd8b	[DAG] LowerMINMAX - move default expansion to generic TargetLowering::expandIntMINMAX This is part of the discussion on D91876 about trying to reduce custom lowering of MIN/MAX ops on older SSE targets - if we can improve generic vector expansion we should be able to relax the limitations in SelectionDAGBuilder when it will let MIN/MAX ops be generated, and avoid having to flag so many ops as 'custom'.	2020-11-22 13:02:27 +00:00
Craig Topper	84b8222705	[RISCV] Use separate Lo and Hi MemOperands when expanding BuildPairF64Pseudo and SplitF64Pseudo. We generate two 4 byte loads or two stores as part of the expansion. Previously the MemOperand was set the same for both to cover the full 8 bytes. Now we set a separate 4 byte mem operand for each with a 4 byte offset for the high part.	2020-11-22 00:46:12 -08:00
Esme-Yi	1c0941e152	[PowerPC] Extend folding RLWINM + RLWINM to post-RA. Summary: We have the patterns to fold 2 RLWINMs before RA, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization too. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D89855	2020-11-22 07:37:24 +00:00
Ella Ma	1756d67934	[llvm][clang][mlir] Add checks for the return values from Target::createXXX to prevent protential null deref All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`. The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer. Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr. Reviewed By: tejohnson, MaskRay, jpienaar Differential Revision: https://reviews.llvm.org/D91410	2020-11-21 21:04:12 -08:00
Harald van Dijk	4629afa101	[X86] Include %rip for 32-bit RIP-relative relocs for x32 %rip was only included for 64-bit RIP-relative relocations, but needs to be included for 32-bit as well. Reviewed By: MaskRay, RKSimon Differential Revision: https://reviews.llvm.org/D91339	2020-11-21 09:20:20 -08:00
Kazushi (Jam) Marukawa	4a1d230fa6	[VE][NFC] Modify function order and simplify comments	2020-11-21 16:09:37 +09:00
Kazushi (Jam) Marukawa	02b2bcd940	[VE] Correct types of return/argument values for getAdjustedFrameSize() A getAdjustedFrameSize function may need to handle larger than 32 bits integer, so change int to uint64_t. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91862	2020-11-21 16:08:20 +09:00
Matt Arsenault	79f75468b4	AMDGPU: Fix counting kernel arguments towards register usage Also use DataLayout to get type size. Relying on the IR type size is also pretty broken here, since this won't perfectly capture how types are legalized.	2020-11-20 21:23:33 -05:00
Amara Emerson	c58df88886	[AArch64][GlobalISel] Make G_EXTRACT_VECTOR_ELT of <2 x p0> legal. Also fix a selection issue for this which was using LLT::isScalar() when it should have been using !isVector(), add test for that too.	2020-11-20 14:07:45 -08:00
Craig Topper	9211da4215	[RISCV] Put RV32 before RV64 in the ValueTypeByHwMode and RegInfoByHwMode lists in RISCVRegisterInfo.td Addresses post-commit feedback from `77e25b5bc8`	2020-11-20 12:10:21 -08:00
Craig Topper	77e25b5bc8	[RISCV] Remove RV32 HwMode. Use DefaultMode for RV32 Prior to this the DefaultMode was never selected, but RISCVGenDAGISel.inc, RISCVGenRegisterInfo.inc, RISCVGenGlobalISel.inc all ended up with extra table entries for that mode. This patch removes the RV32 and uses DefaultMode for RV32. This impressively reduces the size of my release+asserts llc binary by about 270K. About 15K from RISCVGenDAGISel.inc, 1-2K from RISCVGenRegisterInfo.inc, but the vast majority from RISCVGenGlobalISel.inc. Differential Revision: https://reviews.llvm.org/D90973	2020-11-20 11:16:06 -08:00
Craig Topper	6a1d8b91ed	[RISCV] Custom type legalize i32 bswap/bitreverse to GREVIW on RV64 with Zbp extension Previously we required a sra to pattern match these properly in isel. If the consumer didn't need the result sign extended we'll have an srl instead of sra and fail to match. This patch switches to custom legalizing to GREVIW using portions of D91259. Differential Revision: https://reviews.llvm.org/D91457	2020-11-20 10:41:01 -08:00
Craig Topper	78767b7f8e	[RISCV] Add RISCVISD::ROLW/RORW use those for custom legalizing i32 rotl/rotr on RV64IZbb. This should result in better utilization of RORIW since we don't need to look for a SIGN_EXTEND_INREG that may not exist. Also remove rotl/rotr isel matching to GREVI and just prefer RORI. This is to keep consistency so we don't have to match ROLW/RORW to GREVIW as well. I imagine RORI/RORIW performance will be the same or better than GREVI. Differential Revision: https://reviews.llvm.org/D91449	2020-11-20 10:25:47 -08:00
Simon Pilgrim	0341029bb4	[X86][AVX] LowerADDSAT_SUBSAT - avoid X86ISD::BLENDV in UADDSAT/USUBSAT v8i32/v4i64 lowering Use the OR(CMP,ADD) / AND(CMP,SUB) patterns like we do on SSE targets. Enable custom lowering for v8i32/v4i64 and generalize the 128-bit lowering code for any vector size - this also lets us use the slightly cheaper codegen for icmp_ugt instead of umin/umax.	2020-11-20 18:16:44 +00:00
Craig Topper	a7eae62a42	[SelectionDAG][X86][PowerPC][Mips] Replace the default implementation of LowerOperationWrapper with the X86 and PowerPC version. The default version only works if the returned node has a single result. The X86 and PowerPC versions support multiple results and allow a single result to be returned from a node with multiple outputs. And allow a single result that is not result 0 of the node. Also replace the Mips version since the new version should work for it. The original version handled multiple results, but only if the new node and original node had the same number of results. Differential Revision: https://reviews.llvm.org/D91846	2020-11-20 10:06:53 -08:00
Arthur Eubanks	ac7419bb4f	[Hexagon][NewPM] Port -hexagon-loop-idiom and add to pipeline Fixes pmpy-mod.ll under NPM Reviewed By: kparzysz Differential Revision: https://reviews.llvm.org/D91829	2020-11-20 09:34:37 -08:00
Simon Pilgrim	09a081f221	[X86][SSE] LowerADDSAT_SUBSAT - avoid X86ISD::BLENDV in UADDSAT/USUBSAT custom lowering Use the OR(CMP,ADD) / AND(CMP,SUB) patterns like we do on pre-SSE4 targets. We're still using X86ISD::BLENDV on some AVX targets as we don't do custom lowering for >= 256-bit vectors. Really this (and combineVSelectWithAllOnesOrZeros) needs moving to DAGCombiner, but pre-SSE42 we see the vXi64 comparison type as a 2 x 32-bits result so we can't just rely on ComputeNumSignBits to give us the 'all bits' result we need.	2020-11-20 16:53:01 +00:00
Alex Richardson	51e09e1d5a	[AMDGPU] Set the default globals address space to 1 This will ensure that passes that add new global variables will create them in address space 1 once the passes have been updated to no longer default to the implicit address space zero. This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't already to present to ensure bitcode backwards compatibility. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D84345	2020-11-20 15:46:53 +00:00
Sjoerd Meijer	412237dcd0	[AArch64] Enable post RA scheduler for Cortex-R82 Just something I forgot when I added the R82. Need to have a look at crypto and fusing, but will do that as a follow up. Differential Revision: https://reviews.llvm.org/D91848	2020-11-20 14:04:26 +00:00
David Green	f08c37da7b	[ARM] Disable WLSTP loops This checks to see if the loop will likely become a tail predicated loop and disables wls loop generation if so, as the likelihood for reverting is currently too high. These should be fairly rare situations anyway due to the way iterations and element counts are used during lowering. Just not trying can alter how SCEV's are materialized however, leading to different codegen. It also adds a option to disable all while low overhead loops, for debugging. Differential Revision: https://reviews.llvm.org/D91663	2020-11-20 13:30:44 +00:00
Pavel Iliin	4d7df43ffd	[AArch64] Out-of-line atomics (-moutline-atomics) implementation. This patch implements out of line atomics for LSE deployment mechanism. Details how it works can be found in llvm/docs/Atomics.rst Options -moutline-atomics and -mno-outline-atomics to enable and disable it were added to clang driver. This is clang and llvm part of out-of-line atomics interface, library part is already supported by libgcc. Compiler-rt support is provided in separate patch. Differential Revision: https://reviews.llvm.org/D91157	2020-11-20 13:30:12 +00:00
Kazushi (Jam) Marukawa	42389f1e96	[VE] Change threshold for jump table generation Implement getMinimumJumpTableEntries() to specify threshold for jump table genaration. We use 8 for the case of PIC mode to relieve the impact of PIC calculation required to implement PIC mode jump table. Update jump table regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91785	2020-11-20 21:27:18 +09:00
Sebastian Neubauer	7a18bdb350	[AMDGPU] Implement flat scratch init for pal Extract the scratch offset from the scratch buffer descriptor that is stored in the global table. Differential Revision: https://reviews.llvm.org/D91701	2020-11-20 11:14:30 +01:00
Liu, Chen3	776f92e067	[X86] Add support for vex, vex2, vex3, and evex for MASM For MASM syntax, the prefixes are not enclosed in braces. The assembly code should like: "evex vcvtps2pd xmm0, xmm1" Differential Revision: https://reviews.llvm.org/D90441	2020-11-20 16:20:19 +08:00
Bill Wendling	b2f6630739	[PowerPC] Allow a '%' prefix for registers in CFI directives Clang generates a '%' prefix for some registers in CFI directives. E.g. ".cfi_register lr, r12" becomes ".cfi_register lr, %r12" after processing. Differential Revision: https://reviews.llvm.org/D91735	2020-11-19 18:19:51 -08:00
Nikita Popov	393b9e9db3	[MemLoc] Require LocationSize argument (NFC) When constructing a MemoryLocation by hand, require that a LocationSize is explicitly specified. D91649 will split up LocationSize::unknown() into two different states, and callers should make an explicit choice regarding the kind of MemoryLocation they want to have.	2020-11-19 21:45:52 +01:00
Fraser Cormack	1ac9b54831	[RISCV] Lower GREVI and GORCI as custom nodes This moves the recognition of GREVI and GORCI from TableGen patterns into a DAGCombine. This is done primarily to match "deeper" patterns in the future, like (grevi (grevi x, 1) 2) -> (grevi x, 3). TableGen is not best suited to matching patterns such as these as the compile time of the DAG matchers quickly gets out of hand due to the expansion of commutative permutations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91259	2020-11-19 18:11:42 +00:00
Adhemerval Zanella	807320119f	[AArch64] Lower fptrunc/fpext from/to FP128t to/from FP16 The compiler-rt part which adds the emitted symbols is handled in a subsequent patch. Differential Revision: https://reviews.llvm.org/D91731	2020-11-19 15:14:50 -03:00
Sam Tebbs	8ecb015ed5	[ARM][LowOverheadLoops] Convert intermediate vpr use assertion to condition This converts the intermediate VPR use assertion to a condition in the if-statement to protect against assertion failures in case behaviuour is changed. This is a follow-up to https://reviews.llvm.org/D90935 and implements the post-approval comments. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91790	2020-11-19 17:15:45 +00:00
David Green	006b3bdedd	[ARM] Deliberately prevent inline asm in low overhead loops. NFC This was already something that was handled by one of the "else" branches in maybeLoweredToCall, so this patch is an NFC but makes it explicit and adds a test. We may in the future want to support this under certain situations but for the moment just don't try and create low overhead loops with inline asm in them. Differential Revision: https://reviews.llvm.org/D91257	2020-11-19 13:28:21 +00:00
Simon Pilgrim	14ae02fb33	[X86][AVX] Only share broadcasts of different widths from the same SDValue of the same SDNode (PR48215) D57663 allowed us to reuse broadcasts of the same scalar value by extracting low subvectors from the widest type. Unfortunately we weren't ensuring the broadcasts were from the same SDValue, just the same SDNode - which failed on multiple-value nodes like ISD::SDIVREM FYI: I intend to request this be merged into the 11.x release branch. Differential Revision: https://reviews.llvm.org/D91709	2020-11-19 12:15:18 +00:00
Simon Moll	ffe6c97f6b	[VE] VEC_BROADCAST, lowering and isel This defines the vec_broadcast SDNode along with lowering and isel code. We also remove unused type mappings for the vector register classes (all vector MVTs that are not used in the ISA go). We will implement support for short vectors later by intercepting nodes with illegal vector EVTs before LLVM has had a chance to widen them. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D91646	2020-11-19 09:44:56 +01:00
Craig Topper	6b0fc1f3c1	[RISCV] Add MemOperand to the instruction created by storeRegToStackSlot/loadRegFromStackSlot Differential Revision: https://reviews.llvm.org/D91730	2020-11-18 19:20:03 -08:00
Duncan P. N. Exon Smith	5abf76fbe3	ADT: Add assertions to SmallVector::insert, etc., for reference invalidation `2c196bbc6b` asserted that `SmallVector::push_back` doesn't invalidate the parameter when it needs to grow. Do the same for `resize`, `append`, `assign`, `insert`, and `emplace_back`. Differential Revision: https://reviews.llvm.org/D91744	2020-11-18 17:36:28 -08:00
snek	803af31e5b	[WebAssembly] Support fp reg class in r constraint Patch by snek Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D90978	2020-11-18 17:05:58 -08:00
Kazushi (Jam) Marukawa	132d6d73ea	[VE] Add vmv intrinsic instructions Add vmv intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91700	2020-11-19 08:05:35 +09:00
Hsiangkai Wang	44cd03ad04	[RISCV] Use register class VR for V instruction operands directly. @tangxingxin1008 found a bug that regard vadd.vv v1, v3, a0 as a valid V instruction. We should remove the VRegAsmOperand operand class and use VR register class directly. Patched by: tangxingxin1008, Hsiangkai Differential Revision: https://reviews.llvm.org/D91712	2020-11-19 05:59:46 +08:00
Baptiste Saleil	18db29ea6f	[PowerPC] Add peephole to remove redundant accumulator prime/unprime instructions In some situations, the compiler may insert an accumulator prime instruction and an accumulator unprime instruction with no use of that accumulator between the two. That's for example the case when we store an accumulator after assembling it or restoring it. This patch adds a peephole to remove these prime and unprime instructions. Differential Revision: https://reviews.llvm.org/D91386	2020-11-18 15:01:07 -06:00
Sebastian Neubauer	72ccec1bbc	[AMDGPU] Fix v3f16 interaction with image store workaround In some cases, the wrong amount of registers was reserved. Also enable more v3f16 tests. Differential Revision: https://reviews.llvm.org/D90847	2020-11-18 18:21:04 +01:00
Simon Pilgrim	480ad4afc2	HazardRecognizer - Fix definition/declaration argument name mismatches. NFCI. Consistently use SUnit *SU (or drop the argname entirely if not used like the other HazardRecognizer methods). Silences cppcheck warnings.	2020-11-18 16:50:52 +00:00
Gaurav Jain	06fcc4f06f	[NFC] Use [MC]Register for Hexagon target Differential Revision: https://reviews.llvm.org/D91160	2020-11-18 08:17:07 -08:00
Jay Foad	7ecf19697e	[AMDGPU] Fix and extend vccz workarounds We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways: 1. Fix the case where the def of vcc was in a previous basic block, by pessimistically assuming that vccz might be incorrect at a basic block boundary. 2. Fix the handling of pre-existing waitcnt instructions by calling generateWaitcntInstBefore before examining ScoreBrackets to determine whether there's an outstanding smem read operation. Differential Revision: https://reviews.llvm.org/D91636	2020-11-18 15:26:06 +00:00
Mikhail Goncharov	f45c052c9e	Fix unused variables in release build Differential Revision: https://reviews.llvm.org/D91705	2020-11-18 15:18:31 +01:00
Jay Foad	9f69c1bc54	[AMDGPU] Rename pseudo S_WAITCNT_IDLE to S_WAIT_IDLE. NFC.	2020-11-18 14:03:43 +00:00
Jonas Paulsson	45b8e37afc	[SystemZ] Use ISD::ABS opcode during isel. The SystemZISD::IABS node is no longer needed since ISD::ABS can be used instead. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D91697	2020-11-18 14:43:55 +01:00
Sam Tebbs	da2e4728c7	[ARM][LowOverheadLoops] Merge VCMP and VPST across VPT blocks This patch adds support for combining a VPST with a dangling VCMP from a previous VPT block. Differential Revision: https://reviews.llvm.org/D90935	2020-11-18 12:54:16 +00:00
Craig Topper	f0b0bab34d	[X86] Use GF2P8AFFINEQB to implement vector bitreverse. We can use GF2P8AFFINEQB to reverse bits in a byte. Shuffles are needed to reverse the bytes in elements larger than i8. LegalizeVectorOps takes care of inserting the shuffle for the larger element size. We already have Custom lowering for v16i8 with SSSE3, v32i8 with AVX, and v64i8 with AVX512BW. I think we might be able to use this for scalars too by moving into a vector and back. But I'll save that for a follow up as its a little more involved. Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D91515	2020-11-17 23:49:06 -08:00
Fangrui Song	f4d9d80fe4	[ARC] Correct ARCInstPrinter::getMnemonic after D90039	2020-11-17 09:08:36 -08:00
Simon Pilgrim	5f3a8074a4	[PPC] Fix dead store value clang static analyzer warning. NFCI. Simplify the SplatBits 2-byte -> 4-byte 'splat'.	2020-11-17 16:27:45 +00:00
Kazushi (Jam) Marukawa	f4517bbd73	[VE] Implement JumpTable Implement JumpTable to make BRIND work on VE. Update an existing br_jt regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91582	2020-11-17 22:43:10 +09:00
Kazushi (Jam) Marukawa	5872cab849	[VE] Correct getMnemonic https://reviews.llvm.org/D90039 breaks VE backend. So, fix it. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91619	2020-11-17 22:33:29 +09:00
Kazushi (Jam) Marukawa	3a5c0ea895	[VE] Add vbrd intrinsic instructions Add vbrd intrinsic instructions and a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91569	2020-11-17 19:04:18 +09:00
Ben Shi	9f8f8db339	[AVR] Optimize the 16-bit NEGW pseudo instruction Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D88658	2020-11-17 17:51:58 +08:00
Florian Hahn	b2f4c5fddc	[AsmWriter] Factor out mnemonic generation to accessible getMnemonic. This patch factors out the part of printInstruction that gets the mnemonic string for a given MCInst. This is intended to be used subsequently for the instruction-mix remarks to display the final mnemonic (D90040). Unfortunately making `getMnemonic` available to the AsmPrinter seems to require making it virtual. Not sure if there's a way around that with the current layering of the AsmPrinters. Reviewed By: Paul-C-Anagnostopoulos Differential Revision: https://reviews.llvm.org/D90039	2020-11-17 09:47:38 +00:00
Jessica Paquette	5bc0bd05e6	[AArch64][GlobalISel] Fold G_XOR x, -1 into G_SELECT and select CSINV When we see ``` xor = G_XOR xor_lhs, -1 select = G_SELECT cc, tval, xor ``` Fold this into ``` select = CSINV tval, xor_lhs, cc ``` Update select-select.mir to reflect the changes. For now, only handle the case where the G_XOR is the false-value for the G_SELECT. It may make more sense to handle the true-value case in post-legalizer lowering. Differential Revision: https://reviews.llvm.org/D90774	2020-11-16 14:14:14 -08:00
Michael Liao	f375885ab8	[InferAddrSpace] Teach to handle assumed address space. - In certain cases, a generic pointer could be assumed as a pointer to the global memory space or other spaces. With a dedicated target hook to query that address space from a given value, infer-address-space pass could infer and propagate that to all its users. Differential Revision: https://reviews.llvm.org/D91121	2020-11-16 17:06:33 -05:00
Kazushi (Jam) Marukawa	38621c45a8	[VE] Add lvm/svm intrinsic instructions Add lvm/svm intrinsic instructions and a regression test. Change RegisterInfo to specify that VM0/VMP0 are constant and reserved registers. This modifies a vst regression test, so update it. Also add pseudo instructions for VM512 register classes and mechanism to expand them after register allocation. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91541	2020-11-17 07:05:36 +09:00
Amara Emerson	0b6090699a	[AArch64][GlobalISel] Look through a G_ZEXT when trying to match shift-extended register offsets. The G_ZEXT in these cases seems to actually come from a combine that we do but SelectionDAG doesn't. Looking through it allows us to match "uxtw #2" addressing modes. Differential Revision: https://reviews.llvm.org/D91475	2020-11-16 10:50:46 -08:00
Craig Topper	124c93c528	[RISCV] When matching SROIW, check all 64 bits of the OR mask We need to make sure the upper 32 bits are all ones to ensure the result is properly sign extended. Previously we only checked the lower 32 bits of the mask. I've also added a check that the shift amount is less than 32. Without that the original code asserts inside maskLeadingOnes if the SROI check is removed or the SROIW pattern is checked first. I've refactored the code to use early outs to reduce nesting. I've also updated SLOIW matching with the same changes, but I couldn't find a broken test case with the existing code. Differential Revision: https://reviews.llvm.org/D90961	2020-11-16 10:08:15 -08:00
Matt Arsenault	d2e52eec51	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.	2020-11-16 11:51:06 -05:00
Matt Arsenault	a6e353b1d0	AMDGPU: Split large offsets when selecting global saddr mode When the offset doesn't fit in the immediate field, move some to voffset.	2020-11-16 11:36:01 -05:00
Jay Foad	a6ecb2eb3d	[AMDGPU] Add comments. NFC.	2020-11-16 16:34:13 +00:00
Kazushi (Jam) Marukawa	44a4f93925	[VE] Optimize leaf functions Optimize leaf functions by not generating save/restore for callee saved registers. Update regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91539	2020-11-17 00:38:01 +09:00
Simon Moll	a598c08ac8	[VE] fastcc and vreg-to-vreg copy This defines a 'fastcc' for the VE target and implements vreg-to-vreg copy for parameter passing. The 'fastcc' extends the standard CC for SX-Aurora with register passing of vector-typed parameters and return values. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D90842	2020-11-16 16:24:22 +01:00
Yonghong Song	4369223ea7	BPF: make __builtin_btf_type_id() return 64bit int Linux kernel recently added support for kernel modules https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org/ In such cases, a type id in the kernel needs to be presented as (btf id for modules, btf type id for this module). Change __builtin_btf_type_id() to return 64bit value so libbpf can do the above encoding. Differential Revision: https://reviews.llvm.org/D91489	2020-11-16 07:08:41 -08:00
Kazushi (Jam) Marukawa	37e7a80aed	[VE] Add lsv/lvs intrinsic instructions Add lsv/lvs intrinsic instructions and a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91526	2020-11-16 23:42:51 +09:00
Dmitry Preobrazhensky	65f3e121fe	[AMDGPU][MC] Corrected error position for some operands and modifiers Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D91412	2020-11-16 16:11:23 +03:00
Caroline Concatto	6c4d8f4651	[AArch64] Add check for widening instruction for SVE. This patch fixes the function isWideningInstruction for scalable vectors. Now the cost model can check the widening pattern for SVE. Differential Revision: https://reviews.llvm.org/D91260	2020-11-16 12:30:08 +00:00
Dmitry Preobrazhensky	0bee8c784b	[AMDGPU][MC] Corrected error position for swizzle() Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D91408	2020-11-16 14:37:57 +03:00
Dmitry Preobrazhensky	89df8fc0d7	[AMDGPU][MC] Corrected error position for hwreg() and sendmsg() Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D91407	2020-11-16 14:25:07 +03:00
Kazushi (Jam) Marukawa	e0c92c6c03	[VE] Add pfchv intrinsic instructions Add pfchv intrinsic instructions and a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91522	2020-11-16 20:10:44 +09:00
David Penry	48b43c9d4f	[ARM] Cortex-M7 schedule This patch adds the SchedMachineModel for Cortex-M7. It also adds test cases for the scheduling information. Details of the pipeline and descriptions are in comments in file ARMScheduleM7.td included in this patch. Differential Revision: https://reviews.llvm.org/D91355	2020-11-16 10:16:07 +00:00
Fraser Cormack	fe9dc2e54a	[RISCV] Use a macro to simplify getTargetNodeName Similar to the X86 and AMDGPU targets, this uses a macro to cut down on repetitive and error-prone code when converting RISCVISD node names to strings in getTargetNodeName. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D91414	2020-11-16 09:33:47 +00:00
Kazushi (Jam) Marukawa	15a2bacab6	[VE] Change variable capitalization Change dl to DL in VEFrameLowering.cpp. And clean some comments. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91490	2020-11-16 18:25:23 +09:00
Simon Moll	1c00d096a6	[VE] LVLGen sets VL before vector insts The VE backend represents vector instructions with an explicit 'i32' vector length operand. In the VE ISA, the vector length is always read from the VL hardware register. The LVLGen pass inserts 'lvl' instructions as necessary to set VL to the right value before each vector instruction. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D91416	2020-11-16 09:19:14 +01:00
Craig Topper	57c0c4a275	[X86] Fix crash with i64 bitreverse on 32-bit targets with XOP. We unconditionally marked i64 as Custom, but did not install a handler in ReplaceNodeResults when i64 isn't legal type. This leads to ReplaceNodeResults asserting. We have two options to fix this. Only mark i64 as Custom on 64-bit targets and let it expand to two i32 bitreverses which each need a VPPERM. Or the other option is to add the Custom handling to ReplaceNodeResults. This is what I went with.	2020-11-15 19:02:34 -08:00
Stanislav Mekhanoshin	c9821cec74	[AMDGPU] Mark sin/cos load folding as modifying the function. When the load value is folded into the sin/cos operation, the AMDGPU library calls simplifier could still mark the function as unmodified. Instead ensure if there is an early return, return whether the load was folded into the sin/cos call. Authored by MJDSys Differential Revision: https://reviews.llvm.org/D91401	2020-11-13 14:49:33 -08:00
Sam Clegg	a083b28a31	[WebAssembly] Move GlobalTLSAddress handling to WebAssemblyISelLowering. NFC I'm not why it was added to DAGToDAG oringally but it seems to make sense alongside the non-TLS version: LowerGlobalAddress Differential Revision: https://reviews.llvm.org/D91432	2020-11-13 14:35:51 -08:00
Heejin Ahn	902ea588ea	[WebAssembly] Rename atomic.notify and *.atomic.wait - atomic.notify -> memory.atomic.notify - i32.atomic.wait -> memory.atomic.wait32 - i64.atomic.wait -> memory.atomic.wait64 See https://github.com/WebAssembly/threads/pull/149. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D91447	2020-11-13 12:04:48 -08:00
Baptiste Saleil	3f78605a8c	[PowerPC] Add paired vector load and store builtins and intrinsics This patch adds the Clang builtins and LLVM intrinsics to load and store vector pairs. Differential Revision: https://reviews.llvm.org/D90799	2020-11-13 12:35:10 -06:00
Jessica Paquette	9a8bfe3835	[AArch64][GlobalISel] Select G_SELECT cc, t, (G_SUB 0, x) -> CSNEG t, x, cc When we see ``` %sub = G_SUB 0, %x %select = G_SELECT %cc, %t, %sub ``` Fold away the G_SUB by producing ``` %select = CSNEG %t, %x, cc ``` Simple IR example: https://godbolt.org/z/K8TEnh This is valid on both sides of the select, but for now, just handle one side. It may make more sense to handle swapping sides during post-legalizer lowering. Differential Revision: https://reviews.llvm.org/D90723	2020-11-13 10:12:51 -08:00
Jessica Paquette	6c20c1da1e	[AArch64][GlobalISel] NFC: Use CmpInst::isUnsigned instead of static helper Reducing some code duplication. We had a helper for checking if a predicate is unsigned. Remove that and use the existing function in Instructions.cpp. Differential Revision: https://reviews.llvm.org/D91288	2020-11-13 09:35:42 -08:00
Wouter van Oortmerssen	16f02431dc	[WebAssembly] Added R_WASM_FUNCTION_OFFSET_I64 for use with DWARF DW_AT_low_pc Needed for wasm64, see discussion in https://reviews.llvm.org/D91203 Differential Revision: https://reviews.llvm.org/D91395	2020-11-13 09:32:31 -08:00
Jessica Paquette	b184a2eccf	[GlobalISel] Add matchers for specific constants and a matcher for negations It's fairly common to need matchers for a specific constant value, or for common idioms like finding a negated register. Add - `m_SpecificICst`, which returns true when matching a specific value.. - `m_ZeroInt`, which returns true when an integer 0 is matched. - `m_Neg`, which returns when a register is negated. Also update a few places which use idioms related to the new matchers. Differential Revision: https://reviews.llvm.org/D91397	2020-11-13 09:24:54 -08:00
Matt Arsenault	e722943e05	AMDGPU: Factor out large flat offset splitting	2020-11-13 11:22:13 -05:00
Sam Clegg	a28a466210	[WebAssembly] Add new relocation type for TLS data symbols These relocations represent offsets from the __tls_base symbol. Previously we were just using normal MEMORY_ADDR relocations and relying on the linker to select a segment-offset rather and absolute value in Symbol::getVirtualAddress(). Using an explicit relocation type allows allow us to clearly distinguish absolute from relative relocations based on the relocation information alone. One place this is useful is being able to reject absolute relocation in the PIC case, but still accept TLS relocations. Differential Revision: https://reviews.llvm.org/D91276	2020-11-13 07:59:29 -08:00
Matt Arsenault	0fd6a04ba4	AMDGPU: Refactor getBaseWithOffsetUsingSplitOR usage	2020-11-13 10:58:17 -05:00
Kerry McLaughlin	306c8ab208	[SVE][CodeGen] Improve codegen of scalable masked scatters If the scatter store is able to perform the sign/zero extend of its index, this is folded into the instruction with refineIndexType(). Additionally, refineUniformBase() will return the base pointer and index from an add + splat_vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90942	2020-11-13 11:19:36 +00:00
Simon Pilgrim	a4d3691d55	Fix MSVC signed/unsigned comparison warning. NFCI.	2020-11-13 10:20:48 +00:00
Kazushi (Jam) Marukawa	02ab46ef73	[VE] Add vst intrinsic instructions Add vst intrinsic instructions and a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91406	2020-11-13 19:11:57 +09:00
Jay Foad	ad3ec08955	[AMDGPU] One more use of the new export target names. NFC.	2020-11-13 09:44:09 +00:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
Craig Topper	114f044640	[X86] Use EVT::getIntegerVT instead of MVT::getIntegerVT where the type can be i2 or i4. This was a mistake introduced in D91294. I'm not sure how to exercise this with the existing code, but I hit it while trying some follow up experiments.	2020-11-12 21:48:45 -08:00
Craig Topper	a4124e455e	[X86] When storing v1i1/v2i1/v4i1 to memory, make sure we store zeros in the rest of the byte We can't store garbage in the unused bits. It possible that something like zextload from i1/i2/i4 is created to read the memory. Those zextloads would be legalized assuming the extra bits are 0. I'm not sure that the code in lowerStore is executed for the v1i1/v2i1/v4i1 case. It looks like the DAG combine in combineStore may have converted them to v8i1 first. And I think we're missing some cases to avoid going to the stack in the first place. But I don't have time to investigate those things at the moment so I wanted to focus on the correctness issue. Should fix PR48147. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D91294	2020-11-12 21:28:18 -08:00
Stanislav Mekhanoshin	5ab1702129	[AMDGPU] Remove scratch rsrc from spill pseudos Differential Revision: https://reviews.llvm.org/D91110	2020-11-12 15:23:37 -08:00
Jessica Paquette	d0ba6c4002	[AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants Select the following: - G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc - G_SELECT cc 0, -1 -> CSINV zreg, zreg cc - G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc - G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc - G_SELECT cc, t, 1 -> CSINC t, zreg, cc - G_SELECT cc, t, -1 -> CSINC t, zreg, cc (IR example: https://godbolt.org/z/YfPna9) These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td. Unfortunately, it doesn't seem like we can import patterns that use NZCV like those ones do. E.g. ``` def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV), (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>; ``` So we have to manually select these for now. This replaces `selectSelectOpc` with an `emitSelect` function, which performs these optimizations. Differential Revision: https://reviews.llvm.org/D90701	2020-11-12 14:44:01 -08:00
Kazushi (Jam) Marukawa	410626c9b5	[VE] Support vld intrinsics Add intrinsics for vector load instructions. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91332	2020-11-13 07:34:42 +09:00
Stanislav Mekhanoshin	cf6565f6d0	[AMDGPU] Enable multi-dword flat scratch load/stores Differential Revision: https://reviews.llvm.org/D91384	2020-11-12 13:38:56 -08:00
Jay Foad	6881a82e8c	[AMDGPU] Fix scheduling of exp pos4 Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix has any effect in practice. Differential Revision: https://reviews.llvm.org/D91290	2020-11-12 19:57:14 +00:00
Jay Foad	d7d6ac5624	[AMDGPU] Define and use names for export targets. NFC. Differential Revision: https://reviews.llvm.org/D91289	2020-11-12 19:57:14 +00:00
Craig Topper	4cdf1d2110	[MSP430] Remove unused MVT::Glue output from MSP430ISD::SELECT_CC nodes. Follow up from a similar patch on RISCV `637f19c36b` Nothing reads this Glue value that I could see. The SDNode def in the td file does not have the SDNPOutGlue flag so I don't think this glue would get properly propagated to MachineSDNodes if it was used.	2020-11-12 10:34:01 -08:00
Craig Topper	0add5f9122	[RISCV] Don't include CodeGen layer files in MC layer -Use MCRegister instead of Register in MC layer. -Move some enums from RISCVInstrInfo.h to RISCVBaseInfo.h to be with other TSFlags bits. Differential Revision: https://reviews.llvm.org/D91114	2020-11-12 07:45:38 -08:00
Craig Topper	9ca02d6fe1	[RISCV] Add an ANDI to shift amount of FSL/FSR instructions The fshl and fshr intrinsics are defined to modulo their shift amount by the bitwidth of one of their inputs. The FSR/FSL instructions read one extra bit from the shift amount. If that bit is set the inputs are swapped. In order to preserve the semantics of the llvm intrinsics we need to make sure that the extra bit isn't set. DAG combine or instcombine may have removed any mask that was originally present. We could be smarter here and try to use computeKnownBits to check if the bit is known zero, but wanted to start with correctness. Differential Revision: https://reviews.llvm.org/D90905	2020-11-12 07:33:40 -08:00
David Green	11dee2eae2	[ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP Of course there was something missing, in this case a check that the def of the count register we are adding to a t2DoLoopStartTP would dominate the insertion point. In the future, when we remove some of these COPY's in between, the t2DoLoopStartTP will always become the last instruction in the block, preventing this from happening. In the meantime we need to check they are created in a sensible order. Differential Revision: https://reviews.llvm.org/D91287	2020-11-12 13:47:46 +00:00
Kazushi (Jam) Marukawa	a72d384249	[VE] Change the default type of v64 register class Change the default type of v64 register class from v512i32 to v256f64. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91301	2020-11-12 19:07:07 +09:00
David Sherwood	3225fcf11e	[SVE] Deal with SVE tuple call arguments correctly when running out of registers When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. I've fixed this by ensuring that: 1. When we don't have enough registers to allocate the whole block we mark any remaining SVE registers temporarily as allocated. 2. We temporarily remove the InConsecutiveRegs flags from the last tuple part argument and reinvoke the autogenerated calling convention handler. Doing this prevents the code from entering an infinite recursion and, in combination with 1), ensures we switch over to the Indirect mode. 3. After allocating a GPR register for the pointer to the tuple we then deallocate any SVE registers we marked as allocated in 1). We also set the InConsecutiveRegs flags back how they were before. 4. I've changed the AArch64ISelLowering LowerCALL and LowerFormalArguments functions to detect the start of a tuple, which involves allocating a single stack object and doing the correct numbers of legal loads and stores. Differential Revision: https://reviews.llvm.org/D90219	2020-11-12 08:41:50 +00:00
Amara Emerson	ad376657c1	[AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB.	2020-11-11 22:46:53 -08:00
Baptiste Saleil	37c4ac8545	[PowerPC] Accumulator/Unprimed Accumulator register copy, spill and restore This patch adds support for accumulator/unprimed accumulator register copy, spill and restore for MMA. Authored By: Baptiste Saleil Reviewed By: #powerpc, bsaleil, amyk Differential Revision: https://reviews.llvm.org/D90616	2020-11-11 16:23:45 -06:00
Jessica Paquette	7a70a2f04d	[AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support When there is full fp16 support, there is no reason to widen 16-bit G_FCONSTANTs to 32 bits. Mark them as legal in this case. Also, we currently import a pattern for materializing a 16-bit 0.0. Add a testcase showing we select it. (All other 16-bit G_FCONSTANTS are not yet selected.) Differential Revision: https://reviews.llvm.org/D89164	2020-11-11 13:25:11 -08:00
Craig Topper	637f19c36b	[RISCV] Remove traces of Glue from RISCVISD::SELECT_CC We were creating RISCVISD::SELECT_CC nodes with Glue output that was never being used, and the tablegen SDNode had the SDNPInGlue flag instead of the SDNPOutGlue flag. Since we don't seem to need the Glue just get rid of it from both places. Differential Revision: https://reviews.llvm.org/D91199	2020-11-11 09:30:48 -08:00
Jessica Paquette	c42053f79b	[AArch64][GlobalISel] Select arith extended add/sub in manual selection code The manual selection code for add/sub was not checking if it was possible to fold in shifts + extends (the *rx opcode variants). As a result, we could never select things like ``` cmp x1, w0, uxtw #2 ``` Because we don't import any patterns for compares. This adds support for the arithmetic shifted register forms and updates tests for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`. This is a 0.1% geomean code size improvement on SPECINT2000 at -Os. Differential Revision: https://reviews.llvm.org/D91207	2020-11-11 09:26:03 -08:00
Jessica Paquette	f0580c73bb	[AArch64][GlobalISel] Select negative arithmetic immediates in manual selector Previously, we only handled negative arithmetic immediates in the imported selector code. Since we don't import code for, say, compares, we were missing opportunities for things like ``` %cst:gpr(s64) = G_CONSTANT i64 -10 %cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst -> %adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv %cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv ``` Instead, we would have to materialize the constant and emit a SUBS. This adds support for selection like above for SUB, SUBS, ADD, and ADDS. This is a 0.1% geomean code size improvement on SPECINT2000 at -Os. Differential Revision: https://reviews.llvm.org/D91108	2020-11-11 09:20:05 -08:00
Jay Foad	f23c4c6f8a	[AMDGPU] Separate out real exp instructions by subtarget. NFC. Differential Revision: https://reviews.llvm.org/D91247	2020-11-11 17:13:40 +00:00
Jay Foad	2b33ea6935	[AMDGPU] Split exp instructions out into their own tablegen file. NFC. Differential Revision: https://reviews.llvm.org/D91246	2020-11-11 17:13:40 +00:00
Jay Foad	f94fd1c8ca	[AMDGPU] Make use of SIInstrInfo::isEXP. NFC.	2020-11-11 17:01:20 +00:00
Jay Foad	830ed64ccd	Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"" This reverts commit `8b08fa0103`. The underlying problems were fixed by D90607.	2020-11-11 14:40:14 +00:00
Caroline Concatto	37f4ccb275	[AArch64]Add memory op cost model for SVE This patch adds/fixes memory op cost model for SVE with fixed-width vector. Differential Revision: https://reviews.llvm.org/D90950	2020-11-11 12:49:19 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Kerry McLaughlin	170947a5de	[SVE][CodeGen] Lower scalable masked scatters Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only) Changes included in this patch: - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use. Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts. - Added the getCanonicalIndexType function to convert redundant addressing modes (e.g. scaling is redundant when accessing bytes) - Tests with 32 & 64-bit scaled & unscaled offsets Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90941	2020-11-11 11:50:22 +00:00
Kerry McLaughlin	ffbbfc76ca	[SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter(). Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print the details of masked scatters (is truncating, signed or scaled). This is the first in a series of patches which adds support for scalable masked scatters Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90939	2020-11-11 10:58:24 +00:00
Sam Parker	898a81dfc5	[NFC][ARM] Replace lambda with any_of	2020-11-11 10:02:55 +00:00
Amara Emerson	2262393090	[AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG. These do things like turn a multiply of a pow-2+1 into a shift and and add, which is a common pattern that pops up, and is universally better than expensive madd instructions with a constant. I've added check lines to an existing codegen test since the code being ported is almost identical, however the mul by negative pow2 constant tests don't generate the same code because we're missing some generic G_MUL combines still. Differential Revision: https://reviews.llvm.org/D91125	2020-11-10 22:21:13 -08:00
Gaurav Jain	3726b14428	[NFC] Use [MC]Register for x86 target Differential Revision: https://reviews.llvm.org/D91161	2020-11-10 15:49:39 -08:00
Kazushi (Jam) Marukawa	dd6f607ea8	[VE] Implement FoldImmediate Implement FoldImmediate for only integer aritihmetic operations. Add regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91150	2020-11-11 08:08:32 +09:00
Pirama Arumuga Nainar	8262e94a6d	[ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt. Previously we used setRegClass to rgpr, which may expand the register domain if the result was already in a constrained class (tcgpr in the above PR). Differential Revision: https://reviews.llvm.org/D91192	2020-11-10 13:38:11 -08:00
Stanislav Mekhanoshin	544ef42e40	[AMDGPU] Set default op_sel_hi on accvgpr read/write These are opsel opcodes with op_sel actually being ignored. As a such op_sel_hi needs to be set to default 1 even though these bits are ignored. This is compatibility change. Differential Revision: https://reviews.llvm.org/D91202	2020-11-10 13:07:29 -08:00
Benjamin Kramer	92c61a045f	[ARM] Silence unused variable warning in Release builds. NFC.	2020-11-10 20:35:28 +01:00
Craig Topper	70b481e8db	[RISCV] Add missing copyright header to RISCVBaseInfo.cpp. NFC	2020-11-10 11:33:08 -08:00
David Green	08d1c2d470	[ARM] Introduce t2DoLoopStartTP This introduces a new pseudo instruction, almost identical to a t2DoLoopStart but taking 2 parameters - the original loop iteration count needed for a low overhead loop, plus the VCTP element count needed for a DLSTP instruction setting up a tail predicated loop. The idea is that the instruction holds both values and the backend ARMLowOverheadLoops pass can pick between the two, depending on whether it creates a tail predicated loop or falls back to a low overhead loop. To do that there needs to be something that converts a t2DoLoopStart to a t2DoLoopStartTP, for which this patch repurposes the MVEVPTOptimisationsPass as a "tail predication and vpt optimisation" pass. The extra operand for the t2DoLoopStartTP is chosen based on the operands of VCTP's in the loop, and the instruction is moved as late in the block as possible to attempt to increase the likelihood of making tail predicated loops. Differential Revision: https://reviews.llvm.org/D90591	2020-11-10 18:08:12 +00:00
Jay Foad	bb8d1437a6	[AMDGPU] Simplify multiclass EXP_m. NFC.	2020-11-10 17:28:36 +00:00
David Green	dbe1bf63aa	[ARM] Cleanup for ARMLowOverheadLoops. NFC	2020-11-10 17:28:07 +00:00
David Green	c7e275388e	[ARM] Don't aggressively unroll vector remainder loops We already do not unroll loops with vector instructions under MVE, but that does not include the remainder loops that the vectorizer produces. These remainder loops will be rarely executed and are not worth unrolling, as the trip count is likely to be low if they get executed at all. Luckily they get llvm.loop.isvectorized to make recognizing them simpler. We have wanted to do this for a while but hit issues with low overhead loops being reverted due to difficult registry allocation. With recent changes that seems to be less of an issue now. Differential Revision: https://reviews.llvm.org/D90055	2020-11-10 17:01:31 +00:00
David Green	73a6cd4b6b	[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely. The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter. Differential Revision: https://reviews.llvm.org/D89883	2020-11-10 16:28:57 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
Kazushi (Jam) Marukawa	543b30db06	[VE][NFC] Change cast to dyn_cast We used cast where we should use dyn_cast. So, change it this time. Old code cause problems if I implement brind instruction and compile openmp using new compiler. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91151	2020-11-10 21:49:16 +09:00
Pablo Barrio	642b21beba	[AArch64] Enable RAS 1.1 system registers in all AArch64 Some use cases (e.g. kernel devs) have strict requirements to only enable features available with -march=armv8-a, e.g. no armv8.1-a. Enabling RAS 1.1 in all AArch64 means they can consider to support it. Bear in mind that the first versions of the Armv8 architecture still do not support RAS 1.1. This patch only lets devs write code with the user-friendly register mnemonic instead of the ugly generic S<op0>_<op1>_<Cn>_<Cm>_<op2>. They still need to place runtime checks to make sure that the CPU to run on supports RAS 1.1. Differential Revision: https://reviews.llvm.org/D90594	2020-11-10 12:13:33 +00:00
Kazushi (Jam) Marukawa	c84b2c49be	[VE] Support inline assembly with vector regsiters Support inline assembly with vector registers. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91146	2020-11-10 20:55:38 +09:00
Mirko Brkusanin	a75d6178b8	[GlobalISel] Add combine for (x \| mask) -> x when (x \| mask) == x If we have a mask, and a value x, where (x \| mask) == x, we can drop the OR and just use x. Differential Revision: https://reviews.llvm.org/D90952	2020-11-10 11:32:13 +01:00
Mirko Brkusanin	fb36ab0a42	[GlobalISel] Expand combine for (x & mask) -> x when (x & mask) == x We can use KnownBitsAnalysis to cover cases when mask is not trivial. It can also help with cases when mask is not constant but can still be folded into one. Since 'and' is comutative we should treat both operands as possible replacements. Differential Revision: https://reviews.llvm.org/D90674	2020-11-10 11:32:13 +01:00
Kazushi (Jam) Marukawa	b65ef65b22	[VE] Support inline assembly Support inline assembly with scalar registers. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91119	2020-11-10 18:56:22 +09:00
Jay Foad	0ad4d04002	[AMDGPU] Remove an unused return value. NFC. Differential Revision: https://reviews.llvm.org/D91063	2020-11-10 09:15:14 +00:00
Esme-Yi	6e0ad5bc8c	[PowerPC] Add an ISEL pattern for Mul with Imm. Summary: This patch try to do the following transformation if the multiplier doen't fit int16: (mul X, c1 << c2) -> (rldicr (mulli X, c1) c2) Reviewed By: jsji, steven.zhang Differential Revision: https://reviews.llvm.org/D87384	2020-11-10 06:52:39 +00:00
Carl Ritson	fde8351743	[AMDGPU] Fix lowering of S_MOV_{B32,B64}_term If the source of S_MOV_{B32,B64}_term is an immediate then it cannot be lowered to a COPY. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90451	2020-11-10 12:16:31 +09:00
Eric Astor	d657f7cd30	[ms] [llvm-ml] Support MASM's relational operators (EQ, LT, etc.) Support the named relational operators (EQ, LT, etc.). Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D89733	2020-11-09 14:01:36 -05:00
Francesco Petrogalli	9f61931e07	[llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests. For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D90606	2020-11-09 18:27:48 +00:00

1 2 3 4 5 ...

60269 Commits