llvm-project

Commit Graph

Author	SHA1	Message	Date
Adhemerval Zanella	dad55c2218	[ARM] [ELF] Fix ARMMaterializeGV for Indirect calls Recent shouldAssumeDSOLocal changes (introduced by `961f31d8ad`) do not take in consideration the relocation model anymore. The ARM fast-isel pass uses the function return to set whether a global symbol is loaded indirectly or not, and without the expected information llvm now generates an extra load for following code: ``` $ cat test.ll @__asan_option_detect_stack_use_after_return = external global i32 define dso_local i32 @main(i32 %argc, i8** %argv) #0 { entry: %0 = load i32, i32* @__asan_option_detect_stack_use_after_return, align 4 %1 = icmp ne i32 %0, 0 br i1 %1, label %2, label %3 2: ret i32 0 3: ret i32 1 } attributes #0 = { noinline optnone } $ lcc test.ll -o - [...] main: .fnstart [...] movw r0, :lower16:__asan_option_detect_stack_use_after_return movt r0, :upper16:__asan_option_detect_stack_use_after_return ldr r0, [r0] ldr r0, [r0] cmp r0, #0 [...] ``` And without 'optnone' it produces: ``` [...] main: .fnstart [...] movw r0, :lower16:__asan_option_detect_stack_use_after_return movt r0, :upper16:__asan_option_detect_stack_use_after_return ldr r0, [r0] clz r0, r0 lsr r0, r0, #5 bx lr [...] ``` This triggered a lot of invalid memory access in sanitizers for arm-linux-gnueabihf. I checked this patch both a stage1 built with gcc and a stage2 bootstrap and it fixes all the Linux sanitizers issues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D95379	2021-01-26 15:57:55 -03:00
Craig Topper	f9d7f77267	[RISCV] Have customLegalizeToWOp truncate to the original type instead of i32 now that we use it for i8/i16 as well. `239cfbccb0` add support for legalizing i8/i16 UDIV/UREM/SDIV to use *W instructions. So we need to truncate to i8/i16 if we're legalizing one of those.	2021-01-26 10:50:03 -08:00
Matt Arsenault	5f9707b796	AMDGPU: Fix redundant FP spilling/assert in some functions If a function has stack objects, and a call, we require an FP. If we did not initially have any stack objects, and only introduced them during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills would end up spilling the FP register as if it were a normal register. This would result in an assert in a debug build, or redundant handling of the FP register in a release build. Try to predict that we will have an FP later, although this is ugly.	2021-01-26 13:01:45 -05:00
Matt Arsenault	92d1195b5f	AMDGPU: Add assertion to determineCalleeSaves Make sure this isn't getting called multiple times. I was surprised we were modifying the function here, which I think is a bit questionable.	2021-01-26 13:01:45 -05:00
Simon Pilgrim	f82cff31d3	[AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI. Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions	2021-01-26 16:09:39 +00:00
Simon Pilgrim	ee3da8958a	[AMDGPU] Fix null-dereference static analysis warnings. NFCI. Avoid repeated calls to isZeroValue() and check for a null pointer before dereferencing a dyn_cast<>.	2021-01-26 15:43:59 +00:00
Matt Arsenault	551a69e418	AMDGPU: Clear IsSSA property in SIFormMemoryClauses Fixes verifier error when writing MIR testcases	2021-01-26 10:40:41 -05:00
Mirko Brkusanin	608ac62540	[AMDGPU] Fix use of HasModifiers in VopProfile HasModifiers should be true if at least one modifier is used. This should make the use of this field bit more consistent. Differential Revision: https://reviews.llvm.org/D94795	2021-01-26 15:21:11 +01:00
Dmitry Preobrazhensky	745064e36b	[AMDGPU][MC] Refactored exp tgt handling Summary: - Separated tgt encoding from parsing; - Separated tgt decoding from printing; - Improved errors handling; - Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0 Reviewers: arsenm, rampitec, foad Differential Revision: https://reviews.llvm.org/D95216	2021-01-26 14:54:15 +03:00
Craig Topper	bfc60acd98	[RISCV] Adjust RISCVInstrInfoVSDPatterns.td for different pseudo instructions for different FPR. Move the Suffix string into the VTypeInfo class so we don't need a helper class to get to it. Adjust pseudo naming scheme for FPRs to put F16/F32/F64 in place of F in the pseudo instruction name rather than as a suffix. This avoids special cases like VFMERGE from the original patch. Differential Revision: https://reviews.llvm.org/D95404	2021-01-26 01:00:50 -08:00
Freddy Ye	b3b0acdc6f	[NFC] Refine some uninitialized used variables. These warning are reported by static code analysis tool: Klocwork Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D95421	2021-01-26 16:51:05 +08:00
Hsiangkai Wang	e72b22a40b	[RISCV] Define different pseudo instructions for different FPR. When spilling, the spill size will depend on the size of register class. For .vf vector instructions, it may spill the floating point scalar argument. In order to use the correct load/store instructions for spilling, we need to provide the correct floating point register class for the .vf vector pseudo instructions. In this commit, we define the .vf pseudo instructions as three different kinds of pseudo instructions for half/float/double. For example, PseudoVFADD_M1 will become as PseudoVFADD_F16_M1, PseudoVFADD_F32_M1, and PseudoVFADD_F64_M1. Differential Revision: https://reviews.llvm.org/D95234	2021-01-26 15:48:35 +08:00
Hsiangkai Wang	f19849a07b	[RISCV] Update V extension to v1.0-draft 08a0b464. Differential Revision: https://reviews.llvm.org/D94583	2021-01-26 12:02:43 +08:00
Hsiangkai Wang	b69932b550	[RISCV] Implement vlsegff intrinsics. Differential Revision: https://reviews.llvm.org/D95303	2021-01-26 12:02:43 +08:00
Kazu Hirata	c85b6bf33c	[AMDGPU] Forward-declare MachineIRBuilder (NFC) AMDGPULegalizerInfo.h needs MachineIRBuilder but relies on a forward declaration of MachineIRBuilder in LegalizerInfo.h. This patch adds a forward declaration right in AMDGPULegalizerInfo.h. While we are at it, this patch removes the one in LegalizerInfo.h, where it is unnecessary.	2021-01-25 19:24:01 -08:00
Nemanja Ivanovic	8018f731f0	[PowerPC] Do not emit HW loop with half precision operations If a loop has any operations on half precision values, there will be calls to library functions on Power8. Even on Power9, there is a small subset of instructions that are actually supported for the type. This patch disables HW loops whenever any operations on the type are found (other than the handfull of supported ones when compiling for Power9). Fixes a few PR's opened by Julia: https://bugs.llvm.org/show_bug.cgi?id=48785 https://bugs.llvm.org/show_bug.cgi?id=48786 https://bugs.llvm.org/show_bug.cgi?id=48519 Differential revision: https://reviews.llvm.org/D94980	2021-01-25 20:55:56 -06:00
Craig Topper	15f66cf749	[RISCV] Add isel patterns to optimize slli.uw patterns without Zba extension. This pattern can occur when an unsigned is used to index an array on RV64. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D95290	2021-01-25 16:12:08 -08:00
Changpeng Fang	5b648df1a8	AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause Summary: RPTracker::reset(MI) is a very expensive call when the number of virtual registers is huge. We observed a long compilation time issue when RPT::reset() is called once for each cluster. In this work, we call RPT.reset() only at the first seen cluster, and use advance() to get the register pressure for the later clusters in the same basic block. This could effectively reduce the number of the expensive calls and thus reduce the compile time. Reviewers: rampitec Fixes: SWDEV-239161 Differential Revision: https://reviews.llvm.org/D95273	2021-01-25 16:08:08 -08:00
Fraser Cormack	15141cd115	[RISCV] Add RVV insertelt/extractelt scalable-vector patterns Original patch by @rogfer01. This patch adds support for insertelt and extractelt operations on scalable vectors. Special care must be taken on RV32 when dealing with i64 vectors as there are no straightforward ways to insert a 64-bit element without a register of that size. To that end, both are custom-lowered to different sequences. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Fraser Cormack <fraser@codeplay.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94615	2021-01-25 22:03:52 +00:00
Richard Smith	925ae8c790	Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" This reverts commit `53176c1680`, which introduceed a layering violation. LLVM's IR library can't include headers from Analysis.	2021-01-25 13:53:38 -08:00
Konstantin Zhuravlyov	2cdb34efda	Revert "[IndirectFunctions] Skip propagating attributes to address taken functions" This reverts commit `dd8ae42674`. This commit causes infinite loop when compiling rocThrust and hipCUB. Differential Revision: https://reviews.llvm.org/D95389	2021-01-25 15:58:06 -05:00
Akira Hatanaka	53176c1680	[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end annotates calls with attribute "clang.arc.rv"="retain" or "clang.arc.rv"="claim", which indicates the call is implicitly followed by a marker instruction and a retainRV/claimRV call that consumes the call result. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the annotated calls in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the annotated calls. It doesn't remove the attribute on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes the autoreleaseRV call in the callee that returns the result if nothing in the callee prevents it from being paired up with the calls annotated with "clang.arc.rv"="retain/claim" in the caller. If the call is annotated with "claim", a release call is inserted since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the attributes to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV call returning the callee result, which makes it impossible to pair it up with the retainRV or claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the call is annotated with "retain" and does nothing if it's annotated with "claim". - This patch teaches dead argument elimination pass not to change the return type of a function if any of the calls to the function are annotated with attribute "clang.arc.rv". This is necessary since the pass can incorrectly determine nothing in the IR uses the function return, which can happen since the front-end no longer explicitly emits retainRV/claimRV calls in the IR, and change its return type to 'void'. Future work: - Use the attribute on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the attributes. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-01-25 11:57:08 -08:00
Craig Topper	239cfbccb0	[RISCV] Custom type legalize i8/i16 UDIV/UREM/SDIV on RV64 so we can use divuw/remuw/divw. This makes our i8/i16 codegen more similar to the i32 codegen. I've also added computeKnownBits support for DIVUW/REMUW so that we can remove zero extending ANDs from the output. Without this we end up turning DIVUW/REMUW back into DIVU/REMU via some isel patterns. Reviewed By: frasercrmck, luismarques Differential Revision: https://reviews.llvm.org/D95322	2021-01-25 10:47:22 -08:00
Reid Kleckner	988a5334ed	[Win64] Ensure all stack frames are 8 byte aligned The unwind info format requires that all adjustments are 8 byte aligned, and the bottom three bits are masked out. Most Win64 calling conventions have 32 bytes of shadow stack space for spilling parameters, and I believe that constructing these fixed stack objects had the side effect of ensuring an alignment of 8. However, the Intel regcall convention does not have this shadow space, so when using that convention, it was possible to make a 4 byte stack frame, which was impossible to describe with unwind info. Fixes pr48867	2021-01-25 10:39:27 -08:00
Nemanja Ivanovic	1150bfa6bb	[PowerPC] Add missing negate for VPERMXOR on little endian subtargets This intrinsic is supposed to have the permute control vector complemented on little endian systems (as the ABI specifies and GCC implements). With the current code gen, the result vector is byte-reversed. Differential revision: https://reviews.llvm.org/D95004	2021-01-25 12:23:33 -06:00
Craig Topper	4eb4f8963f	[RISCV] Use sign extend for i32 arguments and returns in makeLibCall on RV64. As far as I know 32 bits arguments and returns on RV64 are always sign extended to i64. So I think we should be taking this into account around libcalls. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D95285	2021-01-25 09:33:48 -08:00
Dmitry Preobrazhensky	558b3bbb5b	[AMDGPU][MC] Improved errors handling for SDWA operands Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D95212	2021-01-25 19:02:53 +03:00
Simon Pilgrim	13f2aee783	[X86][AVX] Generalize vperm2f128/vperm2i128 patterns to support all legal 256-bit vector types Remove bitcasts to/from v4x64 types through vperm2f128/vperm2i128 ops to help improve shuffle combining and demanded vector elts folding.	2021-01-25 15:35:36 +00:00
Simon Pilgrim	821a51a9ca	[X86][AVX] combineX86ShuffleChainWithExtract - widen to at least original root size. NFCI. We're relying on the source inputs for shuffle combining having already been widened to the root size (otherwise the offset logic falls over) - we're going to be supporting different sized shuffle inputs soon, so we need to explicitly make the minimum widened width the original root size.	2021-01-25 13:45:37 +00:00
Simon Pilgrim	1b780cf32e	[X86][AVX] LowerTRUNCATE - avoid bitcasts around extract_subvectors. We allow extract_subvector lowering of all legal types, so pre-bitcast the source type to try and reduce bitcast pollution.	2021-01-25 12:10:36 +00:00
Simon Pilgrim	f461e35cba	[X86][AVX] combineX86ShuffleChain - avoid bitcasts around insert_subvector() shuffle patterns. We allow insert_subvector lowering of all legal types, so don't always cast to the vXi64/vXf64 shuffle types - this is only necessary for X86ISD::SHUF128/X86ISD::VPERM2X128 patterns later.	2021-01-25 11:35:45 +00:00
Fraser Cormack	fde2466171	[SelectionDAG] Support scalable-vector splats in more cases This patch adds support for scalable-vector splats in DAGCombiner's `isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions, which enable the SelectionDAG div/rem-by-constant optimizations for scalable vector types. It also fixes up one case where the UDIV optimization was generating a SETCC without first consulting the target for its preferred SETCC result type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94501	2021-01-25 10:58:15 +00:00
Sjoerd Meijer	815dd4b292	[AArch64] Add Cortex CPU subtarget features for instruction fusion. This adds subtarget features for AES, literal, and compare and branch instruction fusion for different Cortex CPUs. Patch by: Cassie Jones. Differential Revision: https://reviews.llvm.org/D94457	2021-01-25 09:11:29 +00:00
Simon Cook	a7c1239f37	[RISCV] Add attribute support for all supported extensions This adds support for ".attribute arch" for all extensions that are currently supported by the compiler. Differential Revision: https://reviews.llvm.org/D94931	2021-01-25 08:58:53 +00:00
Fangrui Song	d745b82de1	[XRay] Support DW_TAG_call_site and delete unneeded PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL lowering	2021-01-25 00:49:18 -08:00
Andre Vieira	8fbc1437c6	[AArch64] Merge [US]MULL with half adds and subs into [US]ML[AS]L This patch adds patterns to teach the AArch64 backend to merge [US]MULL instructions and adds/subs of half the size into [US]ML[AS]L where we don't use the top half of the result. Differential Revision: https://reviews.llvm.org/D95218	2021-01-25 07:58:12 +00:00
QingShan Zhang	ffc3e800c6	[NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero For now, we correct the result for sqrt if iteration > 0. This doesn't make sense as they are not strict relative. Reviewed By: dmgreen, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D94480	2021-01-25 04:02:44 +00:00
Chen Zheng	0ed4cf4bf3	[PowerPC] support register pressure reduction in machine combiner. Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071	2021-01-24 21:28:21 -05:00
Carl Ritson	a80ebd0179	[AMDGPU] Fix llvm.amdgcn.init.exec and frame materialization Frame-base materialization may insert vector instructions before EXEC is initialised. Fix this by moving lowering of llvm.amdgcn.init.exec later in backend. Also remove SI_INIT_EXEC_LO pseudo as this is not necessary. Reviewed By: ruiling Differential Revision: https://reviews.llvm.org/D94645	2021-01-25 08:31:17 +09:00
Craig Topper	12d0753aca	[RISCV] Use bitsLE instead of strict == MVT::i32 in assertsexti32 and assertzexti32. The patterns that use this really want to know if the operand has at least 32 sign/zero bits. This increases opportunities to use W instructions when the original source used i8/i16. Not sure how much this matters for performance, but it makes i8/i16 code more consistent with i32.	2021-01-24 13:58:14 -08:00
Simon Cook	f3f3c9c254	[RISCV] Fix name of Zba extension (NFC)	2021-01-24 21:02:34 +00:00
Kazu Hirata	16baad8f4e	[llvm] Use pop_back_val (NFC)	2021-01-24 12:18:57 -08:00
Kazu Hirata	054444177b	[Target] Use llvm::append_range (NFC)	2021-01-24 12:18:56 -08:00
Craig Topper	116177afcc	[RISCV] Use SRLIWPat in the PACKUW pattern. This makes the code more tolerant if we ever change SimplifyDemandedBits to not remove 1s from the lsbs of a contiguous mask.	2021-01-24 10:41:58 -08:00
Craig Topper	c50457f3e4	[RISCV] Make the code in MatchSLLIUW ignore the lower bits of the AND mask where the shift has guaranteed zeros. This avoids being dependent on SimplifyDemandedBits having cleared those bits. It could make sense to teach SimplifyDemandedBits to keep all lower bits 1 in an AND mask when possible. This could be implemented with slli+srli in the general case rather than needing to materialize the constant.	2021-01-24 00:34:45 -08:00
Ben Shi	2a4acf3ea8	[AVR] Optimize 8-bit int shift Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D90678	2021-01-24 11:04:37 +08:00
Craig Topper	c7d5d8fa33	[RISCV] Group some Zbs isel patterns together and remove a stale comment. NFC	2021-01-23 16:45:05 -08:00
Craig Topper	998057ec06	[RISCV] Add isel patterns to remove masks on SLO/SRO shift amounts.	2021-01-23 15:57:41 -08:00
Craig Topper	d2927f786e	[RISCV] Add isel patterns to remove (and X, 31) from sllw/srlw/sraw shift amounts. We try to do this during DAG combine with SimplifyDemandedBits, but it fails if there are multiple nodes using the AND. For example, multiple shifts using the same shift amount.	2021-01-23 15:08:18 -08:00
Kazu Hirata	e4847a7fcf	Revert "[Target] Use llvm::append_range (NFC)" This reverts commit `cc7a238286`. The X86WinEHState.cpp hunk seems to break certain builds.	2021-01-23 11:25:27 -08:00

1 2 3 4 5 ...

61068 Commits