llvm-project

Commit Graph

Author	SHA1	Message	Date
Martin Storsjö	36c64af9d7	[CodeGen] [WinException] Only produce handler data at the end of the function if needed If we are going to write handler data (that is written as variable length data following after the unwind info in .xdata), we need to emit the handler data immediately, but for cases where no such info is going to be written, skip emitting it right away. (Unwind info for all remaining functions that hasn't gotten it emitted directly is emitted at the end.) This does slightly change the ordering of sections (triggering a bunch of updates to DebugInfo/COFF tests), but the change should be benign. This also matches GCC's assembly output, which doesn't output .seh_handlerdata unless it actually is needed. For ARM64, the unwind info can be packed into the runtime function entry itself (leaving no data in the .xdata section at all), but that can only be done if there's no follow-on data in the .xdata section. If emission of the unwind info is triggered via EmitWinEHHandlerData (or the .seh_handlerdata directive), which implicitly switches to the .xdata section, there's a chance of the caller wanting to pass further data there, so the packed format can't be used in that case. Differential Revision: https://reviews.llvm.org/D87448	2020-09-21 23:42:59 +03:00
Martin Storsjö	4d85444b31	[clang-cl] Always interpret the LIB env var as separated with semicolons When cross compiling with clang-cl, clang splits the INCLUDE env variable around semicolons (clang/lib/Driver/ToolChains/MSVC.cpp, MSVCToolChain::AddClangSystemIncludeArgs) and lld splits the LIB variable similarly (lld/COFF/Driver.cpp, LinkerDriver::addLibSearchPaths). Therefore, the consensus for cross compilation with clang-cl and lld-link seems to be to use semicolons, despite path lists normally being separated by colons on unix and EnvPathSeparator being set to that. Therefore, handle the LIB variable similarly in Clang, when handling lib file arguments when driving linking via Clang. This fixes commands like "clang-cl test.c -Fetest.exe kernel32.lib" in a cross compilation setting. Normally, most users call (lld-)link directly, but meson happens to use this command syntax for has_function() tests. Differential Revision: https://reviews.llvm.org/D88002	2020-09-21 23:42:59 +03:00
Sanjay Patel	7451bf0b0b	[SLP] use std::distance/find to reduce code; NFC We were already using this code pattern right after the loop, so this makes it consistent.	2020-09-21 16:22:55 -04:00
Matt Arsenault	6daddc213f	AMDGPU: Don't add frame register to frame pseudos We no longer treat the frame register like a function argument, so the problem this avoided is no longer relevant.	2020-09-21 16:18:47 -04:00
Matt Arsenault	55f9f87da2	Reapply Revert "RegAllocFast: Rewrite and improve" This reverts commit `dbd53a1f0c`. Needed lldb test updates	2020-09-21 15:45:27 -04:00
Zequan Wu	9caa3fbe03	[Coverage] Add empty line regions to SkippedRegions Differential Revision: https://reviews.llvm.org/D84988	2020-09-21 12:42:53 -07:00
Sanjay Patel	6bad3caeb0	[InstCombine] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 15:34:24 -04:00
Sanjay Patel	be93505986	[LoopVectorize] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 15:34:24 -04:00
Arthur Eubanks	f4f7df037e	[DIE] Remove DeadInstEliminationPass This pass is like DeadCodeEliminationPass, but only does one pass through a function instead of iterating on users of eliminated instructions. DeadCodeEliminationPass should be used in all cases. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87933	2020-09-21 12:12:25 -07:00
Roman Lebedev	0ab99bb314	[NFC][SCEV] Cleanup lowering of @llvm.uadd.sat, (-1 - V) is just ~V	2020-09-21 22:10:59 +03:00
Arthur Eubanks	746a2c3775	[ObjCARC] Initialize return value Mistakenly removed initialization of `Changed` in https://reviews.llvm.org/D87806.	2020-09-21 11:03:44 -07:00
Sanjay Patel	a44238cb44	[SLP] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 13:54:06 -04:00
Sanjay Patel	1e6b240d7d	[IRBuilder][VectorCombine] make and use a convenience function for unary shuffle; NFC This reduces code duplication for common construct. Follow-ups can use this in SLP, LoopVectorizer, and other passes.	2020-09-21 13:47:01 -04:00
Roman Lebedev	64e2cb7e96	[SCEV] Recognize @llvm.uadd.sat as `%y + umin(%x, (-1 - %y))` ---------------------------------------- define i32 @src(i32 %x, i32 %y) { %0: %r = uadd_sat i32 %x, %y ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %t0 = sub nsw nuw i32 4294967295, %y %t1 = umin i32 %x, %t0 %r = add nuw i32 %t1, %y ret i32 %r } Transformation seems to be correct! The alternative, naive, lowering could be the following, although i don't think it's better, thought it will likely be needed for sadd/ssub/*shl: ---------------------------------------- define i32 @src(i32 %x, i32 %y) { %0: %r = uadd_sat i32 %x, %y ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %t0 = zext i32 %x to i33 %t1 = zext i32 %y to i33 %t2 = add nuw i33 %t0, %t1 %t3 = zext i32 4294967295 to i33 %t4 = umin i33 %t2, %t3 %r = trunc i33 %t4 to i32 ret i32 %r } Transformation seems to be correct!	2020-09-21 20:25:54 +03:00
Roman Lebedev	fedc9549d5	[SCEV] Recognize @llvm.usub.sat as `%x - (umin %x, %y)` ---------------------------------------- define i32 @src(i32 %x, i32 %y) { %0: %r = usub_sat i32 %x, %y ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %t0 = umin i32 %x, %y %r = sub nuw i32 %x, %t0 ret i32 %r } Transformation seems to be correct!	2020-09-21 20:25:54 +03:00
Roman Lebedev	1bb7ab8c4a	[SCEV] Recognize @llvm.abs as smax(x, -x) As per alive2 (ignoring undef): ---------------------------------------- define i32 @src(i32 %x, i1 %y) { %0: %r = abs i32 %x, 0 ret i32 %r } => define i32 @tgt(i32 %x, i1 %y) { %0: %neg_x = mul i32 %x, 4294967295 %r = smax i32 %x, %neg_x ret i32 %r } Transformation seems to be correct! ---------------------------------------- define i32 @src(i32 %x, i1 %y) { %0: %r = abs i32 %x, 1 ret i32 %r } => define i32 @tgt(i32 %x, i1 %y) { %0: %neg_x = mul nsw i32 %x, 4294967295 %r = smax i32 %x, %neg_x ret i32 %r } Transformation seems to be correct!	2020-09-21 20:25:53 +03:00
Simon Pilgrim	005f826a05	[SLP] Use for-range loops across ValueLists. NFCI. Also rename some existing loops that used a 'j' iterator to consistently use 'V'.	2020-09-21 18:24:23 +01:00
Sanjay Patel	46075e0b78	[SLP] simplify interface for gather(); NFC The implementation of gather() should be reduced too, but this change by itself makes things a little clearer: we don't try to gather to a different type or number-of-values than whatever is passed in as the value list itself.	2020-09-21 12:57:28 -04:00
Simon Pilgrim	6a0ed57a22	ImplicitNullChecks.cpp - use auto const& iterators in for-range loops to avoid copies. NFCI.	2020-09-21 17:42:57 +01:00
Arthur Eubanks	024979b7b6	[ObjCARC][NewPM] Port objc-arc-contract to NPM Similar to https://reviews.llvm.org/D86178. This is a module pass instead of a function pass since ARCRuntimeEntryPoints can lazily add function declarations. Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D87806	2020-09-21 09:40:14 -07:00
Momchil Velikov	742250bf62	[ARM][CMSE] Issue an error if passing arguments through memory across security boundary It was never supported and that part was accidentally omitted when upstreaming D76518. Differential Revision: https://reviews.llvm.org/D86478 Change-Id: If6ba9506eb0431c87a1d42a38aa60e47ce263039	2020-09-21 17:26:10 +01:00
Simon Pilgrim	3ae07b2a33	TargetPassConfig.cpp - use auto const& iterator in for-range loop to avoid copies. NFCI.	2020-09-21 17:17:11 +01:00
Simon Pilgrim	3ddecfd220	SLPVectorizer.cpp - fix include ordering. NFCI.	2020-09-21 17:17:11 +01:00
Simon Pilgrim	ce294ff8cd	MachineCSE.cpp - use auto const& iterator in for-range loop to avoid copies. NFCI.	2020-09-21 16:54:26 +01:00
Simon Pilgrim	53f1748c13	ProfileSummary.cpp - use auto const& iterator in for-range loop to avoid copies. NFCI.	2020-09-21 16:54:26 +01:00
David Sherwood	96e52c1364	[SVE][CodeGen] Mark ptrue/pfalse instructions as rematerializable	2020-09-21 16:44:32 +01:00
Baptiste Saleil	1372e23c7d	[PowerPC] Add vector pair load/store instructions and vector pair register class This patch adds support for the lxvp, lxvpx, plxvp, stxvp, stxvpx and pstxvp instructions in the PowerPC backend. These instructions allow loading and storing VSX register pairs. This patch also adds the VSRp register class definition needed for these instructions. Differential Revision: https://reviews.llvm.org/D84359	2020-09-21 10:27:47 -05:00
Arthur Eubanks	5249e6f248	[LoopSimplifyCFG][NewPM] Rename simplify-cfg -> loop-simplifycfg This matches the legacy PM name and makes all tests in Transforms/LoopSimplifyCFG pass under NPM. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87948	2020-09-21 08:27:19 -07:00
Alexey Bataev	3ff07fcd54	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-21 10:51:03 -04:00
Simon Pilgrim	2ef2abdec2	DWARFEmitter.cpp - use auto const& iterators in for-range loops to avoid copies. NFCI.	2020-09-21 15:33:09 +01:00
Paul C. Anagnostopoulos	bd55d5b2a1	Change comments about order of classes in superclass list.	2020-09-21 10:25:44 -04:00
Simon Pilgrim	82042a2c9b	DWARFYAML::emitDebugSections - remove unnecessary cantFail(success) call. NFCI. As mentioned on rG6bb912336804.	2020-09-21 14:07:11 +01:00
Denis Antrushin	ee86688b81	[Statepoints][ISEL] gc.relocate uniquification should be based on SDValue, not IR Value. When exporting statepoint results to virtual registers we try to avoid generating exports for duplicated inputs. But we erroneously use IR Value* to check if inputs are duplicated. Instead, we should use SDValue, because even different IR values can get lowered to the same SDValue. I'm adding a (degenerate) test case which emphasizes importance of this feature for invoke statepoints. If we fail to export only unique values we will end up with something like that: %0 = STATEPOINT %1 = COPY %0 landing_pad: <use of %1> And when exceptional path is taken, %1 is left uninitialized (COPY is never execute). Reviewed By: reames Differential Revision: https://reviews.llvm.org/D87695	2020-09-21 19:44:46 +07:00
Paul Walker	f3fa954b5b	[SVE] Change definition of reduction ISD nodes to have an SVE vector result type. The current nodes, AArch64::SMAXV_PRED for example, are defined to return a NEON vector result. This is incorrect because they modify the complete SVE register and are thus changed to represent such. This patch also adds nodes for UADDV_PRED and SADDV_PRED, which unifies the handling of all SVE reductions. NOTE: Floating-point reductions are already implemented correctly, so this patch is essentially making everything consistent with those. Differential Revision: https://reviews.llvm.org/D87843	2020-09-21 13:16:28 +01:00
Paul Walker	6457455248	[SVE] Use NEON for extract_vector_elt when the index is in range. Patch also adds missing patterns for unpacked vector types and extracts of element zero. Differential Revision: https://reviews.llvm.org/D87842	2020-09-21 13:12:28 +01:00
Alexander Belyaev	17dc729bd4	Revert "[NFC][ScheduleDAG] Remove unused EntrySU SUnit" This reverts commit `0345d88de6`. Google internal backend uses EntrySU, we are looking into removing dependency on it. Differential Revision: https://reviews.llvm.org/D88018	2020-09-21 13:33:05 +02:00
Florian Hahn	11dccf8d3a	Recommit "[SCEV] Look through single value PHIs." This commit was originally because it was suspected to cause a crash, but a reproducer did not surface. A crash that was exposed by this change was fixed in `1d8f2e5292`. This reverts the revert commit `0581c0b0ee`.	2020-09-21 11:59:50 +01:00
David Green	f4c5cadbcb	[ARM] Select f32 constants with vmov.f16 This adds lowering for f32 values using the vmov.f16, which zeroes the top bits whilst setting the lower bits to a pattern. This range of values does not often come up, except where a f16 constant value has been converted to a f32. Differential Revision: https://reviews.llvm.org/D87790	2020-09-21 11:10:47 +01:00
Sjoerd Meijer	4b8ade837e	[AArch64] Cortex-A55 scheduler model This is an initial commit adding the A55 model, but it isn't used/enabled yet. We will follow up on this to improve the model, then flip the switch. The optimisation guide describing Cortex-A55 micro-architecture in more detail can be found here: https://static.docs.arm.com/epm128372/20/arm_cortex_a55_software_optimization_guide_v2.pdf Original patch by Javed Absar. Differential Revision: https://reviews.llvm.org/D46884	2020-09-21 10:54:32 +01:00
Alex Richardson	8cf6778d30	[RISC-V] Implement RISCVInstrInfo::isCopyInstrImpl() This does not result in changes for any of the current tests, but it might improve debug information in some cases. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D86522	2020-09-21 10:21:11 +01:00
Lucas Prates	53d238a961	[CodeGen] Fixing inconsistent ABI mangling of vlaues in SelectionDAGBuilder SelectionDAGBuilder was inconsistently mangling values based on ABI Calling Conventions when getting them through copyFromRegs in SelectionDAGBuilder, causing duplicate value type convertions for function arguments. The checking for the mangling requirement was based on the value's originating instruction and was performed outside of, and inspite of, the regular Calling Convention Lowering. The issue could be observed in a scenario such as: ``` %arg1 = load half, half* %const, align 2 %arg2 = call fastcc half @someFunc() call fastcc void @otherFunc(half %arg1, half %arg2) ; Here, %arg2 was incorrectly mangled twice, as the CallConv data from ; the call to @someFunc() was taken into consideration for the check ; when getting the value for processing the call to @otherFunc(...), ; after the proper convertion had taken place when lowering the return ; value of the first call. ``` This patch fixes the issue by disregarding the Calling Convention information for such copyFromRegs, making sure the ABI mangling is properly contanined in the Calling Convention Lowering. This fixes Bugzilla #47454. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87844	2020-09-21 10:05:34 +01:00
Florian Hahn	57ae9bb932	[LSR] Preserve MSSA when using SplitCriticalEdge. LSR claims to MemorySSA, but we also have to make sure it is preserved when splitting critical edges. This can be done by passing MSSAU to SplitCriticalEdge. Fixes PR47557.	2020-09-21 09:51:26 +01:00
Fangrui Song	dbc616e982	[EHStreamer] Fix a "Continue to action" -fverbose-asm comment when multi-byte LEB128 encoding is needed This only happens with more than 64 action records and it is difficult to construct a test.	2020-09-20 21:41:48 -07:00
Qiu Chaofan	1d782c2987	[PowerPC] Pass nofpexcept flag to custom lowered constrained ops This is a follow-up of D86605. For strict DAG FP node, if its FP exception behavior metadata is ignore, it should have nofpexcept flag. But during custom lowering, this flag isn't passed down. This is also seen on X86 target. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D87390	2020-09-21 10:44:25 +08:00
Fangrui Song	d06485685d	[XRay] Change mips to use version 2 sled (PC-relative address) Follow-up to D78590. All targets use PC-relative addresses now. Reviewed By: atanasyan, dberris Differential Revision: https://reviews.llvm.org/D87977	2020-09-20 17:59:57 -07:00
Craig Topper	a74b1faba2	[X86] Make reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore work for avx512 after type legalization. The scalar elements of the vXi1 build_vector will have been type legalized to i8 by padding with 0s. So we can't check for all ones. Instead we should just look at bit 0 of the constant. Differential Revision: https://reviews.llvm.org/D87863	2020-09-20 13:54:20 -07:00
Craig Topper	4e8c028158	[X86] Stop reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore from creating scalar i64 load/stores in 32-bit mode If we emit a scalar i64 load/store it will get type legalized to two i32 load/stores. Differential Revision: https://reviews.llvm.org/D87862	2020-09-20 13:46:59 -07:00
David Green	29bd8ea110	[ARM] Constant fold VMOVrh This adds simple constant folding for VMOVrh, to constant fold fp16 constants to integer values. It can help especially with soft calling conventions, but some of the results are not optimal as we end up loading using a vldr. This will be improved in a follow up patch. Differential Revision: https://reviews.llvm.org/D87789	2020-09-20 21:32:51 +01:00
Nikita Popov	445db89b53	[LVI] Get value range from mask comparison InstCombine likes to canonicalize comparisons of the form X == C \|\| X == C+1 into (X & -2) == C'. Make sure LVI can still recover the value range from this. Can of course also be useful for proper mask comparisons. For the sake of clarity, the implementation goes through KnownBits to compute the range.	2020-09-20 21:13:57 +02:00
Nikita Popov	f94bbe19b6	[LVI] Refactor getValueFromICmpCondition (NFC) Rewrite this in a way where the core logic is in a separate function, that is invoked with swapped operands. This makes it easier to add handling for additional icmp patterns.	2020-09-20 21:13:57 +02:00
Simon Pilgrim	0bfeede669	[X86][SSE] Fold EXTEND_VECTOR_INREG(EXTRACT_SUBVECTOR(EXTEND(X),0)) -> EXTEND_VECTOR_INREG(X)	2020-09-20 18:39:12 +01:00
Simon Pilgrim	bb0078e591	[X86][SSE] Fold SIGN_EXTEND(SIGN_EXTEND_VECTOR_INREG(X)) -> SIGN_EXTEND_VECTOR_INREG(X) It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp	2020-09-20 18:39:12 +01:00
Sanjay Patel	7903ae4720	[InstCombine] factorize left shifts of add/sub We do similar factorization folds in SimplifyUsingDistributiveLaws, but that drops no-wrap properties. Propagating those optimally may help solve: https://llvm.org/PR47430 The propagation is all-or-nothing for these patterns: when all 3 incoming ops have nsw or nuw, the 2 new ops should have the same no-wrap property: https://alive2.llvm.org/ce/z/Dv8wsU This also solves: https://llvm.org/PR47584	2020-09-20 12:55:24 -04:00
Sanjay Patel	cf75e83275	[InstCombine] replace zombie unreachable values with 'undef' before erasing The test (currently crashing) is reduced from the example provided in the post-commit discussion in D87149. Differential Revision: https://reviews.llvm.org/D87965	2020-09-20 12:25:08 -04:00
Simon Pilgrim	15c8306056	[X86][SSE] Fold EXTEND_VECTOR_INREG(EXTEND_VECTOR_INREG(X)) -> EXTEND_VECTOR_INREG(X) It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp	2020-09-20 16:33:02 +01:00
Simon Pilgrim	a0c8793ce6	[X86][SSE] Enable ZERO_EXTEND_VECTOR_INREG shuffle combining on SSE41 targets. Allows ZERO_EXTEND_VECTOR_INREG to be shuffle combined on all targets where it is legal.	2020-09-20 16:05:10 +01:00
Simon Pilgrim	2b634a9d0e	[X86] Rename getExtendInVec to getEXTEND_VECTOR_INREG. NFCI. Make it easier to find the method by naming it after the ops it actually handles. We already do this for lowering/combining.	2020-09-20 15:19:39 +01:00
Simon Pilgrim	6bb9123368	DWARFYAML::emitDebugSections - fix use after std::move warnings. NFCI. We were using Err after it had been moved into cantFail - avoid this by calling cantFail with Error::success() directly.	2020-09-20 14:42:36 +01:00
Simon Pilgrim	91720ee561	[X86] combineX86ShufflesRecursively - fix use after move warning. NFCI. After moving WidenedMask is in an undefined state, so reduce scope of the variable so its reinitialized every iteration - we should still retain any memory allocation savings.	2020-09-20 14:06:50 +01:00
Dávid Bolvanský	2990518b03	[MemLoc] Support lllvm.memcpy.inline in MemoryLocation::getForArgument Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D87971	2020-09-20 14:01:48 +02:00
Simon Pilgrim	e17686ae60	[X86] Rename combineExtInVec to combineEXTEND_VECTOR_INREG. NFCI. Make it easier to find the method by naming it after the ops it actually handles. We already do this for lowering.	2020-09-20 12:16:00 +01:00
Fangrui Song	871d03a675	[FunctionAttrs] Inline setDoesNotRecurse() and delete it. NFC It always returns true, which may lead to confusion. Inline it because it is trivial and only called twice.	2020-09-19 22:24:52 -07:00
Fangrui Song	0526713aa8	[FunctionAttrs] Remove redundant check. NFC	2020-09-19 20:46:18 -07:00
Fangrui Song	6913812abc	Fix some clang-tidy bugprone-argument-comment issues	2020-09-19 20:41:25 -07:00
Nikita Popov	f4e5541809	[Local] Clean up enforceKnownAlignment() (NFC) I want to export this function, and the current API was a bit weird: It took an additional Alignment argument that didn't really have anything to do with what the function does. Drop it, and perform a max at the callsite. Also rename it to tryEnforceAlignment().	2020-09-19 22:29:40 +02:00
Florian Hahn	1d8f2e5292	[SCEVExpander] Support expanding nonintegral pointers with constant base. Currently SCEVExpander creates inttoptr for non-integral pointers if the base is a null constant for example. This results in invalid IR. This patch changes InsertNoopCastOfTo to emit a GEP & bitcast to convert to a non-integral pointer. First, a GEP of i8* null is generated and the integral value is used as index. The GEP is then bitcasted to the target type. This was exposed by D71539. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87827	2020-09-19 17:19:53 +01:00
Dávid Bolvanský	d716f1608c	[MemLoc] Support bcmp in MemoryLocation::getForArgument Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D87964	2020-09-19 17:12:43 +02:00
Sanjay Patel	f74a334fe3	[ConstantFolding] add undef handling for fmin/fmax intrinsics The output here may not be optimal (yet), but it should be consistent for commuted operands (it was not before) and correct. We can do better by checking FMF and NaN if needed. Code in InstSimplify generally assumes that we have already folded code like this, so it was not handling 2 constant inputs by commuting consistently.	2020-09-19 10:31:01 -04:00
Paul C. Anagnostopoulos	04cebd900f	Change name of Record::TheInit to CorrespondingDefInit to make code clearer. Differential Revision: https://reviews.llvm.org/D87919	2020-09-19 09:18:44 -04:00
Joachim Meyer	f64903fd81	Add -Wno-error=unknown flag to clang-format. Currently newer clang-format options cannot be included in .clang-format files, if not all users can be forced to use an updated version. This patch tries to solve this by adding an option to clang-format, enabling to ignore unknown (newer) options. Differential Revision: https://reviews.llvm.org/D86137	2020-09-19 10:17:57 +02:00
Craig Topper	721d57f952	[X86] Return from SimplifyDemandedBitsForTargetNode after calculating known bits for VSHLI/VSRAI/VSRLI. We were breaking out of the switch which falls into the default implementation of SimplifyDemandedBitsForTargetNode which is a wrapper around computeKnownBits. So we end up doing the recursion and known bits calculation all over again. Instead we should return with the known bits we calculated in the switch.	2020-09-18 23:57:01 -07:00
Amara Emerson	5a50f8b39f	[AArch64][GlobalISel] Add legalization and selection support for <4 x s16> G_SHL.	2020-09-18 23:32:01 -07:00
Xun Li	11453740bc	[ASAN] Properly deal with musttail calls in ASAN When address sanitizing a function, stack unpinsoning code is inserted before each ret instruction. However if the ret instruciton is preceded by a musttail call, such transformation broke the musttail call contract and generates invalid IR. This patch fixes the issue by moving the insertion point prior to the musttail call if there is one. Differential Revision: https://reviews.llvm.org/D87777	2020-09-18 23:10:34 -07:00
Andrew Litteken	132aaec4f2	[IRSim] Adding ilist for IRInstructionData. The IRInstructionData structs are a different representation of the program. This list treats the program as if it was "flattened" and the only parent is this list. This lets us easily create ranges of instructions. Differential Revision: https://reviews.llvm.org/D86969	2020-09-19 00:18:39 -05:00
Craig Topper	58ecbbcdcd	[X86] Fix copy paste mistake in @ccnp flag. We were treating @ccp and @ccnp the same.	2020-09-18 21:28:01 -07:00
David Blaikie	ad68a8b952	DebugInfo: Cleanup RLE dumping, using a length-constrained DataExtractor rather than carrying the end offset separately	2020-09-18 19:32:38 -07:00
Eric Christopher	dbd53a1f0c	Temporarily Revert "RegAllocFast: Rewrite and improve" as it's breaking a few tests in the lldb test suite. Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio This reverts commit `c8757ff3aa`.	2020-09-18 18:11:21 -07:00
Fangrui Song	2ac06241d2	[LiveDebugValues] Add `#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)` to suppress -Wunused-function	2020-09-18 17:25:37 -07:00
Amara Emerson	269bcc39ca	[AArch64][GlobalISel] Legalize arithmetic ops for <4 x s16>	2020-09-18 17:13:55 -07:00
Vitaly Buka	97bfac076a	[NFC][StackSafety] Replace auto with type Fixes static analyzer is warning.	2020-09-18 17:10:28 -07:00
Fangrui Song	76eec6c95b	[SCEV] Fix an unused variable in -DLLVM_ENABLE_ASSERTIONS=off build	2020-09-18 16:19:05 -07:00
Amara Emerson	5d34d7f1a0	[GlobalISel] Add lowering support for G_ABS and use for AArch64. Differential Revision: https://reviews.llvm.org/D87952	2020-09-18 16:17:18 -07:00
Amy Kwan	37e7673c21	[PowerPC] Implement Move to VSR Mask builtins in LLVM/Clang This patch implements the vec_gen[b\|h\|w\|d\|q]m function prototypes in altivec.h in order to utilize the move to VSR with mask instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82725	2020-09-18 18:16:14 -05:00
Philip Reames	06f136f61e	[instcombine][x86] Converted pdep/pext with shifted mask to simple arithmetic If the mask of a pdep or pext instruction is a shift masked (i.e. one contiguous block of ones) we need at most one and and one shift to represent the operation without the intrinsic. One all platforms I know of, this is faster than the pdep/pext. The cost modelling for multiple contiguous blocks might be worth exploring in a follow up, but it's not relevant for my current use case. It would almost certainly be a win on AMDs where these are really really slow though. Differential Revision: https://reviews.llvm.org/D87861	2020-09-18 14:54:24 -07:00
Reid Kleckner	9932561b48	[COFF] Move per-global .drective emission from AsmPrinter to TLOFCOFF This changes the order of output sections and the output assembly, but is otherwise NFC. It simplifies the TLOF interface by removing two COFF-only methods.	2020-09-18 14:31:01 -07:00
Eric Christopher	ecfd8161bf	Temporarily Revert "[SLP] Allow reordering of vectorization trees with reused instructions." as it's infinite looping on occasion. This reverts commit `455ca0ebb6`.	2020-09-18 12:50:04 -07:00
Simon Pilgrim	4ebd30722a	[X86][AVX] lowerBuildVectorAsBroadcast - improve BROADCASTM lowering on non-VLX targets Broadcast to a ZMM type then extract the low subvector.	2020-09-18 19:52:02 +01:00
Huihui Zhang	9ad6049736	[InstCombine][SVE] Skip scalable type for InstCombiner::getFlippedStrictnessPredicateAndConstant. We cannot iterate on scalable vector, the number of elements is unknown at compile-time. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87918	2020-09-18 11:26:36 -07:00
David Blaikie	51a505340d	DebugInfo: Simplify line table parsing to take all the units together, rather than CUs and TUs separately	2020-09-18 11:18:23 -07:00
James Y Knight	f7a53d82c0	PR47468: Fix findPHICopyInsertPoint, so that copies aren't incorrectly inserted after an INLINEASM_BR. findPHICopyInsertPoint special cases placement in a block with a callbr or invoke in it. In that case, we must ensure that the copy is placed before the INLINEASM_BR or call instruction, if the register is defined prior to that instruction, because it may jump out of the block. Previously, the code placed it immediately after the last def _or use_. This is wrong, if the use is the instruction which may jump. We could correctly place it immediately after the last def (ignoring uses), but that is non-optimal for register pressure. Instead, place the copy after the last def, or before the call/inlineasm_br, whichever is later. Differential Revision: https://reviews.llvm.org/D87865	2020-09-18 14:14:04 -04:00
Matt Arsenault	3105d0f84b	CodeGen: Move split block utility to MachineBasicBlock AMDGPU needs this in several places, so consolidate them here.	2020-09-18 14:05:18 -04:00
Matt Arsenault	c8757ff3aa	RegAllocFast: Rewrite and improve This rewrites big parts of the fast register allocator. The basic strategy of doing block-local allocation hasn't changed but I tweaked several details: Track register state on register units instead of physical registers. This simplifies and speeds up handling of register aliases. Process basic blocks in reverse order: Definitions are known to end register livetimes when walking backwards (contrary when walking forward then uses may or may not be a kill so we need heuristics). Check register mask operands (calls) instead of conservatively assuming everything is clobbered. Enhance heuristics to detect killing uses: In case of a small number of defs/uses check if they are all in the same basic block and if so the last one is a killing use. Enhance heuristic for copy-coalescing through hinting: We check the first k defs of a register for COPYs rather than relying on there just being a single definition. When testing this on the full llvm test-suite including SPEC externals I measured: average 5.1% reduction in code size for X86, 4.9% reduction in code on aarch64. (ranging between 0% and 20% depending on the test) 0.5% faster compiletime (some analysis suggests the pass is slightly slower than before, but we more than make up for it because later passes are faster with the reduced instruction count) Also adds a few testcases that were broken without this patch, in particular bug 47278. Patch mostly by Matthias Braun	2020-09-18 14:05:18 -04:00
Matt Arsenault	870fd53e4f	Reapply "RegAllocFast: Record internal state based on register units" The regressions this caused should be fixed when https://reviews.llvm.org/D52010 is applied. This reverts commit `a21387c654`.	2020-09-18 14:05:18 -04:00
Zequan Wu	91aed9bf97	[CodeGen] emit CG profile for COFF object file I forgot to add emission of CG profile for COFF object file, when adding the support (https://reviews.llvm.org/D81775) Differential Revision: https://reviews.llvm.org/D87811	2020-09-18 10:57:54 -07:00
David Blaikie	e0802fe016	DebugInfo: Tidy up initializing multi-section contributions in DWARFContext	2020-09-18 10:54:43 -07:00
Matt Arsenault	0576f436e5	AMDGPU: Don't sometimes allow instructions before lowered si_end_cf Since `6524a7a2b9`, this would sometimes not emit the or to exec at the beginning of the block, where it really has to be. If there is an instruction that defines one of the source operands, split the block and turn the si_end_cf into a terminator. This avoids regressions when regalloc fast is switched to inserting reloads at the beginning of the block, instead of spills at the end of the block. In a future change, this should always split the block.	2020-09-18 13:43:01 -04:00
Amara Emerson	615695de27	[AArch64][GlobalISel] Make <8 x s8> of G_BUILD_VECTOR legal.	2020-09-18 10:32:33 -07:00
Francis Visoiu Mistrih	0345d88de6	[NFC][ScheduleDAG] Remove unused EntrySU SUnit EntrySU doesn't seem to be used at all when building the ScheduleDAG. Differential Revision: https://reviews.llvm.org/D87867	2020-09-18 09:50:47 -07:00
Simon Pilgrim	ceadd98c2f	[X86][AVX] lowerBuildVectorAsBroadcast - improve i64 BROADCASTM lowering on 32-bit targets We already handle the the cases where we have a 'zero extended splat' build vector (a, 0, 0, 0, a, 0, 0, 0, ...) but were missing the case where the 'a' scalar was zero-extended as well - such as i64 -> vXi64 splat cases on 32-bit targets.	2020-09-18 16:59:57 +01:00
Simon Pilgrim	d967aaa8fa	[DAG] BuildVectorSDNode::getSplatValue - pull out repeated getNumOperands() calls. NFCI.	2020-09-18 16:10:23 +01:00
Sanjay Patel	3f100e64b4	[InstSimplify] fix fmin/fmax miscompile for partial undef vectors (PR47567) It would also be correct to return the variable operand in these cases, but eliminating a variable use is probably better for optimization.	2020-09-18 10:05:44 -04:00
Matt Arsenault	751a6c5760	IR: Move denormal mode parsing from MachineFunction to Function This was just inspecting the IR to begin with, and is useful to check in some places in the IR.	2020-09-18 09:55:47 -04:00
Matt Arsenault	27df165270	Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel." This reverts commit `c3492a1aa1`. I think this is the wrong strategy and wrong place to do this transform anyway. Also reverts follow up commit `7d593d0d69`.	2020-09-18 09:48:33 -04:00
Alexey Bataev	455ca0ebb6	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-18 09:34:59 -04:00
Mirko Brkusanin	ae36c02ad0	[AMDGPU] Set DS alignment requirements to be more strict Alignment requirements for ds_read/write_b96/b128 for gfx9 and onward are now the same as for other GCN subtargets. This way we can avoid any unintentional use of these instructions on systems that do not support dword alignment and instead require natural alignment. This also makes 'SH_MEM_CONFIG.alignment_mode == STRICT' the default. Differential Revision: https://reviews.llvm.org/D87821	2020-09-18 15:26:24 +02:00
Xing GUO	2d35092cd2	[DWARFYAML] Make the include_directories, file_names and opcodes fields of the line table optional. This patch makes the include_directories, file_names and opcodes fields of the line table optional. This helps us simplify some tests. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D87878	2020-09-18 20:21:11 +08:00
Florian Hahn	9d172c8e9c	Recommit "[DSE] Switch to MemorySSA-backed DSE by default." This switches to using DSE + MemorySSA by default again, after fixing the issues reported after the first commit. Notable fixes `fc82006331`, `a0017c2bc2`. This reverts commit `3a59628f3c`.	2020-09-18 11:05:00 +01:00
Florian Hahn	4635f6050b	[SCEV] Generalize SCEVParameterRewriter to accept SCEV expression as target. This patch extends SCEVParameterRewriter to support rewriting unknown epxressions to arbitrary SCEV expressions. It will be used by further patches. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D67176	2020-09-18 10:05:02 +01:00
Gabriel Hjort Åkerlund	c10200536f	[TableGen][GlobalISel] Fix handling of zero_reg When generating matching tables for GlobalISel, TableGen would output "::zero_reg" whenever encountering the zero_reg, which in turn would result in compilation error. This patch fixes that by instead outputting NoRegister (== 0), which is the same result that TableGen produces when generating matching tables for ISelDAG. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86215	2020-09-18 11:01:11 +02:00
Tim Northover	2afe4becec	AArch64: make sure jump table entries can reach entire image This turns all jump table entries into deltas within the target function because in the small memory model all code & static data must be in a 4GB block somewhere in memory. When the entries were a delta between the table location and a basic block, the 32-bit signed entries are not enough to guarantee reachability. https://reviews.llvm.org/D87286	2020-09-18 09:50:40 +01:00
Nikita Popov	13e19d2e7c	Revert "[InstCombine] Canonicalize SPF_ABS to abs intrinc" This reverts commit `05d4c4ebc2`. mstorsjo reports a miscompile after this change in https://reviews.llvm.org/D87188#2281093. Reverting until I can investigate this.	2020-09-18 09:38:26 +02:00
Andrew Wei	992698cfbc	[AArch64] Emit zext move when the source of the zext is AssertZext or AssertSext When the source of the zext is AssertZext or AssertSext, it is hard to know any information about the upper 32 bits, so we should insert a zext move before emitting SUBREG_TO_REG to define the lower 32 bits. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87771	2020-09-18 12:48:41 +08:00
Amara Emerson	f5898f8c2d	[AArch64][GlobalISel] Make G_STORE <8 x s8> legal.	2020-09-17 16:42:18 -07:00
Amara Emerson	196e2f97b7	[AArch64][GlobalISel] clang-format AArch64LegalizerInfo.cpp. NFC.	2020-09-17 16:41:10 -07:00
Amy Kwan	6f3c0991bf	[PowerPC] Add Set Boolean Condition Instruction Definitions and MC Tests This patch adds the instruction definitions and assembly/disassembly tests for the set boolean condition instructions. This also includes the negative, and reverse variants of the instruction. Differential Revision: https://reviews.llvm.org/D86252	2020-09-17 18:20:54 -05:00
Amy Kwan	2c3bc918db	[PowerPC] Implement Vector Count Mask Bits builtins in LLVM/Clang This patch implements the vec_cntm function prototypes in altivec.h in order to utilize the vector count mask bits instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82726	2020-09-17 18:20:53 -05:00
Zhaoshi Zheng	1c466477ad	[RISCV] Support Shadow Call Stack Currenlty assume x18 is used as pointer to shadow call stack. User shall pass flags: "-fsanitize=shadow-call-stack -ffixed-x18" Runtime supported is needed to setup x18. If SCS is desired, all parts of the program should be built with -ffixed-x18 to maintain inter-operatability. There's no particuluar reason that we must use x18 as SCS pointer. Any register may be used, as long as it does not have designated purpose already, like RA or passing call arguments. Differential Revision: https://reviews.llvm.org/D84414	2020-09-17 16:02:35 -07:00
Philip Reames	b04c181ed7	[AArch64] Enable implicit null check transformation This change enables the generic implicit null transformation for the AArch64 target. As background for those unfamiliar with our implicit null check support: An implicit null check is the use of a signal handler to catch and redirect to a handler a null pointer. Specifically, it's replacing an explicit conditional branch with such a redirect. This is only done for very cold branches under frontend control w/appropriate metadata. FAULTING_OP is used to wrap the faulting instruction. It is modelled as being a conditional branch to reflect the fact it can transfer control in the CFG. FAULTING_OP does not need to be an analyzable branch to achieve it's purpose. (Or at least, that's the x86 model. I find this slightly questionable.) When lowering to MC, we convert the FAULTING_OP back into the actual instruction, record the labels, and lower the original instruction. As can be seen in the test changes, currently the AArch64 backend does not eliminate the unconditional branch to the fallthrough block. I've tried two approaches, neither of which worked. I plan to return to this in a separate change set once I've wrapped my head around the interactions a bit better. (X86 handles this via AllowModify on analyzeBranch, but adding the obvious code causing BranchFolding to crash. I haven't yet figured out if it's a latent bug in BranchFolding, or something I'm doing wrong.) Differential Revision: https://reviews.llvm.org/D87851	2020-09-17 16:00:19 -07:00
Quentin Colombet	99e865b618	[TargetRegisterInfo] Add a couple of target hooks for the greedy register allocator Before this patch, the last chance recoloring and deferred spilling techniques were solely controled by command line options. This patch adds target hooks for these two techniques so that it is easier for backend writers to override the default behavior. The default behavior of the hooks preserves the default values of the related command line options. NFC	2020-09-17 15:23:15 -07:00
Derek Schuff	0ff28fa6a7	Support dwarf fission for wasm object files Initial support for dwarf fission sections (-gsplit-dwarf) on wasm. The most interesting change is support for writing 2 files (.o and .dwo) in the wasm object writer. My approach moves object-writing logic into its own function and calls it twice, swapping out the endian::Writer (W) in between calls. It also splits the import-preparation step into its own function (and skips it when writing a dwo). Differential Revision: https://reviews.llvm.org/D85685	2020-09-17 14:42:41 -07:00
Nikita Popov	05d4c4ebc2	[InstCombine] Canonicalize SPF_ABS to abs intrinc Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic. To be conservative, the one-use check on the comparison is retained, this may be relaxed if all goes well. It's pretty likely that this will uncover places that missing handling for the abs() intrinsic. Please report any seen performance regressions. Differential Revision: https://reviews.llvm.org/D87188	2020-09-17 22:28:34 +02:00
Whitney Tsang	1cee33e9db	[LoopUnrollAndJam] Allow unroll and jam loops forced by user. Summary: Allow unroll and jam loops forced by user. LoopUnrollAndJamPass is still disabled by default in the NPM pipeline, and can be controlled by -enable-npm-unroll-and-jam. Reviewed By: Meinersbur, dmgreen Differential Revision: https://reviews.llvm.org/D87786	2020-09-17 19:40:14 +00:00
Nikita Popov	91ce8e121b	[GVN] Use that assume(!X) implies X==false (PR47496) We already use that assume(X) implies X==true, do the same for assume(!X) implying X==false. This fixes PR47496.	2020-09-17 21:34:44 +02:00
Victor Huang	a4bb71b1c0	Disable hoisting MI to hotter basic blocks when using pgo This is a follow up patch for https://reviews.llvm.org/D63676 to enable the feature when using pgo. Differential Revision: https://reviews.llvm.org/D85240	2020-09-17 14:17:00 -05:00
Jon Roelofs	c145a1ca25	AArch64::ArchKind's underlying type is uint64_t	2020-09-17 12:13:57 -07:00
Andrew Litteken	7e4c6fb854	[IRSim] Adding IR Instruction Mapper This introduces the IRInstructionMapper, and the associated wrapper for instructions, IRInstructionData, that maps IR level Instructions to unsigned integers. Mapping is done mainly by using the "isSameOperationAs" comparison between two instructions. If they return true, the opcode, result type, and operand types of the instruction are used to hash the instruction with an unsigned integer. The mapper accepts instruction ranges, and adds each resulting integer to a list, and each wrapped instruction to a separate list. At present, branches, phi nodes are not mapping and exception handling is illegal. Debug instructions are not considered. The different mapping schemes are tested in unittests/Analysis/IRSimilarityIdentifierTest.cpp Recommit of: `b04c1a9d31` Differential Revision: https://reviews.llvm.org/D86968	2020-09-17 14:06:16 -05:00
Cameron McInally	a35c7f3076	[SVE][WIP] Implement lowering for fixed length VSELECT to Scalable Map fixed length VSELECT to its Scalable equivalent. Differential Revision: https://reviews.llvm.org/D85364	2020-09-17 14:02:57 -05:00
Amara Emerson	7d5b103483	[AArch64][GlobalISel] Widen G_EXTRACT_VECTOR_ELT element types if < 8b. In order to not unnecessarily promote the source vector to greater than our native vector size of 128b, I've added some cascading rules to widen based on the number of elements.	2020-09-17 11:50:33 -07:00
Amara Emerson	bea7749d03	[AArch64][GlobalISel] Make <8 x s16> and <16 x s8> legal for shifts.	2020-09-17 11:50:32 -07:00
Sanjay Patel	48a23bccf3	[VectorCombine] limit load+insert transform to one-use As discussed in: https://llvm.org/PR47558 ...there are several potential fixes/follow-ups visible in the test case, but this is the quickest and safest fix of the perf regression.	2020-09-17 14:29:15 -04:00
Craig Topper	3783d3bc7b	[X86] Don't match x87 register inline asm constraints unless the VT is floating point or its a clobber The register class picked will be the RFP80 register class which has a f80 VT. The code in SelectionDAGBuilder that generates copies around inline assembly doesn't know how to handle an integer and floating point type of different bit widths. The test case is derived from this https://godbolt.org/z/sEa659 which gcc accepts but clang crashes on. This patch just gives a more graceful error. I'm not sure if the single element struct case is special in gcc. Adding another field to the struct makes gcc reject it. If we want to support this correctly I think we need a change in the frontend to give us the true element type. Right now the frontend just realizes the constraint can take a memory argument so creates an integer type of the same size and bitcasts. Differential Revision: https://reviews.llvm.org/D87485	2020-09-17 11:26:50 -07:00
Sanjay Patel	ddd9575d15	[VectorCombine] rearrange bailouts for load insert for efficiency; NFC	2020-09-17 13:50:37 -04:00
Bogdan Graur	7d593d0d69	[amdgpu] Compilation fix for Release Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D87838	2020-09-17 18:04:53 +02:00
David Green	7f7993e0da	[ARM] Expand distributing increments to also handle existing pre/post inc instructions. This extends the distributing postinc code in load/store optimizer to also handle the case where there is an existing pre/post inc instruction, where subsequent instructions can be modified to use the adjusted offset from the increment. This can save us having to keep the old register live past the increment instruction. Differential Revision: https://reviews.llvm.org/D83377	2020-09-17 16:58:35 +01:00
Amara Emerson	79b21fc187	[AArch64][GlobalISel] Fix bug in fewVectorElts action while legalizing oversize G_FPTRUNC vectors. For <8 x s32> = fptrunc <8 x s64> the fewerElementsVector action tries to break down the source vector into the final source vectors of <2 x s64> using unmerge. This fixes a crash due to using the wrong number of elements for the breakdown type. Also add some legalizer tests for explicitly G_FPTRUNC which we didn't have. Differential Revision: https://reviews.llvm.org/D87814	2020-09-17 08:56:26 -07:00
Simon Pilgrim	2a56a0ba08	ModuloSchedule.cpp - remove unnecessary includes. NFCI. Already included in ModuloSchedule.h	2020-09-17 16:47:48 +01:00
Sanne Wouda	d5fd3d9b90	[AArch64] Match pairwise add/fadd pattern D75689 turns the faddp pattern into a shuffle with vector add. Match this new pattern in target-specific DAG combine, rather than ISel, because legalization (for v2f32) turns it into a bit of a mess. - extended to cover f16, f32, f64 and i64	2020-09-17 16:27:01 +01:00
Xun Li	5b533d6cde	[Coroutine] Fix a bug where Coroutine incorrectly spills phi and invoke defs before CoroBegin When a spill definition is before CoroBegin, we cannot spill it to the frame immediately after the definition. We have to spill it after the frame is ready. The current implementation handles it properly for any other kinds of instructions except for PhINode and InvokeInst, which could also be defined before CoroBegin. This patch fixes it by moving the CoroBegin dominance check earlier, so that it covers all cases. Added a test. Differential Revision: https://reviews.llvm.org/D87810	2020-09-17 08:13:07 -07:00
Michael Liao	c3492a1aa1	[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel. - Need to lower COPY from SGPR to VGPR to a real instruction as the standard COPY is used where the source and destination are from the same register bank so that we potentially coalesc them together and save one COPY. Considering that, backend optimizations, such as CSE, won't handle them. However, the copy from SGPR to VGPR always needs materializing to a native instruction, it should be lowered into a real one before other backend optimizations. Differential Revision: https://reviews.llvm.org/D87556	2020-09-17 11:04:17 -04:00
David Green	34b27b9441	[ARM] Sink splats to MVE intrinsics The predicated MVE intrinsics are generated as, for example, llvm.arm.mve.add.predicated(x, splat(y). p). We need to sink the splat value back into the loop, like we do for other instructions, so we can re-select qr variants. Differential Revision: https://reviews.llvm.org/D87693	2020-09-17 16:00:51 +01:00
alex-t	0efbb70b71	[AMDGPU] should expand ROTL i16 to shifts. Instruction combining pass turns library rotl implementation to llvm.fshl.i16. In the selection dag the intrinsic is turned to ISD::ROTL node that cannot be selected. Need to expand it to shifts again. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D87618	2020-09-17 17:34:33 +03:00
Simon Pilgrim	85ba2f1663	LiveDebugVariables.cpp - remove unnecessary Compiler.h include. NFCI. Already included in LiveDebugVariables.h	2020-09-17 15:06:02 +01:00
Simon Pilgrim	46e59062a0	DwarfExpression.cpp - remove unnecessary includes. NFCI. Already included in DwarfExpression.h	2020-09-17 15:06:02 +01:00
Simon Pilgrim	d566771779	ValueList.cpp - remove unnecessary includes. NFCI. Already included in ValueList.h	2020-09-17 15:06:01 +01:00
Simon Pilgrim	67ae46c820	SafeStackLayout.cpp - remove unnecessary StackLifetime.h include. NFCI. Already included in SafeStackLayout.h	2020-09-17 14:56:46 +01:00
Simon Pilgrim	f026812110	InstCombiner.h - remove unnecessary KnownBits.h include. NFCI. Move the include down to cpp files with an implicit dependency.	2020-09-17 14:28:42 +01:00
Kerry McLaughlin	f7185b271f	[SVE][CodeGen] Lower floating point -> integer conversions This patch adds new ISD nodes, FCVTZS_MERGE_PASSTHRU & FCVTZU_MERGE_PASSTHRU, which are used to lower scalable vector FP_TO_SINT/FP_TO_UINT operations and the following intrinsics: - llvm.aarch64.sve.fcvtzu - llvm.aarch64.sve.fcvtzs Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D87232	2020-09-17 14:04:22 +01:00
Sanjay Patel	03783f19dc	[SLP] sort candidates to increase chance of optimal compare reduction This is one (small) part of improving PR41312: https://llvm.org/PR41312 As shown there and in the smaller tests here, if we have some member of the reduction values that does not match the others, we want to push it to the end (bring the matching members forward and together). In the regression tests, we have 5 candidates for the 4 slots of the reduction. If the one "wrong" compare is grouped with the others, it prevents forming the ideal v4i1 compare reduction. Differential Revision: https://reviews.llvm.org/D87772	2020-09-17 08:49:27 -04:00
Mikael Holmen	bb037c2a76	[ConstraintSystem] Remove local variable that is set but not read [NFC] gcc 7.4 warns about it.	2020-09-17 14:26:48 +02:00
Simon Pilgrim	abe0d8551d	MetadataLoader.cpp - remove unnecessary StringRef include. NFCI. Already included in MetadataLoader.h	2020-09-17 13:18:54 +01:00

1 2 3 4 5 ...

139349 Commits