llvm-project

Commit Graph

Author	SHA1	Message	Date
Benjamin Kramer	ce4acb01b3	Avoid unused variable warning in Release builds	2021-04-06 16:25:19 +02:00
Jay Foad	efc7bf27f5	[AMDGPU] SIFoldOperands: use MachineRegisterInfo::hasOneNonDBGUser NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	005dcd196e	[AMDGPU] SIFoldOperands: use range-based loops and make_early_inc_range NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	ce9cca6c3a	[AMDGPU] SIFoldOperands: rename tryFoldInst to tryFoldCndMask This follows the pattern of the other tryFold* functions. NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	cf4f5292f6	[AMDGPU] SIFoldOperands: use getVRegDef instead of getUniqueVRegDef We are in SSA so getVRegDef is equivalent but simpler. NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	e9608a84d8	[AMDGPU][SDag] Add IMG init also for image_gather4 instructions This fixes an oversight in D99747 which moved the IMG init code from SIAddIMGInit to AdjustInstrPostInstrSelection, but did not set the hasPostISelHook flag on gather4 instructions. Differential Revision: https://reviews.llvm.org/D99953	2021-04-06 14:47:20 +01:00
Kerry McLaughlin	7344f3d39a	[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to reorder FP operations. However, it may still be beneficial to vectorize the loop by moving the reduction inside the vectorized loop and making sure that the scalar reduction value be an input to the horizontal reduction, e.g: %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ] %load = load <8 x float> %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load) This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions. For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D98435	2021-04-06 14:45:34 +01:00
Simon Pilgrim	1dcb5b5e89	[X86] Improve optimizeCompareInstr for signed comparisons after ANDN instructions Extend D94856 to handle 'andn' instructions as well	2021-04-06 14:16:16 +01:00
Roman Lebedev	31d219d299	[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-06 15:58:14 +03:00
Simon Pilgrim	b8aba76a4e	LoopFlatten - CanWidenIV - Fix uninitialized variable warnings and use for-range loop. NFCI. Fix static analysis uninitialized variable warnings, and use for-range loop iteration across WideIVs array.	2021-04-06 12:24:20 +01:00
Abhina Sreeskantharajan	82b3e28e83	[SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text Problem: On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable. Solution: This patch adds two new flags - OF_CRLF which indicates that CRLF translation is used. - OF_TextWithCRLF = OF_Text \| OF_CRLF indicates that the file is text and uses CRLF translation. Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF. So this is the behaviour per platform with my patch: z/OS: OF_None: open in binary mode OF_Text : open in text mode OF_TextWithCRLF: open in text mode Windows: OF_None: open file with no carriage return OF_Text: open file with no carriage return OF_TextWithCRLF: open file with carriage return The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set. ``` if (Flags & OF_CRLF) CrtOpenFlags \|= _O_TEXT; ``` These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows. ./llvm/lib/Support/raw_ostream.cpp ./llvm/lib/TableGen/Main.cpp ./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp ./llvm/unittests/Support/Path.cpp ./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp ./clang/lib/Frontend/CompilerInstance.cpp ./clang/lib/Driver/Driver.cpp ./clang/lib/Driver/ToolChains/Clang.cpp Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99426	2021-04-06 07:23:31 -04:00
Kerry McLaughlin	857b8a73da	[LoopVectorize] Change the identity element for FAdd Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd. Reviewed By: dmgreen, spatel Differential Revision: https://reviews.llvm.org/D98963	2021-04-06 12:13:43 +01:00
Florian Hahn	a6b06b785c	[VPlan] Print VPValue operands for VPWidenPHI if possible. For VPWidenPHIRecipes that model all incoming values as VPValue operands, print those operands instead of printing the original PHI. D99294 updates recipes of reduction PHIs to use the VPValue for the incoming value from the loop backedge, making use of this new printing.	2021-04-06 12:11:21 +01:00
Dmitry Preobrazhensky	3eadcb86ab	[AMDGPU][MC][GFX9] Corrected SMEM decoding Corrected SMEM decoding when IMM=0 and OFFSET>127 Fixed bug 49819 (https://bugs.llvm.org/show_bug.cgi?id=49819) Differential Revision: https://reviews.llvm.org/D99804	2021-04-06 14:10:46 +03:00
Simon Pilgrim	201877d572	[CostModel][X86] Improve accuracy of vXi8 multiply reduction costs After rG47321c311bdbe0145b9bf45d822185c37b19fa50 we promote vXi8 reductions to vXi16 to create a much faster PMULLW mul reduction, followed by a (free) truncation. This avoids the high cost of repeated vXi8 multiplications (which extend+multiply+truncate to/from vXi16 types....). Fixes the missing vXi8 mul reduction vectorization in PR42674 (Comment #20) 'mul16' test case.	2021-04-06 11:53:22 +01:00
madhur13490	167ea67d76	[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction This patch enhances hasAddressTaken() to ignore bitcasts as a callee in callbase instruction. Such bitcast usage doesn't really take the address in a useful meaningful way. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D98884	2021-04-06 09:23:46 +00:00
Simon Pilgrim	ddbb58736a	[KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI. As promised in D98866	2021-04-06 10:11:41 +01:00
Sjoerd Meijer	d5f1131c81	[AArch64] Default to zero-cycle-zeroing FP registers It is generally beneficial to prefer "movi d0, #0" over "fmov s0, wzr" as this is most efficient across all cores; it is recognised as a zeroing idiom. For newer cores, fmov instructions can also be eliminated early and there is no difference with movi, but some implementations lack this so is not true for other/older cores. Thus this standardises on using movi as this should always gives the same or better performance than the fmov with wzr. Differential Revision: https://reviews.llvm.org/D99586	2021-04-06 09:47:50 +01:00
Sjoerd Meijer	ef05b08c61	[AArch64] Use 64-bit movi for zeroing halfs/floats This was using the .2d variant which zeros 128 bits, but using the .2s variant that zeros 64 bits is faster on some cores. This is a prep step for D99586 to always using movi for zeroing floats. Differential Revision: https://reviews.llvm.org/D99710	2021-04-06 08:42:13 +01:00
Yevgeny Rouban	98742e42fc	[NewPM] Fix unused lambda capture build error Fixes commit 39e3e3aa51d: Redesign of PreserveCFG Checker	2021-04-06 13:14:16 +07:00
Yevgeny Rouban	39e3e3aa51	[NewPM] Redesign of PreserveCFG Checker The reason for the NewPM redesign is described in the commit cba3e783389a: [NewPM] Disable PreservedCFGChecker ... The checker introduces an internal custom CFG analysis that tracks current up-to date CFG snapshot. The analysis is invalidated along any other CFG related analysis (the key is CFGAnalyses). If the CFG analysis is not invalidated at a functional pass exit then the checker asserts that the CFG snapshot taken from this analysis is equals to a snapshot of the current CFG. Along the way: - the function CFG::printDiff() is simplified by removing function name calculation. The name is printed by the caller; - fixed CFG invalidated condition (see CFG::invalidate()); - StandardInstrumentations::registerCallbacks() gets additional optional parameter of type FunctionAnalysisManager*, which is needed by the checker to get the custom CFG analysis; - several PM related tests updated to explicitly set -verify-cfg-preserved=1 as they need. This patch is safe to land as the CFGChecker is left switched off (the options -verify-cfg-preserved is false by default). It will be switched on by a separate patch to minimize possible reverts. Reviewed By: skatkov, kuhar Differential Revision: https://reviews.llvm.org/D91327	2021-04-06 12:35:49 +07:00
Serguei Katkov	0057ec8034	[Statepoint] Factor-out utility function to get non-foldable area of STATEPOINT like instructions. NFC Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D99875	2021-04-06 11:44:37 +07:00
Craig Topper	cb1028a0b9	[RISCV] When custom iseling masked stores, copy the mask into V0 instead of virtual register. I missed a few intrinsics in `3dd4aa7d09` when I did this for masked loads and masked segment loads/stores. Found while trying to share more code between these custom isel functions.	2021-04-05 21:28:32 -07:00
Philip Reames	58ccbd0d08	Comment adjustments for a rename	2021-04-05 21:07:42 -07:00
Arthur Eubanks	ea0e2ca1ac	[SROA] Allow SROA on pointers with invariant group intrinsic uses When we are able to SROA an alloca, we know all uses of it, meaning we don't have to preserve the invariant group intrinsics and metadata. It's possible that we could lose information regarding redundant loads/stores, but that's unlikely to have any real impact since right now the only user is Clang and vtables. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99760	2021-04-05 19:53:40 -07:00
Philip Reames	13deb6aac7	Exact ashr/lshr don't loose any set bits and are thus trivially invertible Use that fact to improve isKnownNonEqual.	2021-04-05 19:22:36 -07:00
Philip Reames	dc8d864e3a	Address minor post commit feedback on 0e59dd	2021-04-05 18:22:17 -07:00
Stanislav Mekhanoshin	30b3aab329	Copy syncscope when expanding atomicrmw into cmpxchg loop Fixes: SWDEV-280070 Differential Revision: https://reviews.llvm.org/D99902	2021-04-05 17:29:38 -07:00
Sanjay Patel	e2a0f512ea	[InstSimplify] fix potential miscompile in select value equivalence This is the sibling fix to `c590a9880d` - as there, we can't subsitute a vector value the equality compare replacement that we are trying requires that the comparison is true for the entire value. Vector select can be partly true/false.	2021-04-05 16:52:34 -04:00
Craig Topper	780a47285a	[RISCV] Add SDTCisInt to the SDTRVVSlide1 since it is only used for vslide1up.vx/vslide1down.vx. The scalar type is already marked as XLenVT. The floating point version would need a different rule.	2021-04-05 13:03:39 -07:00
Craig Topper	af2837675a	[RISCV] Split RISCVISD::VMV_S_XF_VL into separate integer and FP. It's a bit silly, but it allows us to write stricter type constraints for isel. There's still some extra type checks in the generated table due to some type interference limitations around HWMode.	2021-04-05 12:57:35 -07:00
Philip Reames	b0e59dd6e1	Extract a helper for figuring out if an operator is invertible [nfc] For use in an uncoming patch. Left out the phi case (which could otherwise fit in this framework) as it would cause infinite recursion in said patch. We can probably also leverage this in instcombine to ensure we keep the two sets of related analysis and transforms in sync.	2021-04-05 12:14:21 -07:00
Craig Topper	7edda698c0	[RISCV] Move VSLIDE1UP_VX pattern out of a loop that includes FP types. FP would need VFSLIDE1UP_VF which uses an FP register.	2021-04-05 12:05:54 -07:00
Ricky Taylor	4db18d62af	[M68k] Add support for Motorola literal syntax to AsmParser These look like $00A0cf for hex and %001010101 for binary. They are used in Motorola assembly syntax. Differential Revision: https://reviews.llvm.org/D98519	2021-04-05 20:02:29 +01:00
Tom Stellard	982396ddd7	Revert "Fix build rules for LLVM_WITH_Z3 after D95727" This reverts commit `d66f9c4f1e`. This was a follow up fix for `43ceb74eb1`, which will be reverted.	2021-04-05 10:46:19 -07:00
Cyndy Ishida	0116d04d04	[TextAPI] move source code files out of subdirectory, NFC TextAPI/ELF has moved out into InterfaceStubs, so theres no longer a need to seperate out TextAPI between formats. Reviewed By: ributzka, int3, #lld-macho Differential Revision: https://reviews.llvm.org/D99811	2021-04-05 10:24:42 -07:00
Ta-Wei Tu	6a82ace5f2	[LoopFusion] Bails out if only the second candidate is guarded (PR48060) If only the second candidate loop is guarded while the first one is not, fusioning two loops might not be valid but this check is currently missing. Fixes https://bugs.llvm.org/show_bug.cgi?id=48060 Reviewed By: sidbav Differential Revision: https://reviews.llvm.org/D99716	2021-04-06 01:08:56 +08:00
Fraser Cormack	af3a839c70	[RISCV] Add support for bitcasts between scalars and fixed-length vectors This patch supports bitcasts from scalar types to fixed-length vectors and vice versa. It custom-lowers and custom-legalizes them to EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element vectors to hold the scalar where appropriate. Previously, some of these would fail to select, others would be expanded through stack loads and stores. Effort was made to ensure the codegen avoids the stack for both legal and illegal scalar types. Some of the codegen could be improved, but on first glance it looks like a general optimization of EXTRACT_VECTOR_ELT when extracting an i64 element on RV32. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99667	2021-04-05 17:21:55 +01:00
Sanjay Patel	c590a9880d	[InstCombine] fix potential miscompile in select value equivalence As shown in the example based on: https://llvm.org/PR49832 ...and the existing test, we can't substitute a vector value because the equality compare replacement that we are attempting requires that the comparison is true for the entire value. Vector select can be partly true/false.	2021-04-05 12:25:40 -04:00
John Paul Adrian Glaubitz	62a94b725c	[M68k] Mark public functions with the LLVM_EXTERNAL_VISIBILITY macro In `0dbcb36394`, most most target symbols were made hidden by default with the public ones marked with LLVM_EXTERNAL_VISIBILITY. When the M68k target was added, this particular change was forgotten so that external tools cannot make use of the public M68k target functions in libLLVM.so. Thus, add the missing LLVM_EXTERNAL_VISIBILITY macro to all public target functions in the M68k backend. Differential Revision: https://reviews.llvm.org/D99869	2021-04-05 09:24:30 -07:00
Fraser Cormack	3f0df4d7b0	[RISCV] Expand scalable-vector truncstores and extloads Caught in internal testing, these operations are assumed legal by default, even for scalable vector types. Expand them back into separate truncations and stores, or loads and extensions. Also add explicit fixed-length vector tests for these operations, even though they should have been correct already. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99654	2021-04-05 17:03:45 +01:00
Alexey Bataev	00a84f9a7f	[SLP]Improve vectorization of the CmpInst instructions. During vectorization better to postpone the vectorization of the CmpInst instructions till the end of the basic block. Otherwise we may vectorize it too early and may miss some vectorization patterns, like reductions. Reworked part of D57059 Differential Revision: https://reviews.llvm.org/D99796	2021-04-05 06:22:51 -07:00
Alex Orlov	5f57793c4f	* NFC. Refactored DIPrinter for better support of new print styles. This patch introduces a DIPrinter interface to implement by different output style printer implementations. DIPrinterGNU and DIPrinterLLVM implement the GNU and LLVM output style printing respectively. No functional changes. This refactoring clarifies and simplifies the code, and makes a new output style addition easier. Reviewed By: jhenderson, dblaikie Differential Revision: https://reviews.llvm.org/D98994	2021-04-05 15:40:41 +04:00
Simon Pilgrim	36d4f6d7f8	[X86] Fold xor(zext(xor(x,c1)),c2) -> xor(zext(x),xor(zext(c1),c2)) Fixes PR47603 (second case) by extending rG89afec348dbd3e5078f176e978971ee2d3b5dec8	2021-04-05 11:40:37 +01:00
Craig Topper	4708a05da0	[RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled. The W version of orc.b does not exist in Zbp so we need to use gorci encoding. If we have Zbp, we can use gorciw which can avoid a sext.w in some cases.	2021-04-04 17:14:28 -07:00
Roman Lebedev	2760a808b9	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): check that adding shift amounts doesn't overflow (PR49778) This is identical to `781d077afb`, but for the other function. For certain shift amount bit widths, we must first ensure that adding shift amounts is safe, that the sum won't have an unsigned overflow. Fixes https://bugs.llvm.org/show_bug.cgi?id=49778	2021-04-04 23:26:41 +03:00
Roman Lebedev	dceb3e5996	[NFC][InstCombine] Extract canTryToConstantAddTwoShiftAmounts() as helper	2021-04-04 23:26:41 +03:00
Craig Topper	98d5db3e3a	[RISCV] Lower orc.b intrinsic to RISCVISD::GORCI. This will allow us to share any future known bits, demaned bits, or sign bits improvements.	2021-04-04 12:31:41 -07:00
Sanjay Patel	c0645f1324	[InstCombine] fold popcount of exactly one bit to shift This is discussed in https://llvm.org/PR48999 , but it does not solve that request. The difference in the vector test shows that some other logic transform is limited to scalar types.	2021-04-04 11:43:49 -04:00
Nikita Popov	9bad7de9a3	[SimplifyCFG] Handle two equal cases in switch to select When converting a switch with two cases and a default into a select, also handle the denegerate case where two cases have the same value. Generate this case directly as %or = or i1 %cmp1, %cmp2 %res = select i1 %or, i32 %val, i32 %default rather than %sel1 = select i1 %cmp1, i32 %val, i32 %default %res = select i1 %cmp2, i32 %val, i32 %sel1 as InstCombine is going to canonicalize to the former anyway.	2021-04-04 17:27:28 +02:00
Nikita Popov	72e0846ef8	[LVI] Don't bail on overdefined value in select Even if one of the operands is overdefined, we may still produce a non-overdefined result, e.g. due to a min/max operation. This matches our handling elsewhere, e.g. for binary operators. The slot poisoning comment refers to a much older LVI cache implementation.	2021-04-04 11:11:01 +02:00
Craig Topper	a2ea003fcb	[RISCV] Don't convert fshr/fshl to target specific FSL/FSR node if shift amount is a constant. As long as it's a constant we can directly pattern match it without any problems. It's only when it isn't a constant that we need to add an AND. In theory this should allow more target independent optimizations to remain active.	2021-04-03 23:13:30 -07:00
Juneyoung Lee	5207cde5cb	[InstCombine] Conditionally fold select i1 into and/or This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or: ``` select cond, cond2, false -> and cond, cond2 ``` This is not safe if cond2 is poison whereas cond isn’t. Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s. To minimize its impact, this patch conservatively checks whether cond2 is an instruction that creates a poison or its operand creates a poison. This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99674	2021-04-04 14:11:28 +09:00
Mircea Trofin	b32e76c6d5	[mlgo] fix build rules This was prompted by D95727, which had the side-effect to break the 'release' mode build bot for ML-driven policies. The problem is that now the pre-compiled object files don't get transitively carried through as 'source' anymore; that being said, the previous way of consuming them was problematic, because it was only working for static builds; in dynamic builds, the whole tf_xla_runtime was linked, which is undesirable. The alternative is to treat tf_xla_runtime as an archive, which then leads to the desired effect. Differential Revision: https://reviews.llvm.org/D99829	2021-04-03 12:49:03 -07:00
Roman Lebedev	7727cc242d	[NFC][X86] Split VPMOV* AVX2 instructions into their own sched class At least on all three Zen's, all such instructions cleanly map into this new class with no overrides needed.	2021-04-03 22:39:07 +03:00
Nikita Popov	665065821e	[FastISel] Remove kill tracking This is a followup to D98145: As far as I know, tracking of kill flags in FastISel is just a compile-time optimization. However, I'm not actually seeing any compile-time regression when removing the tracking. This probably used to be more important in the past, before FastRA was switched to allocate instructions in reverse order, which means that it discovers kills as a matter of course. As such, the kill tracking doesn't really seem to serve a purpose anymore, and just adds additional complexity and potential for errors. This patch removes it entirely. The primary changes are dropping the hasTrivialKill() method and removing the kill arguments from the emitFast methods. The rest is mechanical fixup. Differential Revision: https://reviews.llvm.org/D98294	2021-04-03 15:50:13 +02:00
Simon Pilgrim	89afec348d	[X86] Fold xor(truncate(xor(x,c1)),c2) -> xor(truncate(x),xor(truncate(c1),c2)) Fixes PR47603 This should probably be transferable to DAGCombine - the main limitation with the existing trunc(logicop) DAG fold is we don't know if legalization has tried to promote truncated logicops already. We might be able to peek through extensions as well.	2021-04-03 12:43:05 +01:00
Simon Pilgrim	7c17f1ea84	[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper (REAPPLIED) Use the getTargetShuffleInputs helper for all shuffle decoding Reapplied (after reversion in rGfa0aff6d6960) with fix+test for subvector splitting - we weren't accounting for peeking through bitcasts changing the vector element count of the shuffle sources.	2021-04-03 11:59:19 +01:00
Bjorn Pettersson	d66f9c4f1e	Fix build rules for LLVM_WITH_Z3 after D95727 Started to see build errors like this ../lib/Support/Z3Solver.cpp:19:10: fatal error: 'z3.h' file not found #include <z3.h> ^~~~~~ 1 error generated. after commit `43ceb74eb1`. The -isystem path to the Z3_INCLUDE_DIR wen't missing in the compile commands. No idea why target_include_directories stopped working with that commit, but using include_directories seem to work better.	2021-04-03 12:25:37 +02:00
Nikita Popov	b552e16b0b	[Loads] Forward constant vector store to load of first element InstCombine performs simple forwarding from stores to loads, but currently only handles the case where the load and store have the same size. This extends it to also handle a store of a constant with a larger size followed by a load with a smaller size. This is implemented through ConstantFoldLoadThroughBitcast() which is fairly primitive (e.g. does not allow storing a large integer and then loading a small one), but at least can forward the first element of a vector store. Unfortunately it seems that we currently don't have a generic helper for "read a constant value as a different type", it's all tangled up with other logic in either ConstantFolding or VNCoercion. Differential Revision: https://reviews.llvm.org/D98114	2021-04-03 12:10:31 +02:00
Nikita Popov	9d20eaf9c0	[BasicAA] Don't store AATags in cache key (NFC) The AAMDNodes part of the MemoryLocation is not used by the BasicAA cache, so don't store it. This reduces the size of each cache entry from 112 bytes to 48 bytes.	2021-04-03 11:32:01 +02:00
Nikita Popov	17b4e5d456	[BasicAA] Don't pass through AA metadata (NFCI) BasicAA itself doesn't make use of AA metadata, but passes it through to recursive queries and makes it part of the cache key. Aliasing decisions that are based on AA metadata (i.e. TBAA and ScopedAA) are based only on AA metadata, so checking them with different pointer values or sizes is not useful, the result will always be the same. While this change is a mild compile-time improvement by itself, the actual goal here is to reduce the size of AA cache keys in a followup change. Differential Revision: https://reviews.llvm.org/D90098	2021-04-03 11:21:50 +02:00
Simon Pilgrim	4ea5475a3f	[KnownBits] Add KnownBits::haveNoCommonBitsSet helper. NFCI. Include exhaustive test coverage.	2021-04-02 21:44:33 +01:00
Eric Astor	0499a9d688	[ms] [llvm-ml] Accept /WX to signal that warnings should be fatal. Define -fatal-warnings to make warnings fatal, and accept /WX as an ML.EXE compatible alias for it. Also make sure that if Warning() returns true, we always treat it as an error. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D92504	2021-04-02 15:13:20 -04:00
Levy Hsu	f78d932cf2	[RISCV] Add IR intrinsics for Zbc extension Head files are included in a separate patch in case the name needs to be changed. RV32 / 64: clmul clmulh clmulr Differential Revision: https://reviews.llvm.org/D99711	2021-04-02 12:09:13 -07:00
Levy Hsu	944adbf285	Recommit "[RISCV] Add IR intrinsic for Zbb extension" Forgot to amend the Author. Original commit message: Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b Differential Revision: https://reviews.llvm.org/D99320	2021-04-02 11:50:19 -07:00
Craig Topper	1f0b309f24	Revert "[RISCV] Add IR intrinsic for Zbb extension" This reverts commit `1808194590`. I forgot to change the author.	2021-04-02 11:47:02 -07:00
Cyndy Ishida	3a223cd4f3	[TextAPI] run clang-format on violating sections, NFC	2021-04-02 11:44:33 -07:00
Craig Topper	1808194590	[RISCV] Add IR intrinsic for Zbb extension Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b	2021-04-02 11:23:57 -07:00
Fangrui Song	8e5f3d04f2	[SLPVectorizer] Fix divide-by-zero after D99719 Will add a test case later.	2021-04-02 11:13:51 -07:00
Eric Astor	15ec0ad77a	[ms] [llvm-ml] Fix case-sensitivity for variables and textmacros Make variables and text-macro references case-insensitive, to match ml.exe. Also improve error handling for text-macro expansion. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D92503	2021-04-02 14:08:02 -04:00
Levy Hsu	b001d574d7	[RISCV] Add IR intrinsic for Zbr extension Implementation for RISC-V Zbr extension intrinsic. Header files are included in separate patch in case the name needs to be changed RV32 / 64: crc32b crc32h crc32w crc32cb crc32ch crc32cw RV64 Only: crc32d crc32cd Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99009	2021-04-02 10:58:45 -07:00
Craig Topper	d7ffa82a8e	[RISCV] Improve 64-bit integer constant materialization for more cases. For positive constants we try shifting left to remove leading zeros and fill the bottom bits with 1s. We then materialize that constant shift it right. This patch adds a new strategy to try filling the bottom bits with zeros instead. This catches some additional cases.	2021-04-02 10:18:08 -07:00
Sanjay Patel	412fc74140	[InstCombine] fold not+or+neg ~((-X) \| Y) --> (X - 1) & (~Y) We generally prefer 'add' over 'sub', this reduces the dependency chain, and this looks better for codegen on x86, ARM, and AArch64 targets. https://llvm.org/PR45755 https://alive2.llvm.org/ce/z/cxZDSp	2021-04-02 13:16:36 -04:00
Dimitry Andric	6abb92f210	[SCCP] Avoid modifying AdditionalUsers while iterating over it When run under valgrind, or with a malloc that poisons freed memory, this can lead to segfaults or other problems. To avoid modifying the AdditionalUsers DenseMap while still iterating, save the instructions to be notified in a separate SmallPtrSet, and use this to later call OperandChangedState on each instruction. Fixes PR49582. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98602	2021-04-02 19:05:59 +02:00
Florian Hahn	8867fc69f0	[LV] Hoist mapping of IR operands to VPValues (NFC). This patch moves mapping of IR operands to VPValues out of tryToCreateWidenRecipe. This allows using existing VPValue operands when widening recipes directly, which will be introduced in future patches.	2021-04-02 17:57:20 +01:00
Philip Reames	2c4548e18e	[rs4gc] Use loops instead of straightline code for attribute stripping [nfc] Mostly because I'm about to add more attributes and the straightline copies get much uglier. What's currently there isn't too bad.	2021-04-02 09:25:15 -07:00
Philip Reames	a505801e2b	[rs4gc] Strip nofree and nosync attributes when lowering from abstract model The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering. I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.	2021-04-02 09:12:24 -07:00
Brendon Cahoon	09a88278cb	[GlobalISel] Allow different types for G_SBFX and G_UBFX operands Change the definition of G_SBFX and G_UBFX so that the lsb and width can have different types than the src and dst operands. Differential Revision: https://reviews.llvm.org/D99739	2021-04-02 11:11:06 -04:00
Nikita Popov	4a3e006830	[LVI] Use range metadata on intrinsics If we don't know how to handle an intrinsic, we should still make use of normal call range metadata.	2021-04-02 16:45:31 +02:00
Alexey Bataev	5fcb07a070	[SLP]Fix a bug in min/max reduction, number of condition uses. The ultimate reduction node may have multiple uses, but if the ultimate reduction is min/max reduction and based on SelectInstruction, the condition of this select instruction must have only single use. Differential Revision: https://reviews.llvm.org/D99753	2021-04-02 07:09:44 -07:00
Nico Weber	fa0aff6d69	Revert "[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper" This reverts commit `500969f1d0`. Makes clang assert compiling avx2 code, see https://bugs.chromium.org/p/chromium/issues/detail?id=1195353#c4 for a standalone repro.	2021-04-02 09:55:55 -04:00
Jun Ma	274ac9d40e	[AArch64][SVE] Lowering sve.dot to DOT node Differential Revision: https://reviews.llvm.org/D99699	2021-04-02 20:05:17 +08:00
Jun Ma	ab3c5fb282	[NFC][SVE] Use SVE_4_Op_Imm_Pat for sve_intx_dot_by_indexed_elem	2021-04-02 20:05:17 +08:00
Jeroen Dobbelaere	b82b305cf9	[InstCombine] Fix out-of-bounds ashr(shl) optimization This fixes a crash found by the oss fuzzer and reported by @fhahn. The suggestion of @RKSimon seems to be the correct fix here. (See D91343). The oss fuzz report can be found here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32759 Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D99792	2021-04-02 13:45:11 +02:00
Simon Pilgrim	500969f1d0	[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper Use the getTargetShuffleInputs helper for all shuffle decoding	2021-04-02 11:50:18 +01:00
Sander de Smalen	0f7bbbc481	Always emit error for wrong interfaces to scalable vectors, unless cmdline flag is passed. In order to bring up scalable vector support in LLVM incrementally, we introduced behaviour to emit a warning, instead of an error, when asking the wrong question of a scalable vector, like asking for the fixed number of elements. This patch puts that behaviour under a flag. The default behaviour is that the compiler will always error, which means that all LLVM unit tests and regression tests will now fail when a code-path is taken that still uses the wrong interface. The behaviour to demote an error to a warning can be individually enabled for tools that want to support experimental use of scalable vectors. This patch enables that behaviour when driving compilation from Clang. This means that for users who want to try out scalable-vector support, fixed-width codegen support, or build user-code with scalable vector intrinsics, Clang will not crash and burn when the compiler encounters such a case. This allows us to do away with the following pattern in many of the SVE tests: RUN: .... 2>%t RUN: cat %t \| FileCheck --check-prefix=WARN WARN-NOT: warning: ... The behaviour to emit warnings is only temporary and we expect this flag to be removed in the future when scalable vector support is more stable. This patch also has fixes the following tests: unittests: ScalableVectorMVTsTest.SizeQueries SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG regression tests: Transforms/InstCombine/vscale_gep.ll Reviewed By: paulwalker-arm, ctetreau Differential Revision: https://reviews.llvm.org/D98856	2021-04-02 10:55:22 +01:00
Florian Hahn	0f3230390b	[SLP] Better estimate cost of no-op extracts on target vectors. The motivation for this patch is to better estimate the cost of extracelement instructions in cases were they are going to be free, because the source vector can be used directly. A simple example is %v1.lane.0 = extractelement <2 x double> %v.1, i32 0 %v1.lane.1 = extractelement <2 x double> %v.1, i32 1 %a.lane.0 = fmul double %v1.lane.0, %x %a.lane.1 = fmul double %v1.lane.1, %y Currently we only consider the extracts free, if there are no other users. In this particular case, on AArch64 which can fit <2 x double> in a vector register, the extracts should be free, independently of other users, because the source vector of the extracts will be in a vector register directly, so it should be free to use the vector directly. The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on certain AArch64 CPUs. It looks like this does not impact any code in SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto. This originally regressed after D80773, so if there's a better alternative to explore, I'd be more than happy to do that. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D99719	2021-04-02 10:40:12 +01:00
Fraser Cormack	3b48d849d4	[RISCV] Optimize more redundant VSETVLIs D99717 introduced some test cases which showed that the output of one vsetvli into another would not be picked up by the RISCVCleanupVSETVLI pass. This patch teaches the optimization about such a pattern. The pattern is quite common when using the RVV vsetvli intrinsic to pass the VL onto other intrinsics. The second test case introduced by D99717 is left unoptimized by this patch. It is a rarer case and will require us to rewire any uses of the redundant vset[i]vli's output to the previous one's. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99730	2021-04-02 10:04:07 +01:00
Evgeniy Brevnov	2388aae401	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev, lebedev.ri Differential Revision: https://reviews.llvm.org/D88287	2021-04-02 15:30:13 +07:00
Roman Lebedev	a26f1bf67e	[PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (sic) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9015799 \| -131 \| 0.00% \| 0.00% \| \| indvars.NumElimCmp \| 3536 \| 3544 \| 8 \| 0.23% \| 0.23% \| \| indvars.NumElimExt \| 36725 \| 36580 \| -145 \| -0.39% \| 0.39% \| \| indvars.NumElimIV \| 1197 \| 1187 \| -10 \| -0.84% \| 0.84% \| \| indvars.NumElimIdentity \| 143 \| 136 \| -7 \| -4.90% \| 4.90% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29890 \| 48 \| 0.16% \| 0.16% \| \| indvars.NumReplaced \| 2293 \| 2227 \| -66 \| -2.88% \| 2.88% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26329 \| -109 \| -0.41% \| 0.41% \| \| instcount.TotalBlocks \| 1178338 \| 1173840 \| -4498 \| -0.38% \| 0.38% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9896139 \| -9303 \| -0.09% \| 0.09% \| \| lcssa.NumLCSSA \| 425871 \| 423961 \| -1910 \| -0.45% \| 0.45% \| \| licm.NumHoisted \| 378357 \| 378753 \| 396 \| 0.10% \| 0.10% \| \| licm.NumMovedCalls \| 2193 \| 2208 \| 15 \| 0.68% \| 0.68% \| \| licm.NumMovedLoads \| 35899 \| 31821 \| -4078 \| -11.36% \| 11.36% \| \| licm.NumPromoted \| 11178 \| 11154 \| -24 \| -0.21% \| 0.21% \| \| licm.NumSunk \| 13359 \| 13587 \| 228 \| 1.71% \| 1.71% \| \| loop-delete.NumDeleted \| 8547 \| 8402 \| -145 \| -1.70% \| 1.70% \| \| loop-instsimplify.NumSimplified \| 12876 \| 11890 \| -986 \| -7.66% \| 7.66% \| \| loop-peel.NumPeeled \| 1008 \| 925 \| -83 \| -8.23% \| 8.23% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42003 \| -12 \| -0.03% \| 0.03% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 242 \| 2 \| 0.83% \| 0.83% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 497 \| 20 \| -477 \| -95.98% \| 95.98% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 336 \| -282 \| -45.63% \| 45.63% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11032 \| 4 \| 0.04% \| 0.04% \| \| loop-unroll.NumUnrolled \| 12608 \| 12529 \| -79 \| -0.63% \| 0.63% \| \| mem2reg.NumDeadAlloca \| 10222 \| 10221 \| -1 \| -0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192110 \| 192106 \| -4 \| 0.00% \| 0.00% \| \| mem2reg.NumSingleStore \| 637650 \| 637643 \| -7 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 814 \| 812 \| -2 \| -0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282934 \| -174 \| -0.06% \| 0.06% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106718 \| 6 \| 0.01% \| 0.01% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9014474 \| -1456 \| -0.02% \| 0.02% \| \| indvars.NumElimCmp \| 3536 \| 3546 \| 10 \| 0.28% \| 0.28% \| \| indvars.NumElimExt \| 36725 \| 36681 \| -44 \| -0.12% \| 0.12% \| \| indvars.NumElimIV \| 1197 \| 1185 \| -12 \| -1.00% \| 1.00% \| \| indvars.NumElimIdentity \| 143 \| 146 \| 3 \| 2.10% \| 2.10% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29899 \| 57 \| 0.19% \| 0.19% \| \| indvars.NumReplaced \| 2293 \| 2299 \| 6 \| 0.26% \| 0.26% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26404 \| -34 \| -0.13% \| 0.13% \| \| instcount.TotalBlocks \| 1178338 \| 1173652 \| -4686 \| -0.40% \| 0.40% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9895452 \| -9990 \| -0.10% \| 0.10% \| \| lcssa.NumLCSSA \| 425871 \| 425373 \| -498 \| -0.12% \| 0.12% \| \| licm.NumHoisted \| 378357 \| 383352 \| 4995 \| 1.32% \| 1.32% \| \| licm.NumMovedCalls \| 2193 \| 2204 \| 11 \| 0.50% \| 0.50% \| \| licm.NumMovedLoads \| 35899 \| 35755 \| -144 \| -0.40% \| 0.40% \| \| licm.NumPromoted \| 11178 \| 11163 \| -15 \| -0.13% \| 0.13% \| \| licm.NumSunk \| 13359 \| 14321 \| 962 \| 7.20% \| 7.20% \| \| loop-delete.NumDeleted \| 8547 \| 8538 \| -9 \| -0.11% \| 0.11% \| \| loop-instsimplify.NumSimplified \| 12876 \| 12041 \| -835 \| -6.48% \| 6.48% \| \| loop-peel.NumPeeled \| 1008 \| 924 \| -84 \| -8.33% \| 8.33% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42005 \| -10 \| -0.02% \| 0.02% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 241 \| 1 \| 0.42% \| 0.42% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 619 \| 1 \| 0.16% \| 0.16% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11029 \| 1 \| 0.01% \| 0.01% \| \| loop-unroll.NumUnrolled \| 12608 \| 12525 \| -83 \| -0.66% \| 0.66% \| \| mem2reg.NumPHIInsert \| 192110 \| 192073 \| -37 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637650 \| 637652 \| 2 \| 0.00% \| 0.00% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282998 \| -110 \| -0.04% \| 0.04% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106691 \| -21 \| -0.02% \| 0.02% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 5185 \| 7 \| 0.14% \| 0.14% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 925 \| 11 \| 1.20% \| 1.20% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 179 \| -4 \| -2.19% \| 2.19% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: \| statistic name \| LICM-LoopRotate \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015799 \| 9014474 \| -1325 \| -0.01% \| 0.01% \| \| indvars.NumElimCmp \| 3544 \| 3546 \| 2 \| 0.06% \| 0.06% \| \| indvars.NumElimExt \| 36580 \| 36681 \| 101 \| 0.28% \| 0.28% \| \| indvars.NumElimIV \| 1187 \| 1185 \| -2 \| -0.17% \| 0.17% \| \| indvars.NumElimIdentity \| 136 \| 146 \| 10 \| 7.35% \| 7.35% \| \| indvars.NumLFTR \| 29890 \| 29899 \| 9 \| 0.03% \| 0.03% \| \| indvars.NumReplaced \| 2227 \| 2299 \| 72 \| 3.23% \| 3.23% \| \| indvars.NumWidened \| 26329 \| 26404 \| 75 \| 0.28% \| 0.28% \| \| instcount.TotalBlocks \| 1173840 \| 1173652 \| -188 \| -0.02% \| 0.02% \| \| instcount.TotalInsts \| 9896139 \| 9895452 \| -687 \| -0.01% \| 0.01% \| \| lcssa.NumLCSSA \| 423961 \| 425373 \| 1412 \| 0.33% \| 0.33% \| \| licm.NumHoisted \| 378753 \| 383352 \| 4599 \| 1.21% \| 1.21% \| \| licm.NumMovedCalls \| 2208 \| 2204 \| -4 \| -0.18% \| 0.18% \| \| licm.NumMovedLoads \| 31821 \| 35755 \| 3934 \| 12.36% \| 12.36% \| \| licm.NumPromoted \| 11154 \| 11163 \| 9 \| 0.08% \| 0.08% \| \| licm.NumSunk \| 13587 \| 14321 \| 734 \| 5.40% \| 5.40% \| \| loop-delete.NumDeleted \| 8402 \| 8538 \| 136 \| 1.62% \| 1.62% \| \| loop-instsimplify.NumSimplified \| 11890 \| 12041 \| 151 \| 1.27% \| 1.27% \| \| loop-peel.NumPeeled \| 925 \| 924 \| -1 \| -0.11% \| 0.11% \| \| loop-rotate.NumRotated \| 42003 \| 42005 \| 2 \| 0.00% \| 0.00% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 242 \| 241 \| -1 \| -0.41% \| 0.41% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 20 \| 497 \| 477 \| 2385.00% \| 2385.00% \| \| loop-simplifycfg.NumTerminatorsFolded \| 336 \| 619 \| 283 \| 84.23% \| 84.23% \| \| loop-unroll.NumCompletelyUnrolled \| 11032 \| 11029 \| -3 \| -0.03% \| 0.03% \| \| loop-unroll.NumUnrolled \| 12529 \| 12525 \| -4 \| -0.03% \| 0.03% \| \| mem2reg.NumDeadAlloca \| 10221 \| 10222 \| 1 \| 0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192106 \| 192073 \| -33 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637643 \| 637652 \| 9 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 812 \| 814 \| 2 \| 0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 282934 \| 282998 \| 64 \| 0.02% \| 0.02% \| \| scalar-evolution.NumTripCountsNotComputed \| 106718 \| 106691 \| -27 \| -0.03% \| 0.03% \| \| simple-loop-unswitch.NumBranches \| 4752 \| 5185 \| 433 \| 9.11% \| 9.11% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 503 \| 925 \| 422 \| 83.90% \| 83.90% \| \| simple-loop-unswitch.NumSwitches \| 18 \| 20 \| 2 \| 11.11% \| 11.11% \| \| simple-loop-unswitch.NumTrivial \| 95 \| 179 \| 84 \| 88.42% \| 88.42% \| {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249	2021-04-02 11:11:42 +03:00
Wenlei He	c5605857bb	[CSSPGO] Skip dangling probe value when computing profile summary Recently we switched to use InvalidProbeCount = UINT64_MAX (instead of 0) to represent dangling probe, but UINT64_MAX is not excluded when computing profile summary. This caused profile summary to produce incorrect hot/cold threshold. The change fixed it by excluding UINT64_MAX from summary builder. Differential Revision: https://reviews.llvm.org/D99788	2021-04-01 22:49:11 -07:00
Juneyoung Lee	c664769330	[AssumeBundles] offset should be added to correctly calculate align This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492). Consider this code: ``` call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)] %arrayidx = getelementptr inbounds i32, i32* %a, i64 -1 ; aligment of %arrayidx? ``` The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k. Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned. The reason why this happened is as follows. `DiffSCEV` stores `%arrayidx - %a` which is -4. `OffSCEV` stores the offset value of “align”, which is 28. `DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D98759	2021-04-02 12:32:05 +09:00
Yang Fan	bc6001ce1e	[X86] Fix -Wunused-function warning (NFC) GCC warning: ``` /llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:9212:13: warning: ‘bool isHorizOp(unsigned int)’ defined but not used [-Wunused-function] 9212 \| static bool isHorizOp(unsigned Opcode) { \| ^~~~~~~~~ ```	2021-04-02 09:38:12 +08:00
Philip Reames	91790c6785	[indvars[ Fix pr49802 by checking for SCEVCouldNotCompute The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known. This used to be true, but when we added handling for dead exits we broke this invariant. The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts. We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute. I chose the second as it was simpler. (Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...) p.s. Sorry for the lack of test case. Getting things into a state to actually hit this is difficult and fragile. The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one. All of my attempts to separate it into a standalone test failed.	2021-04-01 17:53:44 -07:00
Philip Reames	b23a314146	[funcattrs] Respect nofree attribute on callsites (not just callee)	2021-04-01 14:45:49 -07:00
Craig Topper	766d27dc85	[RISCV] Add isel patterns to handle vrsub intrinsic with 2 vector operands. This occurs when we type legalize an i64 scalar input on RV32. We need to manually splat, which requires a vector input. Rather than special case this in lowering just pattern match it.	2021-04-01 14:10:21 -07:00
David Green	da98177cda	[ARM] Allow v6m runtime loop unrolling This removes the restriction that only Thumb2 targets enable runtime loop unrolling, allowing it for Thumb1 only cores as well. The existing T2 heuristics are used (for the time being) to control when and how unrolling is performed. Differential Revision: https://reviews.llvm.org/D99588	2021-04-01 21:21:40 +01:00
Craig Topper	dbbc95e3e5	[RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat. The default legalization strategy is PromoteFloat which keeps half in single precision format through multiple floating point operations. Conversion to/from float is done at loads, stores, bitcasts, and other places that care about the exact size being 16 bits. This patches switches to the alternative method softPromoteHalf. This aims to keep the type in 16-bit format between every operation. So we promote to float and immediately round for any arithmetic operation. This should be closer to the IR semantics since we are rounding after each operation and not accumulating extra precision across multiple operations. X86 is the only other target that enables this today. See https://reviews.llvm.org/D73749 I had to update getRegisterTypeForCallingConv to force f16 to use f32 when the F extension is enabled. This way we can still pass it in the lower bits of an FPR for ilp32f and lp64f ABIs. The softPromoteHalf would otherwise always give i16 as the argument type. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D99148	2021-04-01 12:41:57 -07:00
Philip Reames	1e69a5af92	[Attributor] Cleanup detection of non-relaxed atomics in nosync inference The code was checking for cases which are disallowed by the verifier. Delete dead code and adjust style.	2021-04-01 12:01:29 -07:00
Philip Reames	8e596f7e27	[Attributor] Cleanup intrinsic handling in nosync inference [mostly NFC] Mostly stylistic adjustment, but the old code didn't handle the memcpy.inline intrinsic. By using the matcher class, we now do.	2021-04-01 11:49:59 -07:00
Philip Reames	6ef4505298	[funcattrs] Infer nosync from readnone and non-convergent This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (`0626367202`). This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive. Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work. Differential Revision: https://reviews.llvm.org/D99749	2021-04-01 11:37:34 -07:00
Philip Reames	db357891f0	Infer dereferenceability from malloc and friends Hookup TLI when inferring object size from allocation calls. This allows the analysis to prove dereferenceability for known allocation functions (such as malloc/new/etc) in addition to those marked explicitly with the allocsize attribute. This is a follow up to `0129cd5` now that the bug fixed by `e2c6621e6` is resolved. As noted in the test, this relies on being able to prove that there is no free between allocation and context (e.g. hoist location). At the moment, this is handled conservatively. I'm working strengthening out ability to reason about no-free regions separately. Differential Revision: https://reviews.llvm.org/D99737	2021-04-01 11:33:35 -07:00
Martin Storsjö	4391d764e1	[ARM] Remove an unused parameter in ARMWinCOFFObjectWriter. NFC. This writer only ever operates on 32 bit arm code. Differential Revision: https://reviews.llvm.org/D99575	2021-04-01 21:25:41 +03:00
Philip Reames	ffa15e9463	Extract isVolatile helper on Instruction [NFCI] We have this logic duplicated in several cases, none of which were exhaustive. Consolidate it in one place. I don't believe this actually impacts behavior of the callers. I think they all filter their inputs such that their partial implementations were correct. If not, this might be fixing a cornercase bug.	2021-04-01 11:24:02 -07:00
Nick Desaulniers	52338af569	[MC][ARM] add .w suffixes for RSB/RSBS T1 See also: F5.1.167 RSB, RSBS (register) T1 shift or rotate by value variant of the Arm ARM. Link: https://github.com/ClangBuiltLinux/linux/issues/1309 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D99542	2021-04-01 10:45:37 -07:00
Philip Reames	6b05d753e0	Mark unordered memset/memmove/memcpy as nosync Mostly a means to remove a bit of code from attributor in advance of implementing a FuncAttr inference for nosync.	2021-04-01 10:38:54 -07:00
Craig Topper	d157e3f387	[RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32. We need to splat the scalar separately and use .vv, but there is no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with swapped operands. We also need to get VT to use for the splat from an operand rather than the result since the result VT is nxvXi1. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99704	2021-04-01 10:38:05 -07:00
Nick Desaulniers	1addc231cd	[MC][ARM] add .w suffixes for ORN/ORNS T1 See also: F5.1.128 ORN, ORNS (register) T1 shift or rotate by value variant of the Arm ARM. Link: https://github.com/ClangBuiltLinux/linux/issues/1309 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D99538	2021-04-01 10:27:09 -07:00
Craig Topper	b7c2e577cc	[RISCV] Add custom type legalization to form MULHSU when possible. There's no target independent ISD opcode for MULHSU, so custom legalize 2*XLen multiplies ourselves. We have to be a little careful to prefer MULHU or MULHSU. I thought about doing this in isel by pattern matching the (add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided against this because the add might become part of a chain of adds. I don't trust DAG combine not to reassociate with other adds making it difficult to find both pieces again. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D99479	2021-04-01 10:15:55 -07:00
Jay Foad	fdc4f19e2f	[AMDGPU] Remove SIAddIMGInit pass which is now unused Differential Revision: https://reviews.llvm.org/D99748	2021-04-01 18:13:17 +01:00
Jay Foad	3d07a6d891	[AMDGPU][GlobalISel] Add IMG init in selectImageIntrinsic Doing this during instruction selection avoids the cost of running SIAddIMGInit which is yet another pass over the MIR. Differential Revision: https://reviews.llvm.org/D99670	2021-04-01 18:13:17 +01:00
Jay Foad	4af6251cea	[AMDGPU][SDag] Add IMG init in AdjustInstrPostInstrSelection Doing this in a post-isel hook avoids the cost of running SIAddIMGInit which is yet another pass over the MIR. Differential Revision: https://reviews.llvm.org/D99747	2021-04-01 18:13:17 +01:00
Craig Topper	d61b40ed27	[RISCV] Improve 64-bit integer materialization for some cases. This adds a new integer materialization strategy mainly targeted at 64-bit constants like 0xffffffff where there are 32 or more trailing ones with leading zeros. We can materialize these by using an addi -1 and srli to restore the leading zeros. This matches what gcc does. I haven't limited to just these cases though. The implementation here takes the constant, shifts out all the leading zeros and shifts ones into the LSBs, creates the new sequence, adds an srli, and checks if this is shorter than our original strategy. I've separated the recursive portion into a standalone function so I could append the new strategy outside of the recursion. Since external users are no longer using the recursive function, I've cleaned up the external interface to return the sequence instead of taking a vector by reference. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D98821	2021-04-01 09:12:52 -07:00
Jay Foad	b1fbfd9e4c	[AMDGPU] Small cleanup to constructRetValue and its caller. NFC.	2021-04-01 16:36:16 +01:00
Philip Reames	e2c6621e63	[deref-at-point] restrict inference of dereferenceability based on allocsize attribute Support deriving dereferenceability facts from allocation sites with known object sizes while correctly accounting for any possibly frees between allocation and use site. (At the moment, we're conservative and only allowing it in functions where we know we can't free.) This is part of the work on deref-at-point semantics. I'm making the change unconditional as the miscompile in this case is way too easy to trip by accident, and the optimization was only recently added (by me). There will be a follow up patch wiring through TLI since that should now be doable without introducing widespread miscompiles. Differential Revision: https://reviews.llvm.org/D95815	2021-04-01 08:34:40 -07:00
Mircea Trofin	ce61def529	[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs() needs to be called before iterating the interfering live ranges. The rest of the patch offers support that is the case: instead of clearing the query's InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix is invalidated (existing triggering mechanism). Without the change in RegAllocGreedy.cpp, the compiler ices. This patch should make it more easily discoverable by developers that collectInterferringVregs needs to be called before iterating. I will follow up with a subsequent patch to improve the usability and maintainability of Query. Differential Revision: https://reviews.llvm.org/D98232	2021-04-01 08:33:28 -07:00
Anirudh Prasad	7b921a6747	[AsmParser][SystemZ][z/OS] Add in support to accept "#" as part of an Identifier token - This patch adds in support to accept the "#" character as part of an Identifier. - This support is needed especially for the HLASM dialect since "#" is treated as part of the valid "Alphabet" range - The way this is done is by making use of the previous precedent set by the `AllowAtInIdentifier` field in `MCAsmLexer.h`. A new field called `AllowHashInIdentifier` is introduced. - The static function `IsIdentifierChar` is also updated to accept the `#` character if the `AllowHashInIdentifier` field is set to true. Note: The field introduced in `MCAsmLexer.h` could very well be moved to `MCAsmInfo.h`. I'm not opposed to it. I decided to put it in `MCAsmLexer` since there seems to be some sort of precedent already with `AllowAtInIdentifier`. Reviewed By: abhina.sreeskantharajan, nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D99277	2021-04-01 11:24:43 -04:00
Bradley Smith	2f45e632c0	[AArch64][SVE] Improve codegen for select nodes with fixed types Additionally, move the existing fixed vselect tests to *-vselect.ll. Differential Revision: https://reviews.llvm.org/D99418	2021-04-01 15:54:37 +01:00
Bradley Smith	0934fa4f5d	[AArch64][SVE] SVE functions should use the SVE calling convention for fast calls When an SVE function calls another SVE function using the C calling convention we use the more efficient SVE VectorCall PCS. However, for the Fast calling convention we're incorrectly falling back to the generic AArch64 PCS. This patch adds the same "can use SVE vector calling convention" detection used by CallingConv::C to CallingConv::Fast. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D99657	2021-04-01 15:52:08 +01:00
Brendon Cahoon	65c8bfb509	[AMDGPU] Enable output modifiers for double precision instructions Update SIFoldOperands pass to recognize v_add_f64 and v_mul_f64 instructions for folding output modifiers. Differential Revision: https://reviews.llvm.org/D99505	2021-04-01 10:08:17 -04:00
Alexey Bataev	c03696da5e	[SLP]Improve and fix getVectorElementSize. 1. Need to cleanup InstrElementSize map for each new tree, otherwise might use sizes from the previous run of the vectorization attempt. 2. No need to include into analysis the instructions from the different basic blocks to save compile time. Differential Revision: https://reviews.llvm.org/D99677	2021-04-01 06:51:26 -07:00
Simon Pilgrim	77d625f8d8	[DAG] MergeInnerShuffle with BinOps - sometimes accept undef mask elements If the inner shuffle already contains undef elements, then accept them in the merged shuffle as well. This helps some X86 HADD/SUB patterns where slow targets were ending up with HADD/SUB because the (un)merged shuffles were stuck either side of the ADD/SUB - meaning we ended up with a total cost much higher than the "2*shuffle+add" that a slow target usually expands a HADD/SUB to.	2021-04-01 14:33:00 +01:00
Alexey Bataev	ce98a0556a	[SLP]Remove `else` after `return`, NFC.`	2021-04-01 05:33:01 -07:00
Dmitry Preobrazhensky	cd953434f2	[AMDGPU][MC][GFX10][GFX90A] Corrected _e32/_e64 suffices Fixed bugs https://bugs.llvm.org//show_bug.cgi?id=49643, https://bugs.llvm.org//show_bug.cgi?id=49644, https://bugs.llvm.org//show_bug.cgi?id=49645. Differential Revision: https://reviews.llvm.org/D99413	2021-04-01 14:21:00 +03:00
Simon Pilgrim	abbe80fa52	[X86][SSE] Fold HOP(HOP(X,X),HOP(Y,Y)) -> HOP(PERMUTE(HOP(X,Y)),PERMUTE(HOP(X,Y)) For slow-hop targets, attempt to merge HADD/SUB pairs used in chains.	2021-04-01 11:54:10 +01:00
Simon Pilgrim	301319840e	[X86][SSE] Enable (F)HADD/SUB handling to SimplifyMultipleUseDemandedVectorElts Attempt to bypass unused horiz-op operands. This is very similar to the PACKSS/PACKUS handling - we should try to merge these.	2021-04-01 11:54:09 +01:00
Simon Pilgrim	f7aeaced65	[X86][SSE] Add isHorizOp helper function. NFCI.	2021-04-01 11:54:09 +01:00
Dmitry Preobrazhensky	0f5ebbcc7f	[AMDGPU][MC] Added flag to identify VOP instructions which have a single variant By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086. This fix simplifies handling of single VOP instructions by adding a dedicated flag. Differential Revision: https://reviews.llvm.org/D99408	2021-04-01 13:53:12 +03:00
Yevgeny Rouban	1ed53d44d8	[LoopFlatten] Do not report CFG analyses as up-to-date Removes CFGAnalyses from the preserved analyses set returned by LoopFlattenPass::run(). Reviewed By: Dave Green, Ta-Wei Tu Differential Revision: https://reviews.llvm.org/D99700	2021-04-01 15:52:36 +07:00
Sam Parker	92e7771483	[WebAssembly] Invert branch condition on xor input A frequent pattern for floating point conditional branches use an xor to invert the input for the branch. Instead we can fold away the xor by swapping the branch target instead. Differential Revision: https://reviews.llvm.org/D99171	2021-04-01 09:23:28 +01:00
Max Kazantsev	a1d83776bf	[NFC] Undo some erroneous renamings Some vars renamed by mistake during auto-replacements. Undoing them.	2021-04-01 13:10:10 +07:00
Max Kazantsev	630818a850	[NFC] Disambiguate LI in GVN Name GVN uses name 'LI' for two different unrelated things: LoadInst and LoopInfo. This patch relates the variables with former meaning into 'Load' to disambiguate the code.	2021-04-01 12:40:35 +07:00
KAWASHIMA Takahiro	5fac7c6046	[GVN] Propagate llvm.access.group metadata of loads Before this change, the `llvm.access.group` metadata was dropped when moving a load instruction in GVN. This prevents vectorizing a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`. This change propagates the metadata as well as other metadata if it is safe (the move-destination basic block and source basic block belong to the same loop). Differential Revision: https://reviews.llvm.org/D93503	2021-04-01 10:00:48 +09:00
qixingxue	62b74f7564	[GVN][NFC] Refactor analyzeLoadFromClobberingWrite This commit adjusts the order of two swappable if statements to make code cleaner. Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D99648	2021-04-01 08:35:35 +08:00
Philip Reames	4af4828a6e	[ValueTracking] Handle non-zero ashr/lshr recurrences If we know we don't shift out bits (e.g. exact), all we need to know is that input is non-zero.	2021-03-31 16:48:32 -07:00
Philip Reames	115a42ad1e	Add debug printers for KnownBits [nfc]	2021-03-31 15:36:07 -07:00
Simonas Kazlauskas	777a58e05b	Support {S,U}REMEqFold before legalization This allows these optimisations to apply to e.g. `urem i16` directly before `urem` is promoted to i32 on architectures where i16 operations are not intrinsically legal (such as on Aarch64). The legalization then later can happen more directly and generated code gets a chance to avoid wasting time on computing results in types wider than necessary, in the end. Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88785	2021-04-01 01:35:41 +03:00
Craig Topper	c88ee1a094	[RISCV] Add UnsupportedSchedZfh multiclass to reduce duplicate lines from RISCVSchedRocket.td and RISCVSchedSiFive7.td. NFC	2021-03-31 15:06:14 -07:00
YangKeao	1c268a8ff4	[X86] add dwarf annotation for inline stack probe While probing stack, the stack register is moved without dwarf information, which could cause panic if unwind the backtrace. This commit only add annotation for the inline stack probe case. Dwarf information for the loop case should be done in another patch and need further discussion. Reviewed By: nagisa Differential Revision: https://reviews.llvm.org/D99579	2021-04-01 00:32:50 +03:00
Roman Lebedev	43ded90094	[NFC][LoopRotation] Count the number of instructions hoisted/cloned into preheader	2021-03-31 23:27:36 +03:00
Craig Topper	9e00b6660d	[SelectionDAG] Remove unneeded vector resize from the end of FoldConstantArithmetic. NFC There's an assert right before that makes sure the size already matches. Earlier in this function's life, scalars and vectors shared more code.	2021-03-31 12:33:10 -07:00
George Mitenkov	807b019ca2	[ConstantFolding] Fixing addo/subo with undef When folding addo/subo with undef, the current convention is to use { -1, false } for addo and { 0, false } for subo. This was fixed for InstSimplify in https://reviews.llvm.org/rGf094d65beaa492e845b03561eddd75b5be653a01, but not in ConstantFolding. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D99564	2021-03-31 21:47:29 +03:00
Huihui Zhang	fe5c4a06a4	[LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism. Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test consecutive-ptr-uniforms.ll . Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99549	2021-03-31 11:21:07 -07:00
Thomas Lively	45783d0e8a	[WebAssembly] Implement i64x2 comparisons Removes the prototype builtin and intrinsic for i64x2.eq and implements that instruction as well as the other i64x2 comparison instructions in the final SIMD spec. Unsigned comparisons were not included in the final spec, so they still need to be scalarized via a custom lowering. Differential Revision: https://reviews.llvm.org/D99623	2021-03-31 10:46:17 -07:00
Juneyoung Lee	df0b97dab0	[ValueTracking] Add with.overflow intrinsics to poison analysis functions This is a patch teaching ValueTracking that `s/u*.with.overflow` intrinsics do not create undef/poison and they propagate poison. I couldn't write a nice example like the one with ctpop; ValueTrackingTest.cpp were simply updated to check these instead. This patch helps reducing regression while fixing https://llvm.org/pr49688 . Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99671	2021-04-01 02:41:38 +09:00
Philip Reames	ae7b1e8823	[SCEV] Handle unreachable binop when matching shift recurrence This fixes an issue introduced with my change d4648e, and reported in pr49768. The root problem is that dominance collapses in unreachable code, and that LoopInfo explicitly only models reachable code. Since the recurrence matcher doesn't filter by reachability (and can't easily because not all consumers have domtree), we need to bailout before assuming that finding a recurrence implies we found a loop.	2021-03-31 10:33:34 -07:00
Craig Topper	437958d9fd	[X86] Improve SMULO/UMULO codegen for vXi8 vectors. The default expansion creates a MUL and either a MULHS/MULHU. Each of those separately expand to sequences that use one or more PMULLW instructions as well as additional instructions to extend the types to vXi16. The MULHS/MULHU expansion computes the whole 16-bit product, but only keeps the high part. We can improve the lowering of SMULO/UMULO for some cases by using the MULHS/MULHU expansion, but keep both the high and low parts. And we can use those parts to calculate the overflow. For AVX512 we might have vXi1 overflow outputs. We can improve those by using vpcmpeqw to produce a k register if AVX512BW is enabled. This is a little better than truncating the high result to use vpcmpeqb. If we don't have avx512bw we can extend up to v16i32 to use vpcmpeqd to produce a k register. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97624	2021-03-31 10:13:50 -07:00
Shimin Cui	00c0c8c87d	[PowerPC] [MLICM] Enable hoisting of caller preserved registers on AIX On ppc64 linux , MachineLICM will hoist caller preserved registers, including TOC loads of the global variable address, out of loops. This is to enable this on AIX for both ppc64 and ppc32. Differential Revision: https://reviews.llvm.org/D99076	2021-03-31 12:46:25 -04:00
Craig Topper	50b8634a99	[X86] Improve optimizeCompareInstr for signed comparisons after BMI/TBM instructions We previously couldn't optimize out a TEST if the branch/setcc/cmov used the overflow flag. This patches allows the TEST to be removed if the flag producing instruction is known to clear the OF flag. Thats what the TEST instruction would have done so that should be equivalent. Need to add test cases. I'll try to get back to this if I have bandwidth. Fixes PR48768. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D94856	2021-03-31 09:45:29 -07:00
Wael Yehia	563cdeaafd	[LTO][Legacy] Decouple option parsing from LTOCodeGenerator in this patch we add a new libLTO API to specify debug options independent of an lto_code_gen_t. This allows clients to pass codegen flags (through libLTO) which otherwise today are ignored. Reviewed By: steven_wu Differential Revision: https://reviews.llvm.org/D92611	2021-03-31 16:43:26 +00:00
Craig Topper	2a8b7cab6a	[RISCV] Add RISCVISD opcodes for CLZW and CTZW. Our CLZW isel pattern is quite easily broken by surrounding code preventing it from matching sometimes. This usually results in failing to remove the and X, 0xffffffff inserted by type legalization. The add with -32 that type legalization also inserts will often gets combined into other add/sub nodes. That doesn't usually result in extra code when we don't use clzw. CTTZ seems to be less fragile, but I wanted to keep it consistent with CTLZ. Reviewed By: asb, HsiangKai Differential Revision: https://reviews.llvm.org/D99317	2021-03-31 09:40:07 -07:00
Craig Topper	04f10ab367	[RISCV] Add isel patterns to select vsub_vx intrinsic to vadd.vi if it uses a small enough immediate Also modify the simm5_plus1 check because Imm-1 is UB if Imm happens to be INT64_MIN. I don't think the compiler would optimize based on that in this usage, but it could fail UBSan or -ftrapv. Reviewed By: HsiangKai, frasercrmck Differential Revision: https://reviews.llvm.org/D99637	2021-03-31 09:26:41 -07:00
Sanjay Patel	1462bdf1b9	[InstCombine] fold abs(srem X, 2) This is a missing optimization based on an example in: https://llvm.org/PR49763 As noted there and the test here, we could add a more general fold if that is shown useful. https://alive2.llvm.org/ce/z/xEHdTv https://alive2.llvm.org/ce/z/97dcY5	2021-03-31 11:29:20 -04:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Sander de Smalen	2f6f249a49	NFC: Change getIntrinsicInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D97468 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D97469	2021-03-31 14:04:41 +01:00
Sander de Smalen	2f56e1c6b1	NFC: Change getTypeBasedIntrinsicCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D97466 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D97468	2021-03-31 14:04:41 +01:00
Liqiang Tao	d2d6720a93	[InlineCost] Remove TODO comment that consider other forms of savings in the cost-benefit analysis Attempts to compute savings more accurately cannot impact the set of critically important call sites. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D98577	2021-03-31 20:11:32 +08:00
Roman Lebedev	ce548aa236	[X86] AMD Zen 3 has macro fusion This is an improvement over Zen 2, where only branch fusion is supported, as per Agner, 21.4 Instruction fusion. AMD SOG 17h has no mention of fusion. AMD SOG 19h, 2.9.3 Branch Fusion The following flag writing instructions support branch fusion with their reg/reg, reg/imm and reg/mem forms * CMP * TEST * SUB * ADD * INC (no fusion with branches dependent on CF) * DEC (no fusion with branches dependent on CF) * OR * AND * XOR Agner, 22.4 Instruction fusion <...> This applies to CMP, TEST, ADD, SUB, AND, OR, XOR, INC, DEC and all conditional jumps, except if the arithmetic or logic instruction has a rip-relative address or both an address displacement and an immediate operand.	2021-03-31 14:31:50 +03:00
Fraser Cormack	10fc6e4358	[RISCV] Add support for the stepvector intrinsic This adds almost everything required for supporting the new stepvector intrinsic on RVV. It is lowered to the existing VID_VL SDNode. The only exception is a limitation that RV32 cannot yet lower the intrinsic on i64 vectors. This is because the step operand is (currently) required to be at least as large as the vector element type. I will look into patching that out and loosening the requirement to only an integer pointer type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99594	2021-03-31 11:41:17 +01:00
Jay Foad	5d0e9ddfa5	[AMDGPU][GlobalISel] Add support for global atomicrmw fadd This includes gfx908 which only has a no-return version of the global_atomic_add_f32 instruction, using the same hack that was previously implemented for selecting from the llvm.amdgcn.global.atomic.fadd intrinsic. Differential Revision: https://reviews.llvm.org/D97767	2021-03-31 11:13:00 +01:00
Florian Hahn	52e015081a	[AArch64] Avoid SCALAR_TO_VECTOR for single FP constant vector. Currently the code only checks for integer constants (ConstantSDNode) and triggers an infinite cycle for single-element floating point vector constants. We need to check for both FP and integer constants. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D99384	2021-03-31 10:17:36 +01:00
Sander de Smalen	3ccbd4f3c7	NFC: Change getUserCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D97382 Reviewed By: ctetreau, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97466	2021-03-31 10:13:09 +01:00
Lang Hames	0269a407f3	[JITLink] Switch from StringRef to ArrayRef<char>, add some generic x86-64 utils Adds utilities for creating anonymous pointers and jump stubs to x86_64.h. These are used by the GOT and Stubs builder, but may also be used by pass writers who want to create pointer stubs for indirection. This patch also switches the underlying type for LinkGraph content from StringRef to ArrayRef<char>. This avoids any confusion when working with buffers that contain null bytes in the middle like, for example, a newly added null pointer content array. ;)	2021-03-30 21:07:24 -07:00
Chuanqi Xu	eb51dd719f	[Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2 Summary: Try to insert dbg.declare to entry.resume basic block in resume function. In this way, we could print alloca such as __promise in gdb/lldb under O2, which would be beneficial to debug coroutine program. Test Plan: check-llvm Reviewed by: aprantl Differential Revision: https://reviews.llvm.org/D96938	2021-03-31 10:37:06 +08:00
Fangrui Song	3e5ee194c0	[SimpleLoopUnswitch] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after `431a40e1e2`	2021-03-30 19:27:10 -07:00
Craig Topper	5db19cc010	[RISCV] simm12_plus1 should not inherit from Operand. NFC We only use this in Pat patterns, so it just needs to be an ImmLeaf. If we did need it as an instruction operand, the ParserMatchClass, EncoderMethod, and DecoderMethod were probably wrong.	2021-03-30 19:02:11 -07:00
Yang Fan	0d7fd9f0d0	[GlobalISel] Fix Wint-in-bool-context warning (NFC) GCC warning: ``` /llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp: In member function ‘bool llvm::CombinerHelper::matchFunnelShiftToRotate(llvm::MachineInstr&)’: /llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:3882:35: warning: ?: using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context] 3882 \| Opc == TargetOpcode::G_FSHL ? TargetOpcode::G_ROTL : TargetOpcode::G_ROTR; \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ```	2021-03-31 09:59:43 +08:00
Craig Topper	05998701b9	[RISCV] Remove some unused ImmLeafs. NFC These got left behind when we switched RV32 to use selectImm to match RV64.	2021-03-30 18:54:11 -07:00
Juneyoung Lee	431a40e1e2	[LoopUnswitch] Assert that branch condition is either and/or but not both as suggested at https://reviews.llvm.org/rG5bb38e84d3d0#986321	2021-03-31 10:35:22 +09:00
Craig Topper	f59ba0849f	[StructLayout] Use TrailingObjects to allocate space for MemberOffsets. MemberOffsets are stored at the end of StructLayout. The class contains a single entry array to mark the start of the member offsets. getStructLayout calculates the additional space needed for additional elements before allocating memory. This patch converts this to use TrailingObjects. This simplifies the size computation in getStructLayout and gets rid of the single entry array. This is prep work, but to use TypeSize instead of uint64_t for D98169. The single entry array doesn't work with TypeSize because TypeSize doesn't have a default constructor. We thought this change was an improvement by itself so we've separated it out. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D99608	2021-03-30 17:36:50 -07:00
Heejin Ahn	144ec1c38e	[WebAssembly] Encode numbers in ULEB128 in event section The number of events and the type index should be encoded in ULEB128, but they were incorrctly encoded in LEB128. The smallest number with which its LEB128 and ULEB128 encodings are different is 64. There's no way we can generate 64 events in the C++ toolchain implementation so we can't test that, but the attached test tests when the type index is 64. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D99627	2021-03-30 16:21:58 -07:00
Wei Mi	d535a05ca1	[ThinLTO] During module importing, close one source module before open another one for distributed mode. Currently during module importing, ThinLTO opens all the source modules, collect functions to be imported and append them to the destination module, then leave all the modules open through out the lto backend pipeline. This patch refactors it in the way that one source module will be closed before another source module is opened. All the source modules will be closed after importing phase is done. It will save some amount of memory when there are many source modules to be imported. Note that this patch only changes the distributed thinlto mode. For in process thinlto mode, one source module is shared acorss different thinlto backend threads so it is not changed in this patch. Differential Revision: https://reviews.llvm.org/D99554	2021-03-30 14:37:29 -07:00
David Green	3a6365a439	[ARM] Add FeatureHasNoBranchPredictor for Thumb1 cores Mark v6m/v8m-baseline cores as having no branch predictors. This should not alter very much on its own, but is more correct as the cores do not have branch predictors and can help in the future.	2021-03-30 21:45:26 +01:00
Fangrui Song	73adc05ced	[GlobalISel] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D99463	2021-03-30 12:52:56 -07:00
Sanjay Patel	c2ebad8d55	[InstCombine] add fold for demand of low bit of abs() This is one problem shown in https://llvm.org/PR49763 https://alive2.llvm.org/ce/z/cV6-4K https://alive2.llvm.org/ce/z/9_3g-L	2021-03-30 15:14:37 -04:00
Huihui Zhang	d857a81437	[VPlan] Use SetVector for VPExternalDefs to prevent non-determinism. Use SetVector instead of SmallPtrSet for external definitions created for VPlan. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM-Unit test VPRecipeTest.dump. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99544	2021-03-30 12:10:56 -07:00
Amara Emerson	a35c2c7942	[GlobalISel] Implement fewerElements legalization for vector reductions. This patch adds 3 methods, one for power-of-2 vectors which use tree reductions using vector ops, before a final reduction op. For non-pow-2 types it generates multiple narrow reductions and combines the values with scalar ops. Differential Revision: https://reviews.llvm.org/D97163	2021-03-30 11:19:21 -07:00
Amara Emerson	1bc90847ee	[AArch64][GlobalISel] Define some legalization rules for G_ROTR and G_ROTL. For imported pattern purposes, we have a custom rule that promotes the rotate amount to 64b as well. Differential Revision: https://reviews.llvm.org/D99463	2021-03-30 11:11:19 -07:00
Amara Emerson	91887cd4ec	[AArch64][GlobalISel] Combine funnel shifts to rotates. Differential Revision: https://reviews.llvm.org/D99388	2021-03-30 11:00:36 -07:00
Sourabh Singh Tomar	f13f050551	[DebugInfo] Support for signed constants inside DIExpression Negative numbers are represented using DW_OP_consts along with signed representation of the number as the argument. Test case IR is generated using Fortran front-end. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D99273	2021-03-30 23:20:38 +05:30
spupyrev	22998738e8	[SamplePGO] Keeping prof metadata for IndirectBrInst Currently prof metadata with branch counts is added only for BranchInst and SwitchInst, but not for IndirectBrInst. As a result, BPI/BFI make incorrect inferences for indirect branches, which can be very hot. This diff adds metadata for IndirectBrInst, in addition to BranchInst and SwitchInst. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99550	2021-03-30 10:44:48 -07:00
Hongtao Yu	3e3fc431df	[CSSPGO] Top-down processing order based on full profile. Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example: 1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them. 2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining. 3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph. 4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions. Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4. I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%. The change is an enhancement to https://reviews.llvm.org/D95988. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99351	2021-03-30 10:42:22 -07:00
Jessica Paquette	700431128e	[GlobalISel][AArch64] Combine G_SEXT_INREG + right shift -> G_SBFX Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG. This is only done post-legalization for now. Once the legalizer knows how to decompose these back into shifts, this requirement can probably be removed. Differential Revision: https://reviews.llvm.org/D99230	2021-03-30 10:14:30 -07:00
Craig Topper	a33fcafaf0	[RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not. Without Zfh the half type isn't legal, but it could still be used as an argument/return in IR. Clang will not generate this today. Previously we promoted the half value to float for arguments and returns if the F extension is enabled but Zfh isn't. Then depending on which ABI is enabled we would pass it in either an FPR or a GPR in float format. If the F extension isn't enabled, it would get passed in the lower 16 bits of a GPR in half format. With this patch the value will always in half format and will be in the lower bits of a GPR or FPR. This should be consistent with where the bits are located when Zfh is enabled. I've based this implementation off of how this is done on ARM. I've manually nan-boxed the value to 32 bits using integer ops. It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't canonicalize nans so should leave the value alone. I think those are the instructions that could get used on this value. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D98670	2021-03-30 09:47:54 -07:00
Amara Emerson	f5e9be6fdb	[GlobalISel] Implement lowering for G_ROTR and G_ROTL. This is a straightforward port. Differential Revision: https://reviews.llvm.org/D99449	2021-03-30 09:44:41 -07:00
Tomas Matheson	a9968c0a33	[NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions Currently needsStackRealignment returns false if canRealignStack returns false. This means that the behavior of needsStackRealignment does not correspond to it's name and description; a function might need stack realignment, but if it is not possible then this function returns false. Furthermore, needsStackRealignment is not virtual and therefore some backends have made use of canRealignStack to indicate whether a function needs stack realignment. This patch attempts to clarify the situation by separating them and introducing new names: - shouldRealignStack - true if there is any reason the stack should be realigned - canRealignStack - true if we are still able to realign the stack (e.g. we can still reserve/have reserved a frame pointer) - hasStackRealignment = shouldRealignStack && canRealignStack (not target customisable) Targets can now override shouldRealignStack to indicate that stack realignment is required. This change will make it easier in a future change to handle the case where we need to realign the stack but can't do so (for example when the register allocator creates an aligned spill after the frame pointer has been eliminated). Differential Revision: https://reviews.llvm.org/D98716 Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87	2021-03-30 17:31:39 +01:00
Craig Topper	f069000b43	[RISCV] Remove floating point condition code legalization from lowerFixedLengthVectorSetccToRVV. After D98939, this is done by LegalizeVectorOps making this code dead. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99519	2021-03-30 09:11:56 -07:00
Sebastian Neubauer	1c3b74f0ab	[AMDGPU] Remove outdated TODOs. NFC spillSGPRToVGPR is already respected in these places since D95768. Differential Revision: https://reviews.llvm.org/D99570	2021-03-30 15:18:49 +02:00
Krasimir Georgiev	c51e91e046	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5178ffc7cf`. Compiling `llvm-profdata` with a compiler build from this produces a crashing binary.	2021-03-30 14:13:37 +02:00
Sanjay Patel	e694e19a79	[x86] enhance matching of pmaddwd This was crashing with the example from: https://llvm.org/PR49716 ...and that was avoided with `a283d72583` , but as we can see from the SSE vs. AVX test code diff, we can try harder to match the pattern. This matcher code was adapted from another pmadd pattern match in D49636, but it needs different ops to deal with size mismatches. Differential Revision: https://reviews.llvm.org/D99531	2021-03-30 07:28:33 -04:00
Juneyoung Lee	6b4b1dc6ec	[LoopUnswitch] Simplify branch condition if it is select with constant operands This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 . `select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later transformations confused. Simplify the branch condition to not have the form.	2021-03-30 20:09:42 +09:00
Sander de Smalen	f71ed5dfe2	NFC: Migrate PartialInlining to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D97382	2021-03-30 11:59:45 +01:00
David Green	d4b3380dfe	[ARM] Handle Splats in MVE lane interleaving As another addition to MVE lane interleaving, this handles Splat shuffle vectors, as the shuffle of a splat is a splat. Differential Revision: https://reviews.llvm.org/D97291	2021-03-30 11:19:16 +01:00
David Sherwood	a08c7736a7	[LoopVectorize] Add support for scalable vectorization of induction variables This patch adds support for the vectorization of induction variables when using scalable vectors, which required the following changes: 1. Removed assert from InnerLoopVectorizer::getStepVector. 2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use a runtime determined value for VF and removed an assert. 3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable vectors. I did this by calculating the full vector value for each Part of the unroll factor (UF) and caching this in the VP state. This means that we are always able to extract an arbitrary element from the vector if necessary. In addition to this, I also permitted the caching of the individual lane values themselves for the known minimum number of elements in the same way we do for fixed width vectors. This is a further optimisation that improves the code quality since it avoids unnecessary extractelement operations when extracting the first lane. 4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while testing some code paths I noticed this is currently broken for scalable vectors. Various tests to support different cases have been added here: Transforms/LoopVectorize/AArch64/sve-inductions.ll Differential Revision: https://reviews.llvm.org/D98715	2021-03-30 11:13:31 +01:00
Krasimir Georgiev	8e7df996e3	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `92ddd3c1b6`. Causes multistage clang crashes, e.g.: https://lab.llvm.org/buildbot/#/builders/36/builds/6678	2021-03-30 11:47:12 +02:00
Joe Ellis	a7dde4c5f7	[AArch64][SVE] Lower fixed length INSERT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98496	2021-03-30 09:37:11 +00:00
Joe Ellis	c4d39f64d0	[AArch64][SVE] Lower fixed length EXTRACT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98625	2021-03-30 09:35:44 +00:00
Bing1 Yu	0c63b862c4	Revert "[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation" This reverts commit `275df61f04`.	2021-03-30 16:33:07 +08:00
Sander de Smalen	4ca860742d	[InstructionCost] Don't conflate Invalid costs with Unknown costs. We previously made a change to getUserCost to return a Invalid cost when one of the TTI costs returned '-1' (meaning 'unknown' or 'infinitely expensive'). It makes no sense to say that: shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3> has an invalid cost. Perhaps the cost is not known, but the IR is valid and can be code-generated. Invalid should only be used for IR that cannot possibly be code-generated and where a cost is nonsensical. With more passes now asserting that the cost must be valid, it is possible that those assertions will fail for perfectly valid IR. An incomplete cost-model probably shouldn't be a reason for the compiler to break. It's better to consider these costs as 'very expensive' and ignore them for other reasons. At some point, we should consider replacing -1 with some other mechanism. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D99502	2021-03-30 09:29:42 +01:00
Bing1 Yu	275df61f04	[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D99244	2021-03-30 16:21:10 +08:00
Stefan Gränitz	c352a2b829	[lli] Add option -lljit-platform=Inactive to disable platform support explicitly This option tells LLJIT to disable platform support explicitly: JITDylibs aren't scanned for special init/deinit symbols and no runtime API interposes are injected. It's useful in two cases: for platforms that don't have such requirements and platforms for which we have no explicit support yet and that don't work well with the generic IR platform. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D99416	2021-03-30 09:29:45 +02:00
Han Zhu	92ddd3c1b6	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-03-29 23:36:26 -07:00
Han Zhu	2bd4049ceb	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `deb5095833`. Bad commit message.	2021-03-29 23:35:35 -07:00
Han Zhu	deb5095833	[loop-idiom] Hoist loop memcpys to loop preheader Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Blame Revision: Differential Revision: https://phabricator.intern.facebook.com/D26380397	2021-03-29 23:14:42 -07:00
Alok Kumar Sharma	9fb0025f70	[DebugInfo] Upgrade DISubragne::count to accept DIExpression also This is needed for Fortran assumed shape arrays whose dimensions are defined as, - 'count' is taken from array descriptor passed as parameter by caller, access from descriptor is defined by type DIExpression. - 'lowerBound' is defined by callee. The current alternate way represents using upperBound in place of count, where upperBound is calculated in callee in a temp variable using lowerBound and count Representation with count (DIExpression) is not only clearer as compared to upperBound (DIVariable) but it has another advantage that variable count is accessed by being parameter has better chance of survival at higher optimization level than upperBound being local variable. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D99335	2021-03-30 09:16:55 +05:30
Rahman Lavaee	90c401cab6	[Propeller] Do not generate the BB address map for empty functions. Empty functions (functions with no real code) are irrelevant for propeller optimizations and their addresses sometimes conflict with other functions which obfuscates the analysis. This simple change skips the BB address map emission for such functions. Reviewed By: tmsriram Differential Revision: https://reviews.llvm.org/D99395	2021-03-29 20:15:01 -07:00
Jun Ma	65462a08bf	[NFC][SVE] Remove redundant pattern	2021-03-30 10:35:08 +08:00
Jun Ma	1af373c673	[AArch64][SVE] Codegen dup_lane for dup(vector_extract) Differential Revision: https://reviews.llvm.org/D99324	2021-03-30 10:35:08 +08:00
Jun Ma	b0db2dbc29	[AArch64][SVEIntrinsicOpts] Optimize tbl+dup into dup+extractelement Differential Revision: https://reviews.llvm.org/D99412	2021-03-30 10:35:08 +08:00
Evandro Menezes	fd94cfeeb5	[RISCV] Move scheduling resources for B into a separate file (NFC) Differential Revision: https://reviews.llvm.org/D99557	2021-03-29 20:37:22 -05:00
Adrian Prantl	8573c28a51	Add debug support for set types This commit adds debugging support for set types defined in languages such as Pascal and Modula-2. Patch by Peter McKinna! Differential Revision: https://reviews.llvm.org/D76115	2021-03-29 18:04:48 -07:00
Thomas Lively	a1b8b0739a	[WebAssembly] Fix i8x16.popcnt opcode When I updated the SIMD opcodes in `f5764a8654`, I accidentally missed updating i8x16.popcnt. This patch fixes the omission. Differential Revision: https://reviews.llvm.org/D99536	2021-03-29 17:23:15 -07:00
Huihui Zhang	ca721042f1	[IPO][SampleContextTracker] Use SmallVector to track context profiles to prevent non-determinism. Use SmallVector instead of SmallSet to track the context profiles mapped. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test profile-context-tracker-debug.ll . Reviewed By: MaskRay, wenlei Differential Revision: https://reviews.llvm.org/D99547	2021-03-29 16:37:10 -07:00
Jonas Devlieghere	e0577b3130	[dsymutil] Relocate DW_TAG_label dsymutil is not relocating the DW_AT_low_pc for a DW_TAG_label. This patch fixes that and adds a test. Differential revision: https://reviews.llvm.org/D99534	2021-03-29 15:45:48 -07:00
Gulfem Savrun Yeniceri	5178ffc7cf	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-29 21:53:32 +00:00
Florian Hahn	482283042f	[AArch64] Remove custom zext/sext legalization code. Currently performExtendCombine assumes that the src-element bitwidth * 2 is a valid MVT. But this is not the case for i1 and it causes a crash on the v64i1 test cases added in this patch. It turns out that this code appears to not be needed; the same patterns are handled by other code and we end up with the same results, even without the custom lowering. I also added additional test cases in `a50037aaa6`. Let's just remove the unneeded code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99437	2021-03-29 22:22:05 +01:00
Nikita Popov	7669455df4	[X86][FastISel] Fix with.overflow eflags clobber (PR49587) If the successor block has a phi node, then additional moves may be inserted into predecessors, which may clobber eflags. Don't try to fold the with.overflow result into the branch in that case. This is done by explicitly checking for any phis in successor blocks, not sure if there's some more principled way to address this. Other fused compare and branch patterns avoid the issue by emitting the comparison when handling the branch, so that no instructions may be inserted in between. In this case, the with.overflow call is emitted separately (and I don't think this is avoidable, as it will generally have at least two users). Fixes https://bugs.llvm.org/show_bug.cgi?id=49587. Differential Revision: https://reviews.llvm.org/D98600	2021-03-29 23:08:47 +02:00
Stanislav Mekhanoshin	619b88849e	[AMDGPU] Fix "Sequence" spelling. NFC.	2021-03-29 12:11:36 -07:00
Joe Nash	45fd7c02af	Revert "[AMDGPU] Mark additional VOP3 as commutable" This reverts commit `d35d8da7d6`.	2021-03-29 14:48:11 -04:00
Joe Nash	d35d8da7d6	[AMDGPU] Mark additional VOP3 as commutable Note, only src0 and src1 will be commuted if the isCommutable flag is set. This patch does not change that, it just makes it possible to commute src0 and src1 of more instructions. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D99376 Change-Id: I61e20490962d95ea429beb355c55f55c024dafdc	2021-03-29 14:22:20 -04:00
Roger Ferrer Ibanez	489ca73ac4	[PrologEpilogInserter][AMDGPU] Only adjust offset for emergency spill slots if the stack grows down D89239 adjusts the stack offset of emergency spill slots for overaligned stacks. However the adjustment is not valid for targets whose stack grows up (such as AMDGPU). This change makes the adjustment conditional only to those targets whose stack grows down. Fixes https://bugs.llvm.org/show_bug.cgi?id=49686 Differential Revision: https://reviews.llvm.org/D99504	2021-03-29 17:26:58 +00:00
Craig Topper	3dd4aa7d09	[RISCV] When custom iseling masked loads/stores, copy the mask into V0 instead of virtual register. This matches what we do in our isel patterns. In our internal testing we've found this is needed to make the fast register allocator happy at -O0. Otherwise it may assign V0 to an earlier operand and find itself with no registers left when it reaches the mask operand. By using V0 explicitly, the fast register allocator will see it when it checks for phys register usages before it starts allocating vregs. I'll try to update this with a test case. Unfortunately, this does appear to prevent some instruction reordering by the pre-RA scheduler which leads to the increased spills seen in some tests. I suspect that problem could already occur for other instructions that already used V0 directly. There's a lot of repeated code here that could do with some wrapper functions. Not sure if that should be at the level of the new code that deals with V0. That would require multiple output parameters to pass the glue, chain and register back. Maybe it should be at a higher level over the entire set of push_backs. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D99367	2021-03-29 10:20:43 -07:00
Craig Topper	54bacaf311	[X86] Always use rip-relative addressing on 64-bit when rematerializing all zeros/ones registers using a folded load. Previously we only used RIP relative when PIC was enabled. But we know we're in small/kernel code model here so we should be able to always use RIP-relative which will give a smaller encoding. Here's a godbolt link that demonstrates the current codegen https://godbolt.org/z/j3158o Note in the non-PIC version the load from .LCPI0_0 doesn't use RIP-relative addressing, but if you change the constant in the source from 0.0 to 1.0 it will become RIP-relative. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97208	2021-03-29 10:06:17 -07:00
Roger Ferrer Ibanez	ef76a333fa	[RISCV] Fix offset computation for RVV In D97111 we changed the RVV frame layout when using sp or bp to address the stack slots so we could address the emergency stack slot. The idea is to put the RVV objects as far as possible (in offset terms) from the frame reference register (sp / fp / bp). When using fp this happens naturally because the RVV objects are already the top of the stack and due to the constraints of RVV (VLENB being a power of two >= 128) the stack remains aligned. The rest of this summary does not apply to this case. When using sp / bp we need to skip the non-RVV stack slots. The size of the the non-RVV objects is computed subtracting the callee saved register size (whose computation is added in D97111 itself) to the total size of the stack (which does not account for RVV stack slots). However, when doing so we round to 16 bytes when computing that size and we end emitting a smaller offset that may belong to a scalar stack slot (see D98801). So this change removes that rounding. Also, because we want the RVV objects be between the non-RVV stack slots and the callee-saved register slots, we need to make sure the RVV objects are properly aligned to 8 bytes. Adding a padding of 8 would render the stack unaligned. So when allocating space for RVV (only when we don't use fp) we need to have extra padding that preserves the stack alignment. This way we can round to 8 bytes the offset that skips the non-RVV objects and we do not misalign the whole stack in the way. In some circumstances this means that the RVV objects may have padding before (=lower offsets from sp/bp) and after (before the CSR stack slots). Differential Revision: https://reviews.llvm.org/D98802	2021-03-29 17:03:49 +00:00
Wenlei He	30b0232336	[CSSPGO][llvm-profgen] Context-sensitive global pre-inliner This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen. It will serve two purposes: 1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size. 2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation. Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler. This change only has the framework, with a few TODOs left for follow up patches for a complete implementation: 1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation. 2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR. Differential Revision: https://reviews.llvm.org/D99146	2021-03-29 09:46:14 -07:00
Wei Mi	3cbf44190b	[SampleFDO] Do not scale the magic number NOMORE_ICP_MAGICNUM in value profile during profile update. When we inline a function and update the profile, the value profiles of the indirect call in the inliner and inlinee will be scaled. In https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we start using the magic number NOMORE_ICP_MAGICNUM (-1) to mark targets which have been promoted. The magic number shouldn't be scaled during the profile update. Although the problem has been suppressed by https://reviews.llvm.org/D98187 for SampleFDO, which stops profile update for inlining in sampleFDO, the patch is still wanted since it will be more consistent to handle the magic number properly in profile update. Differential Revision: https://reviews.llvm.org/D99394	2021-03-29 09:34:37 -07:00
Florian Hahn	c773d0f973	Recommit "[LV] Move runtime pointer size check to LVP::plan()." Re-apply `25fbe803d4`, with a small update to emit the right remark class. Original message: [LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri	2021-03-29 16:14:27 +01:00
Bradley Smith	9745dce8c3	[SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps This is currently performed in SelectionDAGLegalize, here we make it also happen in LegalizeVectorOps, allowing a target to lower the SETCC condition codes first in LegalizeVectorOps and then lower to a custom node afterwards, without having to duplicate all of the SETCC condition legalization in the target specific lowering. As a result of this, fixed length floating point SETCC nodes can now be properly lowered for SVE. Differential Revision: https://reviews.llvm.org/D98939	2021-03-29 15:32:25 +01:00
Florian Hahn	485c8ce733	Revert "[LV] Move runtime pointer size check to LVP::plan()." This reverts commit `25fbe803d4`. This breaks a clang test which filters for the wrong remark type.	2021-03-29 14:41:53 +01:00
Sanjay Patel	da381cf7ce	[SLP] allow matching integer min/max intrinsics as reduction ops This is a 2nd try of: `3c8473ba53` which was reverted at: `a26312f9d4` because of crashing. This version includes extra code and tests to avoid the known crashing examples as discussed in PR49730. Original commit message: As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-29 09:38:18 -04:00
Paul C. Anagnostopoulos	5f473a04af	[TableGen] Add support for the 'assert' statement in class definitions. Differential Revision: https://reviews.llvm.org/D99275	2021-03-29 09:20:29 -04:00
Florian Hahn	25fbe803d4	[LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98634	2021-03-29 14:12:29 +01:00
Matt Arsenault	9a0c9402fa	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `07e46367ba`.	2021-03-29 08:55:30 -04:00
Jingu Kang	e4abb64100	[LoopUnswitch] Use reference variables instead of pointer one Differential Revision: https://reviews.llvm.org/D99496	2021-03-29 13:08:46 +01:00
Hans Wennborg	c6e5c4654b	Don't use $ as suffix for symbol names in ThinLTOBitcodeWriter and other places Using $ breaks demangling of the symbols. For example, $ c++filt _Z3foov\$123 _Z3foov$123 This causes problems for developers who would like to see nice stack traces etc., but also for automatic crash tracking systems which try to organize crashes based on the stack traces. Instead, use the period as suffix separator, since Itanium demanglers normally ignore such suffixes: $ c++filt _Z3foov.123 foo() [clone .123] This is already done in some places; try to do it everywhere. Differential revision: https://reviews.llvm.org/D97484	2021-03-29 13:03:52 +02:00
Oliver Stannard	07e46367ba	Revert "Reapply "OpaquePtr: Turn inalloca into a type attribute"" Reverting because test 'Bindings/Go/go.test' is failing on most buildbots. This reverts commit `fc9df30991`.	2021-03-29 11:32:22 +01:00
Simon Pilgrim	805148eaf2	[X86][SSE] combineHorizOpWithShuffle - consistently use getTargetShuffleInputs to decode shuffles Minor cleanup before I start trying to merge the unary/binary shuffle combining paths.	2021-03-29 11:31:19 +01:00
Nashe Mncube	19601a4c6c	[SVE][Analysis]Instruction costs for ops on scalable-vec The following operations have no associated cost for them when applied to scalable vectors, and as a consequence can trigger a crash when a call is made to AArch64TTIImpl::getCastInstrCost(): - fptrunc - trunc - fpext - fpto(u,s)i This patch adds costs for these operations and relevant regression tests. Differential Revision: https://reviews.llvm.org/D98934	2021-03-29 11:15:50 +01:00
Jingu Kang	cfe87d4edd	[NFC][LoopUnswitch] Move hasPartialIVCondition to LoopUtils Differential revision: https://reviews.llvm.org/D99490	2021-03-29 10:29:45 +01:00
David Green	3a68c6d26c	[ARM] Extend MVE lane interleaving to handle other non-instruction leaves This extends the recent MVE lane interleaving passto handle other non-instruction leaves, for which a new shuffle is added. This helps especially for constants and potentially for arguments. Differential Revision: https://reviews.llvm.org/D97289	2021-03-29 09:05:45 +01:00
Lang Hames	666df2e2cb	[ORC][C-bindings] Fix some ORC C bindings function names and signatures. LLVMOrcDisposeObjectLayer and LLVMOrcExecutionSessionGetJITDylibByName did not have matching signatures between the C-API header and binding implementations. Fixes http://llvm.org/PR49745. Patch by Mats Larsen. Thanks Mats! Reviewed by: lhames Differential Revision: https://reviews.llvm.org/D99478	2021-03-28 16:30:47 -07:00
David Green	6c88ffeda3	[ARM] Fix the Changed value in the MVE lane interleaving pass.	2021-03-28 23:47:53 +01:00
Nikita Popov	ce066da81c	[BasicAA] Make sure types match in constant offset heuristic This can only happen if offset types that are larger than the pointer size are involved. The previous implementation did not assert in this case because it initialized the APInts to the width of one of the variables -- though I strongly suspect it did not compute correct results in this case. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32621 reported by fhahn.	2021-03-28 21:38:09 +02:00
Craig Topper	69bdf35dc7	[X86] Optimize vXi8 MULHS on targets where we can't sign_extend to the next register size. For these cases we need to extract the upper or lower elements, multiply them using 16-bit multiplies and repack them. Previously we used punpcklbw/punpckhbw+psraw or pmovsxbw+pshudfd to extract and sign extend so we could use pmullw to compute the 16-bit product and then shift down the high bits. We can avoid the need to sign extend if we unpack the bytes into the high byte of each word and fill the lower byte with 0 using pxor. This puts the sign bit of each byte into the sign bit of each word. Since the LHS and RHS have 8 trailing zeros, the full 32-bit product of those 16-bit values will have 16 trailing zeros. This means the 16-bit product of the original bytes is in the upper 16 bits which we can calculate using pmulhw. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98587	2021-03-28 11:41:29 -07:00
David Green	7b6f760fcd	[ARM] MVE vector lane interleaving MVE does not have a single sext/zext or trunc instruction that takes the bottom half of a vector and extends to a full width, like NEON has with MOVL. Instead it is expected that this happens through top/bottom instructions. So the MVE equivalent VMOVLT/B instructions take either the even or odd elements of the input and extend them to the larger type, producing a vector with half the number of elements each of double the bitwidth. As there is no simple instruction for a normal extend, we often have to expand sext/zext/trunc into a series of lane moves (or stack loads/stores, which we do not do yet). This pass takes vector code that starts at truncs, looks for interconnected blobs of operations that end with sext/zext and transforms them by adding shuffles so that the lanes are interleaved and the MVE VMOVL/VMOVN instructions can be used. This is done pre-ISel so that it can work across basic blocks. This initial version of the pass just handles a limited set of instructions, not handling constants or splats or FP, which can all come as extensions to this base. Differential Revision: https://reviews.llvm.org/D95804	2021-03-28 19:34:58 +01:00
Matt Arsenault	fc9df30991	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `20d5c42e0e`.	2021-03-28 13:35:21 -04:00
Sanjay Patel	01ae6e5ead	[InstCombine] sink min/max intrinsics with common op after select This is another step towards parity with cmp+select min/max idioms. See D98152.	2021-03-28 13:13:04 -04:00
Nico Weber	20d5c42e0e	Revert "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `4fefed6563`. Broke check-clang everywhere.	2021-03-28 13:02:52 -04:00
Matt Arsenault	4fefed6563	OpaquePtr: Turn inalloca into a type attribute I think byval/sret and the others are close to being able to rip out the code to support the missing type case. A lot of this code is shared with inalloca, so catch this up to the others so that can happen.	2021-03-28 11:12:23 -04:00
Florian Hahn	8c6c357897	[LV] Mark a few more cost-model members as const (NFC).	2021-03-28 14:59:48 +01:00
Nikita Popov	3df3f3df45	[BasicAA] Handle gep with unknown sizes earlier (NFCI) If the sizes of both memory locations are unknown, we can only perform a check on the underlying objects. There's no point in going through GEP decomposition in this case.	2021-03-28 15:48:49 +02:00
Florian Hahn	eb3d9f2eb6	[SelDag] Add isIntOrFPConstant helper function. This patch adds a new isIntOrFPConstant helper function to check if a SDValue is a integer of FP constant. This pattern is used in various places. There also are places that incorrectly just check for integer constants, e.g. D99384, so hopefully this helper will help people avoid that issue. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D99428	2021-03-28 12:48:58 +01:00
Hsiangkai Wang	bc82e9bf25	[RISCV] Add vfabs.v pseudo instruction. Differential Revision: https://reviews.llvm.org/D99454	2021-03-28 10:24:05 +08:00
Craig Topper	5692fc38e0	[RISCV] Add a pattern for (sext_inreg (mul (and X, 0xffffffff), (and Y, 0xffffffff)), i32) to suppress MULW formation We have a special pattern for (mul (and X, 0xffffffff), (and Y, 0xffffffff)), to optimize the ANDs to shift. But if a sext_inreg coms first, we'll form a MULW and limit the effectiveness of the special match. So this patch adds a larger pattern to suppress the MULW formation by emitting a sext.w and then the same output we use for the (mul (and X, 0xffffffff), (and Y, 0xffffffff)). This should all get CSEd. This is the issue I was trying to fix with D99029, but that affected many more tests.	2021-03-27 15:37:18 -07:00
Nikita Popov	9075864b73	[BasicAA] Refactor linear expression decomposition The current linear expression decomposition handles zext/sext by decomposing the casted operand, and then checking NUW/NSW flags to determine whether the extension can be distributed. This has some disadvantages: First, it is not possible to perform a partial decomposition. If we have zext((x + C1) +<nuw> C2) then we will fail to decompose the expression entirely, even though it would be safe and profitable to decompose it to zext(x + C1) +<nuw> zext(C2) Second, we may end up performing unnecessary decompositions, which will later be discarded because they lack nowrap flags necessary for extensions. Third, correctness of the code is not entirely obvious: At a high level, we encounter zext(x -<nuw> C) in the form of a zext on the linear expression x + (-C) with nuw flag set. Notably, this case must be treated as zext(x) + -zext(C) rather than zext(x) + zext(-C). The code handles this correctly by speculatively zexting constants to the final bitwidth, and performing additional fixup if the actual extension turns out to be an sext. This was not immediately obvious to me. This patch inverts the approach: An ExtendedValue represents a zext(sext(V)), and linear expression decomposition will try to decompose V further, either by absorbing another sext/zext into the ExtendedValue, or by distributing zext(sext(x op C)) over a binary operator with appropriate nsw/nuw flags. At each step we can determine whether distribution is legal and abort with a partial decomposition if not. We also know which extensions we need to apply to constants, and don't need to speculate or fixup.	2021-03-27 23:31:58 +01:00
Florian Hahn	d2855eba81	[LV] Fix formatting from `2f9d68c3f1`.	2021-03-27 21:29:56 +00:00
Florian Hahn	2f9d68c3f1	[LV] Mark some methods as const (NFC). Mark a few methods as const, as they do not modify any state.	2021-03-27 21:27:53 +00:00
Simon Pilgrim	2a0d5da917	[X86][SSE] foldShuffleOfHorizOp - remove broadcast handling. Remove VBROADCAST/MOVDDUP/splat-shuffle handling from foldShuffleOfHorizOp This can all be handled by canonicalizeShuffleMaskWithHorizOp along as we check that the HADD/SUB are only used once (to prevent infinite loops on slow-horizop targets which will try to reuse the nodes again followed by a post-hop shuffle).	2021-03-27 15:09:23 +00:00
Nikita Popov	b981bc30bf	[BasicAA] Correct handle implicit sext in decomposition While explicit sext instructions were handled correctly, the implicit sext that occurs if the offset is smaller than the pointer size blindly assumed that sext(X * Scale + Offset) is the same as sext(X) * Scale + Offset, which is obviously not correct. Fix this by extracting the code that handles linear expression extension and reusing it for the implicit sext as well.	2021-03-27 15:15:47 +01:00
Nikita Popov	60f3e8fbe4	[BasicAA] Clarify entry values of GetLinearExpression() (NFC) A number of variables need to be correctly initialized on entry to GetLinearExpression() for the implementation to behave reasonably. The fact that SExtBits can currenlty be non-zero on entry is a bug, as demonstrated by the added test: For implicit sexts by the GEP, we do currently skip legality checks.	2021-03-27 14:50:09 +01:00
Nikita Popov	ad9dad93ff	[BasicAA] Bail out earlier for invalid shift amount Currently, we'd produce an incorrect decomposition, because we already recursively called GetLinearExpression(), so the Scale=1, Offset=0 will not necessarily be relative to the shl itself. Now, this doesn't actually matter for functional correctness, because such a shift is poison anyway, so its okay to return an incorrect decomposition. It's still unnecessarily confusing though, and we can easily avoid this by checking the bitwidth earlier.	2021-03-27 12:41:16 +01:00
Nikita Popov	5a5a8088cc	[BasicAA] Retain shl nowrap flags in GetLinearExpression() Nowrap flags between mul and shl differ in that mul nsw allows multiplication of 1 * INT_MIN, while shl nsw does not. This means that it is always fine to transfer shl nowrap flags to muls, but not necessarily the other way around. In this case the NUW/NSW results refer to mul/add operations, so it's fine to retain the flags from the shl.	2021-03-27 12:26:22 +01:00
Simon Pilgrim	41146bfe82	[X86][SSE] combineX86ShuffleChain - attempt to recognise 'hidden' identity shuffles See if the combined shuffle mask is equivalent to an identity shuffle, typically this is due to repeated LHS/RHS ops in horiz-ops, but isTargetShuffleEquivalent might see other patterns as well. This is another small step towards getting rid of foldShuffleOfHorizOp and relying on canonicalizeShuffleMaskWithHorizOp and generic shuffle combining.	2021-03-27 11:09:30 +00:00
Juneyoung Lee	05884d3b52	Make FoldBranchToCommonDest poison-safe by default This is a small patch to make FoldBranchToCommonDest poison-safe by default. After `fc3f0c9c`, only two syntactic changes are needed to fix unit tests. This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99452	2021-03-27 19:05:12 +09:00
Sanjay Patel	a283d72583	[x86] prevent crashing while matching pmaddwd This could crash in 2 ways: either one or both of the input vectors could be a different size than the math ops. https://llvm.org/PR49716	2021-03-27 05:27:14 -04:00
Juneyoung Lee	fc3f0c9cc0	[IRCE] Use m_LogicalAnd This is a minor fix to use m_LogicalAnd. This allows IRCE to recognize select form of and conditions as well.	2021-03-27 15:23:18 +09:00
Craig Topper	4d5ee71b52	[RISCV] Merge FMulAdd and FMulSub scheduler classes to a single FMA scheduler class. NFC It's unlikely that FMADD and FMSUB would have different scheduling information so merge them. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99140	2021-03-26 16:37:20 -07:00
Hongtao Yu	12ac0403b1	[CSSPGO][NFC] Fix a debug dump issue. During context promotion, intermediate nodes that are on a call path but do not come with a profile can be promoted together with their parent nodes. Do not print sample context string for such nodes since they do not have profile. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D99441	2021-03-26 16:06:56 -07:00
Chris Lattner	62c41cfba1	Add a missing file header comment, NFC.	2021-03-26 15:34:04 -07:00
Craig Topper	c41f2f6492	[RISCV] Add scheduler classes for the Zba and Zbb extensions. I've used IALU for the simplest operations from Zbb: min, minu, max, maxu, sext.b, sext.h, zext.h, andn, orn, xnor I've put add.uw in IALU32 and slli.uw in ShiftImm32. Remaining instructions have received new classes. All 3 shadd are grouped together. shadd.uw are grouped together. Rotate left and right are together. Everything else got their own class containing one instruction. I think what I have here is the minimum granularity we need. I could be convinced that we need more classes. Reviewed By: evandro Differential Revision: https://reviews.llvm.org/D99040	2021-03-26 14:15:29 -07:00
Nikita Popov	4622648a06	Revert "[ArgPromotion] Copy additional metadata for loads." This reverts commit `166620a4f0`. A miscompile has been reported in https://reviews.llvm.org/D93927#2653480 and following.	2021-03-26 21:34:54 +01:00
Florian Hahn	4858e081d7	[ConstraintElimination] Only strip casts preserving the representation. Things like addrspacecast may not be no-ops, so we should not look through them.	2021-03-26 20:07:41 +00:00
Nikita Popov	fd7df0cf38	[ValueTracking] Handle shl pair in isKnownNonEqual() Handle (x << s) != (y << s) where x != y and the shifts are non-wrapping. Once again, this establishes parity with the corresponing mul fold that already exists. The shift case is more powerful because we don't need to guard against multiplies by zero.	2021-03-26 20:21:05 +01:00
Nikita Popov	9666e89d57	[ValueTracking] Handle shl in isKnownNonEqual() This handles the pattern X != X << C for non-zero X and C and a non-overflowing shift. This establishes parity with the corresponing fold for multiplies.	2021-03-26 20:21:05 +01:00
Sanjay Patel	b0797e0c12	[SLP] use dyn_cast instead of isa + cast; NFC	2021-03-26 13:52:31 -04:00
Nikita Popov	caf92a8a92	[ValueTracking] Handle non-zero shl recurrence In this case we don't care about the step at all, and only require that the starting value is non-zero.	2021-03-26 18:39:06 +01:00
Nikita Popov	938d05b814	[ValueTracking] Handle non-zero add/mul recurrences more precisely This is mainly for clarity: It doesn't make sense to do any negative/positive checks when dealing with a nuw add/mul. These only make sense to nsw add/mul.	2021-03-26 18:30:07 +01:00
Simon Pilgrim	c769ba9514	[X86][AVX] combineHorizOpWithShuffle - improve SHUFFLE(HOP(LOSUBVECTOR(X),HISUBVECTOR(X))) folding Peek through bitcasts to find subvector splits and use getTargetShuffleInputs to decode target shuffles as well as ShuffleVectorSDNode	2021-03-26 17:23:54 +00:00
Jay Foad	9d08f276d7	[AMDGPU] Use reductions instead of scans in the atomic optimizer If the result of an atomic operation is not used then it can be more efficient to build a reduction across all lanes instead of a scan. Do this for GFX10, where the permlanex16 instruction makes it viable. For wave64 this saves a couple of dpp operations. For wave32 it saves one readlane (which are generally bad for performance) and one dpp operation. Differential Revision: https://reviews.llvm.org/D98953	2021-03-26 15:38:14 +00:00
Zakk Chen	9049cf77e3	[RISCV] Add constraint for RVV indexed loads. Add the constraint when destination EEW not equals the source EEW for correctness. The RVV spec has three register overlap rules and I implement the first stricter constraint because the others are difficult to enforce. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D98920	2021-03-26 07:23:24 -07:00
Sanjay Patel	a26312f9d4	Revert "[SLP] allow matching integer min/max intrinsics as reduction ops" This reverts commit `3c8473ba53` and includes test diffs to maintain testing status. There's at least 1 place that was not updated with `7202f47508` , so we can crash mismatching select and intrinsics as shown in PR49730.	2021-03-26 09:59:14 -04:00
David Sherwood	c39460cc4f	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `240aa96cf2`.	2021-03-26 11:36:53 +00:00
David Sherwood	240aa96cf2	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. Differential Revision: https://reviews.llvm.org/D98512	2021-03-26 11:27:12 +00:00
Abhina Sreeskantharajan	bc5d4bcc2d	[Windows] Turn off text mode in TableGen and Rewriter to stop CRLF translation This patch should fix the errors shown on the Windows bots by turning off text mode. I plan to investigate a better fix but this should unblock the buildbots for now. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99363	2021-03-26 07:12:46 -04:00
Jay Foad	d92b4956d6	[AMDGPU] Inline FSHRPattern into its only use. NFC.	2021-03-26 09:32:02 +00:00
Craig Topper	8f62a80328	[RISCV] Optimize (and (shl GPR:, uimm5:), 0xffffffff) to use 2 shifts instead of 3. The and would normally become SLLI+SRLI, giving us 2 SLLI+SRLI. We can detect this and combine the 2 SLLIs into 1.	2021-03-25 23:31:01 -07:00
Craig Topper	5a18c576c4	[RISCV] Don't call CheckAndMask from selectZExti32. Now that targetShrinkDemandedConstant preserves 0xffffffff masks we shouldn't need to call computeKnownBits here.	2021-03-25 22:07:41 -07:00
Kazu Hirata	9d375a40c3	Reapply [InlineCost] Enable the cost benefit analysis on FDO This patch enables the cost-benefit-analysis-based inliner by default if we have instrumentation profile. - SPEC CPU 2017 shows a 0.4% improvement. - An internal large benchmark shows a 0.9% reduction in the cycle count along with 14.6% reduction in the number of call instructions executed. Differential Revision: https://reviews.llvm.org/D98213	2021-03-25 21:51:38 -07:00
Kazu Hirata	3c775d93a1	[InlineCost] Reject a zero entry count This patch teaches the cost-benefit-analysis-based inliner to reject a zero entry count so that we don't trigger a divide-by-zero.	2021-03-25 21:51:36 -07:00
Wenlei He	5f59f407f5	[CSSPGO] Minor tweak for inline candidate priority tie breaker When prioritize call site to consider for inlining in sample loader, use number of samples as a first tier breaker before using name/guid comparison. This would favor smaller functions when hotness is the same (from the same block). We could try to retrieve accurate function size if this turns out to be more important. Differential Revision: https://reviews.llvm.org/D99370	2021-03-25 21:15:36 -07:00
Lang Hames	19e402d2b3	[JITLink][MachO] Use full <segment>,<section> names for MachO jitlink::Sections. JITLink now requires section names to be unique. In MachO section names are only guaranteed to be unique within their containing segment (e.g. a '__const' section in the '__DATA' segment does not clash with a '__const' section in the '__TEXT' segment), so we need to use the fully qualified <segment>,<section> section names (e.g. '__DATA,__const' or '__TEXT,__const') when constructing jitlink::Sections for MachO objects.	2021-03-25 18:31:18 -07:00
Amara Emerson	55533203d7	[GlobalISel] Add G_ROTR and G_ROTL opcodes for rotates. Differential Revision: https://reviews.llvm.org/D99383	2021-03-25 17:23:30 -07:00
Jessica Paquette	23f657c165	[AArch64][GlobalISel] Emit bzero on Darwin Darwin platforms for both AArch64 and X86 can provide optimized `bzero()` routines. In this case, it may be preferable to use `bzero` in place of a memset of 0. This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can be generated by platforms which may want to use bzero. To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The conditions for this are largely a port of the bzero case in `AArch64SelectionDAGInfo::EmitTargetCodeForMemset`. The only difference in comparison to the SelectionDAG code is that, when compiling for minsize, this will fire for all memsets of 0. The original code notes that it's not beneficial to do this for small memsets; however, using bzero here will save a mov from wzr. For minsize, I think that it's preferable to prioritise omitting the mov. This also fixes a bug in the libcall legalization code which would delete instructions which could not be legalized. It also adds a check to make sure that we actually get a libcall name. Code size improvements (Darwin): - CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign) - CTMark -Oz: -0.2% geomean (-0.5% on bullet) Differential Revision: https://reviews.llvm.org/D99358	2021-03-25 17:14:25 -07:00
Richard Smith	040c60d9b6	Fix a miscompile introduced by `99203f2`. getPointersDiff would previously round down the difference between two pointers to a multiple of the element size of the pointee, which could result in a pointer value being decreased a little. Alexey Bataev has graciously agreed to add a testcase for this; submitting the bugfix now to unblock.	2021-03-25 16:53:58 -07:00
Fangrui Song	ed956554f9	[Triple][Driver] Add muslx32 environment and use /lib/ld-musl-x32.so.1 for -dynamic-linker Differential Revision: https://reviews.llvm.org/D99308	2021-03-25 16:25:47 -07:00
Yonghong Song	886f9ff531	BPF: add extern func to data sections if specified This permits extern function (BTF_KIND_FUNC) be added to BTF_KIND_DATASEC if a section name is specified. For example, -bash-4.4$ cat t.c void foo(int) __attribute__((section(".kernel.funcs"))); int test(void) { foo(5); return 0; } The extern function foo (BTF_KIND_FUNC) will be put into BTF_KIND_DATASEC with name ".kernel.funcs". This will help to differentiate two kinds of external functions, functions in kernel and functions defined in other bpf programs. Differential Revision: https://reviews.llvm.org/D93563	2021-03-25 16:03:29 -07:00
Jingu Kang	3fd64cc7a3	[ValueTracking] Handle two PHIs in isKnownNonEqual() loop: %cmp.0 = phi i32 [ 3, %entry ], [ %inc, %loop ] %pos.0 = phi i32 [ 1, %entry ], [ %cmp.0, %loop ] ... %inc = add i32 %cmp.0, 1 br label %loop On above example, %pos.0 uses previous iteration's %cmp.0 with backedge according to PHI's instruction's defintion. If the %inc is not same among iterations, we can say the two PHIs are not same. Differential Revision: https://reviews.llvm.org/D98422	2021-03-25 22:56:05 +00:00
Leonard Chan	36eaeaf728	[llvm][hwasan] Add Fuchsia shadow mapping configuration Ensure that Fuchsia shadow memory starts at zero. Differential Revision: https://reviews.llvm.org/D99380	2021-03-25 15:28:59 -07:00
Guozhi Wei	3240910f00	[DAE] Adjust param/arg attributes when changing parameter to undef In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code. To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg. Differential Revision: https://reviews.llvm.org/D98899	2021-03-25 14:53:22 -07:00
Philip Reames	e7ebb87222	[deref] Handle byval/byref/sret/inalloc/preallocated arguments for deref-at-point semantics All of these are scoped allocations which remain dereferenceable during the lifetime of the callee. Differential Revision: https://reviews.llvm.org/D99310	2021-03-25 14:47:31 -07:00
Craig Topper	5797feaa55	[RISCV] Reorder checks in RISCVTTIImpl::getGatherScatterOpCost to avoid calling getMinRVVVectorSizeInBits() when V extension is not enabled. getMinRVVVectorSizeInBits() asserts if the V extension isn't enabled. So check that gather/scatter is legal first since it already contains a check for V extension being enabled. It also already checks getMinRVVVectorSizeInBits for fixed length vectors so we don't need a check in getGatherScatterOpCost.	2021-03-25 14:20:47 -07:00
Andrew Savonichev	bba25a9cd8	[MCA] Support carry-over instructions for in-order processors Instructions that have more uops than the processor's IssueWidth are issued in multiple cycles. The patch fixes PR49712. Differential Revision: https://reviews.llvm.org/D99339	2021-03-26 00:06:19 +03:00
Nico Weber	a60ffee3f4	Revert "[InlineCost] Enable the cost benefit analysis on FDO" This reverts commit `ef69aa961d`. Makes clang assert in PGO builds, see repro tgz in https://bugs.chromium.org/p/chromium/issues/detail?id=1192783#c6	2021-03-25 16:42:19 -04:00
Roman Lebedev	1c55dcbca7	[NFCI][SimplifyCFG] Don't pay for a Small{Map,Set}Vector when plain SmallSet will suffice This only changes the cases where we really don't care about the iteration order of the underlying contained, namely when we will use the values from it to form DTU updates.	2021-03-25 23:25:40 +03:00
Nikita Popov	93a636d9f6	[IR] Lift attribute handling for assume bundles into CallBase Rather than special-casing assume in BasicAA getModRefBehavior(), do this one level higher, in the attribute handling of CallBase. For assumes with operand bundles, the inaccessiblememonly attribute applies regardless of operand bundles.	2021-03-25 21:15:39 +01:00
Markus Böck	c6047101ad	[Support][Windows] Make sure only executables are found by sys::findProgramByName The function utilizes Windows' SearchPathW function, which as I found out today, may also return directories. After looking at the Unix implementation of the file I found that it contains a check whether the found path is also executable. While fixing the Windows implementation, I also learned that sys::fs::access returns successfully when querying whether directories are executable, which the Unix version does not. This patch makes both of these functions equivalent to their Unix implementation and insures that any path returned by sys::findProgramByName on Windows may only be executable, just like the Unix implementation. The equivalent additions I have made to the Windows implementation, in the Unix implementation are here: sys::findProgramByName: `39ecfe6143/llvm/lib/Support/Unix/Program.inc (L90)` sys::fs::access: `c2a84771bb/llvm/lib/Support/Unix/Path.inc (L608)` I encountered this issue when running the LLVM testsuite. Commands of the form not test ... would fail to correctly execute test.exe, which is part of GnuWin32, as it actually tried to execute a folder called test, which happened to be in a directory on my PATH. Differential Revision: https://reviews.llvm.org/D99357	2021-03-25 20:29:43 +01:00
Mircea Trofin	20ad206b60	[NFC] Module::getInstructionCount() is const	2021-03-25 12:29:19 -07:00
Krzysztof Parzyszek	a5b7d38c57	[Hexagon] Limit virtual register reuse range in FI elimination	2021-03-25 13:59:36 -05:00
Lang Hames	7d1c503080	[JITLink][MachO/x86-64] Remove stale commented-out code. This commented-out code was accidentally left in during the transition from MachO-specific to generic x86-64 edge kinds (`ecf6466f01`).	2021-03-25 11:47:24 -07:00
David Green	d97189600e	[ARM] Revert WhileLoopStartLR to DoLoopStart If a WhileLoopStartLR is reverted due to calls in the preheader, we may still be able to instead create a DoLoopStart, preserving the low overhead loop. This adds code for that, only reverting the WhileLoopStartR to a Br/Cmp, leaving the rest of the low overhead loop in place. Differential Revision: https://reviews.llvm.org/D98413	2021-03-25 16:44:15 +00:00
Craig Topper	c40cea6f08	[RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff). We look for this pattern frequently in isel patterns so its a good idea to try to preserve it. This also let's us remove our special isel handling for srliw and use a direct pattern match of (srl (and X, 0xffffffff), C) since no bits will be removed from the and mask. Differential Revision: https://reviews.llvm.org/D99042	2021-03-25 09:03:25 -07:00
Abhina Sreeskantharajan	f5349922c0	Fix: Reordering parameters in getFile and getFileOrSTDIN There was a new getFileOrSTDIN call added recently which was not included in my patch. https://reviews.llvm.org/D99110 I reordered the args to match the new order. Reviewed By: tunz Differential Revision: https://reviews.llvm.org/D99349	2021-03-25 11:55:57 -04:00
Yevgeny Rouban	f7ef26ef0b	[SLP] Fix crash in reduction for integer min/max The SCEV commit `b46c085d2b` [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions seems to reveal a new crash in SLPVectorizer. SLP crashes expecting a SelectInst as an externally used value but umin() call is found. The patch relaxes the assumption to make the IR flag propagation safe. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D99328	2021-03-25 21:44:21 +07:00
Jamie Schmeiser	7f2ae3d55f	add print-change diff modes that do not use colour Summary: The colour characters currently added to the output of -print-changed=diff and -print-changed=diff-quiet cause difficulties when capturing the output and examining it in an editor. Change the function to not have the colour characters and add 2 new choices (-print-changed=cdiff and -print-changed=cdiff-quiet) to retain the existing functionality of adding the colour characters. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) yrouban (Yevgeny Rouban) Differential Revision: https://reviews.llvm.org/D97398	2021-03-25 10:35:27 -04:00
Matt Morehouse	96a4167b4c	[HWASan] Use page aliasing on x86_64. Userspace page aliasing allows us to use middle pointer bits for tags without untagging them before syscalls or accesses. This should enable easier experimentation with HWASan on x86_64 platforms. Currently stack, global, and secondary heap tagging are unsupported. Only primary heap allocations get tagged. Note that aliasing mode will not work properly in the presence of fork(), since heap memory will be shared between the parent and child processes. This mode is non-ideal; we expect Intel LAM to enable full HWASan support on x86_64 in the future. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98875	2021-03-25 07:04:14 -07:00
Abhina Sreeskantharajan	c83cd8feef	[NFC] Reordering parameters in getFile and getFileOrSTDIN In future patches I will be setting the IsText parameter frequently so I will refactor the args to be in the following order. I have removed the FileSize parameter because it is never used. ``` static ErrorOr<std::unique_ptr<MemoryBuffer>> getFile(const Twine &Filename, bool IsText = false, bool RequiresNullTerminator = true, bool IsVolatile = false); static ErrorOr<std::unique_ptr<MemoryBuffer>> getFileOrSTDIN(const Twine &Filename, bool IsText = false, bool RequiresNullTerminator = true); static ErrorOr<std::unique_ptr<MB>> getFileAux(const Twine &Filename, uint64_t MapSize, uint64_t Offset, bool IsText, bool RequiresNullTerminator, bool IsVolatile); static ErrorOr<std::unique_ptr<WritableMemoryBuffer>> getFile(const Twine &Filename, bool IsVolatile = false); ``` Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D99182	2021-03-25 09:47:49 -04:00
Alexey Bataev	568c874117	[SLP]Improve and simplify extendSchedulingRegion. We do not need to scan further if the upper end or lower end of the basic block is reached already and the instruction is not found. It means that the instruction is definitely in the lower part of basic block or in the upper block relatively. This should improve compile time for the very big basic blocks. Differential Revision: https://reviews.llvm.org/D99266	2021-03-25 05:31:58 -07:00
Fraser Cormack	99211352c1	[RISCV] Optimize select-like vector shuffles This patch adds a small optimization for vector shuffle lowering, detecting shuffles which can be re-expressed as vector selects. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99270	2021-03-25 11:39:57 +00:00
Sameer Sahasrabuddhe	b92c8c22b9	[NewPM] Disable non-trivial loop-unswitch on targets with divergence Unswitching a loop on a non-trivial divergent branch is expensive since it serializes the execution of both version of the loop. But identifying a divergent branch needs divergence analysis, which is a function level analysis. The legacy pass manager handles this dependency by isolating such a loop transform and rerunning the required function analyses. This functionality is currently missing in the new pass manager, and there is no safe way for the SimpleLoopUnswitch pass to depend on DivergenceAnalysis. So we conservatively assume that all non-trivial branches are divergent if the target has divergence. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D98958	2021-03-25 11:27:10 +00:00
Fraser Cormack	321a71a772	[RISCV] Optimize BUILD_VECTOR sequences that reveal hidden splats This patch adds further optimization techniques to RVV BUILD_VECTOR lowering. It teaches the compiler to find splats of larger vector element types "hidden" in smaller ones. For example, a v4i8 build_vector (0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally more optimal than the dominant-element BUILD_VECTORs and so takes priority. This optimization is currently limited to all-constant-or-undef BUILD_VECTORs as those were found to be the most common. There's no reason this couldn't be extended to other BUILD_VECTORs, but the additional bit-manipulation instructions may require more sophisticated heuristics. There are some cases where the materialization of the larger constant takes more scalar instructions than it does to build the vector with vector instructions. We could add heuristics to try and catch this. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99195	2021-03-25 10:35:31 +00:00
Simon Pilgrim	36e3c6c841	[X86][AVX] Truncate vectors with PACKSS/PACKUS on AVX2 targets Until AVX512 we don't have any vector truncation instructions, and always lower using shuffles instead. combineVectorTruncation performs this earlier than lowering as it makes it easier to use any sign/zero-extended bits in the truncated bits with PACKSS/PACKUS to perform the shuffle. We currently don't attempt to use combineVectorTruncation on AVX2 targets as in the past 256-bit PACKSS/PACKUS tended to cause 128-bit lane shuffle regressions - but these should now be all resolved with combineHorizOpWithShuffle and in all cases we now reduce the amount of cross-lane shuffling and variable shuffle mask usage. Differential Revision: https://reviews.llvm.org/D96609	2021-03-25 10:34:34 +00:00
Simon Pilgrim	9fde88c3e2	[X86][AVX] splitIntVSETCC - handle separate (canonicalized) SETCC operands LowerVSETCC calls splitIntVSETCC after canonicalizing certain patterns, in particular (X & CPow2 != 0) -> (X & CPow2 == CPow2). Unfortunately if we're splitting for AVX1/non-AVX512BW cases, we lose these canonicalizations as we call the split with the original SetCC node, and when the split nodes are later lowered in LowerVSETCC the patterns are lost behind extract_subvector etc. But if we pass the canonicalized operands for splitting we retain the optimizations. Differential Revision: https://reviews.llvm.org/D99256	2021-03-25 10:18:44 +00:00
Amara Emerson	0d2c4db637	[GlobalISel] Fix crash in RBS with a non-generic IMPLICIT_DEF. This may occur when swifterror codegen in the translator generates these, but we shouldn't try to handle them since they should have regclasses anyway. rdar://75784009 Differential Revision: https://reviews.llvm.org/D99287	2021-03-24 23:08:51 -07:00
Serge Pavlov	ddb0bcbdff	Add missing cases in RISCVMCExpr::getVariantKindName Differential Revision: https://reviews.llvm.org/D98929	2021-03-25 12:57:05 +07:00
Craig Topper	0f99c6c56e	[RISCV] Remove duplicate DebugLoc variables from cases in ReplaceNodeResults. NFC We already created a DebugLoc at the top of the function. We can just use that one.	2021-03-24 20:23:03 -07:00
Philip Reames	9a82f42d12	Plumb TLI through isSafeToExecuteUnconditionally [NFC] Split from D95815 to reduce patch size. Isn't (yet) used for anything, only the client side is wired up.	2021-03-24 17:52:04 -07:00
Philip Reames	4054b8322f	[deref] Implement initial set of inference rules for deref-at-point This implements a subset of the initial set of inference rules proposed in the llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree". The nolias one got moved to a separate review as there was some concerns raised which require further discussion. Differential Revision: https://reviews.llvm.org/D99135	2021-03-24 16:20:41 -07:00
Matt Morehouse	c8ef98e5de	Revert "[HWASan] Use page aliasing on x86_64." This reverts commit `63f73c3eb9` due to breakage on aarch64 without TBI.	2021-03-24 16:18:29 -07:00
Wenlei He	6869e6c1e7	[InlineCost] Make cost-benefit decision explicit With cost-benefit analysis for inlining, we bypass the cost-threshold by returning inline result from call analyzer early. However the cost and threshold are still available from call analyzer, and when cost is actually higher than threshold, we incorrect set the reason. The change makes the decision from cost-benefit analysis explicit. It's mostly NFC, except that it allows the priority-based sample loader inliner used by CSSPGO to use cost-benefit heuristic. Differential Revision: https://reviews.llvm.org/D99302	2021-03-24 16:10:58 -07:00
Kazu Hirata	ef69aa961d	[InlineCost] Enable the cost benefit analysis on FDO This patch enables the cost-benefit-analysis-based inliner by default if we have instrumentation profile. - SPEC CPU 2017 shows a 0.4% improvement. - An internal large benchmark shows a 0.9% reduction in the cycle count along with 14.6% reduction in the number of call instructions executed. Differential Revision: https://reviews.llvm.org/D98213	2021-03-24 15:36:49 -07:00
Sanjay Patel	adf42dff42	[ValueTracking] peek through min/max to find isKnownToBeAPowerOfTwo This is similar to the select logic just ahead of the new code. Min/max choose exactly one value from the inputs, so if both of those are a power-of-2, then the result must be a power-of-2. This might help with D98152, but we likely still need other pieces of the puzzle to avoid regressions. The change in PatternMatch.h is needed to build with clang. It's possible there is a better way to deal with the 'const' incompatibities. Differential Revision: https://reviews.llvm.org/D99276	2021-03-24 17:54:38 -04:00
Roman Lebedev	2070fe7144	[NFCI][SimplifyCFG] Don't form DTU updates if we aren't going to apply them I think we may want to have a thin wrapper over a vector to deduplicate those `if(DTU)` predicates, and instead do them in the `insert()` itself.	2021-03-25 00:02:37 +03:00
Jessica Paquette	56e6eb7975	[AArch64][GlobalISel] Make G_UBFX/G_SBFX legalization check for constants The original rule just checked the type, but this is actually only legal if it has a constant. Differential Revision: https://reviews.llvm.org/D99298	2021-03-24 13:58:27 -07:00
Nikita Popov	a7efed5a20	[SCEV] Improve handling of not expressions in isImpliedCond() SCEV currently tries to prove implications of x pred y by also trying to imply ~y pred ~x. This is expensive in terms of compile-time (in fact, the majority of isImpliedCond compile-time is spent here) and generally not fruitful. The issue is that this also swaps the operands and thus breaks canonical ordering. If originally we were trying to prove an implication like X > C1 -> Y > C2, then we'll now try to prove X > C1 -> C3 > ~Y, which will not work. The only real case where we can get some use out of this transform is if the original conditions were in the form X > C1 -> Y < C2, were then swapped to X > C1 -> C2 > Y and are then swapped again here to X > C1 -> ~Y > C3. As such, handle this at a higher level, where we are doing the swapping in the first place. There's four different ways that we can line up a predicate and a swapped predicate, so we use some heuristics to pick some profitable way. Because we now try this transform at a higher level (isImpliedCondOperands rather than isImpliedCondOperandsHelper), we can also prove additional facts. Of the added tests, one was proven previously while the other wasn't. Differential Revision: https://reviews.llvm.org/D90926	2021-03-24 21:53:02 +01:00
Albion Fung	e29bb074c6	[PowerPC] Exploit xxsplti32dx (constant materialization) for scalars This patch exploits the xxsplti32dx instruction available on Power10 in place of constant pool loads where xxspltidp would not be able to, usually because the immediate cannot fit into 32 bits. Differential Revision: https://reviews.llvm.org/D95458	2021-03-24 15:59:59 -04:00
Congzhe Cao	829c1b6443	[LoopInterchange] fix tightlyNested() in LoopInterchange legality This is yet another attempt to fix tightlyNested(). Add checks in tightlyNested() for the inner loop exit block, such that 1) if there is control-flow divergence in between the inner loop exit block and the outer loop latch, or 2) if the inner loop exit block contains unsafe instructions, tightlyNested() returns false. The reasoning behind is that after interchange, the original inner loop exit block, which was part of the outer loop, would be put into the new inner loop, and will be executed different number of times before and after interchange. Thus it should be dealt with appropriately. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D98263	2021-03-24 15:49:25 -04:00
Nick Lewycky	80f6c99a78	Verify that MDNodes belong to the same context as the Module. Differential Revision: https://reviews.llvm.org/D99289	2021-03-24 12:38:05 -07:00
Florian Hahn	9d45579279	[LV] Factor out phi type access to variable (NFC). A slight simplification of the code to reduce future diffs.	2021-03-24 19:25:22 +00:00
Florian Hahn	8d1342f79d	[LV] Remove redundant access to Legal::getReductionVars() (NFC). The reduction descriptor is retrieved earlier and stored in a variable RdxDesc already.	2021-03-24 19:15:14 +00:00
Gulfem Savrun Yeniceri	5fbe1fdf17	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5fd001a5ff` because it broke clang-with-thin-lto-ubuntu bot.	2021-03-24 18:59:33 +00:00
Thomas Preud'homme	058455ffbe	[FileCheck] Fix PR49531: invalid use of string var FileCheck string substitution block parsing code only report an invalid variable name in a string variable use if it starts with a forbidden character. It does not report anything if there are unparsed characters after the variable name, i.e. [[X-Y]] is parsed as [[X]] and no error is returned. This commit fixes that. Reviewed By: jdenny, jhenderson Differential Revision: https://reviews.llvm.org/D98691	2021-03-24 18:49:58 +00:00
Matt Morehouse	63f73c3eb9	[HWASan] Use page aliasing on x86_64. Userspace page aliasing allows us to use middle pointer bits for tags without untagging them before syscalls or accesses. This should enable easier experimentation with HWASan on x86_64 platforms. Currently stack, global, and secondary heap tagging are unsupported. Only primary heap allocations get tagged. Note that aliasing mode will not work properly in the presence of fork(), since heap memory will be shared between the parent and child processes. This mode is non-ideal; we expect Intel LAM to enable full HWASan support on x86_64 in the future. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98875	2021-03-24 11:43:41 -07:00
Roland McGrath	3cb2346982	[AArch64] Support .arch_extension pan This makes the behavior consistent with the GNU assembler. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D99209	2021-03-24 11:29:22 -07:00
Jessica Paquette	a141c7d06b	[AArch64][GlobalISel] Select G_SBFX and G_UBFX Add selection support for G_SBFX and G_UBFX and add a test. These must always have a constant LSB and width. Differential Revision: https://reviews.llvm.org/D99224	2021-03-24 11:15:57 -07:00
Craig Topper	512bae81cc	[RISCV] Add basic cost modelling for fixed vector gather/scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99142	2021-03-24 11:14:14 -07:00
Jessica Paquette	1818dc394f	[AArch64][GlobalISel] Mark G_SBFX/G_UBFX as legal for s32 and s64 This isn't perfect, since we should also verify that these only use constants. Differential Revision: https://reviews.llvm.org/D99219	2021-03-24 11:08:41 -07:00
Craig Topper	f24f09d256	[RISCV] Add TTI support for cpop with Zbb This will tell loop idiom recognize that it can make popcount loops countable using the ctpop intrinsic. I didn't bother checking for illegal types. Type legalization knows how to split a ctpop into multiple ctops added together. Assuming we only receive reasonable integer bit widths, a few cpop instructions added together is probably better than the loop. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99203	2021-03-24 10:58:42 -07:00
Gulfem Savrun Yeniceri	5fd001a5ff	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-24 17:31:18 +00:00
Nikita Popov	8a168d2d70	[LICM] Fix NumSunk statistic (NFC) LICM can sink instructions that have uses inside the loop, as long as these uses are considered "free". However, if there were only free uses inside the loop, and no uses outside the loop at all, the instruction would still count towards the NumSunk statistic. This resulted in a wild inflation of the NumSunk metric. After this patch it drops down from 1141787 to 5852 on test-suite O3.	2021-03-24 18:28:19 +01:00
Thomas Preud'homme	3b52c04e82	Make FindAvailableLoadedValue TBAA aware FindAvailableLoadedValue() relies on FindAvailablePtrLoadStore() to run the alias analysis when searching for an equivalent value. However, FindAvailablePtrLoadStore() calls the alias analysis framework with a memory location for the load constructed from an address and a size, which thus lacks TBAA metadata info. This commit modifies FindAvailablePtrLoadStore() to accept an optional memory location as parameter to allow FindAvailableLoadedValue() to create it based on the load instruction, which would then have TBAA metadata info attached. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99206	2021-03-24 17:20:26 +00:00
Alexandre Ganea	64ab2b6825	[Support] Fix 'keeping' temporary files on Windows 7 As reported here: https://bugs.llvm.org/show_bug.cgi?id=48378#c0 and here: https://github.com/rust-lang/rust/issues/81051 since `79657e2339`, some programs such as llvm-ar don't work properly on Windows 7. The issue is shown in the snippet by Oleksandr Prodan: https://pastebin.com/v51m3uBU In essence, once the 'DeleteFile' flag has been set on FILE_DISPOSITION_INFO, the file path can't be queried anymore with GetFinalPathNameByHandleW. This however works on Windows 10, GetFinalPathNameByHandleW would return sucessfully. To workaround the issue, we simply reset the 'DeleteFile' flag before even checking if we're dealing with a network file. Tested with `llvm-ar r empty.a a.obj` ran on a network mount. At the moment, we cannot specifically add a test coverage for this, since it requres mounting a network drive.	2021-03-24 12:47:08 -04:00
David Green	14b2ec934e	[ARM] Enable UpperBound unrolling for all loops This UpperBound unrolling was already enabled so long as a series of conditions in ARMTTIImpl::getUnrollingPreferences pass. This just always enables it as it can help fully unroll loops that would not otherwise pass those tests. Differential Revision: https://reviews.llvm.org/D99174	2021-03-24 16:39:21 +00:00
Roman Lebedev	fe36b834db	[NFCI][SimplifyCFG] Fold branch to common dest: don't check cost if no qualified preds	2021-03-24 19:01:47 +03:00
Konstantin Zhuravlyov	f4ace63737	AMDGPU: Add target id and code object v4 support - Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id) - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object) - Add kernarg_size to kernel descriptor - Change trap handler ABI to no longer move queue pointer into s[0:1] - Cleanup ELF definitions - Add V2, V3, V4 suffixes to make a clear distinction for code object version - Consolidate note names Differential Revision: https://reviews.llvm.org/D95638	2021-03-24 11:54:05 -04:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Nashe Mncube	ac2a1e9596	[SVE] Suppress vselect warning from incorrect interface call The VSelectCombine handler within AArch64ISelLowering, uses an interface call which only expects fixed vectors. This generates a warning when the call is made on a scalable vector. This warning has been suppressed with this change, by using the ElementCount interface, which supports both fixed and scalable vectors. I have also added a regression test which recreates the warning. Differential Revision: https://reviews.llvm.org/D98249	2021-03-24 14:34:34 +00:00
Anirudh Prasad	301d9261b7	[AsmParser][SystemZ][z/OS] Re-introduce HLASM comment syntax - https://reviews.llvm.org/rGb605cfb336989705f391d255b7628062d3dfe9c3 was reverted due to sanitizer bugs in the introduced unit-test (specifically in the Address sanitizer https://lab.llvm.org/buildbot/#/builders/5/builds/5697) - This patch attempts to rectify that, as well as re-factor parts of the test - The issue was previously, within the `setupCallToAsmParser` function in the unit-test, `SrcMgr` was declared as a local variable. `SrcMgr` owns a unique pointer. Since the variable goes out of scope at the end of the function, the unique pointer is released. - This patch, moves the declaration of the `SrcMgr` variable to a class field, since the scope will remain until the class's destructor is invoked (which in this case is at the end of the unit test) - Furthermore, this patch also moves the `MCContext Ctx` declaration from a local variable instance inside a function, to a unique pointer class field. This ensures the instantiation of the MCContext remains until the tear down of the test. Reviewed By: abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D99004	2021-03-24 10:17:00 -04:00
Simon Pilgrim	7920527796	[X86][AVX] combineBitcastvxi1 - improve handling of vectors truncated to vXi1 If we're truncating to vXi1 from a wider type, then prefer the original wider vector as is simplifies folding the separate truncations + extensions. AVX1 this is only worth it for v8i1 cases, not v4i1 where we're always better off truncating down to v4i32 for movmsk. Helps with some regressions encountered in D96609	2021-03-24 14:05:59 +00:00
Stefan Pintilie	91f4c11133	[PowerPC] Add mprivileged option Add an option to tell the compiler that it can use privileged instructions. This patch only adds the option. Backend implementation will be added in a future patch. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D99193	2021-03-24 08:33:22 -05:00
Joseph Huber	8140d0ec4a	[OpenMP] Change OMPIRBuilder to append function attributes Summary: Currently the OMPIRBuilder overwrites the function's existing attributes when it assigns the ones defined in OMPKinds.def. This changes the behaviour to append the current function's attributes with them instead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98740	2021-03-24 09:08:29 -04:00
Alexey Bataev	2f1b439089	[LoopAnalysis][NFC]Remove redundant code. Removed redundant code for IsConsecutive variable.	2021-03-24 05:37:19 -07:00
Simon Pilgrim	e9015bd595	[X86][AVX] lowerShuffleAsBroadcast - MOVDDUP(SCALAR_TO_VECTOR(X)) -> BROADCAST(X) Prefer broadcast from scalar on AVX targets as this makes it easier for later folds to strip away bitcasts etc. This helps a lot with the AVX1 poor codegen from PR49658. There's a trivial regression in bitcast-int-to-vector-bool-*ext.ll tests due to SimplifyDemandedBits not being able to see a multi-use case, but there's bigger existing codegen issues to be addressed first in those tests (unnecessary NOTs).	2021-03-24 11:31:56 +00:00
Andrea Di Biagio	97a00b7b20	[MCA] Fix for uninitialised member in constructor. NFC	2021-03-24 11:21:59 +00:00
Simon Pilgrim	c1ef642ad8	[X86] Remove unused 'OneUse' option from IsNOT helper. NFCI.	2021-03-24 11:14:38 +00:00
alex-t	dccf83acf9	[AMDGPU] SIOptimizeExecMaskingPreRA should check constant bus constraint when folds EXEC copy Folding EXEC copy into it's single use may lead to constant bus constraint violation as it adds one more SGPR operand. This change makes it validate the user instruction with the new SGPR operand and only fold it if it is legal. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D98888	2021-03-24 14:14:13 +03:00
Florian Hahn	cd0c00c9fe	[LV] Move exact FP math check out of Requirements. We know if the loop contains FP instructions preventing vectorization after we are done with legality checks. This patch updates the code the check for un-vectorizable FP operations earlier, to avoid unnecessarily running the cost model and picking a vectorization factor. It also makes the code more direct and moves the check to a position where similar checks are done. I might be missing something, but I don't see any reason to handle this check differently to other, similar checks. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98633	2021-03-24 11:01:44 +00:00
Andrew Savonichev	292da93d59	[MCA] Disable RCU for InOrderIssueStage This is a follow-up for: D98604 [MCA] Ensure that writes occur in-order When instructions are aligned by the order of writes, they retire in-order naturally. There is no need for an RCU, so it is disabled. Differential Revision: https://reviews.llvm.org/D98628	2021-03-24 13:54:04 +03:00
Stefan Pintilie	0e4f5f3ea6	[PowerPC] Change option to mrop-protect In order to have the same option on power PC LLVM and power PC gcc the option will be changed from -mrop-protection to -mrop-protect. The feature will be off by default and turned on when the option is used. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D99185	2021-03-24 05:51:35 -05:00
Ta-Wei Tu	4d9d736875	[NFC] Improve debug message and test description in `4c1f74a`	2021-03-24 18:21:13 +08:00
Ta-Wei Tu	4c1f74a76c	[LoopFlatten] Fix invalid assertion (PR49571) The `InductionPHI` is not necessarily the increment instruction, as demonstrated in pr49571.ll. This patch removes the assertion and instead bails out from the `LoopFlatten` pass if that happens. This fixes https://bugs.llvm.org/show_bug.cgi?id=49571 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D99252	2021-03-24 18:08:27 +08:00
Ta-Wei Tu	8fde25b3c3	[NFC] Remove redundant `struct` prefix Reviewed By: SjoerdMeijer, fhahn Differential Revision: https://reviews.llvm.org/D99251	2021-03-24 17:58:33 +08:00
Andy Wingo	c9801db2eb	[WebAssembly][MC] Record limit constraints for table sizes This commit adds a full WasmTableType to MCSymbolWasm, differing from the current situation (just an ElemType) in that it additionally records a WasmLimits. We add support for specifying the limits in .S files also, via the following syntax variations: .tabletype SYM, ELEMTYPE .tabletype SYM, ELEMTYPE, MINSIZE .tabletype SYM, ELEMTYPE, MINSIZE, MAXSIZE Depends on D99186. Differential Revision: https://reviews.llvm.org/D99191	2021-03-24 09:44:22 +01:00
Andy Wingo	9ac5620cb8	[WebAssembly] Rename WasmLimits::Initial to ::Minimum. NFC. This patch renames the "Initial" member of WasmLimits to the name used in the spec, "Minimum". In the core WebAssembly specification, the Limits data type has one required "min" member and one optional "max" member, indicating the minimum required size of the corresponding table or memory, and the maximum size, if any. Although the WebAssembly spec does instantiate locally-defined tables and memories with the initial size being equal to the minimum size, it can't impose such a requirement for imports. It doesn't make sense to require an initial size for a memory import, for example. The compiler can only sensibly express the minimum and maximum sizes. See https://github.com/WebAssembly/js-types/blob/master/proposals/js-types/Overview.md#naming-of-size-limits for a related discussion that agrees that the right name of "initial" is "minimum" when querying the type of a table or memory from JavaScript. (Of course it still makes sense for JS to speak in terms of an initial size when it explicitly instantiates memories and tables.) Differential Revision: https://reviews.llvm.org/D99186	2021-03-24 09:10:11 +01:00
Jim Lin	503f1d845f	[RISCV] Add HasStdExtD predicate to copysign from double and to double patterns Copysign from double and to double patterns have lack of HasStdExtD predicate. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99234	2021-03-24 14:29:23 +08:00
Serguei Katkov	311d81ce97	[RegAlloc] Fix "ran out of regs" with uses in statepoint Statepoint instruction is known to have a variable and big number of operands. It is possible that Register Allocator will split live intervals in the way that all physical registers are occupied by "zero-length" live intervals which are marked as not-spillable. While intervals are marked as not-spillable in the moment of creation when they are really zero-length it is possible that in future as part of re-materialization there will need for physical register between def and use of such tiny interval (the use is not related to this interval at all). As all physical registers are assigned to not-spillable intervals there is not avaialbe registers and RA reports an error. The idea of the fix is avoid marking tiny live intervals where there is a use in statepoint instruction in var args section. Such interval may be perfectly spilled and folded to operand of statepoint. Reviewers: reames, dantrushin, qcolombet, dsanders, dmgreen Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D98766	2021-03-24 10:25:34 +07:00
Craig Topper	6204ac4536	[X86] Bale out of X86FastISel::X86SelectCmp for vectors. None of the code in this function was written to handle vectors. Most of the cases already fail for vectors for one reason or another. The exception is an optimization that detects identical operands. This can be triggered by vectors, but the code always creates a 0 or 1 constants in a scalar register which is incorrect for vectors. Fixes PR49706.	2021-03-23 20:16:04 -07:00
Yang Fan	279d74ffd1	[InstSimplify] Fix unused variable warning (NFC) GCC warning: ``` /llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp: In function ‘llvm::Value* SimplifyWithOpReplaced(llvm::Value, llvm::Value, llvm::Value, const llvm::SimplifyQuery&, bool, unsigned int)’: /llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp:3993:15: warning: unused variable ‘SI’ [-Wunused-variable] 3993 \| if (auto SI = dyn_cast<SelectInst>(I)) \| ^~ ```	2021-03-24 09:56:36 +08:00
Choongwoo Han	772e1dd1dd	[Coverage] Load records immediately The current implementation keeps buffers generated for each object file until it completes loading of all files. This approach requires a lot of memory if there are a lot of huge object files. Thus, make it to load coverage records immediately rather than waiting for other binaries to be loaded. This reduces memory usage of llvm-cov from >128GB to 5GB when loading Chromium binaries in Windows. Additional testing: check-profile, check-llvm Differential Revision: https://reviews.llvm.org/D99110	2021-03-23 16:25:20 -07:00
Amara Emerson	7bddf00581	[AArch64][GlobalISel] Lower G_FSHL and G_FSHR. Codegen isn't as good as we need it, but that'll be done later.	2021-03-23 16:09:19 -07:00
Jingu Kang	2e2740b859	[ValueTracking] Handle increasing mul recurrence in isKnownNonZero() Differential Revision: https://reviews.llvm.org/D99069	2021-03-23 23:04:41 +00:00
Matteo Favaro	a4fb88669c	[MSSA] Extending IsGuaranteedLoopInvariant to support an instruction defined in the entry block As mentioned in [[ https://reviews.llvm.org/D96979 \| D96979 ]], I'm extending the IsGuaranteedLoopInvariant check also to the `MemorySSA.cpp` file. @fhahn For now I didn't unify the function into `MemorySSA.h` because, as you mentioned, it's not directly MSSA related. I'm open to suggestions to find a better place so we can improve the unification process. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97155	2021-03-23 21:50:56 +00:00
Alexey Bataev	99203f2004	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 14:25:36 -07:00
Nikita Popov	931b6066ac	[BasicAA] Handle assumes with operand bundles This fixes a regression reported on D99022: If a call has operand bundles, then the inaccessiblememonly attribute on the function will be ignored, as operand bundles can affect modref behavior in the general case. However, for assume operand bundles in particular this is not the case. Adjust getModRefBehavior() to always report inaccessiblememonly for assumes, regardless of presence of operand bundles.	2021-03-23 21:21:19 +01:00
Alexey Bataev	f1b47ad278	Revert "[Analysis]Add getPointersDiff function to improve compile time." This reverts commit `065a14a12d` to investigate and fix crash in SLP vectorizer.	2021-03-23 13:17:54 -07:00
Alexey Bataev	065a14a12d	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 12:58:42 -07:00
Amara Emerson	75b6a47bd0	[AArch64][GlobalISel] Lower G_CTLZ_ZERO_UNDEF. This adds some missing legalizer tests, which uncovered a v2s64 selection test that wasn't working since there's no legalization or instruction for that.	2021-03-23 12:49:10 -07:00
Craig Topper	4c38c35c8d	[ValueTracking] Teach canCreateUndefOrPoison that ctpop does not create undef or poison. This select of ctpop with 0 pattern can get left behind after loop idiom recognize converts a loop to ctpop. LLVM 10 was able to optimize this, but LLVM 11 and later is not. The difference seems to be that some select transforms are now limited based on canCreateUndefOrPoison. Teaching canCreateUndefOrPoison about ctpop restores the LLVM 10 codegen. Differential Revision: https://reviews.llvm.org/D99207	2021-03-23 12:42:18 -07:00
Jay Foad	fd142e6c18	[AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitBranchInst. NFC. A BranchInst is always the terminator of its containing BasicBlock.	2021-03-23 16:54:43 +00:00
Joe Nash	538bda0b80	[AMDGPU] Refactor DPPCombine NFC. Extract IsShrinkable into a helper function, and make Subtarget a member variable. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D99099 Change-Id: If4bc97a88a9ae4eb1df47e717345d46a6ed515bf	2021-03-23 11:53:53 -04:00
Craig Topper	839a46d88f	[RISCV] Use selectImm for RV32. NFC Previously we used selectImm for RV64 and isel patterns for RV32. This should be NFC, but will allow RV32 and RV64 to share improvements in the future. For example, it might be useful to use BSETI from Zbs to make single bit constants. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98877	2021-03-23 08:57:15 -07:00
Jay Foad	fc7e3e7dd9	[AMDGPU] Set SchedRW on real instructions Coyp SchedRW from pseudos to real instructions so that llvm-mca has access to it. This is NFC for normal compiler codegen, which schedules pseudos not real instructions. Add an llvm-mca test for some high latency double-precision instructions as a smoke test. Differential Revision: https://reviews.llvm.org/D99187	2021-03-23 15:38:11 +00:00
Roman Lebedev	b5822026dd	[SimplifyCFG] 'Fold branch to common dest': don't overestimate the cost `FoldBranchToCommonDest()` has a certain budget (`-bonus-inst-threshold=`) for bonus instruction duplication. And currently it calculates the cost as-if it will actually duplicate into each predecessor. But ignoring the budget, it won't always duplicate into each predecessor, there are some correctness and profitability checks. So when calculating the cost, we should first check into which blocks will we actually duplicate, and only then use that block count to do budgeting.	2021-03-23 18:30:26 +03:00
Andrea Di Biagio	f5bdc88e4d	[MCA] Improved handling of negative read-advance cycles. Before this patch, register writes were always invalidated by the RegisterFile at instruction commit stage. So, the RegisterFile was often losing the knowledge about the `execute cycle` of writes already committed. While this was not problematic for non-delayed reads, this was sometimes leading to inaccurate read latency computations in the presence of negative read-advance cycles. This patch fixes the issue by changing how the RegisterFile component internally keeps track of the `execute cycle` information of each write. On every instruction executed, the RegisterFile gets notified by the RetireStage, so that it can internally record the execute cycle of each executed write. The `execute cycle` information is stored within WriteRef itself, and it is not invalidated when the write is committed.	2021-03-23 14:47:23 +00:00
Roman Lebedev	514bc01ca3	[SimplifyCFG] FoldBranchToCommonDest(): properly handle same-block external uses (PR49510/PR49689) We clone bonus instructions to the end of the predecessor block, and then use `SSAUpdater::RewriteUseAfterInsertions()`. But that only deals with the cases where the use-to-be-rewritten are either in different block from the def, or come after the def. But in some loop cases, the external use may be in the beginning of predecessor block, before the newly cloned bonus instruction. `SSAUpdater::RewriteUseAfterInsertions()` does not deal with that. Notably, the external use can't happen to be both in the same block and after the newly-cloned instruction, because of the fold preconditions. To properly handle these cases, when the use is in the same block, we should instead use `SSAUpdater::RewriteUse()`. TBN, they do the same thing for PHI users. Fixes https://bugs.llvm.org/show_bug.cgi?id=49510 Likely Fixes https://bugs.llvm.org/show_bug.cgi?id=49689	2021-03-23 17:37:28 +03:00
Stefan Gränitz	0ef51db5a4	Revert "[Orc] Allow OrcGenericABI variant of LazyCallThroughManager" This reverts commit 61974268269f96b672a50eac40a5a8eeb4acd6d3.	2021-03-23 15:23:33 +01:00
Fraser Cormack	feff66a082	[RISCV] Further optimize BUILD_VECTORs with repeated elements This patch builds upon the initial BUILD_VECTOR work introduced in D98700. It further optimizes the lowering of BUILD_VECTOR by using VSELECT operations to effectively insert repeated elements into the vector with relatively few instructions. This allows us to optimize more BUILD_VECTORs without significantly increasing the size of the generated code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98969	2021-03-23 14:14:48 +00:00
Sanjay Patel	1bf8f9e228	[SimplifyCFG] use profile metadata to refine merging branch conditions 2nd try (original: `27ae17a6b0`) with fix/test for crash. We must make sure that TTI is available before trying to use it because it is not required (might be another bug). Original commit message: This is one step towards solving: https://llvm.org/PR49336 In that example, we disregard the recommended usage of builtin_expect, so an expensive (unpredictable) branch is folded into another branch that is guarding it. Here, we read the profile metadata to see if the 1st (predecessor) condition is likely to cause execution to bypass the 2nd (successor) condition before merging conditions by using logic ops. Differential Revision: https://reviews.llvm.org/D98898	2021-03-23 10:19:37 -04:00
Jamie Schmeiser	64336d3421	Revert "A new option -print-on-crash that prints the IR as it was upon entering the last pass when there is a crash." This reverts commit `9544a32287`.	2021-03-23 10:09:27 -04:00
Jamie Schmeiser	9544a32287	A new option -print-on-crash that prints the IR as it was upon entering the last pass when there is a crash. Summary: The IR is saved in its print form before each pass is started and a signal handler is registered. If the compilation crashes, the signal handler will print the saved IR to dbgs(). This option can be modified using -print-module-scope to get the IR for the complete module. Note that this option only works with the new pass manager. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) yrouban (Yevgeny Rouban) Differential Revision: https://reviews.llvm.org/D86657	2021-03-23 09:29:17 -04:00
serge-sans-paille	e19884cd74	Introduce a generic operator to apply complex operations to BitVector This avoids temporary and memcpy call when computing large expressions. It's basically some kind of poor man's expression template, but it seems easier to maintain to have a single generic `apply` call instead of the whole expression template machinery here. Differential Revision: https://reviews.llvm.org/D98176	2021-03-23 14:23:26 +01:00
Valentin Clement	d709dcc090	[openacc][openmp] Reduce number of generated file and prefer inclusion of .inc Follow up from D92955 and D83636. This patch makes the base cpp files OMP.cpp and ACC.cpp normal files and they now include the XXX.inc file generated by tablegen. This reduces the number of file generated by the DirectiveEmitter backend and makes it closer to the proposal in D83636. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D93560	2021-03-23 09:16:53 -04:00
Matt Arsenault	b24436ac96	GlobalISel: Lower funnel shifts	2021-03-23 09:11:17 -04:00
Stefan Gränitz	5949bd9125	[Orc] Allow OrcGenericABI variant of LazyCallThroughManager Apply the way createLocalIndirectStubsManagerBuilder() deals with unsupported achritectures to createLocalLazyCallThroughManager(). The returned call-through manager is dysfunctional: It runs into an unreachable as soon as a lazy JIT attempts to use it. However, this results in broader platform support for lli in default (greedy) ORC mode where no lazy materialization is required.	2021-03-23 14:08:53 +01:00
Sanjay Patel	3c8473ba53	[SLP] allow matching integer min/max intrinsics as reduction ops As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-23 08:56:44 -04:00
Luke Drummond	520f70e94d	[NFC] clang-format llvm/lib/Transforms/Utils/CloneFunction.cpp Differential Revision: https://reviews.llvm.org/D98957	2021-03-23 12:53:28 +00:00
Luke Drummond	ab44ec1b22	[NFC] Minor refactor - Give unwieldy repeated expression a name - Use a ranged `for` basic block iterator Reviewed by: nikic, dexonsmith Differential Revisision: https://reviews.llvm.org/D98957	2021-03-23 12:53:28 +00:00
Luke Drummond	0448ddd169	[NFCI] cleanup CloneFunctionInto Hoist early return for decl-only clones to before DIFinder calculation. Also fix an out of date assert message after invariants changed in `22a52dfddc`. Reviewed by: nikic, dexonsmith Differential Revisision: https://reviews.llvm.org/D98957	2021-03-23 12:53:27 +00:00
Benjamin Kramer	39e36fff3d	[AArch64] Fix unused variable warning	2021-03-23 13:42:14 +01:00
Nashe Mncube	5d929794a8	[llvm-opt] Bug fix within combining FP vectors A bug was found within InstCombineCasts where a function call is only implemented to work with FixedVectors. This caused a crash when a ScalableVector was passed to this function. This commit introduces a regression test which recreates the failure and a bug fix. Differential Revision: https://reviews.llvm.org/D98351	2021-03-23 12:13:41 +00:00
Florian Hahn	e43e8e9138	[AnnotationRemarks] Use subprogram location for summary remarks. The summary remarks are generated on a per-function basis. Using the first instruction's location is sub-optimal for 2 reasons: 1. Sometimes the first instruction is missing !dbg 2. The location of the first instruction may be mis-leading. Instead, just use the location of the function directly.	2021-03-23 12:05:41 +00:00
Victor Campos	f22b4c7122	[ARM] Handle debug instrs in ARM Low Overhead Loop pass In function ConvertVPTBlocks(), it is assumed that every instruction within a vector-predicated block is predicated. This is false for debug instructions, used by LLVM. Because of this, an assertion failure is reached when an input contains debug instructions inside VPT blocks. In non-assert builds, an out of bounds memory access took place. The present patch properly covers the case of debug instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99075	2021-03-23 11:49:06 +00:00
David Sherwood	d70251163f	[LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector In places where we create a ConstantVector whose elements are a linear sequence of the form <start, start + 1, start + 2, ...> I've changed the code to make use of CreateStepVector, which creates a vector with the sequence <0, 1, 2, ...>, and a vector addition operation. This patch is a non-functional change, since the output from the vectoriser remains unchanged for fixed length vectors and there are existing asserts that still fire when attempting to use scalable vectors for vectorising induction variables. In a later patch we will enable support for scalable vectors in InnerLoopVectorizer::getStepVector(), which relies upon the new stepvector intrinsic in IRBuilder::CreateStepVector. Differential Revision: https://reviews.llvm.org/D97861	2021-03-23 11:29:05 +00:00
Abhina Sreeskantharajan	a234d03198	[NFC] Formatting changes This patch addresses some formatting changes from the comments in https://reviews.llvm.org/D97785. Reviewed By: anirudhp Differential Revision: https://reviews.llvm.org/D99072	2021-03-23 07:17:54 -04:00
David Sherwood	748ae5281d	[IR][SVE] Add new llvm.experimental.stepvector intrinsic This patch adds a new llvm.experimental.stepvector intrinsic, which takes no arguments and returns a linear integer sequence of values of the form <0, 1, ...>. It is primarily intended for scalable vectors, although it will work for fixed width vectors too. It is intended that later patches will make use of this new intrinsic when vectorising induction variables, currently only supported for fixed width. I've added a new CreateStepVector method to the IRBuilder, which will generate a call to this intrinsic for scalable vectors and fall back on creating a ConstantVector for fixed width. For scalable vectors this intrinsic is lowered to a new ISD node called STEP_VECTOR, which takes a single constant integer argument as the step. During lowering this argument is set to a value of 1. The reason for this additional argument at the codegen level is because in future patches we will introduce various generic DAG combines such as mul step_vector(1), 2 -> step_vector(2) add step_vector(1), step_vector(1) -> step_vector(2) shl step_vector(1), 1 -> step_vector(2) etc. that encourage a canonical format for all targets. This hopefully means all other targets supporting scalable vectors can benefit from this too. I've added cost model tests for both fixed width and scalable vectors: llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll as well as codegen lowering tests for fixed width and scalable vectors: llvm/test/CodeGen/AArch64/neon-stepvector.ll llvm/test/CodeGen/AArch64/sve-stepvector.ll See this thread for discussion of the intrinsic: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html	2021-03-23 10:43:35 +00:00
Fraser Cormack	5bfbd9d938	[RISCV] Optimize all-constant mask BUILD_VECTORs This patch adds an optimization for mask-vector BUILD_VECTOR nodes whose elements are all constants or undef. It lowers such operations by building up the vector via a series of integer operations, in which multiple mask elements are inserted into a vector at a time via i8/i16/i32/i64 element types. The final result is then bitcast from that integer vector. We restrict this optimization in certain circumstances when optimizing for size. If we are required to use more than one integer insert operation, then it will likely increase code size compared with using a load from a constant pool. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98860	2021-03-23 10:11:19 +00:00
Florian Hahn	f759d512c8	[VPlan] Include name when printing after `93a9d2de8f`. The name is included when printing in DOT mode. Also print it in non-DOT mode after `93a9d2de8f`. This will become more important to distinguish different plans once VPlans are gradually refined.	2021-03-23 09:50:14 +00:00
Simon Pilgrim	080cb83e52	[X86][AVX] Narrow VPBROADCASTQ->VPBROADCASTD if we don't need the upper bits. Helps fix cases where we've splatted smaller types to a wider vector element type without needing the upper bits. Avoid this on AVX512 targets as that can affect broadcast folding.	2021-03-23 09:41:02 +00:00
Juneyoung Lee	960a767368	Reland "[InstCombine] Add simplification of two logical and/ors" This relands `07c3b97e18` (D96945) which was reverted by commit `f49354838e`. The two-stage compilation successfully tests passes on my machine.	2021-03-23 16:24:50 +09:00
Fangrui Song	3c81822ec5	[SanitizerCoverage] Use External on Windows This should fix https://reviews.llvm.org/D98903#2643589 though it is not clear to me why ExternalWeak does not work.	2021-03-22 23:05:36 -07:00
Serguei Katkov	9fec382601	[RS4GC] Fix hang on infinite loop meetBDVState utility may sets the base pointer for the conflict state. At this moment the base for conflict state does not have any meaning but is used in comparison of BDV states. This comparison is used as an indicator of progress done on iteration and RS4GC pass uses infinite loop to reach fixed point. As a result for added test on each iteration state for some phi nodes is updated with other base value for conflict state and it indicates as a progress while for conflict state there is no any progress more possible. In reality the base value is transferred from one state to another and pass detects the progress on these states. The test is very fragile. The traversal order of states and operands of phi nodes plays important role. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D99058	2021-03-23 12:54:51 +07:00
Pushpinder Singh	d0e5422eb8	[GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO Reviewed By: foad Differential Revision: https://reviews.llvm.org/D93963	2021-03-23 05:45:43 +00:00
Max Kazantsev	105dc0f9de	[NFC] Fix typo longre -> longer	2021-03-23 12:13:52 +07:00
Rahman Lavaee	949abf7d6a	[llvm-readelf, propeller] Add fallthrough bit to basic block metadata in BB-Address-Map section. This patch adds a fallthrough bit to basic block metadata, indicating whether the basic block can fallthrough without taking any branches. The bit will help us avoid an intel LBR bug which results in occasional duplicate entries at the beginning of the LBR stack. This patch uses `MachineBasicBlock::canFallThrough()` to set the bit. This is not a const method because it eventually calls `TargetInstrInfo::analyzeBranch`, but it calls this function with the default `AllowModify=false`. So we can either make the argument to the `getBBAddrMapMetadata` non-const, or we can use `const_cast` when calling `canFallThrough`. I decide to go with the latter since this is purely due to legacy code, and in general we should not allow the BasicBlock to be mutable during `getBBAddrMapMetadata`. Reviewed By: tmsriram Differential Revision: https://reviews.llvm.org/D96918	2021-03-22 21:38:05 -07:00
Craig Topper	d7b0c19823	[RISCV] Add scheduler classes to Zfh instructions. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99053	2021-03-22 20:30:09 -07:00
Craig Topper	8db4804da7	[RISCV] Remove unused SchedWrites WriteFConv32/WriteFConv64/WriteFMov32/WriteFMov64. It doesn't look like any instructions have ever been assigned to these classes. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99050	2021-03-22 20:29:18 -07:00
Carl Ritson	64db6b8d37	[AMDGPU] Only unbundle memory accesses in SIMemoryLegalizer This restores previous behaviour and is a step toward removing unbundling entirely. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D99061	2021-03-23 11:30:36 +09:00
Philip Reames	013449299c	Minor format tweak to deref analysis printer	2021-03-22 18:44:18 -07:00
Gulfem Savrun Yeniceri	e3a6d70c68	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `78a65cd945` which caused buildbot failures.	2021-03-23 00:43:16 +00:00
Juneyoung Lee	5c2e50b5d2	Reland "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe" This relands commit `99108c791d` (D95026) which was reverted by `8d5a981a13` because the underlying problem (https://llvm.org/pr49495) is fixed.	2021-03-23 09:19:53 +09:00
Gulfem Savrun Yeniceri	78a65cd945	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-22 22:09:02 +00:00
Roman Lebedev	d37fe26a2b	[NFC][IR] Type: add getWithNewType() method Sometimes you want to get a type with same vector element count as the current type, but different element type, but there's no QOL wrapper to do that. Add one.	2021-03-23 00:50:58 +03:00
Sanjay Patel	95f7f7c21b	Revert "[SimplifyCFG] use profile metadata to refine merging branch conditions" This reverts commit `27ae17a6b0`. There are bot failures that end with: #4 0x00007fff7ae3c9b8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0 #5 0x00007fff84e504d8 (linux-vdso64.so.1+0x4d8) #6 0x00007fff7c419a5c llvm::TargetTransformInfo::getPredictableBranchThreshold() const (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMAnalysis.so.13git+0x479a5c) ...but not sure how to trigger that yet.	2021-03-22 17:48:06 -04:00
Nikita Popov	7e18cd887c	[InstCombine] Whitelist non-refining folds in SimplifyWithOpReplaced This is an alternative to D98391/D98585, playing things more conservatively. If AllowRefinement == false, then we don't use InstSimplify methods at all, and instead explicitly implement a small number of non-refining folds. Most cases are handled by constant folding, and I only had to add three folds to cover our unit tests / test-suite. While this may lose some optimization power, I think it is safer to approach from this direction, given how many issues this code has already caused. Differential Revision: https://reviews.llvm.org/D99027	2021-03-22 22:12:56 +01:00
Nikita Popov	ca28e32359	[IR] Mark assume/annotation as InaccessibleMemOnly These intrinsics don't need to be marked as arbitrary writing, it's sufficient to write inaccessible memory (aka "side effect") to preserve control dependencies. This means less special-casing in BasicAA. This is intended as an alternative to D98925. Differential Revision: https://reviews.llvm.org/D99022	2021-03-22 22:01:03 +01:00
Juneyoung Lee	b00209ed10	[SCEV] Use logical and/or matcher This is a minor patch that updates ScalarEvolution::isImpliedCond to use logical and/or matcher.	2021-03-23 06:00:54 +09:00
Sanjay Patel	27ae17a6b0	[SimplifyCFG] use profile metadata to refine merging branch conditions This is one step towards solving: https://llvm.org/PR49336 In that example, we disregard the recommended usage of builtin_expect, so an expensive (unpredictable) branch is folded into another branch that is guarding it. Here, we read the profile metadata to see if the 1st (predecessor) condition is likely to cause execution to bypass the 2nd (successor) condition before merging conditions by using logic ops. Differential Revision: https://reviews.llvm.org/D98898	2021-03-22 16:49:21 -04:00
Sanjay Patel	664d0c052c	[TargetTransformInfo] move branch probability query from TargetLoweringInfo This is no-functional-change intended (NFC), but needed to allow optimizer passes to use the API. See D98898 for a proposed usage by SimplifyCFG. I'm simplifying the code by removing the cl::opt. That was added back with the original commit in D19488, but I don't see any evidence in regression tests that it was used. Target-specific overrides can use the usual patterns to adjust as necessary. We could also restore that cl::opt, but it was not clear to me exactly how to do it in the convoluted TTI class structure.	2021-03-22 15:55:34 -04:00
Matt Arsenault	9fdfd8dd52	GlobalISel: Add utility function to constant fold FP ops	2021-03-22 14:38:17 -04:00
Matt Arsenault	c34819afe3	GlobalISel: Handle G_BUILD_VECTOR in isKnownToBeAPowerOfTwo	2021-03-22 14:20:35 -04:00
serge-sans-paille	e617cf9576	[NFC] Restore original SmallString size for X86TargetMachine::getSubtargetImpl lookup Better safe than sorry here, quoting Craig Topper: > Clang passes a pretty lengthy feature string.	2021-03-22 19:19:46 +01:00
Lang Hames	cc4ad2c540	[JITLink][ELF/x86-64] Add support for GOTOFF64 relocation.	2021-03-22 10:40:50 -07:00
Philip Reames	93ce855d4b	2nd attempt at a speculative fix for windows builders after `d4648eea`	2021-03-22 10:32:57 -07:00
Craig Topper	2f13e63f9e	[LegalizeDAG] Add asserts to verify the types of custom legalized operation matches the original node. We've messed this up a few times recently on RISCV. Experiments with these asserts found a couple issues on other targets as well. They've all been cleaned up now so we can put in these asserts to catch future issues I had to waive Glue because ADDC/ADDE/etc legalization replaces Glue with i32 on at least AArch64. X86 used to do the same before we switched to ADDCARRY. So I guess that's just how that works. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98979	2021-03-22 10:28:51 -07:00
Philip Reames	6ba73c4743	Speculative fix for windows builders after `d4648eea`	2021-03-22 10:22:01 -07:00
Craig Topper	294efcd6f7	[RISCV] Add support for fixed vector masked gather/scatter. I've split the gather/scatter custom handler to avoid complicating it with even more differences between gather/scatter. Tests are the scalable vector tests with the vscale removed and dropped the tests that used vector.insert. We're probably not as thorough on the splitting cases since we use 128 for VLEN here but scalable vector use a known min size of 64. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98991	2021-03-22 10:17:30 -07:00
Stefan Gränitz	cbcc1c9f87	[Orc] Make usage of ResourceKeys thread-safe in DebugObjectManagerPlugin Don't leak ResourceKeys from MaterializationResponsibility::withResourceKeyDo() in notifyEmitted(). Also make some improvements in the overall implementation. Differential Revision: https://reviews.llvm.org/D98863	2021-03-22 17:47:33 +01:00
Stefan Gränitz	c154cddabd	[Orc] Fix tracking of pending debug objects in DebugObjectManagerPlugin There can be multiple MaterializationResponsibilitys in-flight for a single ResourceKey. Hence, pending debug objects must be tracked by MaterializationResponsibility and not by ResourceKey. Differential Revision: https://reviews.llvm.org/D98785	2021-03-22 17:47:32 +01:00
Philip Reames	d4648eeaa2	[SCEV] Use trip count information to improve shift recurrence ranges This patch exploits the knowledge that we may be running many fewer than bitwidth iterations of the loop, and may be able to disallow the overflow case. This patch specifically implements only the shl case, but this can be generalized to ashr and lshr without difficulty. Differential Revision: https://reviews.llvm.org/D98222	2021-03-22 09:38:43 -07:00
Bjorn Pettersson	688cdddafb	[SLP] Honor min/max regsize and min/max VF in vectorizeStores Make sure we use PowerOf2Floor instead of PowerOf2Ceil when calculating max number of elements that fits inside a vector register (otherwise we could end up creating vectors larger than the maximum vector register size). Also make sure we honor the min/max VF (as given by TTI or cmd line parameters) when doing vectorizeStores. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D97691	2021-03-22 17:29:35 +01:00
Wenlei He	ce6bfe9411	[CSSPGO][llvm-profgen] Use profile summary based threshold for context trimming and merging Switch to use cold threshold from profile summary for cold context merging and trimming, instead of relying on hard coded values. Minor refactoring included for switch names, etc. Differential Revision: https://reviews.llvm.org/D98921	2021-03-22 08:56:59 -07:00
Matt Morehouse	772851ca4e	[HWASan] Disable stack, globals and force callbacks for x86_64. Subsequent patches will implement page-aliasing mode for x86_64, which will initially only work for the primary heap allocator. We force callback instrumentation to simplify the initial aliasing implementation. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98069	2021-03-22 08:02:27 -07:00
Matt Arsenault	1dd23c6d53	AMDGPU: Allow tail calls for amdgpu_gfx functions	2021-03-22 10:55:19 -04:00
Stefan Pintilie	b8f3c6d011	[PowerPC][NFC] Do not enter prefix selection if it cannot do better. Do not try to materialize a constant using prefix instructions if the selection using non prefix instructions was able to do it using a single non prefix instruction. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D98791	2021-03-22 09:17:52 -05:00
Alexey Lapshin	972b6a3a34	[llvm-objcopy][Support] move writeToOutput helper function to Support. writeToOutput function is useful when it is necessary to create different kinds of streams(based on stream name) and when we need to use a temporary file while writing(which would be renamed into the resulting file in a success case). This patch moves the writeToStream helper into the Support library. Differential Revision: https://reviews.llvm.org/D98426	2021-03-22 15:41:10 +03:00
Bradley Smith	48f5a392cb	[IR] Add vscale_range IR function attribute This attribute represents the minimum and maximum values vscale can take. For now this attribute is not hooked up to anything during codegen, this will be added in the future when such codegen is considered stable. Additionally hook up the -msve-vector-bits=<x> clang option to emit this attribute. Differential Revision: https://reviews.llvm.org/D98030	2021-03-22 12:05:06 +00:00
Sjoerd Meijer	7515e81e8c	[AArch64] Add some float -> int -> float conversion patterns This adds some conversion match patterns for which we want to keep the int values in FP registers using the corresponding NEON instructions (not the FP instructions) to avoid more costly int <-> fp register transfers. Differential Revision: https://reviews.llvm.org/D98956	2021-03-22 11:06:08 +00:00
serge-sans-paille	b2f7ce91a6	[NFC] Simpler and faster key computation for getSubtargetImpl memoization There's no use in computing a large key that's only used for a memoization optimization.	2021-03-22 10:02:51 +01:00
Qiu Chaofan	52f33f7953	[PowerPC] Enable redundant TOC save removal on AIX Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D97039	2021-03-22 14:29:22 +08:00
Bing1 Yu	113f077f80	[X86] Pass to transform tdpbf16ps intrinsics to scalar operation. In previous patch https://reviews.llvm.org/D93594, we only scalarize tilezero, tileload, tilestore and tiledpbssd. In this patch we scalarize tdpbf16ps intrinsic. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D96110	2021-03-22 13:00:40 +08:00
Max Kazantsev	8fab9f824f	[IndVars] Sharpen context in eliminateIVComparison When eliminating comparisons, we can use common dominator of all its users as context. This gives better results when ICMP is not computed right before the branch that uses it. Differential Revision: https://reviews.llvm.org/D98924 Reviewed By: lebedev.ri	2021-03-22 11:55:57 +07:00
Lang Hames	fc36a511c6	[JITLink][ELF/x86-64] Add support for R_X86_64_GOTPC64 and R_X86_64_GOT64. Start adding support for ELF x86-64 large code model, PIC relocations.	2021-03-21 21:52:54 -07:00
Lang Hames	0a74ec3299	[JITLink] Start laying the groundwork for ELF x86-64 large code model support. Introduces DefineExternalSectionStartAndEndSymbols.h, which defines a template for a JITLink pass that transforms external symbols meeting a user-supplied predicate into defined symbols pointing at the start and end of a Section identified by the predicate. JITLink.h is updated with a new makeAbsolute function to support this pass. Also renames BasicGOTAndStubsBuilder to PerGraphGOTAndPLTStubsBuilder -- the new name better describes the intent of this GOT and PLT stubs builder, and will help to distinguish it from future GOT and PLT stub builders that build entries that may be shared between multiple graphs.	2021-03-21 20:56:47 -07:00
Lang Hames	209ceed745	[JITLink][ELF/x86-64] Add Delta32, NegDelta32, NegDelta64 support. These were missing, but are used in eh-frame section support.	2021-03-21 20:15:40 -07:00
Roman Lebedev	e3a4701627	[clang][CodeGen] Lower Likelihood attributes to @llvm.expect intrin instead of branch weights `08196e0b2e` exposed LowerExpectIntrinsic's internal implementation detail in the form of LikelyBranchWeight/UnlikelyBranchWeight options to the outside. While this isn't incorrect from the results viewpoint, this is suboptimal from the layering viewpoint, and causes confusion - should transforms also use those weights, or should they use something else, D98898? So go back to status quo by making LikelyBranchWeight/UnlikelyBranchWeight internal again, and fixing all the code that used it directly, which currently is only clang codegen, thankfully, to emit proper @llvm.expect intrinsics instead.	2021-03-21 22:50:21 +03:00
Roman Lebedev	37d6be9052	Revert "[BranchProbability] move options for 'likely' and 'unlikely'" Upon reviewing D98898 i've come to realization that these are implementation detail of LowerExpectIntrinsicPass, and they should not be exposed to outside of it. This reverts commit `ee8b53815d`.	2021-03-21 22:50:21 +03:00
Craig Topper	30080b003e	[DAGCombiner] Minor compile time improvement to (sext_in_reg (sign_extend_vector_inreg x)) optimization. Don't bother calling ComputeNumSignBits if N00Bits < ExtVTBits. No matter what answer we get back this will be true: (N00Bits - DAG.ComputeNumSignBits(N00, DemandedSrcElts)) < ExtVTBits) So we might as well save the computation. This makes the code more consistent with the similar (sext_in_reg (sext x)) handling above.	2021-03-21 11:16:41 -07:00
Nikita Popov	d11d5d1c5f	[ValueTracking] Improve mul handling in isKnownNonEqual() X != X * C is true if: * C is not 0 or 1 * X is not 0 * mul is nsw or nuw Proof: https://alive2.llvm.org/ce/z/uwF29z This is motivated by one of the cases in D98422.	2021-03-21 18:41:35 +01:00
Matt Arsenault	20a24af01d	MIR: Fix missing serialization for HasTailCall	2021-03-21 13:14:04 -04:00
Matt Arsenault	a0f5aad6d7	AMDGPU: Fix allowing immediates for tail call pseudo. The pseudo was using SSrc_b64, so it allowed folding immediates into the destination operand for a tail call to null. However, this is not a valid operand for the s_setpc_b64 this will be lowered to. Avoids printing the operand as an invalid immediate. Avoids a regression when tail calls are enabled in GlobalISel (somehow tail calls to null get deleted in the DAG).	2021-03-21 13:14:04 -04:00
Nikita Popov	9f864d2025	Reapply [ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast() There seems to be an impedance mismatch between what the type system considers an aggregate (structs and arrays) and what constants consider an aggregate (structs, arrays and vectors). Adjust the type check to consider vectors as well. The previous version of the patch dropped the type check entirely, but it turns out that getAggregateElement() does require the constant to be an aggregate in some edge cases: For Poison/Undef the getNumElements() API is called, without checking in advance that we're dealing with an aggregate. Possibly the implementation should avoid doing that, but for now I'm adding an assert so the next person doesn't fall into this trap.	2021-03-21 17:48:21 +01:00
Nikita Popov	daae927f9c	[InstSimplify] Clean up SimplifyReplacedWithOp implementation (NFCI) Replace Op with RepOp up-front, and then always work with the new operands, rather than checking for replacement in various places.	2021-03-21 15:30:30 +01:00
Matt Arsenault	1098acd46d	GlobalISel: Avoid unnecessary truncation to i64 We can just directly pass through the APInt to create a new constant.	2021-03-21 10:07:41 -04:00
Matt Arsenault	6314a72730	AMDGPU/GlobalISel: Enable CSE in pre-legalizer combiner	2021-03-21 10:07:37 -04:00
Simon Pilgrim	64c2641c89	[DAG] Limit (sext_in_reg (zero_extend_vector_inreg x)) to exact sign extension As commented by @craig.topper on rG1ba5c550d418, we can't guarantee that we'll be extending zero bits, just sign bit. So, revert to the old code for zero_extend_vector_inreg cases.	2021-03-21 14:01:37 +00:00
Simon Pilgrim	3179588947	[X86][AVX] ComputeNumSignBitsForTargetNode - add X86ISD::VBROADCAST handling for scalar sources The target shuffle code handles vector sources, but X86ISD::VBROADCAST can also accept a scalar source for splatting. Added as an extension to PR49658	2021-03-21 12:22:51 +00:00
David Green	6d9d2049c8	[ARM] VINS f16 pattern This adds an extra pattern for inserting an f16 into a odd vector lane via an VINS. If the dual-insert-lane pattern does not happen to apply, this can help with some simple cases. Differential Revision: https://reviews.llvm.org/D95471	2021-03-21 12:00:06 +00:00
luxufan	02ffbac844	[RISCV] remove redundant instruction when eliminate frame index The reason for generating mv a0, a0 instruction is when the stack object offset is large then int<12>. To deal this situation, in the elimintateFrameIndex function, it will create a virtual register, which needs the register scavenger to scavenge it. If the machine instruction that contains the stack object and the opcode is ADDI(the addi was generated by frameindexNode), and then this instruction's destination register was the same as the register that was generated by the register scavenger, then the mv a0, a0 was generated. So to eliminnate this instruction, in the eliminateFrameIndex function, if the instrution opcode is ADDI, then the virtual register can't be created. Differential Revision: https://reviews.llvm.org/D92479	2021-03-21 18:54:00 +08:00
Simon Pilgrim	297b9bc3fa	[X86][AVX] computeKnownBitsForTargetNode - add X86ISD::VBROADCAST handling for scalar sources The target shuffle code handles vector sources, but X86ISD::VBROADCAST can also accept a scalar source for splatting. Suggested by @craig.topper on PR49658	2021-03-21 10:40:57 +00:00
Simon Pilgrim	54a05f2ec8	[X86] computeKnownBitsForTargetNode - add X86ISD::PMULUDQ handling Reuse the existing KnownBits multiplication code to handle what is effectively a ISD::UMUL_LOHI varient	2021-03-21 09:57:20 +00:00
Jessica Clarke	b2bb003774	[RISCV] Update comment in RISCVInstrInfoM.td Missed in `07ed62b7d5`.	2021-03-20 22:35:40 +00:00
Craig Topper	07ed62b7d5	[RISCV] Disable (mul (and X, 0xffffffff), (and Y, 0xffffffff)) optimization when Zba is enabled. This optimization is trying to save SRLI instructions needed to implement the ANDs. If we have zext.w we won't save anything. Because we don't check that the multiply is the only user of the AND we might even increase instruction count.	2021-03-20 15:31:45 -07:00
Craig Topper	b0d8823a8a	[RISCV] Add isel pattern to optimize (mul (and X, 0xffffffff), (and Y, 0xffffffff)) on RV64 This patterns computes the full 64 bit product of a 32x32 unsigned multiply. This requires a two pairs of SLLI+SRLI to zero the upper 32 bits of the inputs. We can do better than this by using two SLLI to move the lower bits to the upper bits then use MULHU to compute the product. This is the high half of a full 64x64 product. Since we put 32 0s in the lower bits of the inputs we know the 128-bit product will have zeros in the lower 64 bits. So the upper 64 bits, which MULHU computes, will contain the original 64 bit product we were after. The same trick would work for (mul (sext_inreg X, i32), (sext_inreg Y, i32)) using MULHS, but sext_inreg is sext.w which is already one instruction so we wouldn't save anything. Differential Revision: https://reviews.llvm.org/D99026	2021-03-20 14:55:46 -07:00
Sanjay Patel	ee8b53815d	[BranchProbability] move options for 'likely' and 'unlikely' This makes the settings available for use in other passes by housing them within the Support lib, but NFC otherwise. See D98898 for the proposed usage in SimplifyCFG (where this change was originally included). Differential Revision: https://reviews.llvm.org/D98945	2021-03-20 14:46:46 -04:00
Fangrui Song	879760c245	[VE] Fix types of multiclass template arguments in TableGen files There were not properly checked before `[TableGen] Improve handling of template arguments`.	2021-03-20 10:36:51 -07:00
Jeroen Dobbelaere	77080a1eb6	Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types. Now that intrinsic name mangling can cope with unnamed types, the custom name mangling in PredicateInfo (introduced by D49126) can be removed. (See D91250, D48541) Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91661	2021-03-20 11:37:09 +01:00
Wang, Pengfei	2327513b85	[X86] Fix a bug when calculating the ldtilecfg insertion points. The BB we initialized the ldtilecfg is special. We don't need to check if its predecessor BBs need to insert ldtilecfg for calls. We reused the flag HasCallBeforeAMX, so that the predecessors won't be added to CfgNeedInsert. This case happens only when the entry BB is in a loop. We need to hoist the first tile config point out of the loop in future. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D98845	2021-03-20 17:48:59 +08:00
Juneyoung Lee	319d093b87	[CFLGraph] Fix a crash due to missing handling of freeze https://reviews.llvm.org/D85534#2636321	2021-03-21 02:14:13 +09:00
Carl Ritson	6c9cac5da1	[AMDGPU] Add MDT update missing from D98915	2021-03-20 13:38:58 +09:00
Nemanja Ivanovic	ea48bf8649	[PowerPC][NFC] Do not produce i64 constants in 32-bit mode There are some instances where we produce constants of type MVT::i64 unconditionally in the target DAG combines. This is not actually valid in 32-bit mode.	2021-03-19 22:54:47 -05:00
Craig Topper	d5c1d305b3	[RISCV] Rename WriteShift/ReadShift scheduler classes to WriteShiftImm/ReadShiftImm. Move variable shifts from WriteIALU/ReadIALU to new WriteShiftReg/ReadShiftReg. Previously only immediate shifts were in WriteShift. Register shifts were grouped with IALU. Seems likely that immediate shifts would be as fast or faster than register shifts. And that immediate shifts wouldn't be any faster than IALU. So if any deserved to be in their own group it should be register shifts not immediate shifts. Rather than try to flip them let's just add more granularity and give each kind their own class. I've used new names for both to make them unambiguous and to force any downstream implementations to be forced to put correct information in their scheduler models. Reviewed By: evandro Differential Revision: https://reviews.llvm.org/D98911	2021-03-19 20:39:49 -07:00
Carl Ritson	fe5f4c397f	[AMDGPU] Rename SIInsertSkips Pass Pass no longer handles skips. Pass now removes unnecessary unconditional branches and lowers early termination branches. Hence rename to SILateBranchLowering. Move code to handle returns to epilog from SIPreEmitPeephole into SILateBranchLowering. This means SIPreEmitPeephole only contains optional optimisations, and all required transforms are in SILateBranchLowering. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D98915	2021-03-20 11:48:04 +09:00
Carl Ritson	5df2af8b0e	[AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole SIRemoveShortExecBranches is an optimisation so fits well in the context of SIPreEmitPeephole. Test changes relate to early termination from kills which have now been lowered prior to considering branches for removal. As these use s_cbranch the execz skips are now retained instead. Currently either behaviour is valid as kill with EXEC=0 is a nop; however, if early termination is used differently in future then the new behaviour is the correct one. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D98917	2021-03-20 11:26:42 +09:00
Carl Ritson	b76c09023d	[AMDGPU] Allow index optimisation in SIPreEmitPeephole for bundles Add code so duplication index register changes can be removed from inside bundles. Reviewed By: rampitec, foad Differential Revision: https://reviews.llvm.org/D98940	2021-03-20 10:26:23 +09:00
Anshil Gandhi	697f90ebfa	[NFC] [PowerPC] Determine Endianness in PPCTargetMachine The TargetMachine uses the triple to determine endianness. Just use that logic rather than replicating it in PPCSubtarget. Differential revision: https://reviews.llvm.org/D98674	2021-03-19 20:22:16 -05:00
Lang Hames	602e19ed79	[JITLink] Don't issue lookups for empty symbol sets. Issuing a lookup for an empty symbol set is legal, but can actually result in unrelated work being done if there was a work queue left over from the previous lookup. We can avoid doing this unrelated work (reducing stack depth and interleaving of debugging output) by not issuing these no-op lookups in the first place.	2021-03-19 16:10:47 -07:00
Christoffer Lernö	528f6f7d61	Add type attributes to LLVM C API The LLVM C API is missing type attributes as is needed by attributes such as sret and byval. This patch adds three missing wrapper functions. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=48249 https://reviews.llvm.org/D97763	2021-03-19 19:07:04 -04:00
Arthur Eubanks	a17394dc88	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those when the checks are enabled (which is by default with expensive checks on). Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. This is a reland of https://reviews.llvm.org/D98820 which was reverted due to unacceptably large compile time regressions in normal debug builds.	2021-03-19 14:56:37 -07:00
Jessica Paquette	4773dd5ba9	[GlobalISel] Add G_SBFX + G_UBFX (bitfield extraction opcodes) There is a bunch of similar bitfield extraction code throughout *ISelDAGToDAG. E.g, ARMISelDAGToDAG, AArch64ISelDAGToDAG, and AMDGPUISelDAGToDAG all contain code that matches a bitfield extract from an and + right shift. Rather than duplicating code in the same way, this adds two opcodes: - G_UBFX (unsigned bitfield extract) - G_SBFX (signed bitfield extract) They work like this ``` %x = G_UBFX %y, %lsb, %width ``` Where `lsb` and `width` are - The least-significant bit of the extraction - The width of the extraction This will extract `width` bits from `%y`, starting at `lsb`. G_UBFX zero-extends the result, while G_SBFX sign-extends the result. This should allow us to use the combiner to match the bitfield extraction patterns rather than duplicating pattern-matching code in each target. Differential Revision: https://reviews.llvm.org/D98464	2021-03-19 14:37:19 -07:00
Arthur Eubanks	a1ab5627f0	Revert "[NewPM] Verify LoopAnalysisResults after a loop pass" This reverts commit `94c269baf5`. Still causes too large of compile time regression in normal debug builds. Will put under expensive checks instead.	2021-03-19 14:31:08 -07:00
Arthur Eubanks	94c269baf5	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those. Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. Only verify MSSA when VerifyMemorySSA, normally it's very expensive. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98820	2021-03-19 13:26:45 -07:00
Craig Topper	1066dcb550	[AArch64] Fix LowerMGATHER to return the chain result for floating point gathers. Found by adding asserts to LegalizeDAG to make sure custom legalized results had the right types. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D98968	2021-03-19 11:53:46 -07:00
David Green	a2e0312cda	[ARM] Tone down the MVE scalarization overhead The scalarization overhead was set deliberately high for MVE, whilst the codegen was new. It helps protect us against the negative ramifications of mixing scalar and vector instructions. This decreases that, especially for floating point where the cost of extracting/inserting lane elements can be low. For integer the cost is still fairly high due to the cross-register-bank copy, but is no longer n^2 in the length of the vector. In general, this will decrease the cost of scalarizing floats and long integer vectors. i64 increase in cost, having a high cost before and after this patch. For floats this allows up to start doing things like vectorizing fdiv instructions, even if they are scalarized. Differential Revision: https://reviews.llvm.org/D98245	2021-03-19 18:30:11 +00:00
Philip Reames	5698537f81	Update basic deref API to account for possiblity of free [NFC] This patch is plumbing to support work towards the goal outlined in the recent llvm-dev post "[llvm-dev] RFC: Decomposing deref(N) into deref(N) + nofree". The point of this change is purely to simplify iteration on other pieces on way to making the switch. Rebuilding with a change to Value.h is slow and painful, so I want to get the API change landed. Once that's done, I plan to more closely audit each caller, add the inference rules in their own patch, then post a patch with the langref changes and test diffs. The value of the command line flag is that we can exercise the inference logic in standalone patches without needing the whole switch ready to go just yet. Differential Revision: https://reviews.llvm.org/D98908	2021-03-19 11:17:19 -07:00
Alexey Bataev	14ae0cf0f5	[Cost]Canonicalize the cost for logical or/and reductions. The generic cost of logical or/and reductions should be cost of bitcast <ReduxWidth x i1> to iReduxWidth + cmp eq\|ne iReduxWidth. Differential Revision: https://reviews.llvm.org/D97961	2021-03-19 11:01:58 -07:00
Craig Topper	5d315691c4	[RISCV] Add missing bitcasts to the results of lowerINSERT_SUBVECTOR and lowerEXTRACT_SUBVECTOR when handling mask vectors. Found by adding asserts to LegalizeDAG to catch incorrect result types being returned. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98964	2021-03-19 10:54:33 -07:00
Craig Topper	95998b898c	[Hexagon] Return an i64 for result 0 from LowerREADCYCLECOUNTER instead of an i32. As far as I can tell, the node coming in has an i64 result so the return should have the same type. The HexagonISD node used for this has a type profile that says the result is i64. Found while trying to add assserts to LegalizeDAG to catch result type mismatches. Reviewed By: kparzysz Differential Revision: https://reviews.llvm.org/D98962	2021-03-19 10:54:33 -07:00
Andrei Elovikov	92205cb27f	[NFC][VPlan] Guard print routines with "#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)" Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D98897	2021-03-19 10:50:12 -07:00
Andrei Elovikov	93a9d2de8f	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-19 10:50:12 -07:00
Craig Topper	85f3f6b3cc	[RISCV] Lower scalable vector masked loads to intrinsics to match fixed vectors and reduce isel patterns. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98840	2021-03-19 10:39:35 -07:00
Fraser Cormack	d399b82e2a	[RISCV] Maintain fixed-length info when optimizing BUILD_VECTORs I'm not sure how I failed to notice this before, but when optimizing dominant-element BUILD_VECTORs we would lower via the scalable container type, which lost us the information about the fixed length of the vector types. By lowering via the fixed-length type we can preserve that information and eliminate redundant vsetvli instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98938	2021-03-19 17:21:06 +00:00
Philip Reames	00d0315a7c	[SCEV] Factor out a lambda for strict condition splitting [NFC]	2021-03-19 10:07:12 -07:00
Fraser Cormack	550292ecb1	[RISCV] Fix missing scalable->fixed-length vector conversion Returning the scalable-vector container type would present problems when the fixed-length INSERT_VECTOR_ELT was used by later operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98776	2021-03-19 16:49:47 +00:00
Simon Pilgrim	9d2df96407	[DAG] computeKnownBits - add ISD::MULHS/MULHU/SMUL_LOHI/UMUL_LOHI handling Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops. Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement. Differential Revision: https://reviews.llvm.org/D98857	2021-03-19 16:02:31 +00:00
Stanislav Mekhanoshin	57effe2205	[AMDGPU] Remove dead glc1 handing in asm parser. NFC.	2021-03-19 08:37:47 -07:00
Simon Pilgrim	ffb2887103	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),undef) -> bop(shuffle'(x,y),shuffle'(z,w)) Followup to D96345, handle unary shuffles of binops (as well as binary shuffles) if we can merge the shuffle with inner operand shuffles. Differential Revision: https://reviews.llvm.org/D98646	2021-03-19 14:14:56 +00:00
Paul C. Anagnostopoulos	a9fc44c557	[TableGen] Improve handling of template arguments This requires changes to TableGen files and some C++ files due to incompatible multiclass template arguments that slipped through before the improved handling.	2021-03-19 09:57:53 -04:00
Ricky Taylor	028d6250ea	[M68k] Replace unknown operand with explicit type Replace the unknown operand used for immediate operands for DIV/MUL with a fixed 16-bit immediate. This is required since the assembly parser generator requires that all operands are typed. Differential Revision: https://reviews.llvm.org/D98819	2021-03-19 13:44:46 +00:00
Jeroen Dobbelaere	04790d9cfb	Support intrinsic overloading on unnamed types This patch adds support for intrinsic overloading on unnamed types. This fixes PR38117 and PR48340 and will also be needed for the Full Restrict Patches (D68484). The main problem is that the intrinsic overloading name mangling is using 's_s' for unnamed types. This can result in identical intrinsic mangled names for different function prototypes. This patch changes this by adding a '.XXXXX' to the intrinsic mangled name when at least one of the types is based on an unnamed type, ensuring that we get a unique name. Implementation details: - The mapping is created on demand and kept in Module. - It also checks for existing clashes and recycles potentially existing prototypes and declarations. - Because of extra data in Module, Intrinsic::getName needs an extra Module* argument and, for speed, an optional FunctionType* argument. - I still kept the original two-argument 'Intrinsic::getName' around which keeps the original behavior (providing the base name). -- Main reason is that I did not want to change the LLVMIntrinsicGetName version, as I don't know how acceptable such a change is -- The current situation already has a limitation. So that should not get worse with this patch. - Intrinsic::getDeclaration and the verifier are now using the new version. Other notes: - As far as I see, this should not suffer from stability issues. The count is only added for prototypes depending on at least one anonymous struct - The initial count starts from 0 for each intrinsic mangled name. - In case of name clashes, existing prototypes are remembered and reused when that makes sense. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91250	2021-03-19 14:34:25 +01:00
Nemanja Ivanovic	a8697c57fa	[PowerPC] Fix the check for 16-bit signed field in peephole When a D-Form instruction is fed by an add-immediate, we attempt to merge the two immediates to form a single displacement so we can remove the add-immediate. However, we don't check whether the new displacement fits into a 16-bit signed immediate field early enough. Namely, we do a sign-extend from 16 bits first which will discard high bits and then we check whether the result is a 16-bit signed immediate. It of course will always be. Move the check prior to the sign extend to ensure we are checking the correct value. Fixes https://bugs.llvm.org/show_bug.cgi?id=49640	2021-03-19 07:15:53 -05:00
Abhina Sreeskantharajan	4f750f6ebc	[SystemZ][z/OS] Distinguish between text and binary files on z/OS This patch consists of the initial changes to help distinguish between text and binary content correctly on z/OS. I would like to get feedback from Windows users on setting OF_None for all ToolOutputFiles. This seems to have been done as an optimization to prevent CRLF translation on Windows in the past. Reviewed By: zibi Differential Revision: https://reviews.llvm.org/D97785	2021-03-19 08:09:57 -04:00
Ricky Taylor	cd442157cf	[M68k] Convert register Aliases to AltNames This makes it simpler to determine when two registers are actually the same vs just partially aliasing. The only real caveat is that it becomes impossible to know which name was used for the register previously. (i.e. parsing assembly and then disassembling it can result in the register name changing.) Differential Revision: https://reviews.llvm.org/D98536	2021-03-19 11:44:53 +00:00
Ricky Taylor	51884c6bef	[M68k] Introduce DReg bead This is required in order to determine during disassembly whether a Reg bead without associated DA bead is referring to a data register. Differential Revision: https://reviews.llvm.org/D98534	2021-03-19 11:44:53 +00:00
Jay Foad	5a5a531214	[AMDGPU] Remove some redundant code. NFC. This is redundant because we have already checked that we can't handle divergent 64-bit atomic operands.	2021-03-19 11:36:15 +00:00
Jay Foad	5dd5ddcb41	[AMDGPU] Skip building some IR if it won't be used. NFC.	2021-03-19 11:36:14 +00:00
Jay Foad	c96dfe0d8b	[AMDGPU] Sink Intrinsic::getDeclaration calls to where they are used. NFC.	2021-03-19 11:36:14 +00:00
Simon Pilgrim	a96897219d	[KnownBits] Add knownbits analysis for mulhs/mulu 'multiply high' instructions Split off from D98857 https://reviews.llvm.org/D98866	2021-03-19 08:56:06 +00:00
Mikael Holmen	6d22ba48ea	[NVPTX] Fix warning, remove extra ";" [NFC] gcc complained with ../lib/Target/NVPTX/NVPTXLowerArgs.cpp:203:2: warning: extra ';' [-Wpedantic] 203 \| }; \| ^	2021-03-19 09:26:14 +01:00
Max Kazantsev	8eefa07fcf	[NFC] Move function up in code	2021-03-19 14:03:31 +07:00
Max Kazantsev	8bb952b57f	[NFC] Factor out utility function for finding common dom of user set	2021-03-19 13:49:29 +07:00
Fangrui Song	c241659d15	[X86] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off builds	2021-03-18 23:22:58 -07:00
Max Kazantsev	16370e02a7	[IndVars] Provide eliminateIVComparison with context We can prove more predicates when we have a context when eliminating ICmp. As first (and very obvious) approximation we can use the ICmp instruction itself, though in the future we are going to use a common dominator of all its users. Need some refactoring before that. Observed ~0.5% negative compile time impact. Differential Revision: https://reviews.llvm.org/D98697 Reviewed By: lebedev.ri	2021-03-19 12:28:22 +07:00
Wenlei He	1410db70b9	[CSSPGO] Add attribute metadata for context profile This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build. This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning. Differential Revision: https://reviews.llvm.org/D98823	2021-03-18 22:00:56 -07:00
Max Kazantsev	fff1363ba0	[SCEV] Add false->any implication By definition of Implication operator, `false -> true` and `false -> false`. It means that `false` implies any predicate, no matter true or false. We don't need to go any further trying to prove the statement we need and just always say that `false` implies it in this case. In practice it means that we are trying to prove something guarded by `false` condition, which means that this code is unreachable, and we can safely prove any fact or perform any transform in this code. Differential Revision: https://reviews.llvm.org/D98706 Reviewed By: lebedev.ri	2021-03-19 11:29:48 +07:00
Philip Reames	fa26da0582	Add a couple of missing attribute query methods [NFC]	2021-03-18 17:33:20 -07:00
Hsiangkai Wang	aa8d33a6d6	[RISCV] Spilling for Zvlsseg registers. For Zvlsseg, we create several tuple register classes. When spilling for these tuple register classes, we need to iterate NF times to load/store these tuple registers. Differential Revision: https://reviews.llvm.org/D98629	2021-03-19 07:46:16 +08:00
Fangrui Song	9558456b53	[SanitizerCoverage] Make __start_/__stop_ symbols extern_weak On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`, `__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or `comdat noduplicates`). With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage collect such sections. If all `__sancov_bools` are discarded, LLD will error `error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar). ``` % cat a.c void discarded() {} % clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections ... ld.lld: error: undefined hidden symbol: __start___sancov_guards >>> referenced by a.c >>> /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard) ``` Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the undefined error. Differential Revision: https://reviews.llvm.org/D98903	2021-03-18 16:46:04 -07:00
Craig Topper	c9861f722e	[RISCV] Correct the output chain in lowerFixedLengthVectorMaskedLoadToRVV We returned the input chain instead of the output chain from the new load. This bypasses the load in the chain. I haven't found a good way to test this yet. IR order prevents my initial attempts at causing reordering.	2021-03-18 16:34:35 -07:00
George Balatsouras	d10f173f34	[dfsan] Add -dfsan-fast-8-labels flag This is only adding support to the dfsan instrumentation pass but not to the runtime. Added more RUN lines for testing: for each instrumentation test that had a -dfsan-fast-16-labels invocation, a new invocation was added using fast8. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D98734	2021-03-18 16:28:42 -07:00
Jessica Paquette	0ca83730cc	Recommit "[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE" This reverts commit `962b73dd0f`. This commit was reverted because of some internal SPEC test failures. It turns out that this wasn't actually relevant to anything in open source, so it's safe to recommit this.	2021-03-18 16:01:02 -07:00
Craig Topper	182b831aeb	[DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR. Previously only all zeros BUILD_VECTOR was recognized.	2021-03-18 15:34:14 -07:00
Yuanfang Chen	b4a8c0ebb6	[LTO][MC] Discard non-prevailing defined symbols in module-level assembly This is the alternative approach to D96931. In LTO, for each module with inlineasm block, prepend directive ".lto_discard <sym>, <sym>*" to the beginning of the inline asm. ".lto_discard" is both a module inlineasm block marker and (optionally) provides a list of symbols to be discarded. In MC while emitting for inlineasm, discard symbol binding & symbol definitions according to ".lto_disard". Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D98762	2021-03-18 15:33:42 -07:00
Stanislav Mekhanoshin	edd6da10d2	[AMDGPU] Remove cpol, tfe, and swz from MUBUF patterns These are always selected as 0 anyway. Differential Revision: https://reviews.llvm.org/D98663	2021-03-18 14:36:04 -07:00
Mehdi Amini	3614df3537	Revert "[VPlan] Add plain text (not DOT's digraph) dumps" This reverts commit `6b053c9867`. The build is broken: ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const >>> referenced by LoopVectorize.cpp >>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a	2021-03-18 19:20:39 +00:00
Andrei Elovikov	6b053c9867	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-18 11:33:39 -07:00
Thomas Lively	f5764a8654	[WebAssembly] Finalize SIMD names and opcodes Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all SIMD instructions to match the finalized SIMD spec. Deliberately does not change the public interface in wasm_simd128.h yet; that will require more care. Depends on D98466. Differential Revision: https://reviews.llvm.org/D98676	2021-03-18 11:21:25 -07:00
Thomas Lively	2f2ae08da9	[WebAssembly] Remove experimental SIMD instructions Removes the instruction definitions, intrinsics, and builtins for qfma/qfms, signselect, and prefetch instructions, which were not included in the final WebAssembly SIMD spec. Depends on D98457. Differential Revision: https://reviews.llvm.org/D98466	2021-03-18 11:21:24 -07:00
Thomas Lively	8638c897f4	[WebAssembly] Remove unimplemented-simd target feature Now that the WebAssembly SIMD specification is finalized and engines are generally up-to-date, there is no need for a separate target feature for gating SIMD instructions that engines have not implemented. With this change, v128.const is now enabled by default with the simd128 target feature. Differential Revision: https://reviews.llvm.org/D98457	2021-03-18 10:23:12 -07:00
Peter Waller	0d6482a76a	[llvm][AArch64][SVE] Lower fixed length vector fabs Seemingly striaghtforward. Differential Revision: https://reviews.llvm.org/D98434	2021-03-18 17:20:08 +00:00
Stanislav Mekhanoshin	961e4384f4	[AMDGPU] Support SCC on buffer atomics Differential Revision: https://reviews.llvm.org/D98731	2021-03-18 09:56:14 -07:00

... 9 10 11 12 13 ...

146169 Commits