llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	eb165090bb	[LegalizeIntegerTypes] Improve ExpandIntRes_SADDSUBO codegen on targets without SADDO/SSUBO. This code creates 3 setccs that need to be expanded. It was creating a sign bit test as setge X, 0 which is non-canonical. Canonical would be setgt X, -1. This misses the special case in IntegerExpandSetCCOperands for sign bit tests that assumes canonical form. If we don't hit this special case we end up with a multipart setcc instead of just checking the sign of the high part. To fix this I've reversed the polarity of all of the setccs to setlt X, 0 which is canonical. The rest of the logic should still work. This seems to produce better code on RISCV which lacks a setgt instruction. This probably still isn't the best code sequence we could use here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97181	2021-02-23 09:40:32 -08:00
Jay Foad	a6be26710b	[GlobalISel] Make more use of replaceSingleDefInstWithReg. NFC.	2021-02-23 17:08:34 +00:00
Cassie Jones	8f956a5e8f	[GlobalISel] Implement narrowScalar for SADDE/SSUBE/UADDE/USUBE Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96673	2021-02-22 19:59:36 -05:00
Cassie Jones	e1532649cb	[GlobalISel] Implement narrowScalar for SADDO/SSUBO Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96672	2021-02-22 19:59:36 -05:00
Cassie Jones	c63b33b792	[GlobalISel] Implement narrowScalar for UADDO/USUBO Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96671	2021-02-22 19:59:35 -05:00
Amara Emerson	212d6a95ab	[GloblalISel] Support lowering <3 x i8> arguments in multiple parts. Differential Revision: https://reviews.llvm.org/D97086	2021-02-22 13:58:44 -08:00
Amara Emerson	69ce291bcc	[AArch64][GlobalISel] Support lowering <1 x i8> arguments. We don't yet have working codegen for the resulting unmerges, and if we did it would probably be horrible. Differential Revision: https://reviews.llvm.org/D97035	2021-02-22 13:58:44 -08:00
Heejin Ahn	a08e609d2e	[WebAssembly] Rename methods in WasmEHFuncInfo (NFC) This renames variable and method names in `WasmEHFuncInfo` class to be simpler and clearer. For example, unwind destinations are EH pads by definition so it doesn't necessarily need to be included in every method name. Also I am planning to add the reverse mapping in a later CL, something like `UnwindDestToSrc`, so this renaming will make meanings clearer. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97173	2021-02-22 12:16:11 -08:00
Kazu Hirata	ffba9e596d	[CodeGen] Use range-based for loops (NFC)	2021-02-21 19:58:07 -08:00
Craig Topper	1a6c1ac686	[SelectionDAG][RISCV] Teach ComputeNumSignBits to handle SREM. This also removes a pattern from RISCV that is no longer needed since the sexti32 on the LHS of the srem in the pattern implies the result is sign extended so the sign_extend_inreg should be removed in DAG combine now. Reviewed By: luismarques, RKSimon Differential Revision: https://reviews.llvm.org/D97133	2021-02-21 11:13:36 -08:00
Simon Pilgrim	38ab47c813	[DAG] Match USUBSAT patterns through zext/trunc This patch handles usubsat patterns hidden through zext/trunc and uses the getTruncatedUSUBSAT helper to determine if the USUBSAT can be correctly performed in the truncated form: zext(x) >= y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit))) zext(x) > y ? x - trunc(y) : 0 --> usubsat(x,trunc(umin(y,SatLimit))) Based on original examples: void foo(unsigned short p, int max, int n) { int i; unsigned m; for (i = 0; i < n; i++) { m = --p; *p = (unsigned short)(m >= max ? m-max : 0); } } Differential Revision: https://reviews.llvm.org/D25987	2021-02-21 15:26:54 +00:00
Kazu Hirata	0b417ba20f	[CodeGen] Use range-based for loops (NFC)	2021-02-20 21:46:02 -08:00
Petr Hosek	6b286d93f7	[InstrProfiling] Use nobits as __llvm_prf_cnts section type in ELF This can reduce the binary size because counters will no longer occupy space in the binary, instead they will be allocated by dynamic linker. Differential Revision: https://reviews.llvm.org/D97110	2021-02-20 14:20:33 -08:00
Simon Pilgrim	761bbed264	[DAG] foldSubToUSubSat - fold sub(a,trunc(umin(zext(a),b))) -> usubsat(a,trunc(umin(b,SatLimit))) This moves the last custom x86 USUBSAT fold to generic DAGCombine. Completes PR40111 Differential Revision: https://reviews.llvm.org/D96703	2021-02-20 12:02:07 +00:00
Kazu Hirata	a205fa5cd9	[CodeGen] Use range-based for loops (NFC)	2021-02-19 22:44:14 -08:00
Pan, Tao	12edddafac	[CodeGen] Fix two dots between text section name and symbol name There is a trailing dot in text section name if it has prefix, don't add repeated dot when connect text section name and symbol name. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D96327	2021-02-20 10:15:48 +08:00
Craig Topper	baab797878	[ValueTypes] Assert if changeVectorElementType is called on a simple type with an extended element type. Previously we would use the extended implementation, but the extended implementation requires the vector type to be extended so that we can access the LLVMContext. In theory we could detect this case and use the context from the element type instead, but since I know of no cases hitting this in practice today I've done the simplest thing. Also add asserts to several extended EVT functions that assume LLVMTy is non-null. Follow from discussion in D97036 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D97070	2021-02-19 17:30:46 -08:00
Mircea Trofin	82492f24ff	[NFC][Regalloc] Share the VirtRegAuxInfo object with LiveRangeEdit VirtRegAuxInfo is an extensibility point, so the register allocator's decision on which implementation to use should be communicated to the other users - namely, LiveRangeEdit. Differential Revision: https://reviews.llvm.org/D96898	2021-02-19 07:44:28 -08:00
Simon Pilgrim	5d3930bb8f	[DAG] visitTRUNCATE - attempt to truncate USUBSAT Fold trunc(usubsat(zext(x),y)) -> usubsat(x,trunc(umin(y,satlimit)))	2021-02-19 14:26:05 +00:00
Kazu Hirata	fd04f3a30c	[CodeGen] Use range-based for loops (NFC)	2021-02-18 22:46:43 -08:00
Adrian Prantl	c4ad878acb	Reset the EntryValue location flag in finalizeEntryValue. This fixes an assertion error when entry values are combined with DW_OP_LLVM_fragment.	2021-02-18 18:36:36 -08:00
Matt Arsenault	2d3d2e78d0	MIR: Fix parser crash on syntax error on first character This was calling the diagnostic printer before the context member was initialized.	2021-02-18 18:59:08 -05:00
Leonard Chan	c77659e549	[llvm][IR] Do not place constants with static relocations in a mergeable section This patch provides two major changes: 1. Add getRelocationInfo to check if a constant will have static, dynamic, or no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.) 2. Only allow a constant with no relocations (static or dynamic) to be placed in a mergeable section. This will allow unused symbols that contain static relocations and happen to fit in mergeable constant sections (.rodata.cstN) to instead be placed in unique-named sections if -fdata-sections is used and subsequently garbage collected by --gc-sections. See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html. Differential Revision: https://reviews.llvm.org/D95960	2021-02-18 15:39:00 -08:00
Matt Arsenault	62d946e133	GlobalISel: Merge some AMDGPU ABI lowering code to generic code AMDGPU currently has a lot of pre-processing code to pre-split argument types into 32-bit pieces before passing it to the generic code in handleAssignments. This is a bit sloppy and also requires some overly fancy iterator work when building the calls. It's better if all argument marshalling code is handled directly in handleAssignments. This handles more situations like decomposing large element vectors into sub-element sized pieces. This should mostly be NFC, but does change the generated code by shifting where the initial argument packing instructions are placed. I think this is nicer looking, since it now emits the packing code directly after the relevant copies, rather than after the copies for the remaining arguments. This doubles down on gfx6/gfx7 using the gfx8+ ABI for 16-bit types. This is ultimately the better option, but incompatible with the DAG. Fixing this requires more work, especially for f16.	2021-02-18 17:26:55 -05:00
Simon Pilgrim	53e83afcaf	[DAG] getTruncatedUSUBSAT - always truncate operands. NFCI. As noticed on D96703, we're always truncating the operands so should use getNode(ISD::TRUNCATE) instead of getZExtOrTrunc.	2021-02-18 21:28:55 +00:00
Guozhi Wei	66f2d09ebf	[DAGCombiner] Transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2) If extload is legal, following transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2) can save one ext instruction. Differential Revision: https://reviews.llvm.org/D95086	2021-02-18 13:15:20 -08:00
Philip Reames	dcebe8ab1e	Fix a buildbot warning triggered by `1dfb06d`	2021-02-18 09:37:49 -08:00
Philip Reames	13753808f4	[verify-regalloc] Verify after allocation and before postOptimization I've now hit several cases where a mistake in the regalloc main loop caused corrupt live intervals that didn't get caught until either the next verify or during post-optimization. The later case is rather confusing and tends to lead one down false trails, so let's catch corruption before that.	2021-02-18 09:10:50 -08:00
Philip Reames	5318d9e516	[splitkit] Add a minor wrapper function for readability [NFC]	2021-02-18 09:00:22 -08:00
Bradley Smith	8bad8a43c3	[AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD Adjust generateFMAsInMachineCombiner to return false if SVE is present in order to combine fmul+fadd into fma. Also add new pseudo instructions so as to select the most appropriate of FMLA/FMAD depending on register allocation. Depends on D96599 Differential Revision: https://reviews.llvm.org/D96424	2021-02-18 16:55:16 +00:00
Craig Topper	61d4d9a5d3	[TableGen][SelectionDAG] Improve efficiency of encoding negative immediates for isel's CheckInteger opcode. CheckInteger uses an int64_t encoded using a variable width encoding that is optimized for encoding a number with a lot of leading zeros. Negative numbers have no leading zeros so use the largest encoding requiring 9 bytes. I believe its most like we want to check for positive and negative numbers near 0. -1 is quite common due to its use in the 'not' idiom. To optimize for this, we can borrow an idea from the bitcode format and move the sign bit to bit 0 with the magnitude stored in the upper bits. This will drastically increase the number of leading zeros for small magnitudes. Then we can run this value through VBR encoding. This gives a small reduction in the table size on all in tree targets except VE where size increased by about 300 bytes due to intrinsic ids now requiring 3 bytes instead of 2. Since the intrinsic enum space is shared by all targets this an unfortunate consquence of where VE is currently located in the range. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96317	2021-02-18 08:53:17 -08:00
Philip Reames	1dfb06d0b4	[regalloc] Add a couple of dump routines for ease of debugging [NFC]	2021-02-18 08:50:00 -08:00
Kazu Hirata	61efa3d93f	[CodeGen] Use range-based for loops (NFC)	2021-02-17 23:58:46 -08:00
Kazu Hirata	8e13bbca08	[CodeGen] Use ListSeparator (NFC)	2021-02-17 23:58:43 -08:00
Yang Fan	796feb6163	[MC][ELF] Fix unused variable warning (NFC) GCC warning: ``` /llvm-project/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp: In member function ‘virtual llvm::MCSection* llvm::TargetLoweringObjectFileELF::getSectionForLSDA(const llvm::Function&, const llvm::MCSymbol&, const llvm::TargetMachine&) const’: /llvm-project/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp:871:8: warning: variable ‘IsComdat’ set but not used [-Wunused-but-set-variable] 871 \| bool IsComdat = false; \| ^~~~~~~~ ```	2021-02-18 14:23:18 +08:00
Chen Zheng	5517923b1c	[XCOFF][NFC] make csect properties optional for getXCOFFSection We are going to support debug sections for XCOFF. So the csect properties are not necessary. This patch makes these properties optional. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D95931	2021-02-17 20:51:42 -05:00
Jessica Paquette	e6064a6418	[GlobalISel] Implement computeKnownBits for G_ASSERT_SEXT Implementation is the same as G_SEXT_INREG. Differential Revision: https://reviews.llvm.org/D96899	2021-02-17 14:00:36 -08:00
Jessica Paquette	26fb036559	[GlobalISel] Implement computeNumSignBits for G_ASSERT_SEXT Same implementation as G_SEXT_INREG. Add a testcase to combine-sext-inreg for a concrete example, and a testcase to KnownBitsTest. Differential Revision: https://reviews.llvm.org/D96897	2021-02-17 13:53:17 -08:00
Jessica Paquette	60aa646441	[GlobalISel] Add G_ASSERT_SEXT This adds a G_ASSERT_SEXT opcode, similar to G_ASSERT_ZEXT. This instruction signifies that an operation was already sign extended from a smaller type. This is useful for functions with sign-extended parameters. E.g. ``` define void @foo(i16 signext %x) { ... } ``` This adds verifier, regbankselect, and instruction selection support for G_ASSERT_SEXT equivalent to G_ASSERT_ZEXT. Differential Revision: https://reviews.llvm.org/D96890	2021-02-17 13:10:34 -08:00
Mircea Trofin	3a030c2f2f	[NFC][RegAlloc] InlineSpiller::Original is a Register	2021-02-17 12:07:59 -08:00
Derek Schuff	1f9e551a81	[WebAssembly] Do not use EHCatchret symbols with wasm EH D94835 added support for WinEH to export public symbols pointing to basic blocks which are catchret targets for use with Windows CET. Wasm currently doesn't support public symbols to non-function code addresses (they get treated like new functions in asm but then don't lower to object files correctly). It created them unconditionally for all catchret targets. This change disables those symbols unless the exceptionHandlingType is WinEH (since they aren't used with ExceptionHandling::Wasm) Differential Revision: https://reviews.llvm.org/D96824	2021-02-17 11:22:48 -08:00
Marianne Mailhot-Sarrasin	f0ec9f1bb3	[Pipeliner] Fixed optimization remarks and debug dumps Initiation Interval value The II value was incremented before exiting the loop, and therefor when used in the optimization remarks and debug dumps it did not reflect the initiation interval actually used in Schedule. Differential Revision: https://reviews.llvm.org/D95692	2021-02-17 12:28:37 -05:00
Simon Pilgrim	87fbc06d06	[DAG] Pull out getTruncatedUSUBSAT helper from foldSubToUSubSat. NFCI. This will simplify an incoming generic implementation of D25987. I'll rebase D96703 shortly to support this.	2021-02-17 12:17:08 +00:00
Simon Pilgrim	05c64ea672	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) (REAPPLIED) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Fixes issue raised by @saugustine in rG5aa8f4c0843a where we were failing to replace null shuffle operands from MergeInnerShuffle to UNDEFs. Differential Revision: https://reviews.llvm.org/D96345	2021-02-17 11:42:43 +00:00
Igor Kudrin	aa84289629	[DebugInfo] Keep the DWARF64 flag in the module metadata This allows the option to affect the LTO output. Module::Max helps to generate debug info for all modules in the same format. Differential Revision: https://reviews.llvm.org/D96597	2021-02-17 17:03:34 +07:00
Sjoerd Meijer	7f3170ec19	[MachineSink] Add a loop sink limit To make sure compile-times don't regress, add an option to restrict the number of instructions considered for sinking as alias analysis can be expensive and for the same reason also skip large blocks. Differential Revision: https://reviews.llvm.org/D96485	2021-02-17 08:50:53 +00:00
Kazu Hirata	3279943adf	[CodeGen] Use range-based for loops (NFC)	2021-02-16 23:23:08 -08:00
Sriraman Tallam	d1a838babc	Basic block sections should enable function sections implicitly. Basic block sections enables function sections implicitly, this is not needed and is inefficient with "=list" option. We had basic block sections enable function sections implicitly in clang. This is particularly inefficient with "=list" option as it places functions that do not have any basic block sections in separate sections. This causes unnecessary object file overhead for large applications. This patch disables this implicit behavior. It only creates function sections for those functions that require basic block sections. Further, there was an inconistent behavior with llc as llc was not turning on function sections by default. This patch makes llc and clang consistent and tests are added to check the new behavior. This is the first of two patches and this adds functionality in LLVM to create a new section for the entry block if function sections is not enabled. Differential Revision: https://reviews.llvm.org/D93876	2021-02-16 16:27:16 -08:00
Petr Hosek	16af973933	[MC][ELF] Support for zero flag section groups This change introduces support for zero flag ELF section groups to LLVM. LLVM already supports COMDAT sections, which in ELF are a special type of ELF section groups. These are generally useful to enable linker GC where you want a group of sections to always travel together, that is to be either retained or discarded as a whole, but without the COMDAT semantics. Other ELF assemblers already support zero flag ELF section groups and this change helps us reach feature parity. Differential Revision: https://reviews.llvm.org/D95851	2021-02-16 14:23:40 -08:00
Sterling Augustine	5aa8f4c084	Revert "[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))" This reverts commit `5dfba562dd`. That commit causes an assertion failure with the following repro: typedef long b __attribute__((__vector_size__(16))); b d; b e; b __attribute__((__always_inline__)) c(b h, b i) { return (__attribute__((__vector_size__(8 sizeof(short)))) short)h + i; } j() { b k, l, m, n, o[6], p, q; m = d[5]; b r = m; b s = f(r, 8); q = s; l = d[1]; p = l; t(q); n = c(m, l); o[1] = c(s, f(p, 8)); k = __builtin_shufflevector(n, o[1], 0, 2); e = __builtin_ia32_psrlwi128(k, j); } ./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c	2021-02-16 12:48:15 -08:00
Simon Pilgrim	df45c18135	[DAG] PromoteIntRes_ADDSUBSHLSAT - promote ISD::UADDSAT as clamped add Similar to D96622, we're better off just promoting uaddsat(x,y) -> umin(add(x,y),c) instead of trying to perform a shifted uaddsat. I initially tried to just use shifted promotion in cases where we didn't have a legal/custom umin - but we don't appear to have any targets that have uaddsat but not umin, so imo we're better off always using the umin and avoid an untested shifted uaddsat code path. Differential Revision: https://reviews.llvm.org/D96767	2021-02-16 17:37:44 +00:00
Craig Topper	064ada4ec6	[SelectionDAG][AArch64] Restrict matchUnaryPredicate to only handle SPLAT_VECTOR for scalable vectors. `fde2466171` added support for scalable vectors to matchUnaryPredicate by handling SPLAT_VECTOR in addition to BUILD_VECTOR. This was used to enabled UDIV/SDIV/UREM/SREM by constant expansion in BuildUDIV/BuildSDIV in TargetLowering.cpp The caller there expects to call getBuildVector from the match factors. This leads to a crash right now if there is a SPLAT_VECTOR of fixed vectors since the number of vectors won't match the number of elements. To fix this, this patch updates the callers to check the opcode instead of whether the type is fixed or scalable. This assumes that only 3 opcodes are handled by matchUnaryPredicate so I've added an assertion to the final else to check that opcode. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96174	2021-02-16 09:22:46 -08:00
Simon Pilgrim	5dfba562dd	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Differential Revision: https://reviews.llvm.org/D96345	2021-02-16 15:46:34 +00:00
Simon Pilgrim	420420de57	[DAG] Avoid APInt copies by directly using the APInt reference from getAPIntValue. NFCI.	2021-02-16 13:50:34 +00:00
Simon Pilgrim	dd879f7dc9	[DAG] Use APInt::extractBits instead of lshr().trunc(). NFCI. Avoids so many APInt instances by directly using the APInt reference from getAPIntValue.	2021-02-16 13:50:33 +00:00
Kazu Hirata	22f00f61dd	[CodeGen] Use range-based for loops (NFC)	2021-02-15 14:46:11 -08:00
Matt Arsenault	392e0fcfd1	GlobalISel: Handle arguments partially passed on the stack The API is a bit awkward since you need to index into an array in the passed struct. I guess an alternative would be to pass all of the individual fields.	2021-02-15 17:06:14 -05:00
Matt Arsenault	1b3d8ddeb9	CodeGen: Move function to get subregister indexes to cover a LaneMask Return the best covering index, and additional needed to complete the mask. This logically belongs in TargetRegisterInfo, although I ended up not needing it for why I originally split this out.	2021-02-15 17:05:37 -05:00
Craig Topper	eb75f250fe	[RISCV][LegalizeTypes] Try to expand BITREVERSE before promoting if the promoted BITREVERSE would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. Simlilar was done for BSWAP previously. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96681	2021-02-15 12:33:16 -08:00
Adrian Prantl	09b832e74f	Support emitting complex expressions that include entry values This patch enables AsmPrinter support for complex expression with entry values. It shouldn't AsmPrinter's call whether these are safe or not but the pass who introduces the DW_OP_LLVM_entry_value. This patch on its own has no effect on clang. Differential Revision: https://reviews.llvm.org/D96559	2021-02-15 11:09:09 -08:00
Simon Pilgrim	e47f21da61	[DAG] visitVSELECT - move OpLHS == LHS into inner if() in USUBSAT matching. NFCI. This will be necessary for the update of D25987 where we'll need to match OpLHS against other ops.	2021-02-15 18:27:00 +00:00
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
Arlo Siemsen	080866470d	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Cassie Jones	97a1cdb156	[GlobalISel] Disable vector types in narrowScalarAddSub The implementation for vectors is broken and doesn't seem to be used by anything. Explicitly remove support for them, they can be added again later when they're properly implemented. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D95699	2021-02-14 18:06:32 -05:00
Cassie Jones	36246388ba	[GlobalISel] Extract a narrowScalarAddSub method. NFC Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D95426	2021-02-14 18:06:32 -05:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Kazu Hirata	d5adba10f0	[CodeGen] Use range-based for loops (NFC)	2021-02-13 20:41:39 -08:00
Simon Pilgrim	6f5a805bbb	[DAG] Fold i1/vXi1 saddsat/uaddsat(x,y) -> or(x,y) Alive2: https://alive2.llvm.org/ce/z/FzcrpH	2021-02-13 15:02:01 +00:00
Simon Pilgrim	0df15e5eff	[DAG] Fold i1/vXi1 ssubsat/usubsat(x,y) -> and(x,~y) Alive2: https://alive2.llvm.org/ce/z/4nkNGh	2021-02-13 13:21:15 +00:00
Simon Pilgrim	60ba5397df	[DAG] PromoteIntRes_ADDSUBSHLSAT - use promoted ISD::USUBSAT directly As discussed on D96413, as long as the promoted bits of the args are zero we can use the basic ISD::USUBSAT pattern directly, without the shifting like we do for other ops. I think something similar should be possible for ISD::UADDSAT as well, which I'll look at later. Also, create a ISD::USUBSAT node directly - this will be expanded back by the legalizer later on if necessary. Differential Revision: https://reviews.llvm.org/D96622	2021-02-13 12:35:10 +00:00
Simon Pilgrim	7ad0c573bd	[DAG] Fix shift amount limit in SimplifyDemandedBits trunc(shift(x,c)) to truncated bitwidth We lost this in D56387/rG69bc0990a9181e6eb86228276d2f59435a7fae67 - where I got the src/dst bitwidths mixed up and assumed getValidShiftAmountConstant would catch it. Patch by @craig.topper - confirmed by @Carrot that it fixes PR49162	2021-02-13 12:00:08 +00:00
Kazu Hirata	905cf88d18	[CodeGen] Use range-based for loops (NFC)	2021-02-12 23:44:33 -08:00
Adrian Prantl	982b891905	Store the LocationKind of an entry value buffer independently from the main LocationKind (NFC) This patch hides the logic for setting the location kind of an entry value inside the begin/finalize/cancel functions. This way we get rid the strange workaround that is currently in setLocation(). In the future, this will allow us to set the location kind of the entry value independently from the location kind of the main expression. Differential Revision: https://reviews.llvm.org/D96554	2021-02-12 16:59:39 -08:00
Jay Foad	7c749baa3a	[GlobalISel] Simpler verification of G_SEXT_INREG and G_ASSERT_ZEXT There's no need to call verifyVectorElementMatch since we already know that the source and destination types are identical. Differential Revision: https://reviews.llvm.org/D96589	2021-02-12 21:33:27 +00:00
Amara Emerson	5d6d9b63a3	[GlobalISel] Propagate extends through G_PHIs into the incoming value blocks. This combine tries to do inter-block hoisting of extends of G_PHIs, into the originating blocks of the phi's incoming value. The idea is to expose further optimization opportunities that are normally obscured by the PHI. Some basic heuristics, and a target hook for AArch64 is added, to allow tuning. E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine since it may be folded into the addressing mode during selection. There are very minor code size improvements on AArch64 -Os, but the real benefit is that it unlocks optimizations like AArch64 conditional compares on some benchmarks. Differential Revision: https://reviews.llvm.org/D95703	2021-02-12 11:52:52 -08:00
Simon Pilgrim	4841a225b7	[DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets. This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization. We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation. The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987. Differential Revision: https://reviews.llvm.org/D96413	2021-02-12 18:22:57 +00:00
Lukas Sommer	6577cef9b0	[CodeGen] New pass: Replace vector intrinsics with call to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. This is a re-try of the original commit `2303e93e66` that was reverted due to pass manager problems. Other minor changes have also been made. Differential Revision: https://reviews.llvm.org/D95373	2021-02-12 12:53:27 -05:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Petar Avramovic	f0d65f4096	AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum Implements same logis as in SelectionDAG. G_FMINNUM_IEEE and G_FMAXNUM_IEEE are never SNaN by definition and never NaN when one operand is known non-NaN and other known non-SNaN. G_FMINNUM and G_FMAXNUM are never NaN/SNaN when one of the operands is known non-NaN/SNaN. Differential Revision: https://reviews.llvm.org/D91716	2021-02-12 17:14:34 +01:00
Petar Avramovic	122c649c98	AMDGPU/GlobalISel: Check values of constants in isKnownNeverNaN Differential Revision: https://reviews.llvm.org/D91714	2021-02-12 17:14:34 +01:00
Simon Pilgrim	2465541dc0	[DAG] DAGTypeLegalizer::PromoteIntRes_ADDSUBSHLSAT - break if-else chain. NFCI. Style fixup - the if() block always returns so we can pull out the contents of the else() block.	2021-02-12 10:33:12 +00:00
Kazu Hirata	d61b4cb9d8	[CodeGen] Use range-based for loops (NFC)	2021-02-11 23:31:31 -08:00
Amara Emerson	de035c18cf	[GlobalISel] Fix sext_inreg(load) combine to not move the originating load. The builder was using the extend user as the insertion point, which meant that we were incorrectly "moving" the load from its original position, and therefore could violate memory operation ordering.	2021-02-11 19:27:09 -08:00
Snehasish Kumar	2c7077e67d	[CodeGen] Split out cold exception handling pads. Support for splitting exception handling pads was added in D73739. This change updates the code to split out exception handling pads if profile information indicates that they are cold. For a given function with multiple landind pads, if one of them is hot they are all retained as part of the hot code section. Differential Revision: https://reviews.llvm.org/D96372	2021-02-11 11:23:43 -08:00
Snehasish Kumar	d079dbc591	[CodeGen] Basic block sections should take precendence over splitting. The use of basic block sections should take precedence over the machine function splitting pass. Since they use the same underlying mechanism they are kept exclusive. Updated the tests to check that split machine functions is overridden by all flavours of basic block sections. Differential Revision: https://reviews.llvm.org/D96392	2021-02-11 11:14:10 -08:00
Craig Topper	5744502a13	[TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL. If we wait until the type is legalized, we'll lose information about the orginal type and need to use larger magic constants. This gets especially bad on RISCV64 where i64 is the only legal type. I've limited this to simple scalar types so it only works for i8/i16/i32 which are most likely to occur. For more odd types we might want to do a small promotion to a type where MULH is legal instead. Unfortunately, this does prevent some urem/srem+seteq matching since that still require legal types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96210	2021-02-11 09:43:13 -08:00
Simon Pilgrim	5beebf9c58	[DAG] foldLogicOfSetCCs - Generalize and/or (setcc X, CMax, ne), (setcc X, CMin, ne/eq) fold. NFCI. Prep work to add support for non-uniform vectors - replace APInt values with using the SDValue ops directly.	2021-02-11 17:09:01 +00:00
Thomas Preud'homme	bad0290ce3	Improve STRICT_FSETCC codegen in absence of no NaN As for SETCC, use a less expensive condition code when generating STRICT_FSETCC if the node is known not to have Nan. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91972	2021-02-11 14:19:43 +00:00
Joe Ellis	67464dfe36	[DebugInfo] Only perform TypeSize -> unsigned cast when necessary This commit moves a line in SelectionDAGBuilder::handleDebugValue to avoid implicitly casting a TypeSize object to an unsigned earlier than necessary. It was possible that we bail out of the loop before the value is ever used, which means we could create a superfluous TypeSize warning. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D96423	2021-02-11 13:54:09 +00:00
Max Kazantsev	418c218efa	Return "[Codegenprepare][X86] Use usub with overflow opt for IV increment" The patch did not account for one corner case where cmp does not dominate the loop latch. This patch adds this check, hopefully it's cheap because the CFG does not change during the transform, so DT queries should be executed quickly. If you see compile time slowness from this, please revert. Differential Revision: https://reviews.llvm.org/D96119	2021-02-11 19:49:23 +07:00
Max Kazantsev	90081f3020	Revert "[Codegenprepare][X86] Use usub with overflow opt for IV increment" This reverts commit `3d15b7e7df`. We've found an internal failure, need to analyze.	2021-02-11 17:52:11 +07:00
Max Kazantsev	3d15b7e7df	[Codegenprepare][X86] Use usub with overflow opt for IV increment Function `replaceMathCmpWithIntrinsic` artificially limits the scope of the optimization, setting a requirement of two instructions be in the same block, due to two reasons: - usage of DT for more general check is costly in terms of compile time; - risk of creating a new value that lives through multiple blocks. Because of this, two semantically equivalent tests may be or not be the subject of this opt depending on where the binary operation is located. See `test/CodeGen/X86/usub_inc_iv.ll` for motivation There is one important particular case where this limitation is too strict: it is when the binary operation is the increment of the induction variable. As result, the application of this opt becomes fragile and highly reliant on where other passes decide to place IV increment. In most cases, they place it in the end of the latch block, killing the opt opportunity (when in fact it does not matter where to insert the actual instruction). This patch handles this particular case separately. - The detector does not use dom tree and has constant cost; - The value of IV or IV.next lives through all loop in any case, so this should not create a new unexpected long-living value. As result, the transform becomes more robust. It also seems to lead to better code generation in some cases (see `test/CodeGen/X86/lsr-loop-exit-cond.ll`). Differential Revision: https://reviews.llvm.org/D96119 Reviewed By: spatel, reames	2021-02-11 11:59:45 +07:00
Kazu Hirata	c5e90a8857	[AsmPrinter] Use range-based for loops (NFC)	2021-02-10 20:01:22 -08:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Jeremy Morse	1d68e0a075	Reland [DWARF] Location-less inlined variables should not have DW_TAG_variable Originally landed in `ddc2f1e3fb` and reverted in `d32deaab4d` because of a Generic test objecting. That was fixed up in `013613964f`. Original landing commit message follows: [DWARF] Location-less inlined variables should not have DW_TAG_variable Discussed in this thread: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html DwarfDebug::collectEntityInfo accidentally distinguishes between variable locations that never have a location specified, and variable locations that have an empty location specified. The latter leads to the creation of an empty variable referring to the abstract origin. Fix this by seeking a non-empty location before producing a concrete entity, to guarantee a DW_AT_location will be produced. Other loops in collectEntityInfo and endFunctionImpl take care of examining the retainedNodes collection and ensuring optimised-out variables are created. Differential Revision: https://reviews.llvm.org/D95617	2021-02-10 15:40:47 +00:00
Luís Marques	acac29ca42	[DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts Avoid doing the following combine for vector types: ``` copysign(x, fp_extend(y)) -> copysign(x, y) copysign(x, fp_round(y)) -> copysign(x, y) ``` That combine seemed to impede the selection of vector instruction and cause a mess in some circumstances. Differential Revision: https://reviews.llvm.org/D96037	2021-02-10 14:25:24 +00:00
Sander de Smalen	750a78cd5d	[ValueTypes] Add MVT for nxv1bf16. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96249	2021-02-10 08:50:41 +00:00
Kazu Hirata	7e75f6fc1d	[SelectionDAG] Use range-based for loops (NFC)	2021-02-09 22:14:30 -08:00
Matt Arsenault	b72a23650f	GlobalISel: Fix using wrong calling convention for callees This was taking the calling convention from the parent function, instead of the callee. Avoids regressions in a future patch when the caller and callee have different type breakdowns. For some reason AArch64's lowerFormalArguments seems to intentionally ignore the parent isVarArg.	2021-02-09 13:48:56 -05:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Nemanja Ivanovic	a5222aa085	[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets As of commit `284f2bffc9`, the DAG Combiner gets rid of the masking of the input to this node if the mask only keeps the bottom 16 bits. This is because the underlying library function does not use the high order bits. However, on PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits from the register. Therefore, the library implementation of __gnu_h2f_ieee will return an incorrect result if the bits aren't cleared. This combine is desired for ARM (and possibly other targets) so this patch adds a query to Target Lowering to check if this zeroing needs to be kept. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092 Differential revision: https://reviews.llvm.org/D96283	2021-02-09 06:33:48 -06:00
Thomas Preud'homme	a50ab8672d	Revert STRICT_FCMP nonan optimisation Summary: This reverts commit `b7b61a7b5b` which fails on some of the builders: http://lab.llvm.org:8011/#/builders/14/builds/5806 Reviewers: Subscribers:	2021-02-09 11:27:35 +00:00
Thomas Preud'homme	b7b61a7b5b	Improve STRICT_FSETCC codegen in absence of no NaN As for SETCC, use a less expensive condition code when generating STRICT_FSETCC if the node is known not to have Nan. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91972	2021-02-09 11:18:16 +00:00
Matt Arsenault	87e280110d	GlobalISel: Use correct calling convention in handleAssignments This was using the calling convention of the calling function, not the callee. Avoids regressions in a future patch.	2021-02-08 17:09:28 -05:00
Amara Emerson	ec41ed5b1b	[AArch64][GlobalISel] Support the 'returned' parameter attribute. On AArch64 (which seems to be the only target that supports it), this attribute allows codegen to avoid saving/restoring the value in x0 across a call. Gives a 0.1% geomean -Os code size improvement on CTMark. Differential Revision: https://reviews.llvm.org/D96099	2021-02-08 12:47:39 -08:00
Simon Pilgrim	c5c690a835	[DAG] visitVECTOR_SHUFFLE - move shuffle legality check into MergeInnerShuffle lamda. NFCI. This is going to be necessary for a future reuse of MergeInnerShuffle	2021-02-08 14:25:16 +00:00
Nicholas Guy	cd880442ae	[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold Different targets might handle branch performance differently, so this patch allows for targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch can be and still be duplicated to generate straight-line code instead. This patch also specifies said override values for the AArch64 subtarget. Differential Revision: https://reviews.llvm.org/D95631	2021-02-08 13:28:00 +00:00
Jeremy Morse	c1d45abda5	Revert "Re-land D94976 after revert in e29552c5aff6" Maskray has reported a fault with .debug_gnu_pubnames in the comments on D94976, caused by this patch, reverting to investigate. This reverts commit `8998f58435`.	2021-02-08 12:41:12 +00:00
Jeremy Morse	6ade2dea7b	Revert "DebugInfo: Temporarily work around -gsplit-dwarf + LTO .debug_gnu_pubnames regression after D94976" Backing out this workaround to focus on fixing whatever's wrong with .debug_gnu_pubnames, I'll revert the cause, (`8998f584`) in the next commit. This reverts commit `56fa34ae35`.	2021-02-08 12:41:01 +00:00
Kazu Hirata	7b9f6c2d42	[SelectionDAG] Drop unnecessary const from a return type (NFC) Identified with const-return-type.	2021-02-07 09:49:33 -08:00
Simon Pilgrim	86dabf4226	[DAG] SelectionDAG::isSplatValue - handle OR/XOR cases Add OR/XOR to the basic binops that we support when checking for a splat vector value	2021-02-07 13:27:57 +00:00
Fangrui Song	e44a100942	.gcc_except_table: Set SHF_LINK_ORDER if binutils>=2.36, and drop unneeded unique ID for -fno-unique-section-names GNU ld>=2.36 supports mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER sections in an output section, so we can set SHF_LINK_ORDER if -fbinutils-version=2.36 or above. If -fno-function-sections or older binutils, drop unique ID for -fno-unique-section-names. The users can just specify -fbinutils-version=2.36 or above to allow GC with both GNU ld and LLD. (LLD does not support garbage collection of non-group non-SHF_LINK_ORDER .gcc_except_table sections.)	2021-02-05 21:45:21 -08:00
Fangrui Song	853a264916	[AsmPrinter] __patchable_function_entries: Set SHF_LINK_ORDER for binutils 2.36 and above This matches GCC behavior when the configure-time binutils is new. GNU ld<2.36 did not support mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER sections in an output section, so we conservatively disable SHF_LINK_ORDER for <2.36.	2021-02-05 19:53:06 -08:00
Sanjay Patel	c981f6f8e1	Revert "[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library" This reverts commit `2303e93e66`. Investigating bot failures.	2021-02-05 15:10:11 -05:00
Lukas Sommer	2303e93e66	[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. Differential Revision: https://reviews.llvm.org/D95373	2021-02-05 14:25:19 -05:00
Wouter van Oortmerssen	e3c0b0fe09	[WebAssembly] locals can now be indirect in DWARF This for example to indicate that byval args are represented by a pointer to a struct. Followup to https://reviews.llvm.org/D94140 Differential Revision: https://reviews.llvm.org/D94347	2021-02-05 11:14:42 -08:00
Amy Huang	34f3249abd	[DebugInfo] Fix error from D95893, where I accidentally used an unsigned int in a loop and it wraps around. Follow up to https://reviews.llvm.org/D95893	2021-02-05 10:25:21 -08:00
Huihui Zhang	1b81117f88	[DAGCombiner][SVE] Fix invalid use of getVectorNumElements() in visitSRA. Make sure scalable property is preserved by using getVectorElementCount(). Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D95967	2021-02-05 09:56:49 -08:00
Amy Huang	a740af4de9	[CodeView][DebugInfo] Update the code for removing template arguments from the display name of a codeview function id. Previously the code split the string at the first '<', which incorrectly truncated names like `operator<`. Differential Revision: https://reviews.llvm.org/D95893	2021-02-05 09:49:11 -08:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `3fe3946d9a` without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `3fe3946d9a`. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
Guillaume Chatelet	4b15156dca	[NFC] inline variable	2021-02-05 10:17:02 +00:00
Kazu Hirata	5438e079b1	[GlobalISel] Use ListSeparator (NFC)	2021-02-04 21:18:04 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Fangrui Song	56fa34ae35	DebugInfo: Temporarily work around -gsplit-dwarf + LTO .debug_gnu_pubnames regression after D94976 `-flto -gsplit-dwarf -g -O[123]` may create .debug_gnu_pubnames with 0 DIE offset entries. llvm-dwarfdump -debug-gnu-pubnames/ld.lld --gdb-index errors for that. ``` .section .debug_gnu_pubnames,"",@progbits .long .LpubNames_end2-.LpubNames_begin2 # Length of Public Names Info .LpubNames_begin2: .short 2 # DWARF Version .long .Lcu_begin2 # Offset of Compilation Unit Info .long 57 # Compilation Unit Length .long 0 # DIE offset .byte 16 # Attributes: TYPE, EXTERNAL .asciz "absl" # External Name .long 0 # DIE offset .byte 16 # Attributes: TYPE, EXTERNAL .asciz "absl::base_internal" # External Name .long 0 # End Mark ```	2021-02-04 17:35:09 -08:00
Craig Topper	8cc9c42a0c	[TargetLowering] Use LegalOnly operand to isOperationLegalOrCustom to simplify some code. NFC	2021-02-04 12:30:37 -08:00
Sanjay Patel	056d31dd2a	[ExpandReductions] fix FMF requirement for fmin/fmax The upstream callers (the vectorizers) were fixed with: `bbed5f2f8a` ( D95690 ) `77adbe6a8c` We should remove this pass entirely now that reduction legalization/lowering is expected to work just as well, but we need to confirm that the shuffle ops do not regress (for x86 in particular). This should be the last step needed to close: https://llvm.org/PR23116	2021-02-04 13:32:08 -05:00
Jeremy Morse	8998f58435	Re-land D94976 after revert in `e29552c5af` This modified patch avoids redirecting the unit in which a subprogram is created if type units are enabled -- DIEs were getting children allocated from different units memory pools. Original commit message: [DWARF] Create subprogram's DIE in DISubprogram's unit This is a fix for PR48790. Over in D70350, subprogram DIEs were permitted to be shared between CUs. However, the creation of a subprogram DIE can be triggered early, from other CUs. The subprogram definition is then created in one CU, and when the function is actually emitted children are attached to the subprogram that expect to be in another CU. This breaks internal CU references in the children. Fix this by redirecting the creation of subprogram DIEs in getOrCreateContextDIE to the CU specified by it's DISubprogram definition. This ensures that the subprogram DIE is always created in the correct CU. Differential Revision: https://reviews.llvm.org/D94976	2021-02-04 11:17:18 +00:00
Justin Bogner	62ce4b048f	[GlobalISel] Combine narrowScalar of G_ADD and G_SUB. NFC These two cases have identical implementations other than an unreachable part of `G_ADD` that checks if the scalar we're narrowing is a vector. Combining them to avoid unnecessary divergence.	2021-02-03 11:06:04 -08:00
Matt Arsenault	39fbb5c3e3	RegisterCoalescer: Fix not setting undef on coalesced subregister uses This was only adding undef to the use if the copy itself had a subregister index. It did not consider the subrange liveness if the use had a subreg index to begin with.	2021-02-03 13:54:43 -05:00
Matt Arsenault	d886da042c	RegisterCoalescer: Prune undef subranges from copy pairs in loops If we had a pair of copies inside a loop which introduced new liveness to a subregister which was undef before the loop, we would have a dummy phi-only segment remaining across the loop body. Later, this false segment would confuse RenameIndependentSubregs causing it to introduce IMPLICIT_DEFs with broken value numbering. It seems always adding the lanes to ShrinkMask is OK, so any conditions should be purely a compile time filter.	2021-02-03 13:42:53 -05:00
Craig Topper	34da12dd1f	[DAGCombiner] Remove (sra (shl X, C), C) if X has more than C sign bits. If sext_inreg is supported, we will turn this into sext_inreg. That will then remove it if there are enough sign bits. But if sext_inreg isn't supported, we can still remove the shift pair based on sign bits. Split from D95890.	2021-02-03 10:18:40 -08:00
Jeremy Morse	d32deaab4d	Revert "[DWARF] Location-less inlined variables should not have DW_TAG_variable" This reverts commit `ddc2f1e3fb`. A build-bot objected: http://lab.llvm.org:8011/#builders/105/builds/5486	2021-02-03 17:54:33 +00:00
Jeremy Morse	ddc2f1e3fb	[DWARF] Location-less inlined variables should not have DW_TAG_variable Discussed in this thread: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html DwarfDebug::collectEntityInfo accidentally distinguishes between variable locations that never have a location specified, and variable locations that have an empty location specified. The latter leads to the creation of an empty variable referring to the abstract origin. Fix this by seeking a non-empty location before producing a concrete entity, to guarantee a DW_AT_location will be produced. Other loops in collectEntityInfo and endFunctionImpl take care of examining the retainedNodes collection and ensuring optimised-out variables are created. Differential Revision: https://reviews.llvm.org/D95617	2021-02-03 17:32:31 +00:00
Kazu Hirata	511c9a76fb	[AsmPrinter] Use ListSeparator (NFC)	2021-02-02 22:52:48 -08:00
Serguei Katkov	de305b0425	[Statepoint] Handle 'undef' operand tied to def FixupStatepoints pass does not take into account the undef use it skips may have a tied def. So when defs are handled pass considers that tied-use should be spilled and triggers an assert. FixupStatepoints should skip undef def as well. Reviewers: reames, dantrushin Reviewed By: dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D95858	2021-02-03 10:41:14 +07:00
Jessica Paquette	02d4b365bf	[GlobalISel] Check if branches use the same MBB in matchOptBrCondByInvertingCond If the G_BR + G_BRCOND in this combine use the same MBB, then it will infinite loop. Don't allow that to happen. Differential Revision: https://reviews.llvm.org/D95895	2021-02-02 15:38:48 -08:00
Craig Topper	4553821815	[SelectionDAG] Prevent scalable vector warning from ComputeNumSignBits on extract_vector_elt on a scalable vector.	2021-02-01 23:42:03 -08:00
Jessica Paquette	4809663334	[GlobalISel] Make sure G_ASSERT_ZEXT's src ends up with the same rc as dst When replacing the dst reg with the src reg, we need to make sure that we propagate the dst reg's register class through to the src. Otherwise, we aren't meeting the requirements for G_ASSERT_ZEXT, and so the verifier will fail. Differential Revision: https://reviews.llvm.org/D95708	2021-02-01 09:46:35 -08:00
Kerry McLaughlin	9b4fcfaa9e	[SVE][CodeGen] Remove performMaskedGatherScatterCombine The AArch64 DAG combine added by D90945 & D91433 extends the index of a scalable masked gather or scatter to i32 if necessary. This patch removes the combine and instead adds shouldExtendGSIndex, which is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether the index should be extended before calling getMaskedGather/Scatter. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94525	2021-02-01 14:10:00 +00:00
Tim Northover	c2b322fc19	GlobalISel: check type size before getZExtValue()ing it. Otherwise getZExtValue() asserts.	2021-02-01 12:43:33 +00:00
xgupta	94fac81fcc	[Branch-Rename] Fix some links According to the [[ https://foundation.llvm.org/docs/branch-rename/ \| status of branch rename ]], the master branch of the LLVM repository is removed on 28 Jan 2021. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D95766	2021-02-01 16:43:21 +05:30
Serge Pavlov	bf416d166b	[FPEnv] Intrinsic for setting rounding mode To set non-default rounding mode user usually calls function 'fesetround' from standard C library. This way has some disadvantages. * It creates unnecessary dependency on libc. On the other hand, setting rounding mode requires few instructions and could be made by compiler. Sometimes standard C library even is not available, like in the case of GPU or AI cores that execute small kernels. * Compiler could generate more effective code if it knows that a particular call just sets rounding mode. This change introduces new IR intrinsic, namely 'llvm.set.rounding', which sets current rounding mode, similar to 'fesetround'. It however differs from the latter, because it is a lower level facility: * 'llvm.set.rounding' does not return any value, whereas 'fesetround' returns non-zero value in the case of failure. In glibc 'fesetround' reports failure if its argument is invalid or unsupported or if floating point operations are unavailable on the hardware. Compiler usually knows what core it generates code for and it can validate arguments in many cases. * Rounding mode is specified in 'fesetround' using constants like 'FE_TONEAREST', which are target dependent. It is inconvenient to work with such constants at IR level. C standard provides a target-independent way to specify rounding mode, it is used in FLT_ROUNDS, however it does not define standard way to set rounding mode using this encoding. This change implements only IR intrinsic. Lowering it to machine code is target-specific and will be implemented latter. Mapping of 'fesetround' to 'llvm.set.rounding' is also not implemented here. Differential Revision: https://reviews.llvm.org/D74729	2021-02-01 11:28:14 +07:00
Jun Ma	54842fa0bb	[CodeGenPrepare] Also skip lifetime.end intrinsic when check return block in dupRetToEnableTailCallOpts. Differential Revision: https://reviews.llvm.org/D95424	2021-02-01 08:18:44 +08:00
Craig Topper	70289ea6f5	[RISCV][LegalizeTypes] Try to expand BSWAP before promoting if the promoted BSWAP would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. This is helfpul on RISCV where we might have to promote i16 all the way to i64. Differential Revision: https://reviews.llvm.org/D95756	2021-01-31 14:33:29 -08:00
Matt Arsenault	1801e2aa24	RegAlloc: Fix assert if all registers in class reserved With a context instruction, this would produce a context error. However, it would continue on and do an out of bounds access of the empty allocation order array.	2021-01-31 11:10:04 -05:00
Kazu Hirata	627b5bda11	[llvm] Add missing header guards (NFC) Identified with llvm-header-guard.	2021-01-30 09:53:42 -08:00
Kazu Hirata	1a2d67fa23	[llvm] Use llvm::lower_bound and llvm::upper_bound (NFC)	2021-01-29 23:23:36 -08:00
Sriraman Tallam	c32f399802	Detect Source Drift with Propeller. Source Drift happens when the sources are updated after profiling the binary but before building the final optimized binary. If the source has changed since the profiles were obtained, optimizing basic blocks might be sub-optimal. This only applies to BasicBlockSection::List as it creates clusters of basic blocks using basic block ids. Source drift can invalidate these groupings leading to sub-optimal code generation with regards to performance. PGO source drift for a particular function can be detected using function metadata added in D95495. When source drift is deected, disable basic block clusters by default which can be re-enabled with -mllvm option bbsections-detect-source-drift=false. Differential Revision: https://reviews.llvm.org/D95593	2021-01-29 18:47:26 -08:00
Roman Lebedev	ddc4b56eef	[ExpandMemCmpPass] Preserve Dominator Tree, if available This finishes getting rid of all the avoidable Dominator Tree recalculations in X86 optimized codegen pipeline.	2021-01-30 01:14:51 +03:00
Roman Lebedev	c2534a7097	[ShadowStackGCLowering] Preserve Dominator Tree, if avaliable This doesn't help avoid any Dominator Tree recalculations just yet, there's one more pass to go..	2021-01-30 01:14:51 +03:00
Jessica Paquette	d6656c3b25	[GlobalISel] Remove hint instructions in generic InstructionSelect code. I think every target will want to remove these in the same way. Rather than making them all implement the same code, let's just put this in InstructionSelect. Differential Revision: https://reviews.llvm.org/D95652	2021-01-29 11:20:07 -08:00
Jay Foad	5cf6412a27	[GlobalISel] Fix modifying a G_OR without notifying the observer Remove the call to setFlags in favour of creating the instruction with the correct flags in the first place, so we don't have to explicitly notify the observer. Differential Revision: https://reviews.llvm.org/D95681	2021-01-29 16:32:24 +00:00
Sjoerd Meijer	f03f3a8474	[MachineLICM] Fix wrong and confusing comment. NFC.	2021-01-29 13:39:07 +00:00
Florian Hahn	f3a710cade	[LTO] Update splitCodeGen to take a reference to the module. (NFC) splitCodeGen does not need to take ownership of the module, as it currently clones the original module for each split operation. There is an ~4 year old fixme to change that, but until this is addressed, the function can just take a reference to the module. This makes the transition of LTOCodeGenerator to use LTOBackend a bit easier, because under some circumstances, LTOCodeGenerator needs to write the original module back after codegen. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95222	2021-01-29 11:53:11 +00:00
Kazu Hirata	7925aa091d	[llvm] Populate SmallVector at construction time (NFC)	2021-01-28 22:21:14 -08:00
Wei Mi	e15ae67a0a	[LiveDebugVariables] Add cache for SkipPHIsLabelsAndDebug to prevent iterating the same PHI/LABEL/Debug instructions repeatedly. We run into a compiling timeout problem when building a target after its SampleFDO profile is updated. It is because some very large blocks with a bunch of PHIs at the beginning. LiveDebugVariables::emitDebugValues called during VirtRegRewriter phase searchs the insertion point for those large BBs repeatedly in SkipPHIsLabelsAndDebug, and each time SkipPHIsLabelsAndDebug needs to go through the same set of PHIs before it can find the first non PHI/Label/Debug instruction. This patch adds a cache to save the last position for the sequence which has been checked in the previous call of SkipPHIsLabelsAndDebug. Differential Revision: https://reviews.llvm.org/D94981	2021-01-28 21:58:17 -08:00
Christudasan Devadasan	892e4567e1	Support a list of CostPerUse values This patch allows targets to define multiple cost values for each register so that the cost model can be more flexible and better used during the register allocation as per the target requirements. For AMDGPU the VGPR allocation will be more efficient if the register cost can be associated dynamically based on the calling convention. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86836	2021-01-29 10:14:52 +05:30
Jessica Paquette	d5736a2746	[GlobalISel] Implement regbankselect for G_ASSERT_ZEXT This adds generic regbankselect support for G_ASSERT_ZEXT. It inherits whatever register bank the source was given, always, on all targets. I think that at the point where we run into these, the source register bank should be decided. This also adds some AArch64-specific code which makes sure we can handle G_ASSERT_ZEXT when deciding on register banks for G_STORE, G_PHI, ... etc. Differential Revision: https://reviews.llvm.org/D95649	2021-01-28 16:56:14 -08:00
Jessica Paquette	f19971d1de	[GlobalISel] Implement computeKnownBits for G_ASSERT_ZEXT It's the same as the ZEXT/TRUNC case, except SrcBitWidth is given by the immediate operand. Update KnownBitsTest.cpp and a MIR test for a concrete example. Differential Revision: https://reviews.llvm.org/D95566	2021-01-28 16:34:34 -08:00
Jessica Paquette	daffab1985	Recommit "[GlobalISel] Walk through hints in getDefIgnoringCopies et al" Recommit of `4580acf675` `Opc = DefMI->getOpcode()` was in the wrong place.	2021-01-28 14:43:00 -08:00
Jessica Paquette	dcb5b5f1f2	Revert "[GlobalISel] Walk through hints in getDefIgnoringCopies et al" This reverts commit `4580acf675`. Reverting while looking into some test failures.	2021-01-28 14:37:57 -08:00
Jessica Paquette	4580acf675	[GlobalISel] Walk through hints in getDefIgnoringCopies et al Treat hint instructions like G_ASSERT_ZEXT like COPY instructions in helpers which walk through copies. This ensures that instructions like G_ASSERT_ZEXT won't impact any optimizations that rely on these helpers. Differential Revision: https://reviews.llvm.org/D95577	2021-01-28 14:27:00 -08:00
Cassie Jones	f22f4557a7	[GlobalISel] Implement widenScalar for carry-in add/sub These are widened to a wider UADDE/USUBE, with the overflow value unused, and with the same synthesis of a new overflow value as for the O operations. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D95326	2021-01-28 17:06:24 -05:00
Jessica Paquette	24261729a4	[GlobalISel] Add G_ASSERT_ZEXT This adds a generic opcode which communicates that a type has already been zero-extended from a narrower type. This is intended to be similar to AssertZext in SelectionDAG. For example, ``` %x_was_extended:_(s64) = G_ASSERT_ZEXT %x, 16 ``` Signifies that the top 48 bits of %x are known to be 0. This is useful in cases like this: ``` define i1 @zeroext_param(i8 zeroext %x) { %cmp = icmp ult i8 %x, -20 ret i1 %cmp } ``` In AArch64, `%x` must use a 32-bit register, which is then truncated to a 8-bit value. If we know that `%x` is already zero-ed out in the relevant high bits, we can avoid the truncate. Currently, in GISel, this looks like this: ``` _zeroext_param: and w8, w0, #0xff ; We don't actually need this! cmp w8, #236 cset w0, lo ret ``` While SDAG does not produce the truncation, since it knows that it's unnecessary: ``` _zeroext_param: cmp w0, #236 cset w0, lo ret ``` This patch - Adds G_ASSERT_ZEXT - Adds MIRBuilder support for it - Adds MachineVerifier support for it - Documents it It also puts G_ASSERT_ZEXT into its own class of "hint instruction." (There should be a G_ASSERT_SEXT in the future, maybe a G_ASSERT_ALIGN as well.) This allows us to skip over hints in the legalizer etc. These can then later be selected like COPY instructions or removed. Differential Revision: https://reviews.llvm.org/D95564	2021-01-28 13:58:37 -08:00
David Blaikie	85b7b5625a	Fix memory leak in `4318028cd2`	2021-01-28 12:08:23 -08:00
David Blaikie	4318028cd2	DebugInfo: Add a DWARF FORM extension for addrx+offset references to reduce relocations This is an alternative to the use of complex DWARF expressions for addresses - shaving off a few extra bytes of expression overhead.	2021-01-28 10:20:02 -08:00
Shaurya Gupta	e29552c5af	Revert "[DWARF] Create subprogram's DIE in DISubprogram's unit" This reverts commit `ef0dcb5063`. This change is causing a lot of compiler crashes inside, sorry I don't have a small repro/stacktrace with symbols to share right now. Differential Revision: https://reviews.llvm.org/D95622	2021-01-28 16:39:01 +00:00
Roman Lebedev	6617529a1d	[CodeGen][DwarfEHPrepare] Preserve Dominator Tree Now that D94827 has flipped the switch, and SimplifyCFG is officially marked as production-ready regarding Dominator Tree preservation, we can update this user pass to also preserve Dominator Tree. This is a geomean compile-time win of `-0.05%`..`-0.08%`. https://llvm-compile-time-tracker.com/compare.php?from=51a25846c198cff00abad0936f975167357afa6f&to=082499aac236a5c141e50a9e77870d5be2de5f0b&stat=instructions Differential Revision: https://reviews.llvm.org/D95548	2021-01-28 14:11:34 +03:00
Tomas Matheson	b9ed8ebe0e	[ARM][RegisterScavenging] Don't consider LR liveout if it is not reloaded https://bugs.llvm.org/show_bug.cgi?id=48232 When PrologEpilogInserter writes callee-saved registers to the stack, LR is not reloaded but is instead loaded directly into PC. This was not taken into account when determining if each callee-saved register was liveout for the block. When frame elimination inserts virtual registers, and the register scavenger tries to scavenge LR, it considers it liveout and tries to spill again. However there is no emergency spill slot to use, and it fails with an error: fatal error: error in backend: Error while trying to spill LR from class GPR: Cannot scavenge register without an emergency spill slot! This patch pervents any callee-saved registers which are not reloaded (including LR) from being marked liveout. They are therefore available to scavenge without requiring an extra spill.	2021-01-28 09:22:55 +00:00
Kazu Hirata	0da15ea581	[llvm] Use append_range (NFC)	2021-01-27 23:25:41 -08:00
David Blaikie	dd7297e1bf	DebugInfo: Fix bug in addr+offset exprloc to use DWARFv5 addrx op instead of DWARFv4 GNU extension	2021-01-27 18:39:44 -08:00
Roman Lebedev	7e88942d25	[CodeGen] IndirectBrExpandPass: preserve Dominator Tree, if available This fully de-pessimizes the common case of no indirectbr's, (where we don't actually need to do anything to preserve domtree) and avoids domtree recomputation in the case there were indirectbr's. Note that two indirectbr's could have a common successor, and not all successors of an indirectbr's are meant to survive the expansion. Though, the code assumes that an indirectbr's doesn't have duplicate successors, those should have been deduplicated by simplifycfg or something already.	2021-01-28 01:58:53 +03:00
David Blaikie	7e6c87ee04	DebugInfo: Deduplicate addresses in debug_addr Experimental, using non-existent DWARF support to use an expr for the location involving an addr_index (to compute address + offset so addresses can be reused in more places). The global variable debug info had to be deferred until the end of the module (so bss variables would all be emitted first - so their labels would have the relevant section). Non-bss variables seemed to not have their label assigned to a section even at the end of the module, so I didn't know what to do there. Also, the hashing code is broken - doesn't know how to hash these expressions (& isn't hashing anything inside subprograms, which seems problematic), so for test purposes this change just skips the hash computation. (GCC's actually overly sensitive in its hash function, it seems - I'm forgetting the specific case right now - anyway, we might want to just use the frontend-known file hash and give up on optimistic .dwo/.dwp reuse)	2021-01-27 14:00:43 -08:00
Craig Topper	0b50fa9945	[FaultsMaps][llvm-objdump] Move FaultMapParser to Object/. Remove CodeGen dependency from llvm-objdump FaultsMapParser lived in CodeGen and was forcing llvm-objdump to link CodeGen and everything CodeGen depends on. This was previously attempted in r240364 to fix a link failure. The CodeGen dependency was independently added to fix the same link failure, and that ended up being kept. Removing the dependency seems like the correct layering for llvm-objdump. Reviewed By: MaskRay, jhenderson Differential Revision: https://reviews.llvm.org/D95414	2021-01-27 10:39:59 -08:00
Simon Pilgrim	5ded5ab78f	ExecutionDomainFix.cpp - use const refs in for-range loops. NFCI. Avoid unnecessary copies. Reported by clang-tidy.	2021-01-27 15:39:32 +00:00
Roman Lebedev	51a25846c1	[CodeGen] SafeStack: preserve DominatorTree if it is avaliable While this is mostly NFC right now, because only ARM happens to run this pass with DomTree available before it, and required after it, more backends will be affected once the SimplifyCFG's switch for domtree preservation is flipped, and DwarfEHPrepare also preserves the domtree.	2021-01-27 18:32:35 +03:00
Roman Lebedev	4de3bdd65f	[NFC] StackProtector: be consistent and to initialize DominatorTreeWrapperPass We already ask for it, so it might be good to ensure that it is actually initialized before us. Doesn't seem to matter in practice though.	2021-01-27 18:32:35 +03:00
Jeremy Morse	ef0dcb5063	[DWARF] Create subprogram's DIE in DISubprogram's unit This is a fix for PR48790. Over in D70350, subprogram DIEs were permitted to be shared between CUs. However, the creation of a subprogram DIE can be triggered early, from other CUs. The subprogram definition is then created in one CU, and when the function is actually emitted children are attached to the subprogram that expect to be in another CU. This breaks internal CU references in the children. Fix this by redirecting the creation of subprogram DIEs in getOrCreateContextDIE to the CU specified by it's DISubprogram definition. This ensures that the subprogram DIE is always created in the correct CU. Differential Revision: https://reviews.llvm.org/D94976	2021-01-27 12:36:14 +00:00
Sjoerd Meijer	48ecba350e	[MachineLICM][MachineSink] Move SinkIntoLoop to MachineSink. This moves SinkIntoLoop from MachineLICM to MachineSink. The motivation for this work is that hoisting is a canonicalisation transformation, but we do not really have a good story to sink instructions back if that is better, e.g. to reduce live-ranges, register pressure and spilling. This has been discussed a few times on the list, the latest thread is: https://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html There it was pointed out that we have the LoopSink IR pass, but that works on IR, lacks register pressure informatiom, and is focused on profile guided optimisations, and then we have MachineLICM and MachineSink that both perform sinking. MachineLICM is more about hoisting and CSE'ing of hoisted instructions. It also contained a very incomplete and disabled-by-default SinkIntoLoop feature, which we now move to MachineSink. Getting loop-sinking to do something useful is going to be at least a 3-step approach: 1) This is just moving the code and is almost a NFC, but contains a bug fix. This uses helper function `isLoopInvariant` that was factored out in D94082 and added to MachineLoop. 2) A first functional change to make loop-sink a little bit less restrictive, which it really is at the moment, is the change in D94308. This lets it do more (alias) analysis using functions in MachineSink, making it a bit more powerful. Nothing changes much: still off by default. But it shows that MachineSink is a better home for this, and it starts using its functionality like `hasStoreBetween`, and in the next step we can use `isProfitableToSinkTo`. 3) This is the going to be he interesting step: decision making when and how many instructions to sink. This will be driven by the register pressure, and deciding if reducing live-ranges and loop sinking will help in better performance. 4) Once we are happy with 3), this should be enabled by default, that should be the end goal of this exercise. Differential Revision: https://reviews.llvm.org/D93694	2021-01-27 10:49:56 +00:00
Jessica Paquette	f36007e811	[GlobalISel] Implement computeKnownBits for G_SEXT_INREG Just use the existing `Known.sextInReg` implementation. - Update KnownBitsTest.cpp. - Update combine-redundant-and.mir for a more concrete example. Differential Revision: https://reviews.llvm.org/D95484	2021-01-26 15:01:38 -08:00
Amara Emerson	cbed865e1e	[GlobalISel][IRTranslator] Ignore the llvm.experimental.noalias.scope.decl intrinsic. These don't generate any code.	2021-01-26 13:04:11 -08:00
Fangrui Song	34b60d8a56	Add -fbinutils-version= to gate ELF features on the specified binutils version There are two use cases. Assembler We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some features are supported by latest GNU as, but we have to use MCAsmInfo::useIntegratedAs() because the newer versions have not been widely adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26). Linker We want to use features supported only by LLD or very new GNU ld, or don't want to work around older GNU ld. We currently can't represent that "we don't care about old GNU ld". You can find such workarounds in a few other places, e.g. Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276), R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727 https://sourceware.org/bugzilla/show_bug.cgi?id=22969) Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001; GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available). This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table). This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc. It changes one codegen place in SHF_MERGE to demonstrate its usage. `-fbinutils-version=2.35` means the produced object file does not care about GNU ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced assembly can be consumed by GNU as>=2.35, but older versions may not work. `-fbinutils-version=none` means that we can use all ELF features, regardless of GNU as/ld support. Both clang and llc need `parseBinutilsVersion`. Such command line parsing is usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen), however, ClangCodeGen does not depend on LLVMCodeGen. So I add `parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget). Differential Revision: https://reviews.llvm.org/D85474	2021-01-26 12:28:23 -08:00
Freddy Ye	b3b0acdc6f	[NFC] Refine some uninitialized used variables. These warning are reported by static code analysis tool: Klocwork Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D95421	2021-01-26 16:51:05 +08:00
Amara Emerson	03bce0bf4e	[GlobalISel][Localizer] Don't localize phi operands which are used more than once in the phi. The current algorithm just tries to localize defs as far as they can go, and in the case of G_PHI operands, it clones the def into the predecessor block for each incoming edge. When multiple edges have the same register value, this can cause unnecessary code bloat, and inhibit later optimizations. This change checks if a given phi operand is unique in the phi, if not the def of that register is not localized to the predecessor. Differential Revision: https://reviews.llvm.org/D95406	2021-01-25 17:48:04 -08:00
Craig Topper	ea87cf2acd	[TargetLowering][RISCV] Don't transform (seteq/ne (sext_inreg X, VT), C1) -> (seteq/ne (zext_inreg X, VT), C1) if the sext_inreg is cheaper RISCV has to use 2 shifts for (i64 (zext_inreg X, i32)), but we can use addiw rd, rs1, x0 for sext_inreg. We already understood this when type legalizing i32 seteq/ne on rv64. But this transform in SimplifySetCC would sometimes undo it. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D95289	2021-01-25 16:37:21 -08:00
David Blaikie	70e251497c	DebugInfo: Generalize the .debug_addr minimization flag to pave the way for including other strategies	2021-01-25 16:24:35 -08:00
Mitch Phillips	c9466ede7e	Revert "Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method"" This reverts commit `554b3211fe`. Differential Revision: https://reviews.llvm.org/D95035	2021-01-25 16:22:22 -08:00
Cassie Jones	aa8f3677f7	Recommit "[AArch64][GlobalISel] Implement widenScalar for signed overflow" Implement widening for G_SADDO and G_SSUBO. Add legalize-add/sub tests for narrow overflowing add/sub on AArch64. Differential Revision: https://reviews.llvm.org/D95034	2021-01-25 16:57:20 -05:00
Fraser Cormack	fde2466171	[SelectionDAG] Support scalable-vector splats in more cases This patch adds support for scalable-vector splats in DAGCombiner's `isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions, which enable the SelectionDAG div/rem-by-constant optimizations for scalable vector types. It also fixes up one case where the UDIV optimization was generating a SETCC without first consulting the target for its preferred SETCC result type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94501	2021-01-25 10:58:15 +00:00
Fangrui Song	d745b82de1	[XRay] Support DW_TAG_call_site and delete unneeded PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL lowering	2021-01-25 00:49:18 -08:00
Fangrui Song	d5bbaaaf95	[XRay] Make __xray_customevent support non-Linux	2021-01-25 00:48:21 -08:00
QingShan Zhang	ffc3e800c6	[NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero For now, we correct the result for sqrt if iteration > 0. This doesn't make sense as they are not strict relative. Reviewed By: dmgreen, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D94480	2021-01-25 04:02:44 +00:00
Chen Zheng	0ed4cf4bf3	[PowerPC] support register pressure reduction in machine combiner. Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071	2021-01-24 21:28:21 -05:00
Kazu Hirata	16baad8f4e	[llvm] Use pop_back_val (NFC)	2021-01-24 12:18:57 -08:00
Kazu Hirata	d44ca0cf2f	[CodeGen] Forward-declare TargetMachine (NFC) InstrEmitter.h needs TargetMachine but relies on a forward declaration of TargetMachine in MachineOperand.h. This patch adds a forward declaration right in InstrEmitter.h. While we are at it, this patch removes the one in MachineOperand.h, where it is unnecessary.	2021-01-24 12:18:54 -08:00
Roger Ferrer Ibanez	d4ce062340	[RISCV][PrologEpilogInserter] "Float" emergency spill slots to avoid making them immediately unreachable from the stack pointer In RISC-V there is a single addressing mode of the form imm(reg) where imm is a signed integer of 12-bit with a range of [-2048..2047] bytes from reg. The test MultiSource/UnitTests/C++11/frame_layout of the LLVM test-suite exercises several scenarios with the stack, including function calls where the stack will need to be realigned to to a local variable having a large alignment of 4096 bytes. In situations of large stacks, the RISC-V backend (in RISCVFrameLowering) reserves an extra emergency spill slot which can be used (if no free register is found) by the register scavenger after the frame indexes have been eliminated. PrologEpilogInserter already takes care of keeping the emergency spill slots as close as possible to the stack pointer or frame pointer (depending on what the function will use). However there is a final alignment step to honour the maximum alignment of the stack that, when using the stack pointer to access the emergency spill slots, has the side effect of setting them farther from the stack pointer. In the case of the frame_layout testcase, the net result is that we do have an emergency spill slot but it is so far from the stack pointer (more than 2048 bytes due to the extra alignment of a variable to 4096 bytes) that it becomes unreachable via any immediate offset. During elimination of the frame index, many (regular) offsets of the stack may be immediately unreachable already. Their address needs to be computed using a register. A virtual register is created and later RegisterScavenger should be able to find an unused (physical) register. However if no register is available, RegisterScavenger will pick a physical register and spill it onto an emergency stack slot, while we compute the offset (restoring the chosen register after all this). This assumes that the emergency stack slot is easily reachable (this is, without requiring another register!). This is the assumption we seem to break when we perform the extra alignment in PrologEpilogInserter. We can "float" the emergency spill slots by increasing (in absolute value) their offsets from the incoming stack pointer. This way the emergency spill slots will remain close to the stack pointer (once the function has allocated storage for the stack, including the needed realignment). The new size computed in PrologEpilogInserter is padding so it should be OK to move the emergency spill slots there. Also because we're increasing the alignment, the new location should stay aligned for the purpose of the emergency spill slots. Note that this change also impacts other backends as shown by the tests. Changes are minor adjustments to the emergency stack slot offset. Differential Revision: https://reviews.llvm.org/D89239	2021-01-23 09:10:03 +00:00
Craig Topper	147c0c263d	[TargetLowering] Use isOneConstant to simplify some code. NFC	2021-01-22 19:32:19 -08:00
Stanislav Mekhanoshin	607bec0bb9	Change materializeFrameBaseRegister() to return register The only caller of this function is in the LocalStackSlotAllocation and it creates base register of class returned by the target's getPointerRegClass(). AMDGPU wants to use a different reg class here so let materializeFrameBaseRegister to just create and return whatever it wants. Differential Revision: https://reviews.llvm.org/D95268	2021-01-22 15:51:06 -08:00
Mitch Phillips	e3a7532cc9	Revert "[AArch64][GlobalISel] Implement widenScalar for signed overflow" This reverts commit `541d98efa2`. Reason: Dependent patch `3dedad475d` broke UBSan on Android: http://lab.llvm.org:8011/#/builders/77/builds/3082	2021-01-22 14:32:11 -08:00
Mitch Phillips	554b3211fe	Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method" This reverts commit `2bb92bf451`. Dependent patch broke UBSan on Android: `3dedad475d`	2021-01-22 14:32:11 -08:00
Cassie Jones	2bb92bf451	[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method The widenScalar implementation for signed and unsigned overflowing operations were very similar: both are checked by truncating the result and then re-sign/zero-extending it and checking that it matches the computed operation. Using a truncate + zero-extend for the unsigned case instead of manually producing the AND instruction like before leads to an extra copy instruction during legalization, but this should be harmless. Differential Revision: https://reviews.llvm.org/D95035	2021-01-22 14:08:46 -08:00
Simon Pilgrim	5dbe5d2c91	[DAG] Commute shuffle(splat(A,u), shuffle(C,D)) -> shuffle'(shuffle(C,D), splat(A,u)) We only merge shuffles if the inner (LHS) shuffle is a non-splat, so commute these shuffles to improve merging of multiple shuffles.	2021-01-22 11:43:18 +00:00
Craig Topper	c953a83347	[TargetLowering] Use getBoolConstant instead of assuming zero or one for boolean contents. Noticed while I was touching other nearby code. I don't have a test where this matters because the targets I work on use zero or one boolean contents. And the tests cases I've seen this fire on happen before type legalization where the result type is MVT::i1 so the distinction doesn't matter.	2021-01-22 00:26:14 -08:00
Craig Topper	5660dc5968	[TargetLowering] Simplify some code in SimplifySetCC that tries to handle SIGN_EXTEND_INREG operand types that should never happen. NFCI There was code to handle the first operand being different than the result type. And code to handle first operand having the same type as the type to extend from. This should never happen for a correctly formed SIGN_EXTEND_INREG. I've replace the code with asserts. I also noticed we created the same APInt twice so I've reused it.	2021-01-21 23:56:37 -08:00
Cassie Jones	541d98efa2	[AArch64][GlobalISel] Implement widenScalar for signed overflow Implement widening for G_SADDO and G_SSUBO. Previously it was only implemented for G_UADDO and G_USUBO. Also add legalize-add/sub tests for narrow overflowing add/sub on AArch64. Differential Revision: https://reviews.llvm.org/D95034	2021-01-21 22:55:42 -08:00
Kazu Hirata	551aaa24af	[llvm] Use isDigit (NFC)	2021-01-21 19:59:50 -08:00
Kazu Hirata	c5c4dbd279	[CodeGen] Use llvm::append_range (NFC)	2021-01-21 19:59:46 -08:00
Chen Zheng	8120cfedf5	[NFC] [TargetRegisterInfo] add another API to get srcreg through copy. Reviewed By: nemanjai, jsji Differential Revision: https://reviews.llvm.org/D92069	2021-01-21 20:10:25 -05:00
Matt Arsenault	35c535a7df	AArch64/GlobalISel: Factor out parametersInCSRMatch Make this look more like the DAG handling and move to common code. I also noticed AArch64 seems to not be properly adding the physreg:virtreg mapping to the function live ins.	2021-01-21 10:32:48 -05:00
Simon Pilgrim	69bc0990a9	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED). Add DemandedElts support inside the TRUNCATE analysis. REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d Differential Revision: https://reviews.llvm.org/D56387	2021-01-21 13:01:34 +00:00
Simon Pilgrim	935bacd3a7	[DAG] SimplifyDemandedBits - correctly adjust truncated shift amount type As noticed on D56387, for vectors we must always correctly adjust the shift amount type during truncation (not just after legalization). We were getting away with it as we currently only accepted scalars via the dyn_cast<ConstantSDNode>.	2021-01-21 12:38:36 +00:00
Simon Pilgrim	bc9ab9a5cd	[DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI. Cleanup some code to use auto* properly from cast, and use const APInt& for getAPIntValue() to avoid an unnecessary copy.	2021-01-21 11:04:09 +00:00
Luo, Yuanke	64132f541e	Revert "[X86][AMX] Fix tile config register spill issue." This reverts commit `20013d02f3`.	2021-01-21 18:11:43 +08:00
Luo, Yuanke	20013d02f3	[X86][AMX] Fix tile config register spill issue. Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber. To fix this issue, we remove the model of tile config register. We analyze the regmask of call instruction and insert ldtilecfg if there is any tile data register live across the call. Inserting the sttilecfg before the call is unneccessary, because the tile config doesn't change and we can just reload the config. Besides we also need check tile config register interference. Since we don't model the config register we should check interference from the ldtilecfg to each tile data register def. ldtilecfg / \ BB1 BB2 / \ call BB3 / \ %1=tileload %2=tilezero We can start from the instruction of each tile def, and backward to ldtilecfg. If there is any call instruction, and tile data register is not preserved, we should insert ldtilecfg after the call instruction. Differential Revision: https://reviews.llvm.org/D94155	2021-01-21 16:01:50 +08:00
Kazu Hirata	6de4865545	[llvm] Use hasSingleElement (NFC)	2021-01-20 21:35:55 -08:00
Hans Wennborg	a51226057f	Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE" It caused "Vector shift amounts must be in the same as their first arg" asserts in Chromium builds. See the code review for repro instructions. > Add DemandedElts support inside the TRUNCATE analysis. > > Differential Revision: https://reviews.llvm.org/D56387 This reverts commit `cad4275d69`.	2021-01-20 20:06:55 +01:00
Simon Pilgrim	cad4275d69	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE Add DemandedElts support inside the TRUNCATE analysis. Differential Revision: https://reviews.llvm.org/D56387	2021-01-20 15:39:58 +00:00
Amanieu d'Antras	21bfd068b3	[AArch64] Add support for the GNU ILP32 ABI Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64. The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are: - Wiring up the new target to enable ILP32 codegen and MC. - ILP32 va_list support. - ILP32 TLSDESC relocation support. There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain. This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D94143	2021-01-20 13:34:47 +00:00
Mirko Brkusanin	a6a72dfdf2	[AMDGPU][GlobalISel] Avoid selecting S_PACK with constants If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT. For that purpose we extend getConstantVRegValWithLookThrough with option to handle G_ANYEXT same way as G_SEXT. Differential Revision: https://reviews.llvm.org/D92219	2021-01-20 11:54:53 +01:00
Gabriel Hjort Åkerlund	2aeaaf841b	[GlobalISel] Add missing operand update when copy is required When constraining an operand register using constrainOperandRegClass(), the function may emit a COPY in case the provided register class does not match the current operand register class. However, the operand itself is not updated to make use of the COPY, thereby resulting in incorrect code. This patch fixes that bug by updating the machine operand accordingly. Reviewed By: dsanders Differential Revision: https://reviews.llvm.org/D91244	2021-01-20 10:32:52 +01:00
Kazu Hirata	b023cdeacc	[llvm] Use llvm::all_of (NFC)	2021-01-19 20:19:17 -08:00
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Ian Levesque	68a1f09107	[xray] Honor xray-never function-instrument attribute function-instrument=xray-never wasn't actually honored before. We were getting lucky that it worked because CodeGenFunction would omit the other xray attributes when a function was annotated with xray_never_instrument. This patch adds proper support. Differential Revision: https://reviews.llvm.org/D89441	2021-01-19 18:47:09 -05:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Jessica Paquette	cbf5246359	Fix buildbot after `cfc6073017` Windows buildbots were not happy with using find_if + instructionsWithoutDebug. In `cfc6073017`, instructionsWithoutDebug is not technically necessary. So, just iterate over the block directly. http://lab.llvm.org:8011/#/builders/127/builds/4732/steps/7/logs/stdio	2021-01-19 10:38:04 -08:00
Jessica Paquette	cfc6073017	[GlobalISel] Combine (a[0]) \| (a[1] << k1) \| ...\| (a[m] << kn) into a wide load This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`. (See D27861) This tries to recognize patterns like below (assuming a little-endian target): ``` s8* x = ... s32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) -> s32 val = ((i32)a) s8 x = ... s32 val = a[3] \| (a[2] << 8) \| (a[1] << 16) \| (a[0] << 24) -> s32 val = BSWAP(*((s32)a)) ``` (This patch also handles the big-endian target case as well, in which the first example above has a BSWAP, and the second example above does not.) To recognize the pattern, this searches from the last G_OR in the expression tree. E.g. ``` Reg Reg \ / OR_1 Reg \ / OR_2 \ Reg .. / Root ``` Each non-OR register in the tree is put in a list. Each register in the list is then checked to see if it's an appropriate load + shift logic. If every register is a load + potentially a shift, the combine checks if those loads + shifts, when OR'd together, are equivalent to a wide load (possibly with a BSWAP.) To simplify things, this patch (1) Only handles G_ZEXTLOADs (which appear to be the common case) (2) Only works in a single MachineBasicBlock (3) Only handles G_SHL as the bit twiddling to stick the small load into a specific location An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from test/CodeGen/AArch64/load-combine.ll) At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3, and a 0.4% improvement for CTMark/7zip-benchmark. Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was the first instruction in the block. Differential Revision: https://reviews.llvm.org/D94350	2021-01-19 10:24:27 -08:00
Luo, Yuanke	c535a7fdad	[X86] Fix tile spill merge issue. This is a additional bug fix for `c5be0e0cc0`. The distance for the spill instructions is wrong in previous patch. Differential Revision: https://reviews.llvm.org/D94772	2021-01-19 10:51:42 +08:00
Chen Zheng	a9b3303a88	Revert "[NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike." This reverts commit `3bdf4507b6`. Post commit comments need to be addressed first.	2021-01-18 21:33:31 -05:00
Craig Topper	79e798aca3	Recommit "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results." This recommits `2c51bef76c`. I've fixed the broken check line from when I renamed the test function. Original commit message: This builds on D94142 where scalable vectors are allowed in structs. I did have to fix one scalable vector issue in the vector type creation for these intrinsics where we used getVectorNumElements instead of ElementCount.	2021-01-18 11:08:28 -08:00
Craig Topper	5d431c3d32	Revert "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results." This reverts commit `2c51bef76c`. I seem to have messed up the check lines in the test.	2021-01-18 11:00:20 -08:00
Craig Topper	2c51bef76c	[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results. This builds on D94142 where scalable vectors are allowed in structs. I did have to fix one scalable vector issue in the vector type creation for these intrinsics where we used getVectorNumElements instead of ElementCount. Differential Revision: https://reviews.llvm.org/D94149	2021-01-18 10:41:36 -08:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Denis Antrushin	f7443905af	[Statepoint] Handle `undef` operands in statepoint. Currently when spilling statepoint register operands in FixupStatepoints we do not pay attention that it might be `undef`. We just generate a spill, which may lead to verifier error because we have a use without def. To handle it, let FixupStateponts ignore `undef` register operands completely and change them to some constant value when generating stack map. Use same value as used by ISel for this purpose (0xFEFEFEFE). Reviewed By: reames Differential Revision: https://reviews.llvm.org/D94703	2021-01-18 15:20:54 +03:00
Tres Popp	3bd24574c7	Revert "[PowerPC] support register pressure reduction in machine combiner." This reverts commit `26a396c4ef`. See https://reviews.llvm.org/D92071 for a description of the issue.	2021-01-18 12:01:57 +01:00
Simon Pilgrim	207f32948b	[DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other. Differential Revision: https://reviews.llvm.org/D94532	2021-01-18 10:29:23 +00:00
Craig Topper	cfec6cd50c	[IR] Allow scalable vectors in structs to support intrinsics returning multiple values. RISC-V would like to use a struct of scalable vectors to return multiple values from intrinsics. This woud also be needed for target independent intrinsics like llvm.sadd.overflow. This patch removes the existing restriction for this. I've modified StructType::isSized to consider a struct containing scalable vectors as unsized so the verifier won't allow loads/stores/allocas of these structs. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D94142	2021-01-17 23:29:51 -08:00
Chen Zheng	26a396c4ef	[PowerPC] support register pressure reduction in machine combiner. Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071	2021-01-17 23:56:13 -05:00
Qiu Chaofan	f776d8b12f	[Legalizer] Promote result type in expanding FP_TO_XINT This patch promotes result integer type of FP_TO_XINT in expanding. So crash in conversion from ppc_fp128 to i1 will be fixed. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92473	2021-01-18 11:56:11 +08:00
Chen Zheng	3bdf4507b6	[NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike. add one use check to lookThruCopyLike. The root node is safe to be deleted if we are sure that every definition in the copy chain only has one use. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92069	2021-01-17 19:56:42 -05:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Bjorn Pettersson	4f15556731	[LegalizeDAG] Handle NeedInvert when expanding BR_CC This is a follow-up fix to commit `03c8d6a0c4`. Seems like we now end up with NeedInvert being set in the result from LegalizeSetCCCondCode more often than in the past, so we need to handle NeedInvert when expanding BR_CC. Not sure how to deal with the "Tmp4.getNode()" case properly, but current assumption is that that code path isn't impacted by the changes in `03c8d6a0c4` so we can simply move the old assert into the if-branch and only handle NeedInvert in the else-branch. I think that the test case added here, for PowerPC, might have failed also before commit `03c8d6a0c4`. But we started to hit the assert more often downstream when having merged that commit. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94762	2021-01-16 14:33:19 +01:00
Jeroen Dobbelaere	668827b648	Introduce llvm.noalias.decl intrinsic The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a noalias scope is declared. When the intrinsic is duplicated, a decision must also be made about the scope: depending on the reason of the duplication, the scope might need to be duplicated as well. Reviewed By: nikic, jdoerfert Differential Revision: https://reviews.llvm.org/D93039	2021-01-16 09:20:45 +01:00
Kazu Hirata	8fd8ff1f67	[StringExtras] Rename SubsequentDelim to ListSeparator This patch renames SubsequentDelim to ListSeparator to clarify the purpose of the class. Differential Revision: https://reviews.llvm.org/D94649	2021-01-15 21:00:56 -08:00
Craig Topper	a9e939760c	[CodeGen] Removes unwanted optimisation for TargetConstantFP This 'FIXME' popped up in the development of an out-of-tree backend. Quick fix, but first llvm upstream patch, therefore I do not have commit rights, so if approved please commit? - Test is not included as this came up in an out-of-tree backend (if required, please hint on how to test this). Patch by simveg (Simon) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93219	2021-01-15 11:52:53 -08:00
Craig Topper	4c5066b078	[TargetLowering] Don't speculatively call ComputeNumSignBits. NFC These methods are recursive so a little costly. We only look at the result in one place in this function and it's conditional. We also only need the second call if the first had enough returned enough sign bits.	2021-01-15 09:09:35 -08:00
Simon Pilgrim	46aa3c6c33	[DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - improve shuffle(shuffle(x,y),shuffle(x,y)) merging MergeInnerShuffle currently attempts to merge shuffle(shuffle(x,y),z) patterns into a single shuffle, using 1 or 2 of the x,y,z ops. However if we already match 2 ops we might be able to handle the third op if its also a shuffle that references one of the previous ops, allowing us to handle some cases like: shuffle(shuffle(x,y),shuffle(x,y)) shuffle(shuffle(shuffle(x,z),y),z) shuffle(shuffle(x,shuffle(x,y)),z) etc. This isn't an exhaustive match and is dependent on the order the candidate ops are encountered - if one of the matched ops was a shuffle that was peek-able we don't go back and try to split that, I haven't found much need for that amount of analysis yet. This is a preliminary patch that will allow us to later improve x86 HADD/HSUB matching - but needs to be reviewed separately as its in generic code and affects existing Thumb2 tests. Differential Revision: https://reviews.llvm.org/D94671	2021-01-15 15:08:31 +00:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Kazu Hirata	9bcc0d1040	[CodeGen, Transforms] Use llvm::sort (NFC)	2021-01-14 20:30:31 -08:00
Jay Foad	868da2ea93	[SelectionDAG] Remove an early-out from computeKnownBits for smin/smax Even if we know nothing about LHS, it can still be useful to know that smax(LHS, RHS) >= RHS and smin(LHS, RHS) <= RHS. Differential Revision: https://reviews.llvm.org/D87145	2021-01-14 18:15:17 +00:00
Jay Foad	517196e569	[Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC. Differential Revision: https://reviews.llvm.org/D94588	2021-01-14 14:02:43 +00:00
Jay Foad	a1cba5b7a1	[SelectionDAG] Make use of KnownBits::commonBits. NFC. Differential Revision: https://reviews.llvm.org/D94587	2021-01-14 14:02:43 +00:00
Simon Pilgrim	7c30c05ff7	[DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - reset shuffle ops and reorder early-out and second op matching. NFCI. I'm hoping to reuse MergeInnerShuffle in some other folds - so ensure the candidate ops/mask are reset at the start of each run. Also, move the second op matching before bailing to make it simpler to try to match other things afterward.	2021-01-14 11:55:20 +00:00
Simon Pilgrim	af8d27a7a8	[DAG] visitVECTOR_SHUFFLE - pull out shuffle merging code into lambda helper. NFCI. Make it easier to reuse in a future patch.	2021-01-14 11:05:19 +00:00
David Stuttard	259936f491	[NFC][AsmPrinter] Windows warning: Use explicit cast static_cast for uint64_t to unsigned gives a MS VC build warning for Windows: warning C4309: 'static_cast': truncation of constant value Use an explicit cast instead. Change-Id: I692d335b4913070686a102780c1fb05b893a2f69 Differential Revision: https://reviews.llvm.org/D94592	2021-01-14 09:10:31 +00:00
Kazu Hirata	125ea20d55	[llvm] Use llvm::stable_sort (NFC)	2021-01-13 19:14:43 -08:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Simon Pilgrim	993c488ed2	[DAG] visitVECTOR_SHUFFLE - use all_of to check for all-undef shuffle mask. NFCI.	2021-01-13 17:19:41 +00:00
Matt Arsenault	d55d592a92	GlobalISel: Do not set observer of MachineIRBuilder in LegalizerHelper This fixes double printing of insertion debug messages in the legalizer. Try to cleanup usage of observers. Currently the use of observers is pretty hard to follow and it's not clear what is responsible for them. Observers are referenced in 3 places: 1. In the MachineFunction 2. In the MachineIRBuilder 3. In the LegalizerHelper The observers in the MachineFunction and MachineIRBuilder are both called only on insertions, and are redundant with each other. The source of the double printing was the same observer was added to both the MachineFunction, and the MachineIRBuilder. One of these references needs to be removed. Arguably observers in general should be fully removed from one or the other, but it may be useful to have a local observer in the MachineIRBuilder that is not added to the function's observers. Alternatively, the wrapper observer could manage a local observer in one place. The LegalizerHelper only ever calls the observer on changing/changed instructions, and never insertions. Logically these are two different types of observers, for changes and for insertions. Additionally, some places used the GISelObserverWrapper when they only needed a single observer they could use directly. Setting the observer in the LegalizerHelper constructor is not flexible enough if the LegalizerHelper is constructed anywhere outside the one used by the legalizer. AMDGPU calls the LegalizerHelper in RegBankSelect, and needs to use a local observer to apply the regbank to newly created instructions. Currently it accomplishes this by constructing a local MachineIRBuilder. I'm trying to move the MachineIRBuilder to be owned/maintained by the RegBankSelect pass itself, but the locally constructed LegalizerHelper would reset the observer. Mips also has a special case use of the LegalizationArtifactCombiner in applyMappingImpl; I think we do need to run the artifact combiner during RegBankSelect, but in a more consistent way outside of applyMappingImpl.	2021-01-13 10:44:31 -05:00
Kerry McLaughlin	2170e0ee60	[SVE][CodeGen] CTLZ, CTTZ & CTPOP operations (predicates) Canonicalise the following operations in getNode() for predicate types: - CTLZ(Pred) -> bitwise_NOT(Pred) - CTTZ(Pred) -> bitwise_NOT(Pred) - CTPOP(Pred) -> Pred Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94428	2021-01-13 12:24:54 +00:00
Serguei Katkov	8f8c207b8f	[Verifier] Add tied-ness verification to statepoint intsruction Reviewers: reames, dantrushin Reviewed By: reames, dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D94483	2021-01-13 14:40:44 +07:00
Kazu Hirata	2c2d489b78	[CodeGen] Remove unused function isRegLiveInExitBlocks (NFC) The last use was removed on Jan 17, 2020 in commit `42350cd893`.	2021-01-12 21:43:48 -08:00
Kazu Hirata	12fc9ca3a4	[llvm] Remove redundant string initialization (NFC) Identified with readability-redundant-string-init.	2021-01-12 21:43:46 -08:00
Serguei Katkov	fba9805ba3	[Verifier] Extend statepoint verifier to cover more constants Also old mir tests are updated to meet last changes in STATEPOINT format. Reviewers: reames, dantrushin Reviewed By: reames, dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D94482	2021-01-13 11:51:48 +07:00
Serguei Katkov	157efd84ab	[Statepoint Lowering] Add an option to allow use gc values in regs for landing pad Default value is not changed, so it is NFC actually. The option allows to use gc values on registers in landing pads. Reviewers: reames, dantrushin Reviewed By: reames, dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D94469	2021-01-13 11:39:34 +07:00
Serguei Katkov	f454c9f102	[InlineSpiller] Re-tie operands if folding failed InlineSpiller::foldMemoryOperand unties registers before an attempt to fold and does not restore tied-ness in case of failure. I do not have a particular test for demo of invalid behavior. This is something of clean-up. It is better to keep the behavior correct in case some time in future it happens. Reviewers: reames, dantrushin Reviewed By: dantrushin, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D94389	2021-01-13 10:31:43 +07:00
Juneyoung Lee	25eb7b08ba	[DAGCombiner] Fold BRCOND(FREEZE(COND)) to BRCOND(COND) This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 . When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted. It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag. The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however. This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`. To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB. It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however. I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction). Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB. Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D92015	2021-01-13 09:36:52 +09:00
Craig Topper	03c8d6a0c4	[LegalizeDAG][RISCV][PowerPC][AMDGPU][WebAssembly] Improve expansion of SETONE/SETUEQ on targets without SETO/SETUO. If SETO/SETUO aren't legal, they'll be expanded and we'll end up with 3 comparisons. SETONE is equivalent to (SETOGT \|\| SETOLT) so if one of those operations is supported use that expansion. We don't need both since we can commute the operands to make the other. SETUEQ can be implemented with !(SETOGT \|\| SETOLT) or (SETULE && SETUGE). I've only implemented the first because it didn't look like most of the affected targets had legal SETULE/SETUGE. Reviewed By: frasercrmck, tlively, nemanjai Differential Revision: https://reviews.llvm.org/D94450	2021-01-12 10:45:03 -08:00
Jay Foad	f264f9ad7d	[SlotIndexes] Fix and simplify basic block splitting Remove the InsertionPoint argument from SlotIndexes::insertMBBInMaps because it was confusing: what does it mean to insert a new block between two instructions, in the middle of an existing block? Instead, support the case that MachineBasicBlock::splitAt really needs, where the new block contains some instructions that are already in the maps because they have been moved there from the tail of the previous block. In all other use cases the new block is empty. Based on work by Carl Ritson! Differential Revision: https://reviews.llvm.org/D94311	2021-01-12 10:50:14 +00:00
Sander de Smalen	c8a914db5c	[LiveDebugValues] Fix comparison operator in VarLocBasedImpl The issue was introduced in commit rG84a1120943a651184bae507fed5d648fee381ae4 and would cause a VarLoc's StackOffset to be compared with its own, instead of the StackOffset from the other VarLoc. This patch fixes that.	2021-01-12 08:44:58 +00:00
Craig Topper	df74c001fa	[DAGCombiner] Replace static helper function isConstantFPBuildVectorOrConstantFP with the identical version in SelectionDAG. NFC	2021-01-11 23:41:40 -08:00
Craig Topper	f9ef3a6003	[SelectionDAG] Make isConstantIntBuildVectorOrConstantInt and isConstantFPBuildVectorOrConstantFP methods const.	2021-01-11 23:26:53 -08:00
Craig Topper	b1c304c494	[CodeGen] Try to make the print of memory operand alignment a little more user friendly. Memory operands store a base alignment that does not factor in the effect of the offset on the alignment. Previously the printing code only printed the base alignment if it was different than the size. If there is an offset, the reader would need to figure out the effective alignment themselves. This has confused me before and someone else was recently confused on IRC. This patch prints the possibly offset adjusted alignment if it is different than the size. And prints the base alignment if it is different than the alignment. The MIR parser has been updated to read basealign in addition to align. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94344	2021-01-11 19:58:47 -08:00
David Stuttard	5464baaae8	Fix minor build issue (NFC) Change [x86] Fix tile register spill issue was causing problems for our build using gcc-5.4.1 The problem was caused by this line: for (const MachineInstr &MI : make_range(MIS.begin(), MI)) where MI was previously defined as a MachineBasicBlock iterator. Differential Revision: https://reviews.llvm.org/D94415	2021-01-11 11:24:09 -08:00
Paul Robinson	1f9c29228c	[FastISel] NFC: Clean up unnecessary bookkeeping Now that we flush the local value map for every instruction, we don't need any extra flushes for specific cases. Also, LastFlushPoint is not used for anything. Follow-ups to #c161665 (D91734). This reapplies #3fd39d3. Differential Revision: https://reviews.llvm.org/D92338	2021-01-11 09:40:39 -08:00
Paul Robinson	be179b9946	[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option This option is not used for anything after #c161665 (D91737). This commit reapplies #a474657.	2021-01-11 09:32:49 -08:00
Paul Robinson	c161775dec	[FastISel] Flush local value map on every instruction Local values are constants or addresses that can't be folded into the instruction that uses them. FastISel materializes these in a "local value" area that always dominates the current insertion point, to try to avoid materializing these values more than once (per block). https://reviews.llvm.org/D43093 added code to sink these local value instructions to their first use, which has two beneficial effects. One, it is likely to avoid some unnecessary spills and reloads; two, it allows us to attach the debug location of the user to the local value instruction. The latter effect can improve the debugging experience for debuggers with a "set next statement" feature, such as the Visual Studio debugger and PS4 debugger, because instructions to set up constants for a given statement will be associated with the appropriate source line. There are also some constants (primarily addresses) that could be produced by no-op casts or GEP instructions; the main difference from "local value" instructions is that these are values from separate IR instructions, and therefore could have multiple users across multiple basic blocks. D43093 avoided sinking these, even though they were emitted to the same "local value" area as the other instructions. The patch comment for D43093 states: Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. This patch undoes most of D43093, and instead flushes the local value map after() every IR instruction, using that instruction's debug location. This avoids sometimes incorrect locations used previously, and emits instructions in a more natural order. In addition, constants materialized due to PHI instructions are not assigned a debug location immediately; instead, when the local value map is flushed, if the first local value instruction has no debug location, it is given the same location as the first non-local-value-map instruction. This prevents PHIs from introducing unattributed instructions, which would either be implicitly attributed to the location for the preceding IR instruction, or given line 0 if they are at the beginning of a machine basic block. Neither of those consequences is good for debugging. This does mean materialized values are not re-used across IR instruction boundaries; however, only about 5% of those values were reused in an experimental self-build of clang. () Actually, just prior to the next instruction. It seems like it would be cleaner the other way, but I was having trouble getting that to work. This reapplies commits `cf1c774d` and `dc35368c`, and adds the modification to PHI handling, which should avoid problems with debugging under gdb. Differential Revision: https://reviews.llvm.org/D91734	2021-01-11 08:32:36 -08:00
Joe Ellis	007358239d	[DAGCombiner] Use getVectorElementCount inside visitINSERT_SUBVECTOR This avoids TypeSize-/ElementCount-related warnings. Differential Revision: https://reviews.llvm.org/D92747	2021-01-11 14:15:11 +00:00
Luo, Yuanke	c5be0e0cc0	[X86] Fix tile register spill issue. The tile register spill need 2 instructions. %46:gr64_nosp = MOV64ri 64 TILESTORED %stack.2, 1, killed %46:gr64_nosp, 0, $noreg, %43:tile The first instruction load the stride to a GPR, and the second instruction store tile register to stack slot. The optimization of merge spill instruction is done after register allocation. And spill tile register need create a new virtual register to for stride, so we can't hoist tile spill instruction in postOptimization() of register allocation. We can't hoist TILESTORED alone and we can't hoist the 2 instuctions together because MOV64ri will clobber some GPR. This patch is to disble the spill merge for any spill which need 2 instructions. Differential Revision: https://reviews.llvm.org/D93898	2021-01-11 18:35:09 +08:00
Hsiangkai Wang	5e476061de	[NFC][AsmPrinter] Make comments for spill/reload more precise. The size of spill/reload may be unknown for scalable vector types. When the size is unknown, print it as "Unknown-size" instead of a very large number. Differential Revision: https://reviews.llvm.org/D94299	2021-01-11 15:00:27 +08:00
QingShan Zhang	7539c75bb4	[DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent. As the direction is to remove this global option, we need to remove the unsafe-fp-math check for sqrt and update the test with afn fast-math flags. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D93891	2021-01-11 02:25:53 +00:00
Kazu Hirata	407b1e65a4	[StringExtras] Add a helper class for comma-separated lists This patch introduces a helper class SubsequentDelim to simplify loops that generate a comma-separated lists. For example, consider the following loop, taken from llvm/lib/CodeGen/MachineBasicBlock.cpp: for (auto I = pred_begin(), E = pred_end(); I != E; ++I) { if (I != pred_begin()) OS << ", "; OS << printMBBReference(I); } The new class allows us to rewrite the loop as: SubsequentDelim SD; for (auto I = pred_begin(), E = pred_end(); I != E; ++I) OS << SD << printMBBReference(I); where SD evaluates to the empty string for the first time and ", " for subsequent iterations. Unlike interleaveComma, defined in llvm/include/llvm/ADT/STLExtras.h, SubsequentDelim can accommodate a wider variety of loops, including: - those that conditionally skip certain items, - those that need iterators to call getSuccProbability(I), and - those that iterate over integer ranges. As an example, this patch cleans up MachineBasicBlock::print. Differential Revision: https://reviews.llvm.org/D94377	2021-01-10 14:32:02 -08:00
Kazu Hirata	e3d3dbd339	[llvm] Ensure newlines at the end of files (NFC) This patch eliminates pesky "No newline at end of file" messages from git diff.	2021-01-10 09:24:57 -08:00
Kazu Hirata	9850d3b10a	[CodeGen, DebugInfo] Use llvm::find_if (NFC)	2021-01-10 09:24:53 -08:00
Juneyoung Lee	9f2d9364b0	[CodeGen] Update transformations to use poison for shufflevector/insertelem's initial vector elem This patch is a part of D93817 and makes transformations in CodeGen use poison for shufflevector/insertelem's initial vector element. The change in CodeGenPrepare.cpp is fine because the mask of shufflevector should be always zero. It doesn't touch the second element (which is poison). The change in InterleavedAccessPass.cpp is also fine becauses the mask is of the form <a, a+m, a+2m, .., a+km> where a+km is smaller than the size of the first vector operand. This is guaranteed by the caller of replaceBinOpShuffles, which is lowerInterleavedLoad. It calls isDeInterleaveMask and isDeInterleaveMaskOfFactor to check the mask is the desirable form. isDeInterleaveMask has the check that a+km is smaller than the vector size. To check my understanding, I added an assertion & added a test to show that this optimization doesn't fire in such case. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D94056	2021-01-10 18:03:51 +09:00
Fraser Cormack	41d06095b0	[SelectionDAG] Teach isConstOrConstSplat about ISD::SPLAT_VECTOR This improves llvm::isConstOrConstSplat by allowing it to analyze ISD::SPLAT_VECTOR nodes, in order to allow more constant-folding of operations using scalable vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94168	2021-01-09 20:54:34 +00:00
Kazu Hirata	6a6e382161	[llvm] Drop unnecessary make_range (NFC)	2021-01-09 09:25:00 -08:00
Fraser Cormack	de373ef779	[SelectionDAG] Extend immAll(Ones\|Zeros)V to handle ISD::SPLAT_VECTOR The TableGen immAllOnesV and immAllZerosV helpers implicitly wrapped the ISD::isBuildVectorAll(Ones\|Zeros) helper functions. This was inhibiting their use for targets such as RISC-V which use ISD::SPLAT_VECTOR. In particular, RISC-V had to define its own 'vnot' fragment. In order to extend the scope of these nodes to include support for ISD::SPLAT_VECTOR, two new ISD predicate functions have been introduced: ISD::isConstantSplatVectorAll(Ones\|Zeros). These effectively supersede the older "isBuildVector" predicates, which are now simple wrappers for the new functions. They pass a defaulted boolean toggle which preserves the old behaviour. It is hoped that in time all call-sites can be ported to the "isConstantSplatVector" functions. While the use of ISD::isBuildVectorAll(Ones\|Zeros) has not changed, the behaviour of the TableGen immAll(Ones\|Zeros)V has. To test the new functionality, the custom RISC-V TableGen fragment has been removed and replaced with the built-in 'vnot'. To test their use as pattern-roots, two splat patterns have been updated accordingly. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94223	2021-01-09 17:05:31 +00:00
Heejin Ahn	52e240a072	[WebAssembly] Remove exnref and br_on_exn This removes `exnref` type and `br_on_exn` instruction. This is effectively NFC because most uses of these were already removed in the previous CLs. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94041	2021-01-09 02:02:54 -08:00
Heejin Ahn	9e4eadeb13	[WebAssembly] Update basic EH instructions for the new spec This implements basic instructions for the new spec. - Adds new versions of instructions: `catch`, `catch_all`, and `rethrow` - Adds support for instruction selection for the new instructions - `catch` needs a custom routine for the same reason `throw` needs one, to encode `__cpp_exception` tag symbol. - Updates `WebAssembly::isCatch` utility function to include `catch_all` and Change code that compares an instruction's opcode with `catch` to use that function. - LateEHPrepare - Previously in LateEHPrepare we added `catch` instruction to both `catchpad`s (for user catches) and `cleanuppad`s (for destructors). In the new version `catch` is generated from `llvm.catch` intrinsic in instruction selection phase, so we only need to add `catch_all` to the beginning of cleanup pads. - `catch` is generated from instruction selection, but we need to hoist the `catch` instruction to the beginning of every EH pad, because `catch` can be in the middle of the EH pad or even in a split BB from it after various code transformations. - Removes `addExceptionExtraction` function, which was used to generate `br_on_exn` before. - CFGStackfiy: Deletes `fixUnwindMismatches` function. Running this function on the new instruction causes crashes, and the new version will be added in a later CL, whose contents will be completely different. So deleting the whole function will make the diff easier to read. - Reenables all disabled tests in exception.ll and eh-lsda.ll and a single basic test in cfg-stackify-eh.ll. - Updates existing tests to use the new assembly format. And deletes `br_on_exn` instructions from the tests and FileCheck lines. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94040	2021-01-09 01:48:06 -08:00
Heejin Ahn	9724c3cff4	[WebAssembly] Update WasmEHPrepare for the new spec Clang generates `wasm.get.exception` and `wasm.get.ehselector` intrinsics, which respectively return a caught exception value (a pointer to some C++ exception struct) and a selector (an integer value that tells which C++ `catch` clause the current exception matches, or does not match any). WasmEHPrepare is a pass that does some IR-level preparation before instruction selection. Previously one of things we did in this pass was to convert `wasm.get.exception` intrinsic calls to `wasm.extract.exception` intrinsics. Their semantics were the same except `wasm.extract.exception` did not have a token argument. We maintained these two separate intrinsics with the same semantics because instruction selection couldn't handle token arguments. This `wasm.extract.exception` intrinsic was later converted to `extract_exception` instruction in instruction selection, which was a pseudo instruction to implement `br_on_exn`. Because `br_on_exn` pushed an extracted value onto the value stack after the `end` instruction of a `block`, but LLVM does not have a way of modeling that kind of behavior, so this pseudo instruction was used to pull an extracted value out of thin air, like this: ``` block $l0 ... br_on_exn $cpp_exception $l0 ... end extract_exception ;; pushes values onto the stack ``` In the new spec, we don't need this pseudo instruction anymore because `catch` itself returns a value and we don't have `br_on_exn` anymore. In the spec `catch` returns multiple values (like `br_on_exn`), but here we assume it only returns a single i32, which is sufficient to support C++. So this renames `wasm.get.exception` intrinsic to `wasm.catch`. Because this CL does not yet contain instruction selection for `wasm.catch` intrinsic, all `RUN` lines in exception.ll, eh-lsda.ll, and cfg-stackify-eh.ll, and a single `RUN` line in wasm-eh.cpp (which is an end-to-end test from C++ source to assembly) fail. So this CL temporarily disables those `RUN` lines, and for those test files without any valid remaining `RUN` lines, adds a dummy `RUN` line to make them pass. These tests will be reenabled in later CLs. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94039	2021-01-08 23:38:26 -08:00
Kazu Hirata	b7c5e0b02c	[Target, Transforms] Use *Set::contains (NFC)	2021-01-08 18:39:54 -08:00
Heejin Ahn	7be271537e	[WebAssembly] Rename wasm_rethrow_in_catch intrinsic/builtin `wasm_rethrow_in_catch` intrinsic and builtin are used in order to rethrow an exception when the exception is caught but there is no matching clause within the current `catch`. For example, ``` try { foo(); } catch (int n) { ... } ``` If the caught exception does not correspond to C++ `int` type, it should be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch` because at the time I thought there would be another intrinsic for C++'s `throw` keyword, which rethrows an exception. It turned out that `throw` keyword doesn't require wasm's `rethrow` instruction, so we rename `rethrow_in_catch` to just `rethrow` here. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94038	2021-01-08 06:55:04 -08:00
Simon Moll	611d3c63f3	[VP] ISD helper functions [VE] isel for vp_add, vp_and This implements vp_add, vp_and for the VE target by lowering them to the VVP_* layer. We also add helper functions for VP SDNodes (isVPSDNode, getVPMaskIdx, getVPExplicitVectorLengthIdx). Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D93766	2021-01-08 14:29:45 +01:00
Sjoerd Meijer	8af859d514	[MachineLoop] New helper isLoopInvariant() This factors out code from MachineLICM that determines whether an instruction is loop-invariant, which is a generally useful function. Thus this allows to use that helper elsewhere too. Differential Revision: https://reviews.llvm.org/D94082	2021-01-08 09:04:56 +00:00
Christudasan Devadasan	ae25a397e9	AMDGPU/GlobalISel: Enable sret demotion	2021-01-08 10:56:35 +05:30
Kazu Hirata	8febb2e0f5	[CodeGen] Remove unused function isCallerPreservedOrConstPhysReg (NFC) The last use of the function was removed on Oct 20, 2018 in commit `8d6ff4c0af`.	2021-01-07 20:29:32 -08:00

... 4 5 6 7 8 ...

30512 Commits