llvm-project

Commit Graph

Author	SHA1	Message	Date
Nick Desaulniers	649b11ef8b	git-clang-format HEAD~	2022-01-10 18:34:30 -08:00
Nick Desaulniers	301e911740	[TargetLowering] precommit refactor from D115688 NFC Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>	2022-01-10 18:32:13 -08:00
Mircea Trofin	b191c1f0f9	[NFC][regalloc] Pull out some AllocationOrder/CostPerUseLimit eviction logic We are reusing that logic in the ML implementation. Differential Revision: https://reviews.llvm.org/D116075	2022-01-10 15:47:31 -08:00
Nadav Rotem	e2cc091a7d	Fix a missed opportunity to merge stores. This commit fixes a missed opportunity in merging consecutive stores. The code that searches for stores skipped the case of stores that directly connect to the root. The comment above the implementation lists this case but the code did not handle it. I found this pattern when looking into the shared_ptr destructor. GCC generates the right sequence. Here is a small repo: int foo(int* buff) { buff[0] = 0; int x = buff[1]; buff[1] = 0; return x; } Differential Revision: https://reviews.llvm.org/D116895	2022-01-10 13:49:02 -08:00
Mircea Trofin	e121269131	[NFC][regalloc] Pass RAGreedy to eviction adviser This patch simplifies the interface between RAGreedy and the eviction adviser by passing the allocator to the adviser, which allows the latter to extract needed information as needed, rather than requiring it be passed piecemeal at construction time (which would also complicate later evolution). Part of this, the patch also moves ExtraRegInfo back to RAGreedy. We keep the encapsulation of ExtraRegInfo because it has benefits (e.g. improved readability by abstracting access to the cascade info) and also simpler re-initialization at regalloc pass re-entry time (we just flush the Optional). Differential Revision: https://reviews.llvm.org/D116669	2022-01-10 11:55:16 -08:00
Matt Arsenault	0ba4e4b500	GlobalISel: Pass DebugLoc to getFunctionLiveInPhysReg Fixes crash in assertion about dropping debug info.	2022-01-10 13:50:52 -05:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Chen Zheng	2c46ca96e2	[PowerPC] fast isel can lower intrinsics call on AIX. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D114778	2022-01-10 02:30:05 +00:00
Craig Topper	a500f7f48f	[SelectionDAG] Add FP_TO_UINT_SAT/FP_TO_SINT_SAT to computeKnownBits/computeNumSignBits. These nodes should saturate to their saturating VT. We can use this information to know the bits past the VT are all zeros or all sign bits. I think we might only have test coverage for the unsigned case. I'll verify and add tests. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D116870	2022-01-09 17:48:05 -08:00
Alexander Shaposhnikov	22430ede7e	[CodeGen] Rename emitCalleeSavedFrameMoves This diff renames emitCalleeSavedFrameMoves to avoid conflicts with non-virtual methods of derived classes having the same name but different semantics. E.g. the class AArch64FrameLowering used to have (non-virtual) "emitCalleeSavedFrameMoves" but it started to override TargetFrameLowering::emitCalleeSavedFrameMoves after https://github.com/llvm/llvm-project/commit/c3e6555616 though its usage and semantics didn't change. P.S. for x86 there was no conflict because the signature of non-virtual X86FrameLowering::emitCalleeSavedFrameMoves is different Test plan: make check-all Differential revision: https://reviews.llvm.org/D114140	2022-01-10 01:33:04 +00:00
Jay Foad	50fb44eebb	[GlobalISel] Use getPreferredShiftAmountTy in one more G_UBFX combine Change CombinerHelper::matchBitfieldExtractFromShrAnd to use getPreferredShiftAmountTy for the shift-amount-like operands of G_UBFX just like all the other G_[SU]BFX combines do. This better matches the AMDGPU legality rules for these instructions. Differential Revision: https://reviews.llvm.org/D116803	2022-01-08 09:20:44 +00:00
Jay Foad	ff971873b3	[GlobalISel] Fix legality checks for G_UBFX combines 1. Fix CombinerHelper::matchBitfieldExtractFromAnd to check legality with the correct types for the G_UBFX that it builds. 2. Fix AMDGPUTargetLowering::isConstantUnsignedBitfieldExtractLegal to match the legality rules: result and first operand can be s32 or s64 but the "shift amount" operands are always s32. 3. Add AMDGPU tests where the post-legalizer combiner would create illegal MIR without the above fixes. Differential Revision: https://reviews.llvm.org/D116802	2022-01-08 09:20:44 +00:00
Kazu Hirata	4e2ec7e38d	[llvm] Remove unused forward declarations (NFC)	2022-01-07 20:00:34 -08:00
Kazu Hirata	b932bdf59f	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-07 17:45:09 -08:00
Jay Foad	3f3fe4a5cf	[GlobalISel] Fix typo Extact to Extract in function name. NFC.	2022-01-07 11:13:35 +00:00
Nikita Popov	0312fe2901	[CodeGen] Support opaque pointers for inline asm This is the last part of D116531. Fetch the type of the indirect inline asm operand from the elementtype attribute, rather than the pointer element type. Fixes https://github.com/llvm/llvm-project/issues/52928.	2022-01-07 10:57:38 +01:00
Nikita Popov	e4d1779990	[IR] Add ConstraintInfo::hasArg() helper (NFC) Checking whether a constraint corresponds to an argument is a recurring pattern.	2022-01-07 10:44:38 +01:00
Victor Perez	38efa68b08	[LegalizeTypes][VP] Add splitting support for vp.select Split vp.select in a similar way as vselect, splitting also the length parameter. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116651	2022-01-07 08:46:01 +00:00
Kazu Hirata	2aed08131d	[llvm] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-07 00:39:14 -08:00
Kazu Hirata	410480e32b	Ensure newlines at the end of files (NFC)	2022-01-06 23:44:02 -08:00
Arlo Siemsen	3d10997e42	Add Rust to CodeView SourceLanguage (CV_CFL_LANG) enum Microsoft has added several new entries to the CV_CFL_LANG enum, including Rust: https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/cv-cfl-lang This change adds Rust to the corresponding LLVM enum and translates `dwarf::DW_LANG_Rust` to `SourceLanguage::Rust` in the CodeView AsmPrinter. This means that Rust will no longer emit as Masm. Differential Revision: https://reviews.llvm.org/D115300	2022-01-06 14:27:08 -08:00
Mircea Trofin	68ac7b1701	[NFC][mlgo] Add feature declarations for the ML regalloc advisor This just adds feature declarations and some boilerplate. Differential Revision: https://reviews.llvm.org/D116076	2022-01-05 11:54:01 -08:00
David Green	fffd663c87	[CodeGen] Initialize MaxBytesForAlignment in TargetLoweringBase::TargetLoweringBase. This appears to be missing from D114590, causing sanitizer errors.	2022-01-05 19:34:27 +00:00
Luís Ferreira	34435fd105	[llvm] Add support for DW_TAG_immutable_type Added documentation about DW_TAG_immutable_type too. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D113633	2022-01-05 19:17:08 +00:00
Craig Topper	88ecdd30f6	[LegalizeTypes] Remove IsVP argument from type legalization methods. NFC We can either check the opcode or number of operands or use ISD::isVPOpcode inside the methods. In some places I've used number of operands figuring that it is cheaper than isVPOpcode. I've included isVPOpcode in an assert to verify. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D116578	2022-01-05 09:00:48 -08:00
Nicholas Guy	73d92faa2f	[CodeGen] Emit alignment "Max Skip" operand The current AsmPrinter has support to emit the "Max Skip" operand (the 3rd of .p2align), however has no support for it to actually be specified. Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a per-block basis. Leaving the value as default (0) causes no observable differences in behaviour. Differential Revision: https://reviews.llvm.org/D114590	2022-01-05 12:54:30 +00:00
Victor Perez	96e220e688	[LegalizeTypes][VP] Add integer promotion support for vp.select Promote select, vselect and vp.select in a similar way. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116400	2022-01-05 11:01:52 +00:00
Victor Perez	df5226dfb3	[LegalizeTypes][VP] Add widening support for vp.select Widen vp.select the same way as select and vselect. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116407	2022-01-05 09:21:11 +00:00
Craig Topper	a04b532505	[LegalizeIntegerTypes][RISCV] Teach PromoteSetCCOperands to check sign bits of unsigned compares. Unsigned compares work with either zero extended or sign extended inputs just like equality comparisons. I didn't allow this when I refactored the code in D116421 due to lack of tests. But I've since found a simple C test case that demonstrates when this can be useful. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D116617	2022-01-04 12:38:47 -08:00
Jack Andersen	5b1337184b	[DebugInfo] Avoid triggering global location assert for 2-byte pointer sizes. D111404 moved a 4/8 byte check assert into a block taken by 2-byte platforms. Since these platforms do not take the branches where the pointer size is used, sink the assert accordingly. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D116480	2022-01-04 15:16:36 -05:00
Michael Liao	56ec762a76	[regalloc] Fix GCC warning `-Wattributes`. NFC. - Mark it with LLVM_LIBRARY_VISIBILITY to preserve the legacy visibility.	2022-01-04 12:05:57 -05:00
Mircea Trofin	64e56f8356	[NFC] Expose isRematerializable and copyHint from CalcSpillWeights We need to reuse them for the ML regalloc eviction advisor, as we 'explode' the weight calculation into sub-features. Differential Revision: https://reviews.llvm.org/D116074	2022-01-04 08:11:49 -08:00
Mircea Trofin	c41610778b	[NFC][regalloc] Introduce RegAllocGreedy.h This was suggested in D114831. It should simplify the relation between eviction advisor and the allocator, and simplify ingesting more features tied to the internals of the allocator, in the future. This change simply pulls out RAGreedy, places it in the llvm namespace, and cleans up a bit the includes in the new header file. Differential Revision: https://reviews.llvm.org/D116114	2022-01-04 08:04:55 -08:00
Simon Moll	4c2aba999e	[VP][ISel] use LEGALPOS for legalization action Use the VPIntrinsics.def's LEGALPOS that is specified with every VP SDNode to determine which return or operand value type shall be used to infer the legalization action. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D116594	2022-01-04 14:50:49 +01:00
Simon Pilgrim	882c083889	[DAG] TargetLowering::SimplifySetCC - use APInt::getMinSignedBits() helper. NFC.	2022-01-04 13:48:36 +00:00
Nikita Popov	4ef560ec60	[ELF] Handle .init_array prefix consistently Currently, the code in TargetLoweringObjectFile only assigns @init_array section type to plain .init_array sections, but not prioritized sections like .init_array.00001. This is inconsistent with the interpretation in the AsmParser (see `791523bae6/llvm/lib/MC/MCParser/ELFAsmParser.cpp (L621-L632)`) and upcoming expectations in LLD (see https://github.com/rust-lang/rust/issues/92181 for context). This patch assigns @init_array section type to all sections with an .init_array prefix. The same is done for .fini_array and .preinit_array as well. With that, the logic matches the AsmParser. Differential Revision: https://reviews.llvm.org/D116528	2022-01-04 09:42:58 +01:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Kazu Hirata	e5947760c2	Revert "[llvm] Remove redundant member initialization (NFC)" This reverts commit `fd4808887e`. This patch causes gcc to issue a lot of warnings like: warning: base class ‘class llvm::MCParsedAsmOperand’ should be explicitly initialized in the copy constructor [-Wextra]	2022-01-03 11:28:47 -08:00
Victor Perez	5527139302	[RISCV][VP] Add RVV codegen for [nX]vXi1 vp.select Expand [nX]vXi1 vp.select the same way as [nX]vXi1 vselect. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115546	2022-01-02 23:12:32 -08:00
Kazu Hirata	7e163afd9e	Remove redundant void arguments (NFC) Identified by modernize-redundant-void-arg.	2022-01-02 10:20:19 -08:00
Kazu Hirata	fd4808887e	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-01 16:18:18 -08:00
Kazu Hirata	69ccc96162	[llvm] Use the default constructor for SDValue (NFC)	2022-01-01 10:36:59 -08:00
Craig Topper	243b7aaf51	[SelectionDAG] Use KnownBits::countMinSignBits() to simplify the end of ComputeNumSignBits. This matches what is done in ValueTracking.cpp Reviewed By: RKSimon, foad Differential Revision: https://reviews.llvm.org/D116423	2021-12-31 17:29:57 -08:00
Craig Topper	d00e438cfe	[RISCV][LegalizeIntegerTypes] Teach PromoteSetCCOperands not to sext i32 comparisons for RV64 if the promoted values are already zero extended. This is similar to what is done for targets that prefer zero extend where we avoid using a zero extend if the promoted values are sign extended. We'll also check for zero extended operands for ugt, ult, uge, and ule when the target prefers sign extend. This is different than preferring zero extend, where we only check for sign bits on equality comparisons. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D116421	2021-12-31 17:15:20 -08:00
Craig Topper	7d659c6ac7	[LegalizeIntegerTypes] Rename NewLHS/NewRHS arguments to DAGTypeLegalizer::PromoteSetCCOperands. NFC The 'New' only makes sense in the context of these being output arguments, but they are also used as inputs first. Drop the 'New' and just call them LHS/RHS. Factored out of D116421.	2021-12-30 15:31:43 -08:00
Craig Topper	15787ccd45	[RISCV] Add support for STRICT_LRINT/LLRINT/LROUND/LLROUND. Tests for other strict intrinsics. This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND. It also adds test cases for f32 and f64 constrained intrinsics that correspond to the intrinsics in float-intrinsics.ll and double-intrinsics.ll. Support for promoting the integer argument of STRICT_FPOWI was added. I've skipped adding tests for f16 intrinsics, since we don't have libcalls for them and we have inconsistent support for promoting them in LegalizeDAG. This will need to be examined more closely. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D116323	2021-12-30 11:54:32 -08:00
modimo	ba51d26ec4	[CodeView] Clamp Frontend version D43002 introduced a test debug-info-objname.cpp that outputted the current compiler version into CodeView. Internally we appended a date to the patch version and overflowed the 16-bits allocated to that space. This change clamps the Frontend version outputted values to 16-bits like rGd1185fc081ead71a8bf239ff1814f5ff73084c15 did for the Backend version. Testing: ninja check-all newly added tests correctly clamps and no longer asserts when trying to output the field Reviewed By: aganea Differential Revision: https://reviews.llvm.org/D116243	2021-12-28 15:22:18 -08:00
Craig Topper	1c6b740d4b	[TargetLowering] Remove workaround for old behavior of getShiftAmountTy. NFC getShiftAmountTy used to directly return the shift amount type from the target which could be too small for large illegal types. For example, X86 always returns i8. The code here detected this and used i32 instead if it won't fit. This behavior was added to getShiftAmountTy in D112469 so we no longer need this workaround.	2021-12-28 14:08:25 -08:00
Kazu Hirata	5a667c0e74	[llvm] Use nullptr instead of 0 (NFC) Identified with modernize-use-nullptr.	2021-12-28 08:52:25 -08:00
Kazu Hirata	d09a284dfb	[CodeGen] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-12-28 00:38:11 -08:00
Petar Avramovic	508e39afe0	GlobalISel: remove redundant line added in D114198. NFC	2021-12-27 12:14:13 +01:00
David Blaikie	2bddab25db	DebugInfo: Don't hash DIE offsets before they're computed Instead of hashing DIE offsets, hash DIE references the same as they would be when used outside of a loclist - that is, deep hash the type on first use, and hash the numbering on subsequent uses. This does produce different hashes for different type references, where it did not before (because we were hashing zero all the time - so it didn't matter what type was referenced, the hash would be identical). This also allows us to enforce that the DIE offset (& size) is not queried before it is used (which came up while investigating another bug recently).	2021-12-25 16:09:12 -08:00
Kazu Hirata	2d303e6781	Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-12-24 23:17:54 -08:00
Fangrui Song	ea2d4c5881	[GlobalISel] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off builds after D114198	2021-12-24 00:55:54 -08:00
David Blaikie	b05df0287b	Revert "[DWARF] Fix PR51087 Extraneous enum record in DWARF with type units" Causes invalid debug_gnu_pubnames (& I think non-gnu pubnames too) - visible as 0 values for the offset in gnu pubnames. More details on the original review in D115325. This reverts commit `78d15a112c`. This reverts commit `54586582d3`.	2021-12-23 20:50:30 -08:00
Kristina Bessonova	81378f7e56	Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" & dependent patches Try to revert D113741 once again. This also reverts `0ac75e82ff` (D114705) as it causes LLDB's lldb-api.lang/cpp/nsimport.TestCppNsImport.py test failure w/o D113741. This reverts commit `f9607d45f3`. Differential Revision: https://reviews.llvm.org/D116225	2021-12-24 00:47:04 +02:00
Simon Pilgrim	71fc4bbdd2	[X86][SSE] Add ISD::ROTR support Fix issue in TargetLowering::expandROT where we only attempt to flip a rotation if the other direction has better support - this matches TargetLowering::expandFunnelShift This allows us to enable ISD::ROTR lowering on SSE targets, which particularly simplifies/improves codegen for splat amount and AVX2 per-element shifts.	2021-12-23 15:07:30 +00:00
Petar Avramovic	29f88b93fd	[GlobalISel] Rework more/fewer elements for vectors Artifact combiner is not able to access individual elements after using LCMTy style merge/unmerge, extract and insert to change vector number of elements (pad with undef or split to sub-vector instructions). Use unmerge to individual elements instead and then merge elements into requested types. Change argument lowering for vectors and moreElementsVector to use buildPadVectorWithUndefElements and buildDeleteTrailingVectorElements. FewerElementsVector had a few helpers that had different behavior, introduce new helper for most of the opcodes. FewerElementsVector helper is more flexible since it can create leftover instruction smaller then requested type (useful in case target wants to avoid pad with undef and use fewer registers). If target does not want leftover of different type it should call more elements first. Some helpers were performing more elements first to have split without leftover. Opcodes that used this helper use clampMaxNumElementsStrict (does more elements first) in LegalizerInfo to avoid test changes. Fixes failures caused by failing to combine artifacts created during more/fewer elements vector. Differential Revision: https://reviews.llvm.org/D114198	2021-12-23 14:30:02 +01:00
Muhammad Omair Javaid	f9607d45f3	Revert "Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" & dependent patches" This has broke following LLDB buildbots: https://lab.llvm.org/buildbot/#/builders/17/builds/14984 https://lab.llvm.org/buildbot/#/builders/96/builds/15928 https://lab.llvm.org/buildbot/#/builders/68/builds/23600 This reverts commit `62a6b9e9ab`.	2021-12-23 14:09:48 +05:00
Shivam Gupta	0489e89119	[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience When the source has a series of assignments, users reasonably want to have the debugger step through each one individually. Turn off the combine for adjacent stores so we get this behavior at -O0. Similar to D7181. Reviewed By: spatel, xgupta Differential Revision: https://reviews.llvm.org/D115808	2021-12-23 10:48:28 +05:30
David Blaikie	62a6b9e9ab	Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" & dependent patches This patch causes invalid DWARF to be generated in some cases of LTO + Split DWARF - follow-up on the original review thread (D113741) contains further detail and test cases. This reverts commit `75b622a795`. This reverts commit `b6ccca217c`. This reverts commit `514d374419`.	2021-12-22 15:27:09 -08:00
Simon Pilgrim	4639461531	[DAG][X86] Add TargetLowering::isSplatValueForTargetNode override Add callback to enable us to test target nodes if they are splat vectors Added some basic X86ISD::VBROADCAST + X86ISD::VBROADCAST_LOAD handling	2021-12-22 16:57:44 +00:00
Alexandre Ganea	a282ea4898	Reland - [CodeView] Emit S_OBJNAME record Reland integrates build fixes & further review suggestions. Thanks to @zturner for the initial S_OBJNAME patch! Differential Revision: https://reviews.llvm.org/D43002	2021-12-21 19:02:14 -05:00
Alexandre Ganea	5bb5142e80	Revert [CodeView] Emit S_OBJNAME record Also revert all subsequent fixes: - `abd1cbf5e5` [Clang] Disable debug-info-objname.cpp test on Unix until I sort out the issue. - `00ec441253` [Clang] debug-info-objname.cpp test: explictly encode a x86 target when using %clang_cl to avoid falling back to a native CPU triple. - `cd407f6e52` [Clang] Fix build by restricting debug-info-objname.cpp test to x86.	2021-12-21 19:02:14 -05:00
Alexandre Ganea	f44e3fbadd	[CodeView] Emit S_OBJNAME record Thanks to @zturner for the initial patch! Differential Revision: https://reviews.llvm.org/D43002	2021-12-21 09:26:36 -05:00
Jay Foad	17006033f9	[GlobalISel] Verify operand types for G_SHL, G_LSHR, G_ASHR Differential Revision: https://reviews.llvm.org/D115868	2021-12-21 11:59:33 +00:00
Simon Pilgrim	592e89e636	[DAG] Constify SelectionDAG::isSplatValue() This doesn't generate any nodes so should be usable by methods with const SelectionDAG &.	2021-12-21 11:19:23 +00:00
Kazu Hirata	500c4b68dc	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-12-20 23:43:24 -08:00
Mircea Trofin	07622368a8	[NFC] Fix clang-tidy issues in CalcSpillWeights.cpp	2021-12-20 19:24:44 -08:00
Sami Tolvanen	5dc8aaac39	[llvm][IR] Add no_cfi constant With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces function references with CFI jump table references, which is a problem for low-level code that needs the address of the actual function body. For example, in the Linux kernel, the code that sets up interrupt handlers needs to take the address of the interrupt handler function instead of the CFI jump table, as the jump table may not even be mapped into memory when an interrupt is triggered. This change adds the no_cfi constant type, which wraps function references in a value that LowerTypeTestsModule::replaceCfiUses does not replace. Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D108478	2021-12-20 12:55:32 -08:00
Shivam Gupta	eb66f0662a	Revert "[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience" This reverts commit `731bde1ed3`.	2021-12-20 21:43:40 +05:30
Shivam Gupta	731bde1ed3	[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience When the source has a series of assignments, users reasonably want to have the debugger step through each one individually. Turn off the combine for adjacent stores so we get this behavior at -O0. Similar to D7181. Differential Revision: https://reviews.llvm.org/D115808	2021-12-19 20:58:49 +05:30
Simon Pilgrim	efec3a26b4	[DAG] visitADDSAT/visitSUBSAT - merge scalar/vector canonicalization and constant folding. Match order of most of the other integer opcode combines	2021-12-19 13:19:40 +00:00
Simon Pilgrim	c1340b9e78	[DAG] Improve FMINNUM/FMAXNUM/FMINIMUM/FMAXIMUM constant folding Merge the node combines into a common DAGCombiner::visitFMinMax (like we do for IMINMAX). Move the constant folding into SelectionDAG::foldConstantFPMath. This allows us to fold the vecreduce-propagate-sd-flags.ll test as it reduces constants - so I've refactored it to take variables instead. Differential Revision: https://reviews.llvm.org/D115952	2021-12-19 11:45:51 +00:00
Kazu Hirata	fee57711fe	Use DenseMap::lookup (NFC)	2021-12-17 18:19:25 -08:00
Kazu Hirata	26bd534a79	[llvm] Use none_of instead of \!any_of (NFC)	2021-12-17 13:48:57 -08:00
Sanjay Patel	79932211f9	[SDAG] remove FP-to-int cast attribute check in fold to FTRUNC We were using a function attribute to indicate a non-standard FP mode, but now we can use intrinsics for that job as shown in the new tests. Presumably the x86 asm could be improved for that IR with intrinsics, but I have not worked out exactly how to do that. Note that the transform to FTRUNC still requires a hacky check for "nsz" (because FMF are not applied to FP casts). This is a cleanup based on the clang change in D115804 / `8c7f2a4f87` . This is effectively a revert of `5a90285bd9` + D46237 . Differential Revision: https://reviews.llvm.org/D115885	2021-12-17 16:01:37 -05:00
Kazu Hirata	90bd4873d6	[CodeGen] Fix an unused variable warning This patch fixes: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:22617:11: error: unused variable 'Ops' [-Werror,-Wunused-variable]	2021-12-17 09:43:42 -08:00
Simon Pilgrim	35c7b1aeae	[DAG] SimplifyVBinOp - remove FoldConstantArithmetic call. Constant folding (scalar/vector) is now consistently handled before the SimplifyVBinOp calls.	2021-12-17 17:22:23 +00:00
Simon Pilgrim	f602723bfa	[DAG] Constant fold + canonicalize fp binops before SimplifyVBinOp call Replace custom constant scalar/splat folding with FoldConstantArithmetic call and canonicalize commutative constant ops to the RHS before the SimplifyVBinOp call	2021-12-17 17:02:54 +00:00
Simon Pilgrim	9d2994311a	[DAG] Move foldConstantFPMath() inside FoldConstantArithmetic Further merging of integer and fp constant folding paths. This allows us to handle undef vector arguments the same as scalar cases.	2021-12-17 16:06:41 +00:00
Simon Pilgrim	512ab9968d	[DAG] foldConstantFPMath - fold vector splats as well as scalar constants	2021-12-17 15:19:26 +00:00
Simon Pilgrim	52611702ea	Revert rG22dbc7a48bf7a3942a7e5ff57977ef828d240bd3 "[DAG] foldConstantFPMath - fold vector splats as well as scalar constants" A followup patch uncovered an issue with allowing undef elements in the splat - I will reapply this with a fixed implementation.	2021-12-17 15:19:25 +00:00
David Truby	5c9684704d	[DAG][sve] Lowering for VLS masked truncating stores This extends the custom lowering for truncating stores on fixed length vectors in SVE to support masked truncating stores. It also adds a DAG combine for truncates followed by masked stores. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D108115	2021-12-17 15:04:45 +00:00
Simon Pilgrim	22dbc7a48b	[DAG] foldConstantFPMath - fold vector splats as well as scalar constants	2021-12-17 14:24:36 +00:00
Simon Pilgrim	d91b5b0f57	[DAG] foldConstantFPMath - use APFloat& for read-only constant fold arg. NFC. We just need to copy the 1st arg (which we use for the constant fold result) - use a cheaper const reference for the 2nd arg.	2021-12-17 12:34:03 +00:00
Simon Pilgrim	42f00106b7	[DAG] Constant fold + canonicalize integer binops before SimplifyVBinOp call SimplifyVBinOp still has a FoldConstantArithmetic call, which now it isn't vector specific we should be able to remove (once fp binops are tidied up); but we can at least clean up the integer opcodes to perform the basic constant/undef handling in common code first.	2021-12-17 12:02:27 +00:00
OCHyams	78d15a112c	[DWARF] Fix PR51087 Extraneous enum record in DWARF with type units Fixes https://llvm.org/PR51087: Extraneous enum record in DWARF with type units. As explained in PR51087 we sometimes get skeleton DIEs for enums in a Dwarf Compile Unit (CU) that are not referenced from any CU and are already described by a type unit. Types for enums are emitted whether used or not, all together before most types in the CU. Mechanically, the extraneous CU records are generated because the enum types are generated with a call to CU->getOrCreateTypeDIE. This function will recursively get-or-create the parent DIE (in the CU) and the type unit for each. We don't need the CU-side DIEs if the type units are sucesfully emitted. Fix by only emitting the type units for enums if possible, falling back to a call to getOrCreateTypeDIE if not. Do the same for retained types. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D115325	2021-12-17 10:10:55 +00:00
Mircea Trofin	09103807e7	[NFC][regalloc] Introduce the RegAllocEvictionAdvisorAnalysis This patch introduces the eviction analysis and the eviction advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Differential Revision: https://reviews.llvm.org/D115707	2021-12-16 17:56:46 -08:00
Ellis Hoag	58d9c1aec8	[Try2][InstrProf] Attach debug info to counters Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile. Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D115693	2021-12-16 14:20:30 -08:00
Nathan Sidwell	dd073e08ae	Avoid by-value copies of referenced objects These were detected by the new -Wauto-by-value-copy (D114989) warning, these by-value constant copies need only be references. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D114990	2021-12-16 07:22:46 -08:00
Jay Foad	cce93b3397	[MachineVerifier] Undef subreg operands do not require subranges D112556 added verification that the live interval for a subreg operand must have subranges. This patch fixes a corner case, where if all subreg operands for a particular register are undef uses then no subranges are required. This matches how LiveIntervalCalc would build the live intervals in the first place, since an undef use is not considered to read the register. Before this patch, CodeGen/AMDGPU/no-remat-indirect-mov.mir would fail with -early-live-intervals: # After Live Interval Analysis ... * Bad machine code: Live interval for subreg operand has no subranges * - function: index_vgpr_waterfall_loop - basic block: %bb.1 (0x6a9a968) [352B;496B) - instruction: 432B %24:vgpr_32 = V_MOV_B32_e32 undef %18.sub0:vreg_512, implicit $exec, implicit %18:vreg_512, implicit $m0 - operand 1: undef %18.sub0:vreg_512 Differential Revision: https://reviews.llvm.org/D115360	2021-12-16 09:49:27 +00:00
Arthur Eubanks	eba7b26815	[SafeStack] Use Align instead of uint64_t It is better typed, and the calls to getAlignment() are deprecated. Differential Revision: https://reviews.llvm.org/D115466	2021-12-15 14:40:56 -08:00
Arnold Schwaighofer	d87e617048	Teach the backend to make references to swift_async_extendedFramePointerFlags weak if it emits it When references to the symbol `swift_async_extendedFramePointerFlags` are emitted they have to be weak. References to the symbol `swift_async_extendedFramePointerFlags` get emitted only by frame lowering code. Therefore, the backend needs to track references to the symbol and mark them weak. Differential Revision: https://reviews.llvm.org/D115672	2021-12-15 10:02:06 -08:00
Simon Pilgrim	b88f4f271b	[DAG] SelectionDAG::isSplatValue - add *_EXTEND_VECTOR_INREG handling Fixes #52719	2021-12-15 12:26:39 +00:00
Esme-Yi	c0529efc95	[DebugInfo][DWARF] emit DW_AT_accessibility attribute for class/struct/union types. Summary: This patch emits the DW_AT_accessibility attribute for class/struct/union types in the LLVM part. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D115606	2021-12-15 07:38:12 +00:00
John Brawn	dc9f65be45	[AArch64][SVE] Fix handling of stack protection with SVE Fix a couple of things that were causing stack protection to not work correctly in functions that have scalable vectors on the stack: * Use TypeSize when determining if accesses to a variable are considered out-of-bounds so that the behaviour is correct for scalable vectors. * When stack protection is enabled move the stack protector location to the top of the SVE locals, so that any overflow in them (or the other locals which are below that) will be detected. Fixes: https://github.com/llvm/llvm-project/issues/51137 Differential Revision: https://reviews.llvm.org/D111631	2021-12-14 11:30:48 +00:00
Chen Zheng	062d9b7d43	[LegalizeVectorOps] code refactor for LegalizeOp; NFC Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115636	2021-12-14 03:45:53 +00:00
Ellis Hoag	c809da7d9c	Revert "[InstrProf] Attach debug info to counters" This reverts commit `800bf8ed29`. The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was failing because I forgot the `llc` commands are architecture specific. I'll follow up with a fix. Differential Revision: https://reviews.llvm.org/D115689	2021-12-13 18:15:17 -08:00
Ellis Hoag	800bf8ed29	[InstrProf] Attach debug info to counters Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile. Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D114565	2021-12-13 17:51:22 -08:00
Chen Zheng	8c107bee70	[LegalizeVectorOps] fix a typo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115637	2021-12-14 00:22:58 +00:00
Fangrui Song	a6a07a514b	[MachineOutliner] Don't outline functions starting with PATCHABLE_FUNCTION_ENTER/FENTRL_CALL MachineOutliner may outline a "patchable-function-entry" function whose body has a TargetOpcode::PATCHABLE_FUNCTION_ENTER MachineInstr. This is incorrect because the special code sequence must stay unchanged to be used at run-time. Avoid outlining PATCHABLE_FUNCTION_ENTER. While here, avoid outlining FENTRY_CALL too (which doesn't reproduce currently) to allow phase ordering flexibility. Fixes #52635 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D115614	2021-12-13 13:24:29 -08:00
Stanislav Mekhanoshin	c4aef9c281	Check subrange liveness at rematerialization LiveRangeEdit::allUsesAvailableAt checks that VNI at use is the same as at the original use slot. However, the VNI can be the same while a specific subrange needed for use can be dead at the new index. This patch adds subrange liveness check if there is a subreg use. Fixes: SWDEV-312810 Differential Revision: https://reviews.llvm.org/D115278	2021-12-13 11:11:55 -08:00
Mircea Trofin	657adcb077	[NFC][regalloc] Move ExtraRegInfo and related to LiveRangeStageManager This would allow sharing the LiveRangeStageManager between different RegAllocEvictionAdvisors. One scenario is for ML training, where we want to capture what the default advisor would do, for bootstrapping (speeds up training). Differential Revision: https://reviews.llvm.org/D114831	2021-12-13 10:10:57 -08:00
Roman Lebedev	c1a36ba002	[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask') In most test changes this allows us to drop some broadcasts/shuffles. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104156	2021-12-13 20:03:44 +03:00
Peter Waller	921e89c59a	[SVE] Only combine (fneg (fma)) => FNMLA with nsz -(Za + Zm * Zn) != (-Za + Zm * (-Zn)) when the FMA produces a zero output (e.g. all zero inputs can produce -0 output) Add a PatFrag to check presence of nsz on the fneg, add tests which ensure the combine does not fire in the absense of nsz. See https://reviews.llvm.org/D90901 for a similar discussion on X86. Differential Revision: https://reviews.llvm.org/D109525	2021-12-13 11:33:07 +00:00
Fraser Cormack	b0319ab79b	[PR52475] Ensure a correct chain in copies to/from hidden sret parameter This patch fixes an issue during SelectionDAG construction. When the target is unable to lower the function's return value, a hidden sret parameter is created. It is initialized and copied to a stored variable (DemoteRegister) with CopyToReg and is later fetched with CopyFromReg. The bug is that the chains used for each copy are inconsistent, and thus in rare cases the scheduler may issue them out of order. The fix is to ensure that the CopyFromReg uses the DAG root which is set as the chain corresponding to the initial CopyToReg. Fixes https://llvm.org/PR52475 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D114795	2021-12-13 10:46:32 +00:00
Kazu Hirata	bb6447a78c	[llvm] Use llvm::reverse (NFC)	2021-12-12 16:13:49 -08:00
Kazu Hirata	67aeae0138	[llvm] Use range-based for loops (NFC)	2021-12-11 22:34:07 -08:00
Adrian Prantl	c7c84b9087	[DwarfDebug] Refuse to emit DW_OP_LLVM_arg values wider than 64 bits DwarfExpression::addUnsignedConstant(const APInt &Value) only supports wider-than-64-bit values when it is used to emit a top-level DWARF expression representing the location of a variable. Before this change, it was possible to call addUnsignedConstant on >64 bit values within a subexpression when substituting DW_OP_LLVM_arg values. This can trigger an assertion failure (e.g. PR52584, PR52333) when it happens in a fragment (DW_OP_LLVM_fragment) expression, as addUnsignedConstant on >64 bit values splits the constant into separate DW_OP_pieces, which modifies DwarfExpression::OffsetInBits. This change papers over the assertion errors by bailing on overly wide DW_OP_LLVM_arg values. A more comprehensive fix might be to be to split wide values into pointer-sized fragments. [0] https://github.com/llvm/llvm-project/blob/e71fa03/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp#L799-L805 Patch by Ricky Zhou! Differential Revision: https://reviews.llvm.org/D115343	2021-12-10 09:33:27 -08:00
David Sherwood	652faed353	[CodeGen] Improve SelectionDAGBuilder lowering code for get.active.lane.mask intrinsic Previously we were using UADDO to generate a two-result value with the unsigned addition and the overflow mask. We then combined the overflow mask with the trip count comparison to get a result. However, we don't need to do this - we can simply use a UADDSAT saturating add node to add the vector index splat and the stepvector together. Then we can just compare this to a splat of the trip count. This results in overall better code quality for both Thumb2 and AArch64. Differential Revision: https://reviews.llvm.org/D115354	2021-12-10 13:39:38 +00:00
Brian Cain	1e68c79987	Reapply [xray] add support for hexagon Adds x-ray support for hexagon to llvm codegen, clang driver, compiler-rt libs. Differential Revision: https://reviews.llvm.org/D113638 Reapplying this after `543a9ad7c4`, which fixes the leak introduced there.	2021-12-10 05:32:28 -08:00
Sameer Sahasrabuddhe	1d0244aed7	Reapply CycleInfo: Introduce cycles as a generalization of loops Reverts `02940d6d22`. Fixes breakage in the modules build. LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-10 14:36:43 +05:30
Konstantin Schwarz	a344653725	[GlobalISel] Fix IRTranslator for constexpr fcmp The existing code assumed fcmp to always be an Instruction, but it can also be a ConstExpr. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115450	2021-12-10 08:49:12 +01:00
Kazu Hirata	f829630d2e	[llvm] Use llvm::count (NFC)	2021-12-09 20:50:38 -08:00
Arthur Eubanks	1172712f46	[NFC] Replace some deprecated getAlignment() calls with getAlign() Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D115370	2021-12-09 08:43:19 -08:00
Brian Cain	ab28cb1c5c	Revert "[xray] add support for hexagon" This reverts commit `543a9ad7c4`.	2021-12-09 07:30:40 -08:00
Brian Cain	543a9ad7c4	[xray] add support for hexagon Adds x-ray support for hexagon to llvm codegen, clang driver, compiler-rt libs. Differential Revision: https://reviews.llvm.org/D113638	2021-12-09 05:47:53 -08:00
Mircea Trofin	4afae6f7c7	[NFC] Rename MachineFunction::cloneMachineInstrBundle (coding style)	2021-12-08 21:12:54 -08:00
Mircea Trofin	b012742405	[NFC] Rename MachineFunction::deleteMachineInstr (coding style)	2021-12-08 20:36:13 -08:00
Kazu Hirata	c23ebf1714	[llvm] Use range-based for loops (NFC)	2021-12-08 20:35:39 -08:00
Mircea Trofin	91a0da0142	[NFC] Rename MachineFunction::DeleteMachineBasicBlock Renamed to conform to coding style	2021-12-08 18:12:51 -08:00
David Sherwood	3257f63bbd	[NFC][CodeGen] Remove rarely used DL variable from SelectionDAGBuilder There is a pointer to the DataLayout in SelectionDAGBuilder called 'DL' that is hardly ever used. In most cases the code seems to just use `DAG.getDataLayout()` instead. Given that DL is also often used as a shadowed variable for the debug location it seems sensible to just kill off the few remaining uses and be consistent with the rest of the code. Differential Revision: https://reviews.llvm.org/D114451	2021-12-08 17:05:46 +00:00
David Green	5d7efd4758	[SDAG] Refine MMO size when converting masked load/store to normal load/store After D113888 / `32b6c17b29` the MMO size of a masked loads/store is unknown. When we are converting back to a standard load/store because the mask is known all ones, we can refine that to the correct size from the size of the vector being loaded/stored. Differential Revision: https://reviews.llvm.org/D114582	2021-12-08 10:13:25 +00:00
Alex Lorenz	0756aa3978	[macho] add support for emitting macho files with two build version load commands This patch extends LLVM IR to add metadata that can be used to emit macho files with two build version load commands. It utilizes "darwin.target_variant.triple" and "darwin.target_variant.SDK Version" metadata names for that, which will be set by a future patch in clang. MachO uses two build version load commands to represent an object file / binary that is targeting both the macOS target, and the Mac Catalyst target. At runtime, a dynamic library that supports both targets can be loaded from either a native macOS or a Mac Catalyst app on a macOS system. We want to add support to this to upstream to LLVM to be able to build compiler-rt for both targets, to finish the complete support for the Mac Catalyst platform, which is right now targetable by upstream clang, but the compiler-rt bits aren't supported because of the lack of this multiple build version support. Differential Revision: https://reviews.llvm.org/D112189	2021-12-07 18:17:47 -08:00
Jonas Devlieghere	02940d6d22	Revert "CycleInfo: Introduce cycles as a generalization of loops" This reverts commit `0fe61ecc2c` because it breaks the modules build. https://green.lab.llvm.org/green/job/clang-stage2-rthinlto/4858/ https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39112/	2021-12-07 13:06:34 -08:00
Chih-Ping Chen	b5c42ef3da	[NFC][CodeView] Use one unified access to the module in beginModule. Differential Revision: https://reviews.llvm.org/D115257	2021-12-07 13:45:48 -05:00
Simon Pilgrim	d298c32407	Remove unused variable. NFC.	2021-12-07 18:37:07 +00:00
Simon Pilgrim	52d2f35323	[DAG] Update expandFunnelShift/expandROT to return the expansion directly. NFCI. Don't return a bool to indicate if the expansion was successful, just return the SDValue result directly, like we do for most other basic expansions.	2021-12-07 18:09:43 +00:00
Kazu Hirata	630c847b1b	[llvm] Use range-based for loops (NFC)	2021-12-07 09:17:03 -08:00
Mircea Trofin	fa99cb64ff	[mlgo][regalloc] Add score calculation for training Add the calculation of a score, which will be used during ML training. The score qualifies the quality of a regalloc policy, and is independent of what we train (currently, just eviction), or the regalloc algo itself. We can then use scores to guide training (which happens offline), by formulating a reward based on score variation - the goal being lowering scores (currently, that reward is percentage reduction relative to Greedy's heuristic) Currently, we compute the score by factoring different instruction counts (loads, stores, etc) with the machine basic block frequency, regardless of the instructions' provenance - i.e. they could be due to the regalloc policy or be introduced previously. This is different from RAGreedy::reportStats, which accummulates the effects of the allocator alone. We explored this alternative but found (at least currently) that the more naive alternative introduced here produces better policies. We do intend to consolidate the two, however, as we are actively investigating improvements to our reward function, and will likely want to re-explore scoring just the effects of the allocator. In either case, we want to decouple score calculation from allocation algorighm, as we currently evaluate it after a few more passes after allocation (also, because score calculation should be reusable regardless of allocation algorithm). We intentionally accummulate counts independently because it facilitates per-block reporting, which we found useful for debugging - for instance, we can easily report the counts indepdently, and then cross-reference with perf counter measurements. Differential Revision: https://reviews.llvm.org/D115195	2021-12-07 09:00:27 -08:00
spupyrev	f573f6866e	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-07 07:31:10 -08:00
Paulo Matos	2fd634a5e3	[WebAssembly] Implement table instruction intrinsics This change implements intrinsics for table.grow, table.fill, table.size, and table.copy. Differential Revision: https://reviews.llvm.org/D113420	2021-12-07 13:25:59 +01:00
Fraser Cormack	40d51de5cb	[SelectionDAG] Use UnknownSize for VP memory ops In the style of D113888, this patch updates the various VP memory operations (load, store, gather, scatter) to use UnknownSize. This is for the same reason as for masked loads and stores: the number of elements accessed is not generally known at compile time. This is somewhat pessimistic in the sense that we may still find un-canonicalized intrinsics featuring both an all-true mask and an EVL equal to the vector size. Arguably those should be canonicalized before the SelectionDAG, so those have been left for future work. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115036	2021-12-07 10:51:02 +00:00
Fraser Cormack	3460cc2585	[VP] Propagate align parameter attr on VP load/store to ISel This patch fixes a case where the 'align' parameter attribute on the pointer operands to llvm.vp.load and llvm.vp.store was being dropped during the conversion to the SelectionDAG. The default alignment equal to the ABI type alignment of the vector type was kept. It also updates the documentation to reflect the fact that the parameter attribute is now properly supported. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D114422	2021-12-07 10:16:16 +00:00
Sameer Sahasrabuddhe	0fe61ecc2c	CycleInfo: Introduce cycles as a generalization of loops LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-07 12:02:34 +05:30
Mircea Trofin	2bd7384d3a	[NFC][MachineInstr] Pass-by-value DebugLoc in CreateMachineInstr DebugLoc is cheap to move, passing it by-val rather than const ref to take advantage of the fact that it is consumed that way by the MachineInstr ctor, which creates some optimization oportunities. Differential Revision: https://reviews.llvm.org/D115208	2021-12-06 19:42:19 -08:00
Kai Luo	b206ee6906	[MachineVerifier] Make TiedOpsRewritten computable in MIRParser This patch is to address post-commit comment https://reviews.llvm.org/D80538#anchor-inline-1091625, which make the constraint stronger based on what https://reviews.llvm.org/D80538 does, i.e., "TiedOpsRewritten is set iff leave-ssa and all tied operands share the same register". Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D114573	2021-12-07 02:25:15 +00:00
Mircea Trofin	615e374252	[NFC][MachineInstr] Rename some vars to conform to coding style	2021-12-06 17:19:11 -08:00
Nico Weber	3678326d28	Revert "ext-tsp basic block layout" This reverts commit `c68f71eb37`. Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424	2021-12-06 19:08:20 -05:00
James Nagurne	cc3bb85580	[llvm][Hexagon] Generalize VLIWResourceModel, VLIWMachineScheduler, and ConvergingVLIWScheduler The Pre-RA VLIWMachineScheduler used by Hexagon is a relatively generic implementation that would make sense to use on other VLIW targets. This commit lifts those classes into their own header/source file with the root VLIWMachineScheduler. I chose this path rather than adding the strategy et al. into MachineScheduler to avoid bloating the file with other implementations. Target-specific behaviors have been captured and replicated through function overloads. - Added an overloadable DFAPacketizer creation member function. This is mainly done for our downstream, which has the capability to override the DFAPacketizer with custom implementations. This is an upstreamable TODO on our end. Currently, it always returns the result of TargetInstrInfo::CreateTargetScheduleState - Added an extra helper which returns the number of instructions in the current packet. This is used in our downstream, and may be useful elsewhere. - Placed the priority heuristic values into the ConvergingVLIWscheduler class instead of defining them as local statics in the implementation - Added a overridable helper in ConvergingVLIWScheduler so that targets can create their own VLIWResourceModel Differential Revision: https://reviews.llvm.org/D113150	2021-12-06 16:23:48 -06:00
spupyrev	c68f71eb37	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-06 08:56:39 -08:00
Kazu Hirata	c4a8928b51	[CodeGen] Use range-based for loops (NFC)	2021-12-06 08:49:10 -08:00
Jack Andersen	f108c7f59d	[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues. Expanding on D109750. Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune unused register operands as their defs are erased. Consequently, this renders `MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a substantial performance improvement. The only necessary changes involve making relevant passes consider invalid DBG_VALUE vregs uses as valid. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D112852	2021-12-05 15:55:59 -05:00
Michael Liao	b6ccca217c	Fix `-Wunused-variable` warning. NFC.	2021-12-05 13:40:35 -05:00
Kazu Hirata	1457e78352	[llvm] Use range-based for loops (NFC)	2021-12-05 08:33:02 -08:00
Kristina Bessonova	75b622a795	Reland [DwarfDebug] Support emitting function-local declaration for a lexical block This is another attempt to make function-local declarations (like static variables, structs/classes and other) be correctly emitted within a lexical (bracketed) block. Fixes https://bugs.llvm.org/show_bug.cgi?id=19238. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D113741	2021-12-05 13:56:45 +02:00
Kristina Bessonova	0ac75e82ff	Reland [DwarfDebug] Move emission of global vars, types and imports to endModule() This patch proposes to move emission of global variables, types, imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule(). Effectively, this changes nothing but the order of debug entities which will be as follows: * subprograms (including related context, local variables/labels, local imported entities; related types can be created as a part of the emission of local entities of an abstract subprogram); * global variables (including related context and types); * retained types and enums; * non-local-scoped imported entities; * basic types; * other types left (as a part of local variables attributes emission). Note that the order of emitted compile units may also be changed as now we emit units that contain subprograms first and then all other non-empty units. The motivation behind this change is the following: (1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline, from this time IR can be significantly changed by target-specific passes. If it happens for debug metadata of global entities, those changes will not be reflected in the emitted DWARF. (2) imported subprogram names should refer to an abstract subprogram if it exists, but it isn't known in DwarfDebug::beginModule() (it's possible to make some guesses based on location info, but it's not quite reliable); (3) aforementioned entities if they are scoped within a bracketed block (subject of D113741) couldn't be emitted in DwarfDebug::beginModule() (they need parent emitted first). Another problem is if to try to gather some information about local entities and defer their emission (till subprogram's processing or DwarfDebug::endModule()) all the gathered details might be irrelevant / invalid by the time the entities are being emitted (because of (1)). Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114705	2021-12-05 13:56:45 +02:00
David Green	57ff805a6d	[DAG] Create fptoui.sat from clamped fptosi As an extension to D111976, this converts clamp fptosi, clamped between 0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with conversions that naturally saturate, such as Arm. X86 disables the transform as some of the test cases increases in size. A fptoui.sat necessitates a fp clamp without native support, so there is little use in converting if the instruction is just going to be expanded. Differential Revision: https://reviews.llvm.org/D112428	2021-12-05 09:25:52 +00:00
Kazu Hirata	ca2f53897a	[CodeGen] Use range-based for loops (NFC)	2021-12-04 08:48:05 -08:00
Kristina Bessonova	a961604819	Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" This reverts commits * `ee691970a9` (D113741), * `79d3132998` (D114705) due to lldb and dexter test failures.	2021-12-04 18:06:57 +02:00
Kristina Bessonova	ee691970a9	[DwarfDebug] Support emitting function-local declaration for a lexical block This is another attempt to make function-local declarations (like static variables, structs/classes and other) be correctly emitted within a lexical (bracketed) block. Fixes https://bugs.llvm.org/show_bug.cgi?id=19238. Differential Revision: https://reviews.llvm.org/D113741	2021-12-04 17:12:47 +02:00
Kristina Bessonova	79d3132998	[DwarfDebug] Move emission of global vars, types and imports to endModule() This patch proposes to move emission of global variables, types, imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule(). Effectively, this changes nothing but the order of debug entities which will be as follows: * subprograms (including related context, local variables/labels, local imported entities; related types can be created as a part of the emission of local entities of an abstract subprogram); * global variables (including related context and types); * retained types and enums; * non-local-scoped imported entities; * basic types; * other types left (as a part of local variables attributes emission). Note that the order of emitted compile units may also be changed as now we emit units that contain subprograms first and then all other non-empty units. The motivation behind this change is the following: (1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline, from this time IR can be significantly changed by target-specific passes. If it happens for debug metadata of global entities, those changes will not be reflected in the emitted DWARF. (2) imported subprogram names should refer to an abstract subprogram if it exists, but it isn't known in DwarfDebug::beginModule() (it's possible to make some guesses based on location info, but it's not quite reliable); (3) aforementioned entities if they are scoped within a bracketed block (subject of D113741) couldn't be emitted in DwarfDebug::beginModule() (they need parent emitted first). Another problem is if to try to gather some information about local entities and defer their emission (till subprogram's processing or DwarfDebug::endModule()) all the gathered details might be irrelevant / invalid by the time the entities are being emitted (because of (1)). Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114705	2021-12-04 14:10:01 +02:00
Kazu Hirata	3aed282257	[CodeGen] Use range-based for loops (NFC)	2021-12-03 20:45:59 -08:00
Simon Pilgrim	ebf5271918	[DAG] PromoteIntRes_FunnelShift - rename shift Amount variable to Amt to prevent line overflow. NFC.	2021-12-03 17:24:45 +00:00
Stephen Tozer	98a021fcbf	[DebugInfo] Attempt to preserve more information during tail duplication Prior to this patch, tail duplication handled debug info poorly - specifically, debug instructions would be dropped instead of being set undef, potentially extending the lifetimes of prior debug values that should be killed. The pass was also very aggressive with dropping debug info, dropping debug info even when the SSA value it referred to was still present. This patch attempts to handle debug info more carefully, checking to see whether each affected debug value can still be live, setting it undef if not. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D106875	2021-12-03 15:30:05 +00:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
Jay Foad	d133a21b71	[SelectionDAG] Add newline to a debug message	2021-12-03 13:33:32 +00:00
Simon Pilgrim	6803d08c38	[DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case. Differential Revision: https://reviews.llvm.org/D114676	2021-12-02 11:47:53 +00:00
Omer Aviram	617ad14060	[SelectionDAG] Add pattern to haveNoCommonBitsSet Correctly identify the following pattern, which has no common bits: (X & ~M) op (Y & M). Differential Revision: https://reviews.llvm.org/D113970	2021-12-01 12:04:04 -05:00
Simon Pilgrim	19d34f6e95	[X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation. But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation. This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result. Differential Revision: https://reviews.llvm.org/D113371	2021-12-01 16:37:49 +00:00
Ties Stuij	f5f28d5b0c	[ARM] Implement BTI placement pass for PACBTI-M This patch implements a new MachineFunction in the ARM backend for placing BTI instructions. It is similar to the existing AArch64 aarch64-branch-targets pass. BTI instructions are inserted into basic blocks that: - Have their address taken - Are the entry block of a function, if the function has external linkage or has its address taken - Are mentioned in jump tables - Are exception/cleanup landing pads Each BTI instructions is placed in the beginning of a BB after the so-called meta instructions (e.g. exception handler labels). Each outlining candidate and the outlined function need to be in agreement about whether BTI placement is enabled or not. If branch target enforcement is disabled for a function, the outliner should not covertly enable it by emitting a call to an outlined function, which begins with BTI. The cost mode of the outliner is adjusted to account for the extra BTI instructions in the outlined function. The ARM Constant Islands pass will maintain the count of the jump tables, which reference a block. A `BTI` instruction is removed from a block only if the reference count reaches zero. PAC instructions in entry blocks are replaced with PACBTI instructions (tests for this case will be added in a later patch because the compiler currently does not generate PAC instructions). The ARM Constant Island pass is adjusted to handle BTI instructions correctly. Functions with static linkage that don't have their address taken can still be called indirectly by linker-generated veneers and thus their entry points need be marked with BTI or PACBTI. The changes are tested using "LLVM IR -> assembly" tests, jump tables also have a MIR test. Unfortunately it is not possible add MIR tests for exception handling and computed gotos because of MIR parser limitations. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Mikhail Maltsev - Momchil Velikov - Ties Stuij Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D112426	2021-12-01 12:54:05 +00:00
Bradley Smith	0eb1efb92c	[DAGCombiner] When combining REM ensure optimized div nodes are unique The REM DAG combine uses the visitDivLike functions to try and get an optimized DIV node to provide better codegen, however in some cases this visitDivLike call ends up in the BuildSDIVPow2 target hook, which in turn sometimes will return the same node passed in to indicate not to change it. The REM DAG combine does not anticipate this and creates a cycle in the DAG because of it. Fix this by ensuring any such optimized div node returned is distinct from the node being combined. Differential Revision: https://reviews.llvm.org/D114716	2021-12-01 11:24:26 +00:00
Simon Pilgrim	9981dd142f	[DAG] Apply clang-format to visitMSTORE + visitMLOAD. NFC. Reduce diff in D114582	2021-12-01 11:23:47 +00:00
Qiu Chaofan	15826eb437	[Legalizer] Avoid expansion to BR_CC if illegal Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110616	2021-12-01 12:22:21 +08:00
Mircea Trofin	a503cb00d1	[NFC][regalloc] Factor accesses to ExtraRegInfo We'll move ExtraRegInfo to the RegAllocEvictionAdvisor subsequently. This change prepares for that by factoring all accesses. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D114759	2021-11-30 15:10:49 -08:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit `52ff3b0093`.	2021-11-30 15:36:56 +01:00
Jeremy Morse	3c04507088	[DebugInfo] Turn instruction referencing on by default for x86 This patch is designed to be reverted -- it activates a reasonably large block of new-ish code, so some turbulence is likely. Instruction referencing is best summarised, and it being on-by-default, is discussed here: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html Differential Revision: https://reviews.llvm.org/D114631	2021-11-30 13:44:07 +00:00
Jeremy Morse	651122fc4a	[DebugInfo][InstrRef] Pre-land on-by-default-for-x86 changes Over in D114631 and [0] there's a plan for turning instruction referencing on by default for x86. This patch adds / removes all the relevant bits of code, with the aim that the final patch is extremely small, for an easy revert. It should just be a condition in CommandFlags.cpp and removing the XFail on instr-ref-flag.ll. [0] https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html	2021-11-30 12:40:59 +00:00
Jeremy Morse	8dda516b83	[DebugInfo][InstrRef] Avoid dropping fragment info during PHI elimination InstrRefBasedLDV used to crash on the added test -- the exit block is not in scope for the variable being propagated, but is still considered because it contains an assignment. The failure-mode was vlocJoin ignoring assign-only blocks and not updating DIExpressions, but pickVPHILoc would still find a variable location for it. That led to DBG_VALUEs created with the wrong fragment information. Fix this by removing a filter inherited from VarLocBasedLDV: vlocJoin will now consider assign-only blocks and will update their expressions. Differential Revision: https://reviews.llvm.org/D114727	2021-11-30 11:32:31 +00:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
Abinav Puthan Purayil	bc5dbb0bae	[GlobalISel] Add matchers for constant splat. This change exposes isBuildVectorConstantSplat() to the llvm namespace and uses it to implement the constant splat versions of m_SpecificICst(). CombinerHelper::matchOrShiftToFunnelShift() can now work with vector types and CombinerHelper::matchMulOBy2()'s match for a constant splat is simplified. Differential Revision: https://reviews.llvm.org/D114625	2021-11-30 15:18:50 +05:30
Guozhi Wei	f1d8345a2a	[TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB Currently we create register mappings for registers used only once in current MBB. For registers with multiple uses, when all the uses are in the current MBB, we can also create mappings for them similarly according to the last use. For example %reg101 = ... = ... reg101 %reg103 = ADD %reg101, %reg102 We can create mapping between %reg101 and %reg103. Differential Revision: https://reviews.llvm.org/D113193	2021-11-29 19:01:59 -08:00
Mircea Trofin	e8b8304d76	[NFC][Regalloc] Split canEvictInterference into hint and general There are 2 eviction queries. One is made by tryAssign, when it attempts to free an interference occupying the hint of the candidate. The other is during 'regular' interference resolution, where we scan over all physical registers and try to see if we can evict live ranges in favor of the candidate. We currently use the same logic in both cases, just that the former never passes the cost to any subsequent query. Technically, the 2 decisions could be implemented with different policies. This patch splits the 2. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D114019	2021-11-29 16:04:03 -08:00
Jeremy Morse	0eee844539	[DebugInfo][InstrRef] Terminate overlapping variable fragments If we have a variable where its fragments are split into overlapping segments: DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16) ... DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32) we should only propagate the most recently assigned fragment out of a block. LiveDebugValues only deals with live-in variable locations, as overlaps within blocks is DbgEntityHistoryCalculators domain. InstrRefBasedLDV has kept the accumulateFragmentMap method from VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's produced a mapping of variable / fragments to the overlapped variable / fragments, VLocTracker uses it to identify when a debug instruction needs to terminate the other parts it overlaps with. The test is updated for some standard "InstrRef picks different registers" variation, and the order of some unrelated DBG_VALUEs changes. Differential Revision: https://reviews.llvm.org/D114603	2021-11-29 23:37:20 +00:00
Jeremy Morse	a20987adf4	[DebugInfo][InstrRef] Add indirection from dbg.declare in SelectionDAG Usually dbg.declares get translated into either entries in an MF side-table, or a DBG_VALUE on entry to the function with IsIndirect set (including in instruction referencing mode). Much rarer is a dbg.declare attached to a non-argument value, such as in the test added in this patch where there's a variable-length-array. Such dbg.declares become SDDbgValue nodes with InIndirect=true. As it happens, we weren't correctly emitting DBG_INSTR_REFs with the additional indirection. This patch adds the extra indirection, encoded as adding an additional DW_OP_deref to the expression. Differential Revision: https://reviews.llvm.org/D114440	2021-11-29 22:24:19 +00:00
Jeremy Morse	9cf31b8d39	[DebugInfo][InstrRef] Preserve properties of restored variables InstrRefBasedLDV observes when variable locations are clobbered, scans what values are available in the machine, and re-issues a DBG_VALUE for the variable if it can find another location. Unfortunately, I hadn't joined up the Indirectness flag, so if it did this to an Indirect Value, the indirectness would be dropped. Fix this, and add a test that if we clobber a variable value (on the stack in this case), then the recovered variable location keeps the Indirect flag. Differential Revision: https://reviews.llvm.org/D114378	2021-11-29 21:57:24 +00:00
Kazu Hirata	f240e528ce	[llvm] Use range-based for loops (NFC)	2021-11-29 09:04:44 -08:00
Mirko Brkusanin	0dd570ff56	[AMDGPU][GlobalISel] Transform (fsub (fpext (fneg (fmul x, y))), z) -> (fneg (fma (fpext x), (fpext y), z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D98050	2021-11-29 16:27:22 +01:00
Mirko Brkusanin	37c2a2201d	[AMDGPU][GlobalISel] Transform (fsub (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), (fneg z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D98049	2021-11-29 16:27:22 +01:00
Mirko Brkusanin	5fe7fcd28e	[AMDGPU][GlobalISel] Transform (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D98048	2021-11-29 16:27:22 +01:00
Mirko Brkusanin	a782169270	[AMDGPU][GlobalISel] Transform (fsub (fmul x, y), z) -> (fma x, y, -z) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D96614	2021-11-29 16:27:22 +01:00
Mirko Brkusanin	e5e49a08f1	[AMDGPU][GlobalISel] Transform (fadd (fma x, y, (fpext (fmul u, v))), z) -> (fma x, y, (fma (fpext u), (fpext v), z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D98047	2021-11-29 16:27:21 +01:00
Mirko Brkusanin	f732292536	[AMDGPU][GlobalISel] Transform (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y, (fma u, v, z)) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D97938	2021-11-29 16:27:21 +01:00
Mirko Brkusanin	8951136216	[AMDGPU][GlobalISel] Transform (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D97937	2021-11-29 16:27:21 +01:00
Mirko Brkusanin	881840fc26	[AMDGPU][GlobalISel] Transform (fadd (fmul x, y), z) -> (fma x, y, z) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D93305	2021-11-29 16:27:21 +01:00
Bjorn Pettersson	297fb66484	Use a deterministic order when updating the DominatorTree This solves a problem with non-deterministic output from opt due to not performing dominator tree updates in a deterministic order. The problem that was analysed indicated that JumpThreading was using the DomTreeUpdater via llvm::MergeBasicBlockIntoOnlyPred. When preparing the list of updates to send to DomTreeUpdater::applyUpdates we iterated over a SmallPtrSet, which didn't give a well-defined order of updates to perform. The added domtree-updates.ll test case is an example that would result in non-deterministic printouts of the domtree. Semantically those domtree:s are equivalent, but it show the fact that when we use the domtree iterator the order in which nodes are visited depend on the order in which dominator tree updates are performed. Since some passes (at least EarlyCSE) are iterating over nodes in the dominator tree in a similar fashion as the domtree printer, then the order in which transforms are applied by such passes, transitively, also depend on the order in which dominator tree updates are performed. And taking EarlyCSE as an example the end result could be different depending on in which order the transforms are applied. Reviewed By: nikic, kuhar Differential Revision: https://reviews.llvm.org/D110292	2021-11-29 13:14:50 +01:00
Bradley Smith	6180806632	[AArch64][SVE] Mark fixed-type FP extending/truncating loads/stores as custom This allows the generic DAG combine to fold fp_extend/fp_trunc into loads/stores which we can then lower into a integer extending load/truncating store plus an FP_EXTEND/FP_ROUND. The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked types hence lowering them introduces an unpack/zip. By allowing these nodes to be combined with loads/store we make it much easier to have this unpack/zip combined into the load/store by our custom lowering. Differential Revision: https://reviews.llvm.org/D114580	2021-11-29 11:56:07 +00:00
David Sherwood	a31f4bdfe8	[CodeGen][SVE] Use whilelo instruction when lowering @llvm.get.active.lane.mask In most common cases the @llvm.get.active.lane.mask intrinsic maps directly to the SVE whilelo instruction, which already takes overflow into account. However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower this immediately to a generic sequence of instructions that explicitly take overflow into account. This makes it very difficult to then later transform back into a single whilelo instruction. Therefore, this patch introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if we should lower/expand this to a sequence of generic ISD nodes, or instead just leave it as an intrinsic for the target to lower. You can see the significant improvement in code quality for some of the tests in this file: CodeGen/AArch64/active_lane_mask.ll Differential Revision: https://reviews.llvm.org/D114542	2021-11-29 08:08:17 +00:00
Kazu Hirata	fd7d40640d	[llvm] Use range-based for loops (NFC)	2021-11-28 18:14:49 -08:00
Kazu Hirata	c73fc74ce0	[llvm] Use range-based for loops (NFC)	2021-11-28 10:04:54 -08:00
Kristina Bessonova	9043289326	[DwarfCompileUnit] Set parent DIE right after creating a local entity No functional changes intended. Before this patch DwarfCompileUnit::createScopeChildrenDIE() and DwarfCompileUnit::createAndAddScopeChildrenDIE() used to emit child subtrees and then when all the children get created, attach them to a parent scope DIE. However, when a DIE doesn't have a parent, all the requests for its unit DIE fail. Currently, this is not a big issue since it isn't usually needed to know unit DIE for a local (function-scoped) entity. But once we introduce lexical blocks as a valid scope for global variables (static locals) and type DIEs, any requests for a unit DIE need to be guarded against local scope due to the potential absence of the DIE's parent. To avoid the aforementioned issue, this patch refactors a few DwarfCompileUnit methods to support the idea of attaching a DIE to its parent as close to the creation of this DIE as possible. Reviewed By: ellis Differential Revision: https://reviews.llvm.org/D114350	2021-11-27 17:59:07 +02:00
Kazu Hirata	387927bbaf	[Target] Use range-based for loops (NFC)	2021-11-26 21:21:17 -08:00
Nikita Popov	bfa91f38a9	[DAG] Restore dropped condition This was dropped in `fcee33bd5a`, presumably accidentally.	2021-11-26 21:18:54 +01:00
Simon Pilgrim	fcee33bd5a	[DAG] Pull out repeated isLittleEndian() calls. NFC.	2021-11-26 18:41:56 +00:00
Abinav Puthan Purayil	4af45f10cc	[GlobalISel] Fold or of shifts to funnel shift. This change folds a basic funnel shift idiom: - (or (shl x, amt), (lshr y, sub(bw, amt))) -> fshl(x, y, amt) - (or (shl x, sub(bw, amt)), (lshr y, amt)) -> fshr(x, y, amt) This also helps in folding to rotate shift if x and y are equal since we already have a funnel shift to rotate combine. Differential Revision: https://reviews.llvm.org/D114499	2021-11-26 17:05:29 +05:30
Simon Pilgrim	2778f9a9f6	[DAG] SimplifyDemandedVectorElts - attempt to handle ADD(x,x) as single use If the ADD node is the only user of the repeated operand, then treat this as single use - allows us to peek through shl(x,1) patterns.	2021-11-26 10:32:10 +00:00
David Sherwood	86137fb722	[CodeGen] Add scalable vector support for lowering of llvm.get.active.lane.mask Currently the generic lowering of llvm.get.active.lane.mask is done in SelectionDAGBuilder::visitIntrinsicCall and currently assumes only fixed-width vectors are used. This patch changes the code to be more generic and support scalable vectors too. I have added tests for SVE here: CodeGen/AArch64/active_lane_mask.ll although the code quality leaves a lot to be desired. The code will be improved significantly in a later patch that makes use of the SVE whilelo instruction. Differential Revision: https://reviews.llvm.org/D114541	2021-11-26 08:17:55 +00:00
Kazu Hirata	259cd6f893	[llvm] Use range-based for loops (NFC)	2021-11-25 22:17:10 -08:00
Jeremy Morse	536b9eb31e	[DebugInfo][InstrRef] Add extra indirection for NRVO tests In some scenarios, usually involving NRVO, we can issue indirect DBG_VALUEs after SelectionDAG, even in instruction referencing mode (if the variable is an argument). If the corresponding argument value is spilt to the stack, then we have: * Indirection from it being on the stack, * Indirection from it being a dbg.declare or a dbg.addr. However InstrRefBasedLDV only emits one level of indirection. This patch adds the second, by adding an extra DW_OP_deref if necessary. The two tests modified fail otherwise -- they feature some NRVO, and require two levels of indirection to be correct. Differential Revision: https://reviews.llvm.org/D114364	2021-11-25 21:43:38 +00:00
Jeremy Morse	3107081e94	[DebugInfo][InstrRef] Avoid some quadratic behaviour in LiveDebugVariables This is a performance patch -- LiveDebugVariables can behave quadratically if a lot of debug instructions are inserted back into the same place, and we have to repeatedly step-over hte ones we've already inserted. To get around it, whenever we insert a debug instruction at a slot index, check whether there are more debug instructions to insert at this point, and insert them too. That avoids the repeated lookup and stepping through. It relies on the container for unlinked debug instructions being recorded in-order, which is how LiveDebugVariables currently does it. Differential Revision: https://reviews.llvm.org/D114587	2021-11-25 20:31:00 +00:00
Kazu Hirata	bfd5dd1568	[llvm] Use range-based for loops (NFC)	2021-11-25 08:55:16 -08:00
Jeremy Morse	102d2a8a99	[DebugInfo][InstrRef] Track variable assignments in out-of-scope blocks DBG_INSTR_REF's and DBG_VALUE's can end up in blocks that aren't in the lexical scope of their variable. It's arguable as to what we should do about this, however VarLocBasedLDV permits such variable locations to be propagated, so let's allow it in InstrRefBasedLDV. It's necessary for the modified test to work. Differential Revision: https://reviews.llvm.org/D114578	2021-11-25 14:52:11 +00:00
Simon Pilgrim	63b1e58f07	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED) If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. Reapplied with fix for AArch64 rev patterns to matching the ARM fix. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-25 11:14:15 +00:00
David Green	3a700cabdc	[SDAG] Allow Unknown sizes when refining MMO alignments. NFC The changes in D113888 / `32b6c17b29` altered the memory size of a masked store, as it will store an unknown number of bytes not the full vector size. We can have situations where the masked stores is legalized and then turned to a normal store, as the mask is known to be all ones. This creates a store with an unknown size MMO that was hitting this assert. The store created can be given a better size in a followup patch. This currently adjusts the assert to handle unknown sizes.	2021-11-25 10:19:29 +00:00
Jameson Nash	0332d105b9	GlobalISel: remove assert that memcpy Src and Dst addrspace must be identical The LangRef does not require these arguments to have the same type. Differential Revision: https://reviews.llvm.org/D93154	2021-11-24 20:23:05 -05:00
Zarko Todorovski	95875d246a	[LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm Part of work to use more inclusive language in clang/llvm. Rewording some comments and change function and variable names.	2021-11-24 17:29:55 -05:00
Jeremy Morse	bfadc5dcbf	[DebugInfo][InstrRef] Cope with win32 calls changing SP in LiveDebugValues Almost all of the time, call instructions don't actually lead to SP being different after they return. An exception is win32's _chkstk, which which implements stack probes. We need to recognise that as modifying SP, so that copies of the value are tracked as distinct vla pointers. This patch adds a target frame-lowering hook to see whether stack probe functions will modify the stack pointer, store that in an internal flag, and if it's true then scan CALL instructions to see whether they're a stack probe. If they are, recognise them as defining a new stack-pointer value. The added test exercises this behaviour: two calls to _chkstk should be considered as producing two different values. Differential Revision: https://reviews.llvm.org/D114443	2021-11-24 19:56:21 +00:00
Jeremy Morse	133e25f946	[DebugInfo][InstrRef] Ignore SP clobbers on call instructions even more Avoid un-necessarily recreating DBG_VALUEs on call instructions. In LiveDebugvalues we choose to ignore any clobbers of SP by call instructions, as they're irrelevant to our model of the machine. We currently do so for tracking register values (MTracker); do the same for tracking variable locations (TTracker). Test modified to endure that a duplicate DBG_VALUE is not created after the call in struction in this test. Differential Revision: https://reviews.llvm.org/D114365	2021-11-24 17:25:48 +00:00
Benjamin Kramer	d32787230d	Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl" This reverts commit `3cf4a2c620`. It makes llc hang on the following test case. ``` target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" target triple = "aarch64-unknown-linux-gnu" define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 { entry: br label %while.body117.i while.body117.i: ; preds = %cleanup149.i, %entry %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ] %0 = load i16, i16* undef, align 2 %1 = icmp eq i16 undef, -10240 br i1 %1, label %fail.i, label %cleanup149.i cleanup149.i: ; preds = %while.body117.i %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2 store i16 %or130.i, i16* %out.6269.i, align 2 br label %while.body117.i fail.i: ; preds = %while.body117.i ret void } ; Function Attrs: nofree nosync nounwind readnone speculatable willreturn declare i16 @llvm.bswap.i16(i16) #1 attributes #0 = { "target-features"="+neon,+v8a" } attributes #1 = { nofree nosync nounwind readnone speculatable willreturn } attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" } ```	2021-11-24 14:42:54 +01:00
Simon Pilgrim	3cf4a2c620	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-24 11:28:35 +00:00
David Sherwood	cf40ca026f	[NFC] Tidy up SelectionDAGBuilder::visitIntrinsicCall to use existing sdl debug loc In quite a few places we were calling getCurSDLoc() to get the debug location, but this is already a local variable `sdl`. Differential Revision: https://reviews.llvm.org/D114447	2021-11-24 10:35:49 +00:00
Jeremy Morse	b8f68ad9cd	[DebugInfo][InstrRef] Avoid crash when values optimised out late in sdag It appears that we can emit all the instructions for a function, including debug instructions, and then optimise some of the values out late. Specifically, in the attached test case, an argument gets optimised out after DBG_VALUE / DBG_INSTR_REFs are created. This confuses MachineFunction::finalizeDebugInstrRefs, which expects to be able to find a defining instruction, and crashes instead. Fix this by identifying when there's no defining instruction, and translating that instead into a DBG_VALUE $noreg. Differential Revision: https://reviews.llvm.org/D114476	2021-11-24 10:34:48 +00:00
Jun Ma	17eb6b61de	Revert "[Taildup] Don't tail-duplicate loop header with multiple successors as its latches" This reverts commit `1f9fa54984`.	2021-11-24 10:26:37 +08:00
Matt Arsenault	273a0c8bc9	PrologEpilogInserter: Use explicit control for scavenge slot placement AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the emergency stack slots placed as low as possible to ensure they are in range of load/store instruction immediate offsets. The existing logic is mostly OK, but failed if we required stack realignment. I don't understand what the existing control isFPCloseToIncomingSP is supposed to mean, but can only be used to stop placing the scavenge slots earlier. Make this explicit so that targets can opt-in rather than opt-out only.	2021-11-23 18:01:12 -05:00
Rong Xu	bf1138491a	[SampleFDO] Recompute BFI if the sample loader changes BPI The MIR sample loader changes the branch probability but not BFI. Here we force a recompute of BFI if the branch probabilities are changed. Also register the MIR FSAFDO passes properly. Differential Revision: https://reviews.llvm.org/D114400	2021-11-23 13:24:31 -08:00
Quinn Pham	1345bc5e16	[NFC][llvm] Inclusive language: remove instance of master in LiveRangeUtils.h [NFC] As part of using inclusive language within the llvm project, this patch replaces master with primary in `LiveRangeUtils.h`. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D114191	2021-11-23 13:07:42 -06:00
Kazu Hirata	d45cb1d7ea	[llvm] Use range-based for loops (NFC)	2021-11-23 08:54:48 -08:00
Simon Moll	1e65b93f3a	[VP] Canonicalize macros of VPIntrinsics.def Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>. Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro). A follow-up patch has documentation on how the macros are (intended) to be used. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D114144	2021-11-23 16:51:11 +01:00
David Green	32b6c17b29	[SDAG] Use UnknownSize for masked load/store MMO size A masked load or store will load a potentially unknown number of bytes from a memory location - that is not generally known at compile time. They do not necessarily load/store the entire vector width, and treating them as such can lead to incorrect aliasing information (for example, if the underlying object is smaller than the size of the vector). This makes sure that the MMO is given an unknown size to represent this. which is less accurate that "may load/store from up to 16 bytes", but less incorrect that "will load/store from 16 bytes". Differential Revision: https://reviews.llvm.org/D113888	2021-11-23 09:47:56 +00:00
Kazu Hirata	d5b73a70a0	[llvm] Use range-based for loops (NFC)	2021-11-22 20:33:28 -08:00
Nico Weber	2fb3c05b34	[asm] Merge EmitMSInlineAsmStr() and EmitGCCInlineAsmStr() This basically reverts `1778831a3d`, which split them. Since they were split 9 years ago, EmitGCCInlineAsmStr() grew a bunch of features that usually weren't added to EmitMSInlineAsmStr(), and that was usually a mistake. D71677, D113932, D114167 are all examples of where things were backported to EmitMSInlineAsmStr(). The names were also not great. EmitMSInlineAsmStr() used to be called for `asm inteldialect`, which clang produces for Microsoft-style __asm { ... } blocks as well for GCC-style __asm__ / asm statements with -masm=intel. On the other hand, EmitGCCInlineAsmStr() used to be called for `asm`, whic clang produces for GCC-style __asm__ / asm statements with -masm=att (the default). It's also less code (23 insertions, 188 deletions). No behavior change. Differential Revision: https://reviews.llvm.org/D114330	2021-11-22 11:49:57 -05:00
Nico Weber	7c2d51474a	[asm] Allow labels as operands in intel asm syntax This makes a line in llvm/test/CodeGen/X86/asm-block-labels.ll pass with `asm inteldialect` too. I don't know if this is something one can hit in practice with inline asm. The test is from 2007 (`4646aa3e33`) but in 2009 blockaddr was introduced and e.g. `__asm__ __volatile__("brl %0" :: "X"(&&foo) : "memory");` compiles to call void asm sideeffect "brl $0", "X,..."(i8* blockaddress(@func, %1)) nowadays (thanks to jrtc27 for that example!). (`6c4d255bf3` switched clang to blockaddress on an opt-in basis, `e4801f7844` added docs for it, `31b132c0b7` added IR support.) I half-heartedly tried to build clang 2.8 locally, but it didn't just build. And 2.8 didn't have a prebuilt clang binary yet. The motivation is to make EmitGCCInlineAsmStr() and EmitMSInlineAsmStr() more alike, and maybe we should delete this code form EmitGCCInlineAsmStr() instead. But since it's just 3 lines and it's reachable from LLVM IR, let's do the safer thing for now. Differential Revision: https://reviews.llvm.org/D114329	2021-11-22 11:49:29 -05:00
Kazu Hirata	c133fb321f	[CodeGen] Use llvm::is_contained (NFC)	2021-11-21 10:36:20 -08:00
Kazu Hirata	fc981cedea	[llvm] Use range-based for loops (NFC)	2021-11-21 10:36:18 -08:00
Kazu Hirata	f6bce30cf9	[llvm] Use range-based for loops (NFC)	2021-11-20 18:42:10 -08:00
Nico Weber	8b76d33c59	[asm] Allow block address operands in `asm inteldialect` This makes the following program build with -masm=intel: int foo(int count) { asm goto ("dec %0; jb %l[stop]" : "+r" (count) : : : stop); return count; stop: return 0; } It's also is another step towards merging EmitGCCInlineAsmStr() and EmitMSInlineAsmStr(). Differential Revision: https://reviews.llvm.org/D114167	2021-11-19 09:27:30 -05:00
Nico Weber	4f9a5c2a14	[asm] Remove explicit branch for modifier 'l' No intended behavior change. EmitGCCInlineAsmStr() used to explicitly check for modifier 'l' after handling block address and machine basic block operands. This prevented passing a MachineOperand with 'l' modifier to PrintAsmMemoryOperand(). Conceptually that seems kind of nice, but in practice the overrides of PrintAsmMemoryOperand() in all () AsmPrinter subclasses already reject modifiers they don't know about, and none of them don't know about 'l'. So removing this doesn't have a behavior difference, is less code, and it makes EmitGCCInlineAsmStr() and EmitMSInlineAsmStr() more similar, to prepare for merging them later. (Why not _add_ the branch to EmitMSInlineAsmStr() instead? Because that always works with X86AsmPrinter I think, and X86AsmPrinter::PrintAsmMemoryOperand() very decisively rejects the 'l' modifier, so it's hard to motivate adding that branch.) : The one exception was AVRAsmPrinter, which had an llvm_unreachable instead of returning true. So this commit changes that, so that the AVR target keeps emitting an error instead of crashing when passing a mem operand with a :l modifier to it. All the other targets already don't crash on this. Differential Revision: https://reviews.llvm.org/D114216	2021-11-19 09:19:53 -05:00
Simon Pilgrim	812e64ef0c	[DAG] MatchRotate - support rotate-by-constant of illegal types Patch to fix some of the regressions in D77804. By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443. Differential Revision: https://reviews.llvm.org/D113192	2021-11-19 11:12:04 +00:00
Kazu Hirata	7ca14f6044	[llvm] Use range-based for loops (NFC)	2021-11-18 09:09:52 -08:00
Eric Tang	9fe6b9e802	[TargetLowering][RISCV] Fixed a scalable vector issue when lowering [s\|u]mul.overflow intrinsics Fixed the vector type issue that where we used getVectorNumElements() should be replaced by getVectorElementCount() when lowering these intrinsics. This is similar to D94149 Signed-off-by: Eric Tang <tangxingxin1008@gmail.com> Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D109809	2021-11-18 10:16:08 +00:00
Craig Topper	d78fdf111d	[LegalizeTypes] Further limit expansion of CTTZ during type promotion. Don't expand CTTZ if CTPOP or CTLZ is supported on the promoted type. We have special handling for CTTZ expansion to use those ops with a small conversion. The setup for that doesn't generate extra code or large constants so we don't gain anything from expanding early and we make CTTZ_ZERO_UNDEF codegen worse. Follow up from post commit feedback on D112268. We don't seem to have any in tree tests that care about this.	2021-11-17 15:27:29 -08:00
Nico Weber	bf834b2629	[x86/asm] Let EmitMSInlineAsmStr() handle variants too This is preparation for D113707, where I want to make `-masm=intel` emit `asm inteldialect` instructions. `{movq %rbx, %rax\|mov rax, rbx}` is supposed to evaluate to the bit between { and \| for att and to the bit between \| and } for intel. Since intel will become `asm inteldialect`, which alls EmitMSInlineAsmStr(), EmitMSInlineAsmStr() has to support variants as well. (clang translates `{...\|...}` to `$(...$\|...$)`. I'm not sure why it doesn't just send along only the first `...` or the second `...` to LLVM, but given the notes in PR23933 let's not do a big reorganization in this codepath.) Differential Revision: https://reviews.llvm.org/D113932	2021-11-17 13:31:59 -05:00
Craig Topper	0274be28d7	[RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent. If we have a large enough floating point type that can exactly represent the integer value, we can convert the value to FP and use the exponent to calculate the leading/trailing zeros. The exponent will contain log2 of the value plus the exponent bias. We can then remove the bias and convert from log2 to leading/trailing zeros. This doesn't work for zero since the exponent of zero is zero so we can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need a value for zero we can use a vmseq and a vmerge to handle it. We need to be careful to make sure the floating point type is legal. If it isn't we'll continue using the integer expansion. We could split the vector and concatenate the results but that needs some additional work and evaluation. Differential Revision: https://reviews.llvm.org/D111904	2021-11-17 10:29:41 -08:00
Nico Weber	103cc914d6	[x86/asm] Make variants work when converting at&t inline asm input to intel asm output `asm` always has AT&T-style input (`asm inteldialect` has Intel-style asm input), so EmitGCCInlineAsmStr() always has to pick the same variant since it cares about the input asm string, not the output asm string. For PowerPC, that default variant is 1. For other targets, it's 0. Without this, the included test case errors out with error: unknown use of instruction mnemonic without a size suffix mov rax, rbx since it picks the intel branch and then tries to interpret it as AT&T when selecting intel-style output with `-x86-asm-syntax=intel`. Differential Revision: https://reviews.llvm.org/D113894	2021-11-17 13:23:18 -05:00
DianQK	1e9fa0b12a	Fix the side effect of outlined function when the register is implicit use and implicit-def in the same instruction. This is the diff associated with {D95267}, and we need to mark $x0 as live whether or not $x0 is dead. The compiler also needs to mark register $x0 as live in for the following case. ``` $x1 = ADDXri $sp, 16, 0 BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def $x0 ``` This change fixes an issue where the wrong registers were used when -machine-outliner-reruns>0. As an example: ``` lang=c typedef struct { double v1; double v2; } D16; typedef struct { D16 v1; D16 v2; } D32; typedef long long LL8; typedef struct { long long v1; long long v2; } LL16; typedef struct { LL16 v1; LL16 v2; } LL32; typedef struct { LL32 v1; LL32 v2; } LL64; LL8 needx0(LL8 v0, LL8 v1); void bar(LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8); LL8 foo(LL8 v0, LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8) { LL8 result = needx0(v0, 0); bar(v1, v2, v3, v4, v5, v6, v7, v8); return result + 1; } ``` As you can see from the `foo` function, we should not modify the value of `x0` until we call `needx0`. This code is compiled to give the following instruction MIR code. ``` $sp = frame-setup SUBXri $sp, 256, 0 frame-setup STPDi killed $d13, killed $d12, $sp, 16 frame-setup STPDi killed $d11, killed $d10, $sp, 18 frame-setup STPDi killed $d9, killed $d8, $sp, 20 frame-setup STPXi killed $x26, killed $x25, $sp, 22 frame-setup STPXi killed $x24, killed $x23, $sp, 24 frame-setup STPXi killed $x22, killed $x21, $sp, 26 frame-setup STPXi killed $x20, killed $x19, $sp, 28 ... $x1 = MOVZXi 0, 0 BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0 ... ``` Since there are some other instruction sequences that duplicate `foo`, after the first execution of Machine Outliner you will get: ``` $sp = frame-setup SUBXri $sp, 256, 0 frame-setup STPDi killed $d13, killed $d12, $sp, 16 frame-setup STPDi killed $d11, killed $d10, $sp, 18 frame-setup STPDi killed $d9, killed $d8, $sp, 20 $x7 = ORRXrs $xzr, $lr, 0 BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26 $lr = ORRXrs $xzr, $x7, 0 ... BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ... ``` For the first time we outlined the following sequence: ``` frame-setup STPXi killed $x26, killed $x25, $sp, 22 frame-setup STPXi killed $x24, killed $x23, $sp, 24 frame-setup STPXi killed $x22, killed $x21, $sp, 26 frame-setup STPXi killed $x20, killed $x19, $sp, 28 ``` and ``` $x1 = MOVZXi 0, 0 BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0 ``` When we execute the outline again, we will get: ``` $x0 = ORRXrs $xzr, $lr, 0 <---- here BL @OUTLINED_FUNCTION_2_0, implicit-def $lr, implicit $sp, implicit-def $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $d8, implicit $d9, implicit $d10, implicit $d11, implicit $d12, implicit $d13, implicit $x0 $lr = ORRXrs $xzr, $x0, 0 $x7 = ORRXrs $xzr, $lr, 0 BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26 $lr = ORRXrs $xzr, $x7, 0 ... BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ``` When calling `OUTLINED_FUNCTION_2_0`, we used `x0` to save the `lr` register. The reason for the above error appears to be that: ``` BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ``` should be: ``` BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp, implicit $x0 ``` When processing the same instruction with both `implicit-def $x0` and `implicit $x0` we should keep `implicit $x0`. A reproducible demo is available at: [https://github.com/DianQK/reproduce_outlined_function_use_live_x0](https://github.com/DianQK/reproduce_outlined_function_use_live_x0). Reviewed By: jinlin Differential Revision: https://reviews.llvm.org/D112911	2021-11-17 09:44:10 -08:00
Mirko Brkusanin	db6bc2ab51	[AMDGPU][GlobalISel] Fold G_FNEG above when users cannot fold mods If possible fold fneg into instruction above if users cannot fold mods and we know it will decrease instruction count. Follows same logic as SDAG combiner in choosing opportunities to combine. Differential Revision: https://reviews.llvm.org/D112827	2021-11-17 14:25:13 +01:00
David Sherwood	8d77555b12	[Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector When getTypeConversion returns TypeScalarizeScalableVector we were sometimes returning a non-simple type from getTypeLegalizationCost. However, many callers depend upon this being a simple type and will crash if not. This patch changes getTypeLegalizationCost to ensure that we always a return sensible simple VT. If the vector type contains unusual integer types, e.g. <vscale x 2 x i3>, then we just set the type to MVT::i64 as a reasonable default. A test has been added here that demonstrates the vectoriser can correctly calculate the cost of vectorising a "zext i3 to i64" instruction with a VF=vscale x 1: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113777	2021-11-17 13:11:58 +00:00
Simon Pilgrim	5fedbd5b18	[DAG] SimplifyDemandedVectorElts - zero_extend_vector_inreg(and(x,c)) -> and(x,c') If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.	2021-11-17 12:41:48 +00:00
Jay Foad	3264e95938	[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D113493	2021-11-17 10:16:47 +00:00
Eric Tang	f7eb061a5f	[SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors This change make WidenVecRes_SELECT work for scalable vectors. This patch is split from [D110319](https://reviews.llvm.org/D110319) Signed-off-by: Eric Tang <tangxingxin1008@gmail.com> Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110388	2021-11-17 08:55:11 +00:00
Aaron Puchert	b20da5117f	Don't add irrelevant items to queue in DwarfCompileUnit::createScopeChildrenDIE (NFC) Instead of popping them and then immediately throwing them away, we can just filter out globals and items in different scopes before adding them to WorkList. Shouldn't change anything but keep the queue smaller. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D113864	2021-11-17 00:01:20 +01:00
Aaron Puchert	86b3100cde	[DebugInfo] Use DbgEntityKind in DbgEntity interface (NFC) It was being used occasionally already, and using it on the constructor and getDbgEntityID has obvious type safety benefits. Also use llvm_unreachable in the switch as usual, but since only these two values are used in constructor calls I think it's still NFC. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D113862	2021-11-17 00:01:20 +01:00
Mircea Trofin	c6b9b702a0	[NFC][Regalloc] Factor out eviction decision from eviction attempt This splits tryEvict into a const tryFindEvictionCandidate, which attempts to find a candidate, and the actual eviction (should the former be successful) The newly introduced tryFindEvictionCandidate will move subsequently into the RegAllocEvictionAdvisor. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D113941	2021-11-16 10:50:23 -08:00
Kazu Hirata	ee0133dc6d	[llvm] Use range-for loops (NFC)	2021-11-16 09:01:56 -08:00
Frederik Gossen	3f3d4e8a15	Fix unused variable warning in LoadStoreOpt.cpp with (void)	2021-11-16 12:03:59 +01:00
Frederik Gossen	2bceb7c8da	Revert "Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp" This reverts commit `40a609aebe`.	2021-11-16 12:00:17 +01:00
Frederik Gossen	ecfe7a3404	Revert "Fix unused variable warning." This reverts commit `a062e2a8ca`.	2021-11-16 11:59:34 +01:00
Frederik Gossen	9a6817b7ed	Revert "Fix another unused variable error." This reverts commit `5b84ae7c48`.	2021-11-16 11:58:02 +01:00

... 3 4 5 6 7 ...

31928 Commits