llvm-project

Commit Graph

Author	SHA1	Message	Date
Kai Nacke	d897a14c2e	[SystemZ] Fix check for zero size when lowering memcmp. During lowering of memcmp/bcmp, the check for a size of 0 is done in 2 different ways. In rare cases this can lead to a crash in SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a constant int value which is not yet evaluated. When the value is turned into a SDValue, then the evaluation is done and results in a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special case of 0 length to be handled, which results in an assertion. The fix is to turn the value into a SDValue, so that both functions use the same check. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D126900	2022-06-08 14:52:13 -04:00
Simon Pilgrim	b84c10d4bc	[DAG] visitVSELECT - don't wait for truncation of sub before attempting to match with getTruncatedUSUBSAT Fixes some X86 PSUBUS regressions encountered in D127115 where the truncate was being replaced with a PACKSS/PACKUS before the fold got called again	2022-06-08 16:16:35 +01:00
Joseph Huber	9e0dbd2a2a	[Target] Remove `startswith` for adding `SHF_EXCLUDE` to offload section Summary: We use the special section name `.llvm.offloading` to store device imagees in the host object file. We want these to be stripped by the linker as they are not used after linking so we use the `SHF_EXCLUDE` flag to instruct the linker to drop them. We used to do this for all sections that started with `.llvm.offloading` when we encoded metadata in the section name itself. Now we embed a special binary containing the metadata, we should only add the flag on this name specifically.	2022-06-08 09:56:51 -04:00
Paul Walker	d88354213c	[SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST. Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work with TypeSize directly rather than silently casting to unsigned. To accomplish this I've extended TypeSize with an interface that essentially allows TypeSize division when both operands have the same number of dimensions. There still exists combinations of scalable vector bitcasts that cause compiler crashes. I call these out by adding "is missing" entries to sve-bitcast. Depends on D126957. Fixes: #55114 Differential Revision: https://reviews.llvm.org/D127126	2022-06-08 10:30:07 +01:00
Paul Walker	a1121c31d8	[SVE] Fix incorrect code generation for bitcasts of unpacked vector types. Bitcasting between unpacked scalable vector types of different element counts is not a NOP because the live elements are laid out differently. 01234567 e.g. nxv2i32 = XX??XX?? nxv4f16 = X?X?X?X? Differential Revision: https://reviews.llvm.org/D126957	2022-06-08 10:30:07 +01:00
Chuanqi Xu	0e10f12844	[NFC] Remove commented cerr debugging loggings There are some unused cerr debugging loggings in the codes. It is weird to remain such commented debug helpers in the product.	2022-06-08 15:58:06 +08:00
Kito Cheng	7207373e1e	Revert "[SplitKit] Handle early clobber + tied to def correctly" Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS. This reverts commit `e14d04909d`.	2022-06-08 13:05:35 +08:00
Kito Cheng	e14d04909d	[SplitKit] Handle early clobber + tied to def correctly Spliter will try to extend a live range into `r` slot for a use operand, that's works on most situaion, however that not work correctly when the operand has tied to def, and the def operand is early clobber. Give an example to demo what's wrong: 0 %0 = ... 16 early-clobber %0 = Op %0 (tied-def 0), ... 32 ... = Op %0 Before extend: %0 = [0r, 0d) [16e, 32d) The point we want to extend is 0d to 16e not 16r in this case, but if we use 16r here we will extend nothing because that already contained in [16e, 32d). This patch add check for detect such case and adjust the extend point. Detailed explanation for testcase: https://reviews.llvm.org/D126047 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D126048	2022-06-08 11:33:05 +08:00
David Penry	907aedbb3d	[NFC] Fix spelling/newlines in comments/debug messages Just a few spelling mistakes and missing newlines Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D127162	2022-06-07 09:38:53 -07:00
Simon Pilgrim	a083f3caa1	[DAG] combineShuffleOfSplatVal - fold shuffle(splat,undef) -> splat, iff the splat contains no UNDEF elements As noticed on D127115 - we were missing this fold, instead just having the shuffle(shuffle(x,undef,splatmask),undef) fold. We should be able to merge these into one using SelectionDAG::isSplatValue, but we'll need to match the shuffle's undef handling first. This also exposed an issue in SelectionDAG::isSplatValue which was incorrectly propagating the undef mask across a bitcast (it was trying to just bail with a APInt::isSubsetOf if it found any undefs but that was actually the wrong way around so didn't fire for partial undef cases).	2022-06-07 16:42:24 +01:00
Matt Arsenault	56303223ac	llvm-reduce: Don't assert on functions which don't track liveness Use the query that doesn't assert if TracksLiveness isn't set, which needs to always be available. We also need to start printing liveins regardless of TracksLiveness.	2022-06-07 10:00:25 -04:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Nikita Popov	5a64bc207e	[DAGCombiner] Remove overzealous assertion when folding assert+trunc+assert (PR55846) These assert that there are no "useless" assertzext/assertsext nodes (that assert a wider width than a following trunc), but I don't think there is anything preventing such nodes from reaching this code. I don't think the assertion is relevant for correctness of this transform either -- if such an assert is present, then the other one will always be to a smaller width, and we'll pick that one. The assertion dates back to D37017. Fixes https://github.com/llvm/llvm-project/issues/55846. Differential Revision: https://reviews.llvm.org/D126952	2022-06-07 09:50:26 +02:00
Fangrui Song	15d82c62dc	[MC] De-capitalize MCStreamer functions Follow-up to `c031378ce0` . The class is mostly consistent now.	2022-06-07 00:31:02 -07:00
Hendrik Greving	a43d25734a	[ModuloSchedule] Fix terminator update when peeling. Fixes a bug of us not correctly updating the terminator of the loop's preheader, if multiple terminating branch instructions are present. This is tested through existing tests. The bug itself is hard or not possible to get exposed with the upstream Hexagon backend, because the machine pipeliner checks for an existing preheader, which is defined as a block with only 1 edge into the header. The condition of this bug is a block into the loop with more than 1 edge, and not every downstream target checks for an existing preheader. Differential Revision: https://reviews.llvm.org/D126386	2022-06-06 19:52:28 +00:00
Michael Kitzan	b7fcf6632f	[GISel] Add new combines for G_ADD Patch adds new GICombineRules for G_ADD: G_ADD(x, G_SUB(y, x)) -> y G_ADD(G_SUB(y, x), x) -> y Patch additionally adds new combine tests for AArch64 target for these new rules. Reviewed by: paquette Differential Revision: https://reviews.llvm.org/D87936	2022-06-06 11:19:45 -07:00
Craig Topper	be398100ea	[SelectionDAG] Further improve computeKnownBits for (smax X, C) where C is non-negative. Move the code that was added for D126896 after the normal recursive calls to computeKnownBits. This allows us to calculate trailing zeros. Previously we would break out of the switch before the recursive calls.	2022-06-06 09:59:23 -07:00
Kazu Hirata	5c06f7168f	[CodeGen] Remove splitCanCauseEvictionChain and its helpers (NFC) The last use was removed on Mar 7, 2022 in commit `294eca35a0`.	2022-06-05 20:22:47 -07:00
Kazu Hirata	43d4585e64	[GlobalISel] Remove widenWithUnmerge (NFC) The last use was removed on Dec 23, 2021 in commit `29f88b93fd`.	2022-06-05 19:58:18 -07:00
Kazu Hirata	61abcb0b37	[GlobalISel] Remove valueIsSplit (NFC) The last use was removed on Jun 27, 2019 in commit `8138996128`.	2022-06-05 19:51:03 -07:00
Lian Wang	20cf77f776	[LegalizeTypes][VP] Add widen and split support for vp.fptrunc and vp.fpext Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126439	2022-06-06 02:28:01 +00:00
Kazu Hirata	3b9707dbc0	[llvm] Convert for_each to range-based for loops (NFC)	2022-06-05 12:07:14 -07:00
Alexey Lapshin	501d5b24db	[Debuginfo][DWARF][NFC] Refactor DwarfStringPoolEntryRef - remove isIndexed(). This patch is extraction from the https://reviews.llvm.org/D126883. It removes DwarfStringPoolEntryRef::isIndexed() and isIndexed bit since they are not used. Differential Revision: https://reviews.llvm.org/D126958	2022-06-05 21:18:31 +03:00
Fangrui Song	95a134254a	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 01:07:51 -07:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Kazu Hirata	bcf4fa458a	[CodeGen] Use a range-based for loop (NFC)	2022-06-04 22:26:55 -07:00
Kazu Hirata	4969a6924d	Use llvm::less_first (NFC)	2022-06-04 21:23:18 -07:00
Kazu Hirata	32ce076d78	[CodeGen] Use StringRef::contains (NFC)	2022-06-04 20:58:58 -07:00
Fangrui Song	36c7d79dc4	Remove unneeded cl::ZeroOrMore for cl::opt options Similar to `557efc9a8b`. This commit handles options where cl::ZeroOrMore is more than one line below cl::opt.	2022-06-04 00:10:42 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
Benjamin Kramer	e8e4b741dd	[DAGCombiner] Add bf16 to the matrix of types that we don't promote to integer stores Remove a few stray semicolons while there.	2022-06-03 13:28:34 +02:00
Nikita Popov	ad742cf85d	[DAGCombine] Handle promotion of shift with both operands the same When promoting a shift, make sure we only fetch the second operand after promoting the first. Load promotion may replace users of the old load, and we don't want to be left with a dangling reference to the old load instruction. The crashing test case is from https://reviews.llvm.org/D126689#3553212. Differential Revision: https://reviews.llvm.org/D126886	2022-06-03 10:00:44 +02:00
Craig Topper	fa20bf1636	[DAGCombiner][RISCV] Improve computeKnownBits for (smax X, C) where C is non-negative. If C is non-negative, the result of the smax must also be non-negative, so all sign bits of the result are 0. This allows DAGCombiner to remove a zext_inreg in the modified test. This zext_inreg started as a sext that became zext before type legalization then was promoted to a zext_inreg. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126896	2022-06-02 12:34:24 -07:00
jacquesguan	5482ae6328	[LegalizeTypes][VP] Add widen and split support for VP FP integer casting op. This patch adds widen and split support for VP_FPTOSI, VP_FPTOUI, VP_SITOFP and VP_UITOFP. Differential Revision: https://reviews.llvm.org/D126847	2022-06-02 09:05:27 +00:00
jacquesguan	058791d8f2	[LegalizeTypes][VP] Add widen and split support for VP_SIGN_EXTEND and VP_ZERO_EXTEND. Differential Revision: https://reviews.llvm.org/D126442	2022-06-02 02:21:22 +00:00
Matt Arsenault	4cb722acbc	BranchFolder: Require NoPHIs The pass doesn't handle SSA and breaks any phis.	2022-06-01 21:14:49 -04:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Quentin Colombet	1a155ee7de	[RegisterClassInfo] Invalidate cached information if ignoreCSRForAllocationOrder changes Even if CSR list is same between functions, we could have had a different allocation order if ignoreCSRForAllocationOrder is evaluated differently. Hence invalidate cached register class information if ignoreCSRForAllocationOrder changes. Patch by Srividya Karumuri <srividya_karumuri@apple.com> Differential Revision: https://reviews.llvm.org/D126565	2022-06-01 17:15:51 -07:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit `430ac5c302`. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Denis Antrushin	7047d79fde	[TwoAddressInstructionPass] Relax assert in statepoint processing. D124631 added special processing for STATEPOINT instructions. It appears that assertion added there is too strong. We can get two tied operands with the same register tied to different defs. If we hit such case, do not process it in statepoint-specific code and delegate it to common case.	2022-06-01 21:34:52 +07:00
Matt Arsenault	0e1c71e4a4	CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function. Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo. In order to facilitate copying MachineFunctionInfo, we need to stop allocating these there. Alternatively we could allow targets to subclass PseudoSourceValueManager, and allocate them similarly to MachineFunctionInfo.	2022-06-01 09:45:40 -04:00
Martin Storsjö	6b75a3523f	[ARM] [MC] Add support for writing ARM WinEH unwind info This includes .seh_* directives for generating it from assembly. It is designed fairly similarly to the ARM64 handling. For .seh_handler directives, such as ".seh_handler __C_specific_handler, @except" (which is supported on x86_64 and aarch64 so far), the "@except" bit doesn't work in ARM assembly, as '@' is used as a comment character (on all current platforms). Allow using '%' instead of '@' for this purpose. This convention is used by GAS in similar contexts already, e.g. [1]: Note on targets where the @ character is the start of a comment (eg ARM) then another character is used instead. For example the ARM port uses the % character. In practice, this unfortunately means that all such .seh_handler directives will need ifdefs for ARM. Contrary to ARM64, on ARM, it's quite common that we can't evaluate e.g. the function length at this point, due to instructions whose length is finalized later. (Also, inline jump tables end with a ".p2align 1".) If unable to to evaluate the function length immediately, emit it as an MCExpr instead. If we'd implement splitting the unwind info for a function (which isn't implemented for ARM64 yet either), we wouldn't know whether we need to split it though. Avoid calling getFrameIndexOffset() on an unset FuncInfo.UnwindHelpFrameIdx, to avoid triggering asserts in the preexisting testcase CodeGen/ARM/Windows/wineh-basic.ll. (Once MSVC exception handling is fully implemented, those changes can be reverted.) [1] https://sourceware.org/binutils/docs/as/Section.html#Section Differential Revision: https://reviews.llvm.org/D125645	2022-06-01 11:25:48 +03:00
Ping Deng	ae8ae45e2a	[DAGCombine][NFC] Add braces to 'else' to match braced 'if' Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126624	2022-06-01 07:54:05 +00:00
Bjorn Pettersson	86caa03718	Revert "Round up zero-sized symbols to 1 byte in `.debug_aranges`." This reverts commit `256a52d9aa` (and also the follow-up commit `38eb4fe74b` that moved a test case to a different directory). As discussed in https://reviews.llvm.org/D126257 there is a suspicion that something was wrong with this commit as text section range was shortened to 1 byte rather than rounded up as shown in the llvm/test/DebugInfo/X86/dwarf-aranges.ll test case.	2022-05-31 11:03:44 +02:00
Denis Antrushin	85322e82be	[TwoAddressInstructionPass] Special processing of STATEPOINT instruction. STATEPOINT is a special pseudo instruction which represent Moving GC semantic to LLVM. Every tied def/use VReg pair in STATEPOINT represent same physical register which can 'magically' change during call wrapped by statepoint. (By construction, tied use operand is not live across STATEPOINT). This means that when converting into two-address form, there is not need to insert COPY instruction before stateppoint, what TwoAddressInstruction pass does for 'regular' instructions. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D124631	2022-05-30 19:07:30 +03:00
Simon Moll	18c1ee04de	Re-land "[VP] vp intrinsics are not speculatable" with test fix Update the llvmir-intrinsics.mlir test to account for the modified attribute sets. This reverts commit `2e2a8a2d90`.	2022-05-30 14:41:15 +02:00
Mehdi Amini	2e2a8a2d90	Revert "[VP] vp intrinsics are not speculatable" This reverts commit `78a18d2b54`. Break MLIR bot: https://lab.llvm.org/buildbot/#/builders/61/builds/27127	2022-05-30 12:26:16 +00:00
Simon Moll	78a18d2b54	[VP] vp intrinsics are not speculatable VP intrinsics show UB if the %evl parameter is out of bounds - they must not carry the speculatable attribute. The out-of-bounds UB disappears when the %evl parameter is expanded into the mask or expansion replaces the entire VP intrinsic with non-VP code. This patch - Removes the speculatable attribute on all VP intrinsics. - Generalizes the isSafeToSpeculativelyExecute function to let VP expansion know whether the VP intrinsic replacement will be speculatable. VP expansion may only discard %evl where this is the case. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125296	2022-05-30 12:20:05 +02:00
Ping Deng	88af539c0e	[RISCV] Support VP_REDUCE_MUL mask operation Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126520	2022-05-30 03:05:39 +00:00
Ping Deng	083798e270	[LegalizeTypes][VP] Add integer promotion support for vp.fptosi/vp.fptoui Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125760	2022-05-30 03:05:39 +00:00
Serge Pavlov	bdd0093f4d	[GlobalISel] Add G_IS_FPCLASS Add a generic opcode to represent `llvm.is_fpclass` intrinsic. Differential Revision: https://reviews.llvm.org/D121454	2022-05-27 13:49:47 +07:00
Ping Deng	121689a62e	[SelectionDAG][NFC] Simplify integer promotion in setcc/vp.setcc Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126516	2022-05-27 05:50:19 +00:00
Rahman Lavaee	08cc058518	Reland "[Propeller] Promote functions with propeller profiles to .text.hot." This relands commit `4d8d2580c5`. The major change here is using 'addUsedIfAvailable<BasicBlockSectionsProfileReader>()` to make sure we don't change the pipeline tests. Differential Revision: https://reviews.llvm.org/D126518	2022-05-26 19:53:14 -07:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit `4d8d2580c5`.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Craig Topper	460781feef	[LegalizeTypes] Fix bug in expensive checks verification With a fix for an expensive checks build failure exposed by new RISC-V tests. Something about expanding two rotates in type legalization caused a change in the remapping tables that the expensive checks verifying wasn't expecting. See comment in the code for how it was fixed. Tests came from this commit that exposed the bug [RISCV] Add test cases showing failure to remove mask on rotate amounts. If the masking AND has multiple users we fail to remove it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D126036	2022-05-26 13:13:32 -07:00
Adrian Tong	7c13ae6490	Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP. This is then used in AArch64 to remove copy instructions after taildup ran in machine block placement Differential Revision: https://reviews.llvm.org/D125335	2022-05-26 18:43:16 +00:00
Simon Pilgrim	f366acdbf6	[DAG] Generalize (sra (trunc (sra x, c1)), c2) -> (trunc (sra x, c1 + c2)) constant folding Remove local (uniform) constant folding and rely on getNode() to perform it Minor cleanup step toward adding non-uniform shift amount support	2022-05-26 14:05:09 +01:00
Simon Pilgrim	7b617eef80	[DAG] Cleanup "and/or of cmp with single bit diff" fold to use ISD::matchBinaryPredicate Prep work as I'm investigating some cases where TLI::convertSetCCLogicToBitwiseLogic should accept vectors.	2022-05-26 12:34:09 +01:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply `62a9b36fcf` and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after `62a9b36fcf`. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Fangrui Song	9ee15bba47	[MC] Lower case the first letter of EmitCOFF* EmitWin* EmitCV*. NFC	2022-05-26 00:14:08 -07:00
serge-sans-paille	fb67d683db	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `7030654296` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D126417	2022-05-26 08:12:34 +02:00
Lian Wang	8aa6b05deb	[LegalizeTypes][VP] Add widen and split support for VP_TRUNCATE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125950	2022-05-26 02:03:27 +00:00
Patrick Walton	256a52d9aa	Round up zero-sized symbols to 1 byte in `.debug_aranges`. This commit modifies the AsmPrinter to avoid emitting any zero-sized symbols to the .debug_aranges table, by rounding their size up to 1. Entries with zero length violate the DWARF 5 spec, which states: > Each descriptor is a triple consisting of a segment selector, the beginning > address within that segment of a range of text or data covered by some entry > owned by the corresponding compilation unit, followed by the non-zero length > of that range. In practice, these zero-sized entries produce annoying warnings in lld and cause GNU binutils to truncate the table when parsing it. Other parts of LLVM, such as DWARFDebugARanges in the DebugInfo module (specifically the appendRange method), already avoid emitting zero-sized symbols to .debug_aranges, but not comprehensively in the AsmPrinter. In fact, the AsmPrinter does try to avoid emitting such zero-sized symbols when labels aren't involved, but doesn't when the symbol to emitted is a difference of two labels; this patch extends that logic to handle the case in which the symbol is defined via labels. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D126257	2022-05-25 13:31:36 -07:00
Takafumi Arakaki	18e6b8234a	Allow pointer types for atomicrmw xchg This adds support for pointer types for `atomic xchg` and let us write instructions such as `atomicrmw xchg i64** %0, i64* %1 seq_cst`. This is similar to the patch for allowing atomicrmw xchg on floating point types: https://reviews.llvm.org/D52416. Differential Revision: https://reviews.llvm.org/D124728	2022-05-25 16:20:26 +00:00
Simon Moll	6e12711081	[VP][fix] Don't discard masks in reductions When expanding VP reductions to non VP-code, the reduction pass was ignoring the mask before. Fix this by keeping the mask and selecting neutral elements where the mask is zero. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126362	2022-05-25 15:54:45 +02:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit `62a9b36fcf`. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Paul Walker	6f215ca680	[SelectionDAG] Add support to widen ISD::STEP_VECTOR operations. Fixes: #55165 Differential Revision: https://reviews.llvm.org/D126168	2022-05-24 22:42:37 +01:00
Sotiris Apostolakis	67be40df6e	Recommit "[SelectOpti][5/5] Optimize select-to-branch transformation" Use container::size_type directly to avoid type mismatch causing build failures in Windows. Original commit message: This patch optimizes the transformation of selects to a branch when the heuristics deemed it profitable. It aggressively sinks eligible instructions to the newly created true/false blocks to prevent their execution on the common path and interleaves dependence slices to maximize ILP. Depends on D120232 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D120233	2022-05-24 14:08:09 -04:00
Serge Pavlov	6fc0bc5b0f	Fix behavior of is_fp_class on empty class set The second argument to is_fp_class specifies the set of floating-point class to test against. It can be zero, in this case the intrinsic is expected to return zero value. Differential Revision: https://reviews.llvm.org/D112025	2022-05-24 21:50:18 +07:00
Simon Pilgrim	11455e4758	[DAG] Unroll vectorized FPOW instructions before widening that will scalarize to libcalls anyway Followup to D125988 - FPOW is similar to FREM and will most likely scalarize to libcalls, so unroll before widening to prevent use making additional libcalls with UNDEF args.	2022-05-24 15:44:53 +01:00
Sam Parker	e0fe9785d3	[TypePromotion] Avoid unnecessary trunc zext pairs Any zext 'sink' should already have an operand that is in the legal value, so avoid using a trunc and just use the trunc operand instead. Differential Revision: https://reviews.llvm.org/D118905	2022-05-24 15:34:36 +01:00
Nabeel Omer	8b5d9cbbfe	[x86][DAG] Unroll vectorized FREMs that will become libcalls Currently, two element vectors produced as the result of a binary op are widened to four element vectors on x86 by DAGTypeLegalizer::WidenVecRes_BinaryCanTrap. If the op still isn't legal after widening it is unrolled into 4 scalar ops in SelectionDAG before being converted into a libcall. This way we end up with 4 libcalls (two of them on known undef elements) instead of the original two libcalls. This patch modifies DAGTypeLegalizer::WidenVectorResult to ensure that if it is known that a binary op will be tunred into a libcall, it is unrolled instead of being widened. This prevents the creation of the extra scalar instructions on known undef elements and (eventually) libacalls with known undef parameters which would otherwise be created when the op gets expanded post widening. Differential Revision: https://reviews.llvm.org/D125988	2022-05-24 13:34:51 +01:00
Fraser Cormack	7f7ef0ed61	[LegalizeTypes][NFC] Fix node name in assertion message This was probably copy/pasted from the MSCATTER widening.	2022-05-24 09:16:18 +01:00
Lian Wang	be84f91f87	[LegalizeTypes][VP] Fix OpNo in WidenVecOp_VP_SCATTER Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126276	2022-05-24 07:14:46 +00:00
Chen Zheng	62a9b36fcf	[MachineSink] replace MachineLoop with MachineCycle MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-24 01:16:19 -04:00
Sotiris Apostolakis	1786e70bd8	Revert "[SelectOpti][5/5] Optimize select-to-branch transformation" This reverts commit `a111fb9601`.	2022-05-24 00:02:00 -04:00
Sotiris Apostolakis	a111fb9601	[SelectOpti][5/5] Optimize select-to-branch transformation This patch optimizes the transformation of selects to a branch when the heuristics deemed it profitable. It aggressively sinks eligible instructions to the newly created true/false blocks to prevent their execution on the common path and interleaves dependence slices to maximize ILP. Depends on D120232 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D120233	2022-05-23 23:31:27 -04:00
Sotiris Apostolakis	d7ebb74611	[SelectOpti][4/5] Loop Heuristics This patch adds the loop-level heuristics for determining whether branches are more profitable than conditional moves. These heuristics apply to only inner-most loops. Depends on D120231 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D120232	2022-05-23 22:05:41 -04:00
Sotiris Apostolakis	8b42bc5662	[SelectOpti][3/5] Base Heuristics This patch adds the base heuristics for determining whether branches are more profitable than conditional moves. Base heuristics apply to all code apart from inner-most loops. Depends on D122259 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D120231	2022-05-23 22:01:12 -04:00
Sotiris Apostolakis	97c3ef5c8a	[SelectOpti][2/5] Select-to-branch base transformation This patch implements the actual transformation of selects to branches. It includes only the base transformation without any sinking. Depends on D120230 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D122259	2022-05-23 16:11:40 -04:00
Qunyan Mangus	12bae5f3e2	Remove duplicate fields in RAGreedy RAGreedy has two fields of RegisterClassInfo, one called RCI and another RegClassInfo from its base class. RCI is initialized without freezeReservedRegs first, while RegClassInfo does. Therefore, if reserved registers information is changed between last time freezeReservedRegs is called and RAGreedy, it's not picked up by RCI. Instead of having both fields in RAGreedy, remove RCI and use RegClassInfo instead. Also removed is the TRI field which is present in its base class. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D125926	2022-05-23 13:08:25 -07:00
Craig Topper	569d8945f3	[DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2. If the VT is i2, then 2 is really -2. Test has not been commited yet, but diff shows the change. Fixes PR55644. Differential Revision: https://reviews.llvm.org/D126213	2022-05-23 11:13:57 -07:00
Nikita Popov	5126c38012	[CGP] Freeze condition when despeculating ctlz/cttz Freeze the condition of the newly introduced conditional branch, to avoid immediate undefined behavior if the input to ctlz/cttz was originally poison. Differential Revision: https://reviews.llvm.org/D125887	2022-05-23 11:01:18 +02:00
Craig Topper	c11051a400	[SelectionDAG] Add a freeze to ISD::ABS expansion. I had initially assumed this was the problem with https://github.com/llvm/llvm-project/issues/55271#issuecomment-1133426243 But it turns out that was a simpler issue. This patch is still more correct than what we were doing before so figured I'd submit it anyway. No test case because I'm not sure how to get an undef around until expansion. Looking at the test deltas I wonder if it be valid to combine (sext_inreg (freeze (aextload X))) -> (freeze (sextload X)). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D126175	2022-05-22 14:29:58 -07:00
Craig Topper	768a1ca5ec	[SelectionDAG] Fold abs(undef) to 0 instead of undef. abs should only produce a positive value or the signed minimum value. This means we can't fold abs(undef) to undef as that would allow more values. Fold to 0 instead to match InstSimplify. Fixes test mentioned in comment on pr55271. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126174	2022-05-22 12:47:32 -07:00
Paul Walker	258dac43d6	[SVE] Enable use of 32bit gather/scatter indices for fixed length vectors Differential Revision: https://reviews.llvm.org/D125193	2022-05-22 12:32:30 +01:00
Ping Deng	0e8ac3a797	[LegalizeTypes][VP] Add integer promotion support for vp.sitofp/vp.uitofp Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125960	2022-05-22 02:13:45 +00:00
Craig Topper	4638766794	[TypePromotion] Refine fix sext/zext for promoted constant from D125294. Reviewing the code again, I believe the sext is needed on the LHS or RHS for ICmp and only on the RHS for Add. Add an opcode check before checking the operand number. Fixes PR55627. Differential Revision: https://reviews.llvm.org/D125654	2022-05-21 14:08:15 -07:00
Craig Topper	003b95acf2	[LegalizeTypes] Remove double map lookup in DAGTypeLegalizer::PerformExpensiveChecks. NFC Remove repeated checks for ResId being 0.	2022-05-21 00:06:59 -07:00
Craig Topper	66875dbcc0	[LegalizeTypes] Use SmallDenseMap::count instead of SmallDenseMap::find. NFC It's more readable and more efficient.	2022-05-21 00:06:55 -07:00
Shilei Tian	ff60a0a364	[LLVM] Add a check if should cast atomic operations to integer type Currently for atomic load, store, and rmw instructions, as long as the operand is floating-point value, they are casted to integer. Nowadays many targets can actually support part of atomic operations with floating-point operands. For example, NVPTX supports atomic load and store of floating-point values. This patch adds a series interface functions `shouldCastAtomicXXXInIR`, and the default implementations are same as what we currently do. Later for targets can have their specialization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D125652	2022-05-20 17:23:53 -04:00
Zequan Wu	9886046289	[CodeView] Combine variable def ranges that are continuous. It saves about 1.13% size for chrome.dll.pdb on chrome official build. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D125721	2022-05-20 12:12:14 -07:00
Craig Topper	8d3894f67e	[TypePromotion] Fix another case for sext vs zext in promoted constant. If the SafeWrap operation is a subtract, we negated the constant to treat the subtract as an addition. The sext was based on the operation being addition. So we really need to do (neg (sext (neg C))) when promoting the constant. This is equivalent to (sext C) for every value of C except the min signed value. For min signed value we need to do (zext C) instead. Fixes PR55490. Differential Revision: https://reviews.llvm.org/D125653	2022-05-20 09:30:07 -07:00
Ivan Kosarev	86803008ea	[MIR] Provide location of extra instruction operand when diagnosing it. Also resolves misspelled FileCheck directives caught with D125604. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D125965	2022-05-20 05:56:25 +01:00
Sotiris Apostolakis	ca7c307d18	[SelectOpti][1/5] Setup new select-optimize pass This is the first commit for the cmov-vs-branch optimization pass. The goal is to develop a new profile-guided and target-independent cost/benefit analysis for selecting conditional moves over branches when optimizing for performance. Initially, this new pass is expected to be enabled only for instrumentation-based PGO. RFC: https://discourse.llvm.org/t/rfc-cmov-vs-branch-optimization/6040 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D120230	2022-05-19 16:31:10 +00:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
Lian Wang	530bab1f93	[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation Re-landed D125206 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125206	2022-05-19 09:53:33 +00:00
Lian Wang	f035068bb3	[LegalizeVectorTypes][VP] Add widen and split support for VP_SETCC Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D125446	2022-05-19 07:42:39 +00:00
Lian Wang	bbc6834e26	[LegalizeTypes][VP] Add integer promotions support for VP_TRUNCATE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125739	2022-05-19 07:36:10 +00:00
Lian Wang	993070d11f	[LegalizeTypes][VP][NFC] Use an if and two returns instead of ?: operator Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125858	2022-05-19 07:18:24 +00:00
Jon Roelofs	d699e54ca2	Fix an or+and miscompile w/ GlobalISel Fixes #55284	2022-05-18 19:09:47 -07:00
Matthias Braun	8d03c49f49	Extend switch condition in optimizeSwitchPhiConst when free In a case like: switch((i32)x) { case 42: phi((i64)42, ...); } replace `(i64)42` with `zext(x)` when we can do so for free. This fixes a part of https://github.com/llvm/llvm-project/issues/55153 Differential Revision: https://reviews.llvm.org/D124897	2022-05-18 16:23:53 -07:00
Mitch Phillips	7aa1fa0a0a	Reland "[dwarf] Emit a DIGlobalVariable for constant strings." An upcoming patch will extend llvm-symbolizer to provide the source line information for global variables. The goal is to move AddressSanitizer off of internal debug info for symbolization onto the DWARF standard (and doing a clean-up in the process). Currently, ASan reports the line information for constant strings if a memory safety bug happens around them. We want to keep this behaviour, so we need to emit debuginfo for these variables as well. Reviewed By: dblaikie, rnk, aprantl Differential Revision: https://reviews.llvm.org/D123534	2022-05-18 13:56:45 -07:00
Michael Kitzan	29bebb0237	[GISel] Add new combines for G_FMINNUM/MAXNUM and G_FMINIMUM/MAXIMUM I noticed https://reviews.llvm.org/D87415 added SDAG combines to fold FMIN/MAX instrs with NaNs. The patch implements the same NaN combines for GISel GMIR FMIN/MAX opcodes: G_FMINNUM(X, NaN) -> X G_FMAXNUM(X, NaN) -> X G_FMINIMUM(X, NaN) -> NaN G_FMAXIMUM(X, NaN) -> NaN The patch adds AArch64 tests for these combines as well. Reviewed by: arsenm Differential revision: https://reviews.llvm.org/D125819	2022-05-18 12:08:53 -07:00
Yusra Syeda	5ac411aea8	[SystemZ][z/OS] Add the PPA1 to SystemZAsmPrinter Differential Revision: https://reviews.llvm.org/D125725	2022-05-18 14:13:17 -04:00
Craig Topper	46eef76876	[DAGCombiner] Fix bug in MatchBSwapHWordLow. This function tries to match (a >> 8) \| (a << 8) as (bswap a) >> 16. If the SRL isn't masked and the high bits aren't demanded, we still need to ensure that bits 23:16 are zero. After the right shift they will be in bits 15:8 which is where the important bits from the SHL end up. It's only a bswap if the OR on bits 15:8 only takes the bits from the SHL. Fixes PR55484. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125641	2022-05-18 09:23:18 -07:00
Yeting Kuo	00999fb6e1	[SelectionDAGBuilder] Pass fast math flags to most of VP SDNodes. The patch does not pass math flags to float VPCmpIntrinsics because LLParser could not identify float VPCmpIntrinsics as FPMathOperators. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125600	2022-05-18 16:15:47 +08:00
Simon Pilgrim	d40b7f0d5a	[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node. AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback. Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125607	2022-05-17 13:40:11 +01:00
Jay Foad	77480556c4	[RegAllocGreedy] New hook regClassPriorityTrumpsGlobalness Add a new TargetRegisterInfo hook to allow targets to tweak the priority of live ranges, so that AllocationPriority of the register class will be treated as more important than whether the range is local to a basic block or global. This is determined per-MachineFunction. Differential Revision: https://reviews.llvm.org/D125102	2022-05-17 12:35:21 +01:00
jacquesguan	26593e7314	[SelectionDAG] Support more VP reduction mask operation. This patch uses VP_REDUCE_AND and VP_REDUCE_OR to replace VP_REDUCE_SMAX,VP_REDUCE_SMIN,VP_REDUCE_UMAX and VP_REDUCE_UMIN for mask vector type. Differential Revision: https://reviews.llvm.org/D125002	2022-05-17 09:14:21 +00:00
Fraser Cormack	599ff247de	[StackColoring] Don't merge slots with differing StackIDs The documentation for this specifically mentions that this should not happen. We could think about adding target hooks to permit it (and how to merge IDs) in the future if that is desirable. This specific test case was merging a scalable-vector slot into a non-scalable one and dropping the notion of scalability, meaning we failed to allocate enough stack space for the object. Reviewed By: arsenm, MaskRay, sdesmalen Differential Revision: https://reviews.llvm.org/D125699	2022-05-17 08:28:49 +01:00
Mitch Phillips	ed2c3218f5	Revert "[dwarf] Emit a DIGlobalVariable for constant strings." This reverts commit `4680982b36`. Broke a fuchsia windows bot. More details in the review: https://reviews.llvm.org/D123534	2022-05-16 19:07:38 -07:00
Mitch Phillips	4680982b36	[dwarf] Emit a DIGlobalVariable for constant strings. An upcoming patch will extend llvm-symbolizer to provide the source line information for global variables. The goal is to move AddressSanitizer off of internal debug info for symbolization onto the DWARF standard (and doing a clean-up in the process). Currently, ASan reports the line information for constant strings if a memory safety bug happens around them. We want to keep this behaviour, so we need to emit debuginfo for these variables as well. Reviewed By: dblaikie, rnk, aprantl Differential Revision: https://reviews.llvm.org/D123534	2022-05-16 16:52:16 -07:00
Philip Reames	7dbf2e7b57	Teach PeepholeOpt to eliminate redundant copy from constant physreg (e.g VLENB on RISCV) The existing redundant copy elimination required a virtual register source, but the same logic works for any physreg where we don't have to worry about clobbers. On RISCV, this helps eliminate redundant CSR reads from VLENB. Differential Revision: https://reviews.llvm.org/D125564	2022-05-16 16:38:30 -07:00
Paul Walker	7dd05ba9ed	[SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes. During early gather/scatter enablement two different approaches were taken to represent scaled indices: * A Scale operand whereby byte_offsets = Index * Scale * An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType) Having multiple representations is bad as shown by this patch which fixes instances where the two are out of sync. The dedicated scale operand is more flexible and pervasive so this patch removes the UNSCALED values from IndexType. This means all indices are scaled but the scale can be one, hence unscaled. SDNodes now use the scale operand to answer the "isScaledIndex" question. I toyed with the idea of keeping the UNSCALED enums and helper functions but because they will have no uses and force SDNodes to validate the set of supported values I figured it's best to remove them. We can re-add them if there's a real need. For similar reasons I've kept the IndexType enum when a bool could be used as I think being explicitly looks better. Depends On D123347 Differential Revision: https://reviews.llvm.org/D123381	2022-05-16 20:47:52 +01:00
Craig Topper	1c4880a2d3	[TargetLowering] Expand the last stage of i16 popcnt using shift+add+and instead of mul+shift. If we use multiply it would be with 0x0101 which is 1 more than a power of 2. On some targets we would expand this to shl+add. By avoiding the multiply earlier, we can generate better code. Note, PowerPC doesn't do the shl+add expansion of multiply so one of the tests increased in instruction count. Limiting to scalars because it almost always increased the number of instructions in vector tests. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125638	2022-05-16 09:27:44 -07:00
Craig Topper	e6fc8454be	[DAGCombiner] Fix incorrect indentation. NFC	2022-05-16 09:27:15 -07:00
Philip Reames	55e2df7285	[LiveIntervals] Add range accessors for value numbers [nfc]	2022-05-16 08:23:12 -07:00
Bradley Smith	7ff5148d64	[DAGCombine] Support splat_vector nodes in (and (extload)) dagcombine Differential Revision: https://reviews.llvm.org/D125367	2022-05-16 11:25:20 +00:00
Abinav Puthan Purayil	485dd0b752	[GlobalISel] Handle constant splat in funnel shift combine This change adds the constant splat versions of m_ICst() (by using getBuildVectorConstantSplat()) and uses it in matchOrShiftToFunnelShift(). The getBuildVectorConstantSplat() name is shortened to getIConstantSplatVal() so that the *SExtVal() version would have a more compact name. Differential Revision: https://reviews.llvm.org/D125516	2022-05-16 16:03:30 +05:30
Yeting Kuo	26a61ab678	[SelectionDAG] Make getNode which uses single element SDVTList pass SDNodeFlags. The patch make users not need to know getNode with SDNodeFlags argument may not pass its flags. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125659	2022-05-16 18:19:46 +08:00
Denis Antrushin	8903dbef8f	[StatepointLowering] Properly handle local and non-local relocates of the same value. FunctionLoweringInfo::StatepointRelocationMaps map is used to pass GC pointer lowering information from statepoint to gc.relocate which may appear ini different block. D124444 introduced different lowering for local and non-local relocates. Local relocates use SDValue and non-local relocates use value exported to VReg. But I overlooked the fact that StatepointRelocationMap is indexed not by GCRelocate instruction, but by derived pointer. This works incorrectly when we have two relocates (one local and another non-local) of the same value, because they need different relocation records. This patch fixes the problem by recording relocation information per relocate instruction, not per derived pointer. This way, each gc.relocate can be lowered differently. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D125538	2022-05-16 17:02:34 +07:00
Nikita Popov	05c3fe075d	[FastISel] Fix load folding for registers with fixups FastISel tries to fold loads into the single using instruction. However, if the register has fixups, then there may be additional uses through an alias of the register. In particular, this fixes the problem reported at https://reviews.llvm.org/D119432#3507087. The load register is (at the time of load folding) only used in a single call instruction. However, selection of the bitcast has added a fixup between the load register and the cross-BB register of the bitcast result. After fixups are applied, there would now be two uses of the load register, so load folding is not legal. Differential Revision: https://reviews.llvm.org/D125459	2022-05-16 10:25:25 +02:00
Craig Topper	b4ad450953	[TargetLowering] expandCTPOP don't create an used constant mask for i8 ctpop. NFC Use early out for the i8 case. I'm looking at avoiding MUL on targets that use libcalls for MUL. So doing a little pre-refactoring.	2022-05-14 20:35:38 -07:00
Simon Pilgrim	f4eac6e5f6	[DAG] visitOR - merge isa/cast<ShuffleVectorSDNode> into dyn_cast<ShuffleVectorSDNode>. NFC. Also, initialize entire mask to -1 to simplify undefined cases.	2022-05-14 20:49:26 +01:00
Simon Pilgrim	95cdd63b87	[DAG] visitADDLike - use SelectionDAG::FoldConstantArithmetic directly to match constant operands SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.	2022-05-14 18:39:41 +01:00
Simon Pilgrim	8db72d9d04	[DAG] visitMUL - pull out repeated SDLoc() calls. NFC.	2022-05-14 14:28:39 +01:00
Simon Pilgrim	8d4d4988e4	[DAG] Use SelectionDAG::FoldConstantArithmetic directly to match constant operands SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.	2022-05-14 14:19:12 +01:00
Simon Pilgrim	1ecc3d86ae	[DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits Pulled out of D77804 as its going to be easier to address the regressions individually. This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. The lost RISCV gorc2 fold shouldn't be a problem - instcombine would have already destroyed that pattern - see https://github.com/llvm/llvm-project/issues/50553 Differential Revision: https://reviews.llvm.org/D124839	2022-05-14 09:50:01 +01:00
Eli Friedman	96c2a0c9ff	[GlobalIsel] Fix fallback if stack protector isn't supported. When GlobalISel fails, we need to report the error, and we need to set the FailedISel property. We skipped those steps if stack protector insertion failed, which led to a very strange miscompile. Differential Revision: https://reviews.llvm.org/D125584	2022-05-13 14:17:27 -07:00
Simon Pilgrim	3fc33ced10	DAGCombiner.cpp - break if-else chains that always return (style)	2022-05-13 18:31:39 +01:00
Sanjay Patel	e52e1dab2a	[SDAG] freeze operand when expanging urem This is a potential miscompile as discussed in issue #55291. The related IR transform was patched with: `d428f09b2c`	2022-05-13 10:55:14 -04:00
Nikita Popov	ed1cb01baf	[IRBuilder] Add IsInBounds parameter to CreateGEP() We commonly want to create either an inbounds or non-inbounds GEP based on a boolean value, e.g. when preserving inbounds from existing GEPs. Directly accept such a boolean in the API, rather than requiring a ternary between CreateGEP and CreateInBoundsGEP. This change is not entirely NFC, because we now preserve an inbounds flag in a constant expression edge-case in InstCombine.	2022-05-13 14:30:55 +02:00
Sam Parker	6d53d35efd	[TypePromotion] Avoid some unnecessary truncs Recommit. Check for legal zext 'sinks' before inserting a trunc. Differential Revision: https://reviews.llvm.org/D115451	2022-05-13 09:45:20 +01:00
Jay Foad	26e1ebd3ea	[GlobalISel] Change ConstantFoldVectorBinop to return vector of APInt Previously it built MIR for the results and returned a Register. This avoids building constants for earlier elements of the vector if later elements will fail to fold, and allows CSEMIRBuilder::buildInstr to avoid unconditionally building a copy from the result. Use a new helper function MachineIRBuilder::buildBuildVectorConstant to build a G_BUILD_VECTOR of G_CONSTANTs. Differential Revision: https://reviews.llvm.org/D117758	2022-05-13 09:33:07 +01:00
Lian Wang	693758b282	[LegalizeTypes][VP] Add integer promotion support for vp.setcc Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125453	2022-05-13 06:25:13 +00:00
Lian Wang	8050ba6678	[LegalizeTypes][VP] Add integer promotion support for vp.merge Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125452	2022-05-13 03:28:29 +00:00
Craig Topper	cec249c60d	[TypePromotion] Promote undef by converting to 0. If we're promoting an undef I think that means that we expect the upper bits are zero. undef doesn't guarantee that. This patch replaces undef with 0 to ensure this. This matches how a zext or sext of undef would be folded by InstCombine/InstSimplify. I haven't found a failure from this was just thinking through the code. Differential Revision: https://reviews.llvm.org/D123174	2022-05-12 09:09:24 -07:00
Fraser Cormack	1106bc208c	[CodeGen][NFC] Move some comments from the end of lines to above them This avoids wrapping the line itself awkwardly when it exceeds 80 chars. It also better matches our style most other places.	2022-05-12 15:45:04 +01:00
Jeremy Morse	a975472fa6	[DebugInfo][InstrRef] Describe value sizes when spilt to stack This is a re-apply of D123599, which was reverted in `4fe2ab5279`, now with a more appropriate assertion. Original commit message follow: InstrRefBasedLDV can track and describe variable values that are spilt to the stack -- however it does not current describe the size of the value on the stack. This can cause uninitialized bytes to be read from the stack if a small register is spilt for a larger variable, or theoretically on big-endian machines if a large value on the stack is used for a small variable. Fix this by using DW_OP_deref_size to specify the amount of data to load from the stack, if there's any possibility for ambiguity. There are a few scenarios where this can be omitted (such as when using DW_OP_piece and a non-DW_OP_stack_value location), see deref-spills-with-size.mir for an explicit table of inputs flavours and output expressions. Differential Revision: https://reviews.llvm.org/D123599	2022-05-12 15:52:55 +01:00
Nikita Popov	50f846d634	[FastISel] Add some debug output (NFC) Print a debug message when aborting isel (next to the ORE report) and when folding a load.	2022-05-12 12:25:20 +02:00
Lian Wang	9176096c86	[LegalizeVectorTypes] Enable WidenVecRes_SETCC work for scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125359	2022-05-12 02:52:43 +00:00
Craig Topper	edbf390d10	[CodeGenPrepare] Use const reference to avoid unnecessary APInt copy. NFC Spotted while looking at Matthias' patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124985	2022-05-11 12:06:45 -07:00
Matthias Braun	de9ad98d2d	Fix endless loop in optimizePhiConst with integer constant switch condition Avoid endless loop in degenerate case with an integer constant as switch condition as reported in https://reviews.llvm.org/D124552	2022-05-11 08:49:01 -07:00
David Green	5feeceddb2	[TypePromotion] Fix sext vs zext in promoted constant As pointed out in #55342, given non-canonical IR with multiple constants, we check the second operand in isSafeWrap, but can promote both with sext. Fix that as suggested by @craig.topper by ensuring we only extend the second constant if multiple are present. Fixes #55342 Differential Revision: https://reviews.llvm.org/D125294	2022-05-11 10:47:44 +01:00
David Green	764a7f4864	[TypePromotion] Format Type Promotion. NFC This clang-formats the TypePromotion code, with the only meaningful change being the removal of a verifyFunction call inside a LLVM_DEBUG, and the printing of the entire function which can be better handled via -print-after-all.	2022-05-11 08:18:58 +01:00
Xiang1 Zhang	2ea8f203cd	[CodeGen] Fix ConvertNodeToLibcall for STRICT_FPOWI Reviewed By: PengfeiWang Differential Revision: https://reviews.llvm.org/D125159	2022-05-11 08:58:06 +08:00
Matthias Braun	f0ea9c9cec	CodeGenPrepare: Replace constant PHI arguments with switch condition value We often see code like the following after running SCCP: switch (x) { case 42: phi(42, ...); } This tends to produce bad code as we currently materialize the constant phi-argument in the switch-block. This increases register pressure and if the pattern repeats for `n` case statements, we end up generating `n` constant values. This changes CodeGenPrepare to catch this pattern and revert it back to: switch (x) { case 42: phi(x, ...); } Differential Revision: https://reviews.llvm.org/D124552	2022-05-10 10:00:10 -07:00
Matthias Braun	cd19af74c0	Avoid 8 and 16bit switch conditions on x86 This adds a `TargetLoweringBase::getSwitchConditionType` callback to give targets a chance to control the type used in `CodeGenPrepare::optimizeSwitchInst`. Implement callback for X86 to avoid i8 and i16 types where possible as they often incur extra zero-extensions. This is NFC for non-X86 targets. Differential Revision: https://reviews.llvm.org/D124894	2022-05-10 10:00:10 -07:00
Lian Wang	f14a1f26ad	Revert "[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation" This patch make CodeGen/test/AArch64/vecreduce-add-legalization.ll fail. This reverts commit `17a8a1bb71`.	2022-05-10 09:25:25 +00:00
Lian Wang	17a8a1bb71	[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125206	2022-05-10 08:52:48 +00:00
Mircea Trofin	c35ad9ee4f	[mlgo] Support exposing more features than those supported by models This allows the compiler to support more features than those supported by a model. The only requirement (development mode only) is that the new features must be appended at the end of the list of features requested from the model. The support is transparent to compiler code: for unsupported features, we provide a valid buffer to copy their values; it's just that this buffer is disconnected from the model, so insofar as the model is concerned (AOT or development mode), these features don't exist. The buffers are allocated at setup - meaning, at steady state, there is no extra allocation (maintaining the current invariant). These buffers has 2 roles: one, keep the compiler code simple. Second, allow logging their values in development mode. The latter allows retraining a model supporting the larger feature set starting from traces produced with the old model. For release mode (AOT-ed models), this decouples compiler evolution from model evolution, which we want in scenarios where the toolchain is frequently rebuilt and redeployed: we can first deploy the new features, and continue working with the older model, until a new model is made available, which can then be picked up the next time the compiler is built. Differential Revision: https://reviews.llvm.org/D124565	2022-05-09 18:01:21 -07:00
David Green	2cfb243bcd	[DAG] Use isAnyConstantBuildVector. NFC As suggested from `02f8519502`, this uses the isAnyConstantBuildVector method in lieu of separate isBuildVectorOfConstantSDNodes calls. It should otherwise be an NFC.	2022-05-09 14:13:03 +01:00
David Green	02f8519502	[DAG] Prevent infinite loop combining bitcast shuffle This prevents an infinite loop from D123801, where code trying to reduce the total number of bitcasts, but also handling constants, could create the opposite transform. Prevent the transform in these case to let the bitcast of a constant transform naturally. Fixes #55345	2022-05-09 09:36:22 +01:00
Simon Pilgrim	800d36cf32	[DAG] Only perform the fold (A-B)+(C-D) --> (A+C)-(B+D) when both inner subs have one use Fixes #51381	2022-05-08 13:51:58 +01:00
Craig Topper	b81bf7bb2f	[LegalizeTypes] Make use of SelectionDAG::getShiftAmountConstant. NFC Instead of calling getShiftAmountTy and getConstant separately.	2022-05-07 12:16:53 -07:00
Craig Topper	00bfaba997	[LegalizeTypes] Don't assume fshl/fshr shift amount type matches the other operands. Like other shifts, the type isn't required to match. We shouldn't assume we can call ZExtPromotedInteger. I tested the PromoteIntOp_FunnelShift locally by removing the promotion of the shift amount from PromoteIntRes_FunnelShift. But with the final version of this patch it is never executed on any tests. Differential Revision: https://reviews.llvm.org/D125106	2022-05-07 11:44:07 -07:00
Amaury Séchet	06fad8bc05	[DAGCombine] Add node in the worklist in topological order in CombineTo This is part of an ongoing effort toward making DAGCombine process the nodes in topological order. This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124743	2022-05-07 16:24:31 +00:00
Paul Walker	702c4ade22	[ISD::IndexType] Helper functions for common queries. Add helper functions to query the signed and scaled properties of ISD::IndexType along with functions to change them. Remove setIndexType from MaskedGatherSDNode because it only has one usage and typically should only be changed alongside its index operand. Minimise the direct use of the enum values to lay the groundwork for more refactoring. Differential Revision: https://reviews.llvm.org/D123347	2022-05-07 11:23:42 +01:00
David Green	5930691ee1	Revert "[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific" This reverts commit `891c3cf99e` as it turns out that the error was not caused by this commit, the error caming from D124526 instead.	2022-05-06 21:03:22 +01:00
David Green	891c3cf99e	[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific Something is going wrong with the BigEndian PowerPC bot. It is hard to tell what is wrong from here, but attempt to fix it by disabling the combineShuffleOfBitcast combine for bigendian.	2022-05-06 18:42:44 +01:00
Craig Topper	76f90a9d71	[SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift. Otherwise we have garbage in the upper bits that can affect the results of the UREM. Fixes PR55296. Differential Revision: https://reviews.llvm.org/D125076	2022-05-06 09:26:30 -07:00
Simon Pilgrim	c0bebc12f0	[DAG] visitREM - merge buildOptimizedSREM into if(). NFCI.	2022-05-06 15:39:17 +01:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
Lian Wang	fb0d636f28	[RISCV][SelectionDAG] Support VP_REDUCE_ADD mask operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124986	2022-05-06 01:49:21 +00:00
Craig Topper	5140e0d219	[SelectionDAGISel] Add back a comment to MergeInputChains handling. NFC This comment used to exist, but was lost in a refactor over 10 years ago, but still seems relevant and improves readability.	2022-05-05 12:59:21 -07:00
Craig Topper	084f967370	[SelectionDAG] Constant fold (sext_inreg undef, VT) to 0 instead of undef. The result of sign_extend_inreg needs to have as many sign bits as requested by the VT argument. The easiest way to guarantee this is to fold it to 0. SystemZ test was modified to avoid using undef. Fixes https://github.com/llvm/llvm-project/issues/55178 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124696	2022-05-05 09:45:35 -07:00
Craig Topper	4e2d1a6c18	[DAGCombiner] Fold (sext/zext undef) -> 0 and aext(undef) -> undef. Differential Revision: https://reviews.llvm.org/D124988	2022-05-05 09:34:18 -07:00
Craig Topper	fd13192aa5	[DAGCombiner] Fold (max/min X, X) -> X. Differential Revision: https://reviews.llvm.org/D124951	2022-05-05 09:34:17 -07:00
Brian Tracy	87a55137e2	Fix "the the" typo in documentation and user facing strings There are many more instances of this pattern, but I chose to limit this change to .rst files (docs), anything in libcxx/include, and string literals. These have the highest chance of being seen by end users. Reviewed By: #libc, Mordante, martong, ldionne Differential Revision: https://reviews.llvm.org/D124708	2022-05-05 17:52:08 +02:00
Thomas Preud'homme	68dee83923	[MachinePipeliner] Fix unscheduled instruction Prior to ordering instructions to be scheduled, the machine pipeliner update recurrence node sets in groupRemainingNodes() by adding in a given node set any node on the dependency path from a node set with higher priority to the given node set. The function computePath() that determine what constitutes a path follows artificial dependencies. However, when ordering the nodes in the resulting node sets, computeNodeOrder() calls ignoreDependence when looking at dependencies which ignores artificial dependencies. This can cause a node not to be scheduled which then causes wrong code generation and in the case of a debug build will lead to an assert failure in generatePhis() in ModuloScheduler.cpp. This commit adds calls to ignoreDependence() in computePath() to not add any node in groupRemainingNodes() that would not be ordered by computeNodeOrder(). Reviewed By: sgundapa Differential Revision: https://reviews.llvm.org/D124267	2022-05-05 16:01:41 +01:00
Xing Xue	e5926906eb	[XCOFF][AIX] Use unique section names for LSDA and EH info sections with -ffunction-sections Summary: When -ffunction-sections is on, this patch makes the compiler to generate unique LSDA and EH info sections for functions on AIX by appending the function name to the section name as a suffix. This will allow the AIX linker to garbage-collect unused function. Reviewed by: MaskRay, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D124855	2022-05-05 09:01:36 -04:00
Jay Foad	9ebbe25034	RegAllocGreedy: Common up part of the priority calculation. NFC.	2022-05-05 10:35:33 +01:00
Nikita Popov	9678936f18	[DAGCombine] Fold (X & ~Y) \| Y with truncated not This extends the (X & ~Y) \| Y to X \| Y fold to also work if ~Y is a truncated not (when taking into account the mask X). This is done by exporting the infrastructure added in D124856 and reusing it here. I've retained the old value of AllowUndefs=false, though probably this can be switched to true with extra test coverage. Differential Revision: https://reviews.llvm.org/D124930	2022-05-05 11:10:11 +02:00
Craig Topper	572dfef1db	[SelectionDAG] Use llvm::any_of to simplify a loop. NFC	2022-05-04 19:09:06 -07:00
Nikita Popov	451bc723ae	[SDAG] Handle truncated not in haveNoCommonBitsSet() Demanded bits analysis may replace a full-width not with a any_extend (not (truncate X)) pattern. This patch looks through this kind of pattern in haveNoCommonBitsSet(). Of course, we can only do this if we only need negated bits in the non-extended part, as the other bits may now be arbitrary. For example, if we have haveNoCommonBitsSet(~X & Y, X) then ~X only needs to actually negate bits set in Y. This is only a partial solution to the problem in that it allows add -> or conversion, but the resulting or doesn't get folded yet. (I guess that will involve exposing getBitwiseNotOperand() as a more general helper and using that in the relevant transform.) Differential Revision: https://reviews.llvm.org/D124856	2022-05-04 15:30:44 +02:00
serge-sans-paille	7030654296	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `fa5a4e1b95` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D124847	2022-05-04 08:32:38 +02:00
Luo, Yuanke	764676b737	[fastregalloc] Fix bug when undef value is tied to def. If the tied use is undef value, fastregalloc should free the def register. There is no reload needed for the undef value. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D124834	2022-05-04 12:12:55 +08:00
Jon Roelofs	e1c808b36e	Fix zero-width bitfield extracts to emit 0 Fixes #55129	2022-05-03 14:46:42 -07:00
Simon Pilgrim	faa35fc873	[DAG] Fix issue with rot(rot(x,c1),c2) -> rot(x,c1+c2) fold with unnormalized rotation amounts Don't assume the rotation amounts have been correctly normalized - do it as part of the constant folding. Also, the normalization should be performed with UREM not SREM.	2022-05-03 17:16:26 +01:00
Nikita Popov	2171a896ed	[SDAG] Handle A and B&~A in haveNoCommonBitsSet() This is the DAG variant of D124763. The code already handles the general pattern, but not this degenerate case. This allows folding A + (B&~A) to A \| (B&~A) which further holds to A \| B. Handling on the SDAG level is needed because in the motivating case the add is actually a getelementptr, which only gets converted into an add on the SDAG level. However, this patch is not quite sufficient to handle the getelementptr case yet, because of an interfering demanded bits simplification. Differential Revision: https://reviews.llvm.org/D124772	2022-05-03 15:47:02 +02:00
Nikita Popov	e0892614b1	[SDAG] Extract commutative helper from haveNoCommonBitsSet() (NFC) To make it easier to add additional patterns, which will generally want to handle commuted top-level operands.	2022-05-03 12:28:35 +02:00
Jeremy Morse	1d712c3818	[DebugInfo][InstrRef] Don't generate redundant DBG_PHIs In SelectionDAG, DBG_PHI instructions are created to "read" physreg values and give them an instruction number, when they can't be traced back to a defining instruction. The most common scenario if arguments to a function. Unfortunately, if you have 100 inlined methods, each of which has the same "this" pointer, then the 100 dbg.value instructions become 100 DBG_INSTR_REFs plus 100 DBG_PHIs, where only one DBG_PHI would suffice. This patch adds a vreg cache for MachienFunction::salvageCopySSA, if we've already traced a value back to the start of a block and created a DBG_PHI then it allows us to re-use the DBG_PHI, as well as reducing work. Differential Revision: https://reviews.llvm.org/D124517	2022-05-03 09:56:12 +01:00
David Green	6f81903e89	[LV][SLP] Mark fptosi_sat as vectorizable This adds fptosi_sat and fptoui_sat to the list of trivially vectorizable functions, mainly so that the loop vectorizer can vectorize the instruction. Marking them as trivially vectorizable also allows them to be SLP vectorized, and Scalarized. The signature of a fptosi_sat requires two type overrides (@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first operand of the intrinsic as a overloaded (but not scalar) operand. Differential Revision: https://reviews.llvm.org/D124358	2022-05-03 09:32:34 +01:00
Hsiangkai Wang	eaaa31ff2c	[RISCV][TargetLowering] Special case overflow expansion for (uaddo X, C). Follow-up to D122933. Differential Revision: https://reviews.llvm.org/D124374	2022-05-03 03:51:36 +00:00
Craig Topper	5f057eaa0d	[DAGCombiner] reassociationCanBreakAddressingModePattern should check uses of the outer add. When looking for memory uses, reassociationCanBreakAddressingModePattern should check uses of the outer ADD rather than the inner ADD. We want to know if the two ops we're reassociating are used by a load/store. In practice, the existing check usually works because CodeGenPrepare will make one of the load/stores have an offset of 0 relative to split GEP. That will make the inner add have a memory use. To test this, I've manually split the GEPs so there is no 0 offset store. This issue was recently discussed in the original review D60294. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D124644	2022-05-02 16:38:53 -07:00
Sanjay Patel	747c6a0c73	[SDAG] fix miscompile when casting int->FP->int This is the codegen equivalent of D124692. As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. https://alive2.llvm.org/ce/z/KtaDmd Differential Revision: https://reviews.llvm.org/D124771	2022-05-02 14:57:27 -04:00
Simon Pilgrim	ae8b10e543	[DAG] (style) Break apart if-else chain as they all return	2022-05-01 17:56:59 +01:00
Paul Walker	f10a8f6752	[LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG SIGN_EXTEND_INREG expansion can trigger a TypeSize error because "VT.getSizeInBits() == 1" is used to detect for a boolean without first verifying VT is a scalar.	2022-04-30 19:21:48 +01:00
Craig Topper	6affe87bda	[DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask. We try to match as a disguised rotate by constant of these forms (shl (X \| Y), C1) \| (srl X, C2) --> (rotl X, C1) \| (shl Y, C1) (shl X, C1) \| (srl (X \| Y), C2) --> (rotl X, C1) \| (srl Y, C2) We may have also looked through an AND to find the shift. If we did, we need to apply a mask to the result. I'll add an AArch64 test and pre-commit it and the RISC-V test tomorrow. Fixes PR55201. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124711	2022-04-30 11:02:30 -07:00
David Penry	dcb77643e3	Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM Fixed "private field is not used" warning when compiled with clang. original commit: `28d09bbbc3` reverted in: `fa49021c68` ------ This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch. MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body. The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues. It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D122672	2022-04-29 10:54:39 -07:00
Paul Walker	23c509754d	[DAGCombiner] Stop invalid sign conversion in refineIndexType. When looking through extends of gather/scatter indices it's safe to convert a known positive signed index to unsigned, but unsigned indices must remain unsigned. Depends On D123318 Differential Revision: https://reviews.llvm.org/D123326	2022-04-29 14:20:13 +01:00
Nikita Popov	027c728f29	[SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize This is an alternative to D124530. In getUniformBase() only create scales that match the gather/scatter element size. If targets also support other scales, then they can produce those scales in target DAG combines. This is what X86 already does (as long as the resulting scale would be 1, 2, 4 or 8). This essentially restores the pre-opaque-pointer state of things. Fixes https://github.com/llvm/llvm-project/issues/55021. Differential Revision: https://reviews.llvm.org/D124605	2022-04-29 14:57:53 +02:00
Paul Walker	7a0b897e86	[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling refineUniformBase and selectGatherScatterAddrMode both attempt the transformation: base(0) + index(A+splat(B)) => base(B) + index(A) However, this is only safe when index is not implicitly scaled. Differential Revision: https://reviews.llvm.org/D123222	2022-04-29 12:35:16 +01:00
Serge Pavlov	9fc58f1820	[PowerPC] Support of ppc_fp128 in lowering of llvm.is_fpclass PowerPC supports `ppc_fp128`, which is not an IEEE floating point type. The generic lowering of llvm.is_fpclass could not handle it properly. This change extends the generic lowering code to support `ppc_fp128`. The change was tested on emulator using runtime tests from https://reviews.llvm.org/D112933 and the patch for clang https://reviews.llvm.org/D112932. Differential Revision: https://reviews.llvm.org/D113908	2022-04-29 11:10:47 +07:00
Zequan Wu	4fe2ab5279	Revert "[DebugInfo][InstrRef] Describe value sizes when spilt to stack" This reverts commit `a15b66e76d`. This causes linker to crash at assertion: `Assertion failed: !Expr->isComplex(), file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\CodeGen\LiveDebugValues\InstrRefBasedImpl.cpp, line 907`.	2022-04-28 16:18:16 -07:00
David Penry	fa49021c68	Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM" This reverts commit `28d09bbbc3` while I investigate a buildbot failure.	2022-04-28 13:29:27 -07:00
David Penry	28d09bbbc3	[CodeGen][ARM] Enable Swing Module Scheduling for ARM This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch. MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body. The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues. It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D122672	2022-04-28 13:01:18 -07:00
Alexey Bataev	75e1cf4a6a	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-28 10:04:41 -07:00
Bjorn Pettersson	3a39bb96ca	[SelectionDAG] Use correct boolean representation in FoldConstantArithmetic The description of SETCC says /// SetCC operator - This evaluates to a true value iff the condition is /// true. If the result value type is not i1 then the high bits conform /// to getBooleanContents. Without this patch, we sign extended the i1 to the used larger type regardless of getBooleanContents. This resulted in miscompiles, as shown in the attached testcase that ended up returning -1 instead of 1 when using -mattr=+v. Fixes https://github.com/llvm/llvm-project/issues/55168 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124618	2022-04-28 18:42:16 +02:00
Alexey Bataev	9861ca0c23	Revert "[COST]Improve cost model for shuffles in SLP." This reverts commit `29a470e380` to fix a crash reported in https://reviews.llvm.org/D100486#3479989.	2022-04-28 08:11:56 -07:00
Matt Arsenault	7762a3ce18	Revert "BranchFolder: Assert on SSA functions" This reverts commit `6ff91d17d6`.	2022-04-27 19:02:15 -04:00
Matt Arsenault	6ff91d17d6	BranchFolder: Assert on SSA functions We probably should have the opposite of getRequiredProperties for this	2022-04-27 18:51:37 -04:00
Bill Wendling	8f2ec974d1	[X86] Move target-generic code into CodeGen [NFC] This code is the same for all platforms. Differential Revision: https://reviews.llvm.org/D124566	2022-04-27 15:37:28 -07:00
Matt Arsenault	7c2db66632	llvm-reduce: Support multiple MachineFunctions The current testcase I'm trying to reduce only reproduces with IPRA enabled and requires handling multiple functions. The only real difference vs. the IR is the extra indirect to look for the underlying MachineFunction, so treat the ReduceWorkItem as the module instead of the function. The ugliest piece of this is really the ugliness of MachineModuleInfo. It not only tracks actual module state, but has a number of transient fields used for isel and/or the asm printer. These shouldn't do any harm for the use here, though they should be separated out.	2022-04-27 18:11:59 -04:00
Alexey Bataev	29a470e380	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-27 10:56:26 -07:00
Denis Antrushin	4059770af5	[StatepointLowering] Only export STATEPOINT results if used in nonlocal blocks. Cuurently we always export STATEPOINT results (GC pointers lowered via VRegs) to virtual registers. When processing gc.relocate instructions we have to generate CopyFromRegs node and then export it to VReg again if gc.relocate is used in other basic blocks. This results in generation of extra COPY MIR instruction if statepoint and its gc.relocate are in the same BB, but gc.relocate result is used in other blocks. This patch changes this behavior to export statepoint results only if used in other basic blocks. For local uses StatepointLoweringState.(get\|set)Location() API is used to communicate appropriate statepoint result from `LowerStatepoint()` to `visitGCRelocate()` This is NFC and is purely compile time optimization. On big methids it can improve codegen compile time up to 10%. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D124444	2022-04-27 15:53:24 +03:00
Jeremy Morse	a15b66e76d	[DebugInfo][InstrRef] Describe value sizes when spilt to stack InstrRefBasedLDV can track and describe variable values that are spilt to the stack -- however it does not current describe the size of the value on the stack. This can cause uninitialized bytes to be read from the stack if a small register is spilt for a larger variable, or theoretically on big-endian machines if a large value on the stack is used for a small variable. Fix this by using DW_OP_deref_size to specify the amount of data to load from the stack, if there's any possibility for ambiguity. There are a few scenarios where this can be omitted (such as when using DW_OP_piece and a non-DW_OP_stack_value location), see deref-spills-with-size.mir for an explicit table of inputs flavours and output expressions. Differential Revision: https://reviews.llvm.org/D123599	2022-04-27 09:54:50 +01:00
Andrew Savonichev	0a27622a1d	[NVPTX] Disable DWARF .file directory for PTX Default behavior for .file directory was changed in D105856, but ptxas (CUDA 11.5 release) refuses to parse it: $ llc -march=nvptx64 llvm/test/DebugInfo/NVPTX/debug-file-loc.ll $ ptxas debug-file-loc.s ptxas debug-file-loc.s, line 42; fatal : Parsing error near '"foo.h"': syntax error Added a new field to MCAsmInfo to control default value of UseDwarfDirectory. This value is used if -dwarf-directory command line option is not specified. Differential Revision: https://reviews.llvm.org/D121299	2022-04-26 21:40:36 +03:00
Jeremy Morse	65d5beca13	Reapply D124184, [DebugInfo][InstrRef] Add a size operand to DBG_PHI This was reverted twice, in `987cd7c3ed` and `13815e8cbf`. The latter stemed from not accounting for rare register classes in a pre-allocated array, and the former from an array not being completely initialized, leading to asan complaining.	2022-04-26 15:49:22 +01:00
Serge Pavlov	170a903144	Intrinsic for checking floating point class This change introduces a new intrinsic, `llvm.is.fpclass`, which checks if the provided floating-point number belongs to any of the the specified value classes. The intrinsic implements the checks made by C standard library functions `isnan`, `isinf`, `isfinite`, `isnormal`, `issubnormal`, `issignaling` and corresponding IEEE-754 operations. The primary motivation for this intrinsic is the support of strict FP mode. In this mode using compare instructions or other FP operations is not possible, because if the value is a signaling NaN, floating-point exception `Invalid` is raised, but the aforementioned functions must never raise exceptions. Currently there are two solutions for this problem, both are implemented partially. One of them is using integer operations to implement the check. It was implemented in https://reviews.llvm.org/D95948 for `isnan`. It solves the problem of exceptions, but offers one solution for all targets, although some can do the check in more efficient way. The other, implemented in https://reviews.llvm.org/D96568, introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects a target specific code into IR to implement `isnan` and some other functions. It is convenient for targets that have dedicated instruction to determine FP data class. However using target-specific intrinsic complicates analysis and can prevent some optimizations. A special intrinsic for value class checks allows representing data class tests with enough flexibility. During IR transformations it represents the check in target-independent way and saves it from undesired transformations. In the instruction selector it allows efficient lowering depending on the used target and mode. This implementation is an extended variant of `llvm.isnan` introduced in https://reviews.llvm.org/D104854. It is limited to minimal intrinsic support. Target-specific treatment will be implemented in separate patches. Differential Revision: https://reviews.llvm.org/D112025	2022-04-26 13:09:16 +07:00
Lian Wang	9980148305	[RISCV][SelectionDAG] Support VP_ADD/VP_MUL/VP_SUB mask operations Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124144	2022-04-26 02:30:22 +00:00
Jeremy Morse	987cd7c3ed	Revert "Reapply D124184, [DebugInfo][InstrRef] Add a size operand to DBG_PHI" This reverts commit `5db9250231`. Further to the early revert, the sanitizers have found something wrong with this.	2022-04-25 23:30:15 +01:00
Matt Arsenault	7714e03175	RegAllocGreedy: Allow last chance recolor to retry overlapping tuples Last chance recoloring didn't try recoloring a done register with the same class since it believed there was no point. This doesn't necessarily apply if the members in that class overlap. Allow the recoloring to proceed if the assigned interfering physical register overlaps with the candidate register. This avoids an allocation failure with overlapping tuples. This testcase could be handled better, and I don't believe should reach last chance recoloring. The failure only manifests with the mutually unsatisfiable register hints to overlapping tuples. The earlier assignment decisions probably should have figured out that using these hints was a bad idea.	2022-04-25 17:07:17 -04:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Jeremy Morse	5db9250231	Reapply D124184, [DebugInfo][InstrRef] Add a size operand to DBG_PHI This was applied in `fda4305e53`, reverted in `13815e8cbf`, the problem was that fp80 X86 registers that were spilt to the stack aren't expected by LiveDebugValues. It pre-allocates a position number for all register sizes that can be spilt, and 80 bits isn't exactly common. The solution is to scan the register classes to find any unrecognised register sizes, adn pre-allocate those position numbers, avoiding a later assertion.	2022-04-25 15:50:15 +01:00
Jeremy Morse	13815e8cbf	Revert "[DebugInfo][InstrRef] Add a size operand to DBG_PHI" This reverts commit `fda4305e53`. Green dragon has spotted a problem -- it's understood, but might be fiddly to fix, reverting in the meantime.	2022-04-25 14:06:12 +01:00
Jeremy Morse	fda4305e53	[DebugInfo][InstrRef] Add a size operand to DBG_PHI DBG_PHI instructions can refer to stack slots, to indicate that multiple values merge together on control flow joins in that slot. This is fine -- however the slot might be merged at a later date with a slot of a different size. In doing so, we lose information about the size the eliminated PHI. Later analysis passes have to guess. Improve this by attaching an optional "bit size" operand to DBG_PHI, which only gets added for stack slots, to let us know how large a size the value on the stack is. Differential Revision: https://reviews.llvm.org/D124184	2022-04-25 13:41:34 +01:00
Matt Arsenault	6fa1d12b3c	ProcessImplicitDefs: Use required properties instead of isSSA assert	2022-04-22 18:28:45 -04:00
Simon Pilgrim	34e7243464	[DAG] Fold freeze(bitcast(x)) -> bitcast(freeze(x)) This is a very specific fold to fix an upstream poor codegen issue. InstCombine has the much more flexible pushFreezeToPreventPoisonFromPropagating but I don't think we're quite there with DAG/TLI handling for canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison value tracking yet. Fixes #54911 Differential Revision: https://reviews.llvm.org/D124185	2022-04-22 16:39:25 +01:00
Matt Arsenault	9c122537cd	MIR: Serialize FunctionContextIdx in MachineFrameInfo	2022-04-22 11:07:41 -04:00
Matt Arsenault	40bc9112c0	GlobalISel: Relax handling of G_ASSERT_* with source register classes The most common situation where G_ASSERT_ZEXT appears for AMDGPU is a copy from a physical register, which happens to use set the actual register class on the virtual register. After copy coalescing, the assert's source operand had a vreg with a set class. The verifier was strictly rejecting cases where the set class/bank weren't an exact match. Additionally, RegBankSelect was also expecting a register bank to be set on the register, not a class. This is much stricter than regular copies so relax this behavior. This now allows these 2 cases: 1. Source register has either class or bank, and the result does not 2. Source register has a register class, and the result is a register with a matching bank. This should avoid needing some kind of special handling to avoid violating this constraint when folding copies.	2022-04-22 10:49:50 -04:00
Vitaly Buka	9be90748f1	Revert "[asan] Emit .size directive for global object size before redzone" Revert "[docs] Fix underline" Breaks a lot of asan tests in google. This reverts commit `365c3e85bc`. This reverts commit `78a784bea4`.	2022-04-21 16:21:17 -07:00
Alex Brachet	78a784bea4	[asan] Emit .size directive for global object size before redzone This emits an `st_size` that represents the actual useable size of an object before the redzone is added. Reviewed By: vitalybuka, MaskRay, hctim Differential Revision: https://reviews.llvm.org/D123010	2022-04-21 20:46:38 +00:00
Paul Kirth	61e36e87df	[safestack] Support safestack in stack size diagnostics Current stack size diagnostics ignore the size of the unsafe stack. This patch attaches the size of the static portion of the unsafe stack to the function as metadata, which can be used by the backend to emit diagnostics regarding stack usage. Reviewed By: phosek, mcgrathr Differential Revision: https://reviews.llvm.org/D119996	2022-04-20 18:29:40 +00:00
Alexey Bataev	2cca53c815	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 09:37:16 -07:00
Matt Arsenault	9209a51918	MachineModuleInfo: Move AddrLabelSymbols to AsmPrinter This was tracking global state only used by the AsmPrinter, which can store its own module global state.	2022-04-20 11:21:40 -04:00
Matt Arsenault	3659780d58	MachineModuleInfo: Remove UsesMorestackAddr This is x86 specific, and adds statefulness to MachineModuleInfo. Instead of explicitly tracking this, infer if we need to declare the symbol based on the reference previously inserted. This produces a small change in the output due to the move from AsmPrinter::doFinalization to X86's emitEndOfAsmFile. This will now be moved relative to other end of file fields, which I'm assuming doesn't matter (e.g. the __morestack_addr declaration is now after the .note.GNU-split-stack part) This also produces another small change in code if the module happened to define/declare __morestack_addr, but I assume that's invalid and doesn't really matter.	2022-04-20 11:10:20 -04:00
Matt Arsenault	d7938b1a81	MachineModuleInfo: Move HasSplitStack handling to AsmPrinter This is used to emit one field in doFinalization for the module. We can accumulate this when emitting all individual functions directly in the AsmPrinter, rather than accumulating additional state in MachineModuleInfo. Move the special case behavior predicate into MachineFrameInfo to share it. This now promotes it to generic behavior. I'm assuming this is fine because no other target implements adjustForSegmentedStacks, or has tests using the split-stack attribute.	2022-04-20 10:54:29 -04:00
Alexey Bataev	5f7ac15912	Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer." This reverts commit `2f49163b33` to fix a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284	2022-04-20 06:35:55 -07:00
Matt Arsenault	26d575eb08	LocalStackSlotAllocation: Combine debug printing statements	2022-04-20 09:31:14 -04:00
Matt Arsenault	4575f35ea1	LocalStackSlotAllocation: Stop creating unused virtual register	2022-04-20 09:31:14 -04:00
Alexey Bataev	2f49163b33	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 05:32:56 -07:00
Matt Arsenault	9592e88f59	MachineModuleInfo: Don't allow dynamically setting DbgInfoAvailable This can be set up front, and used only as a cache. This avoids a field that looks like it requires MIR serialization. I believe this fixes 2 bugs for CodeView. First, this addresses a FIXME that the flag -diable-debug-info-print only works with DWARF. Second, it fixes emitting debug info with emissionKind NoDebug.	2022-04-19 21:08:37 -04:00
Matt Arsenault	8591328e15	Intrinsics: Mark llvm.eh.sjlj.callsite argument as immarg The assert in SelectionDAG implies that it is	2022-04-19 21:04:33 -04:00
Matt Arsenault	507259820a	GlobalISel: Add LegalizeMutations to help use More/FewerElements	2022-04-19 21:04:32 -04:00
Vitaly Buka	33c5d8f939	[msan] Disable assert with msan The assert uses data from just destroyed BasicBlock.	2022-04-19 16:42:05 -07:00
chenglin.bi	222adf338a	[Arch64][SelectionDAG] Add target-specific implementation of srem 1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. 2. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-19 02:49:42 +08:00
chenglin.bi	acfc025a72	Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem" This reverts commit `9d9eddd3dd`.	2022-04-18 10:35:09 +08:00
chenglin.bi	9d9eddd3dd	[Arch64][SelectionDAG] Add target-specific implementation of srem X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-16 12:29:11 +08:00
Matt Arsenault	b8033de063	MIR: Serialize a few bool function fields	2022-04-15 20:31:07 -04:00
Craig Topper	c6dc229a6d	[DAGCombiner] Move call to hasOneUse after opcode checks. NFC Checking the opcode is cheap, counting the number of uses is not.	2022-04-15 17:02:16 -07:00
Craig Topper	a7b9d75e7a	[DAGCombiner] Move or/xor/and opcode check in ReduceLoadOpStoreWidth before hasOneUse check. hasOneUse is not cheap on nodes with chain results that might have many uses. By checking the opcode first, we can avoid a costly walk of the use list on nodes we aren't interested in. Found by investigating calls to hasNUsesOfValue from the example provided in D123857.	2022-04-15 16:38:27 -07:00
Chih-Ping Chen	eab6e94f91	[DebugInfo] Add a TargetFuncName field in DISubprogram for specifying DW_AT_trampoline as a string. Also update the signature of DIBuilder::createFunction to reflect this addition. Differential Revision: https://reviews.llvm.org/D123697	2022-04-15 16:38:23 -04:00
Johannes Doerfert	3f7a6ce0de	[DWARF][FIX] Handle the use of multiple registers gracefully Certain applications crashed for us with the AMDGPU backend. While this is not a proper fix it allows us to compile the code for now. I left a TODO for someone that understands DWARF. Differential Revision: https://reviews.llvm.org/D123717	2022-04-15 13:43:50 -05:00
Clement Courbet	46a13a0ef8	[ExpandMemCmp] Properly expand `bcmp` to an equality pattern. Before that change, constant-size `bcmp` would miss an opportunity to generate a more efficient equality pattern and would generate a -1/0-1 pattern instead. Differential Revision: https://reviews.llvm.org/D123849	2022-04-15 11:26:24 +02:00
Matt Arsenault	9196f5dab7	MachineCSE: Report this requires SSA	2022-04-14 20:21:21 -04:00
John Brawn	12c1022679	[AArch64] Lowering and legalization of strict FP16 For strict FP16 to work correctly needs some changes in lowering and legalization: * SelectionDAGLegalize::PromoteNode was missing handling for some strict fp opcodes. * Some of the custom lowering of strict fp operations needed to be adjusted to work with FP16. * Custom lowering needed to be added for round-to-int operations. With this, and the previous patches for the rest of the strict fp isel, we can set IsStrictFPEnabled = true. Differential Revision: https://reviews.llvm.org/D115620	2022-04-14 16:51:22 +01:00
Joseph Huber	11f47b791f	[OpenMP] Make offloading sections have the SHF_EXCLUDE flag Offloading sections can be embedded in the host during codegen via a section. This section was originally marked as metadata to prevent it from being loaded, but these sections are completely unused at runtime so the linker should automatically drop them from the final executable or shard library. This flag adds support for the SHF_EXCLUDE flag in target lowering and uses it. Reviewed By: JonChesterfield, MaskRay Differential Revision: https://reviews.llvm.org/D122987	2022-04-14 10:50:49 -04:00
Paul Walker	0c44115e51	[SVE] Add support for non-element-type sized scaling when lowering MGATHER/MSCATTER. The lowering code did not use the scale operand of MGATHER/MSCATTER nodes, but instead assumed scaled indices were always scaled based on the element type of the memory type. This patch adds the missing support by rewritting the nodes as unscaled variants. Differential Revision: https://reviews.llvm.org/D123670	2022-04-14 11:54:46 +01:00
Matt Arsenault	1732242bee	RegAlloc: Fix remaining virtual registers after allocation failure This testcase fails register allocation, but at the failure point there were also new split virtual registers. Previously this was assigning the failing register and not enqueueing the newly created split virtual registers. These would then never be allocated and assert in VirtRegRewriter.	2022-04-13 16:25:30 -04:00
Matt Arsenault	681b9466c9	RegAllocGreedy: Remove redundant check for virtual registers The set of interfering virtual registers obviously only includes virtual registers.	2022-04-13 15:00:18 -04:00
serge-sans-paille	fa5a4e1b95	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `a96638e50e` detected a few regressions, fixing them.	2022-04-13 20:53:19 +02:00
Simon Pilgrim	fef221bf1f	[DAG] Enable SimplifyVBinOp folds on add/sub sat intrinsics	2022-04-13 12:53:23 +01:00
Jonas Paulsson	46f83caebc	[InlineAsm] Add support for address operands ("p"). This patch adds support for inline assembly address operands using the "p" constraint on X86 and SystemZ. This was in fact broken on X86 (see example at https://reviews.llvm.org/D110267, Nov 23). These operands should probably be treated the same as memory operands by CodeGenPrepare, which have been commented with "TODO" there. Review: Xiang Zhang and Ulrich Weigand Differential Revision: https://reviews.llvm.org/D122220	2022-04-13 12:50:21 +02:00
Simon Pilgrim	cfb3ee2185	[DAG] Add non-uniform vector support to (shl (srl x, c1), c2) -> (and (shift x, c3)) Another part of D77804 yak shaving Differential Revision: https://reviews.llvm.org/D123523	2022-04-13 11:37:33 +01:00
Matt Arsenault	d4b1be20f6	RegAllocGreedy: Fix illegal eviction assert for urgent evictions The condition in canEvictInterferenceBasedOnCost is slightly different from the assertion in evictInteference. canEvictInterferenceBasedOnCost uses a <= check for the cascade number for legality, but the assert was checking for <. For equal cascade numbers for an urgent eviction, canEvictInterferenceBasedOnCost could return success. The actual eviction would then hit this assert. Avoid ever returning true for equivalent cascade numbers. The resulting failed allocation seems a bit off to me. e.g. in illegal-eviction-assert.mir, I wuold assume %0 gets allocated starting at $vgpr0. That was its initial allocation choice, but was later evicted. In this example no evictions can help improve anything.	2022-04-12 19:16:56 -04:00
Matt Arsenault	eefed1dbf0	RegAllocGreedy: Roll back successful recolorings on failure This is a replacement for the original fix attempted in `c46aab01c0`. This fixes "overlapping insert" assertion failures when trying to unwind an unsuccessful recoloring attempt. The problem would occur when there are multiple recoloring candidates which recursively required recoloring. If one recoloring candidate was successfully recolored at one level, and the next recoloring candidate was unsuccessful, we would not roll back the first candidates successful recoloring. The forgotten successful recoloring may have been assigned to something that conflicts with a register that needs to be restored in a parent recoloring attempt. See the testcase added in issue48473 for a more concrete example with explanation.	2022-04-12 19:02:48 -04:00
Matt Arsenault	3754f60112	GlobalISel: Implement MoreElements for select of vector conditions	2022-04-12 16:54:04 -04:00
Matt Arsenault	3f2cc7cc2b	GlobalISel: Fix lowerSelect handling of boolean high bits This was making several invalid assumptions about the incoming select. First, it was assuming the incoming condition was either s1 or already sign extended, not accounting for different boolean high bits behavior between scalar and vector conditions. We only had a vector boolean due to the intermediate step vector select, which is now avoided. Second, it was assuming it can use the result vector type as a boolean mask. These types don't have anything to do with other, and only makes sense in the context of the expansion to bit operations. Since these logically are part of the same lowering, do the complete expansion in a single step. The added select_v4s1_s1 test does fail to legalize, since it seems AArch64's vector legalization support is pretty incomplete.	2022-04-12 16:54:03 -04:00
Matt Arsenault	0e489926be	GlobalISel: Handle widening addo/subo booleans This will be tested in a future patch	2022-04-12 16:54:03 -04:00
Matt Arsenault	95c2bcbf8b	GlobalISel: Handle widening umulo/smulo condition outputs	2022-04-12 16:54:03 -04:00
Matt Arsenault	abe171df06	GlobalISel: Update mutationIsSane assert for scalable vectors	2022-04-12 16:54:03 -04:00
Shao-Ce SUN	e90110e696	[NFC][CodeGen] Use ArrayRef in TargetLowering functions This patch is similar to D122557, adding an `ArrayRef` version for `setOperationAction`, `setLoadExtAction`, `setCondCodeAction`, `setLibcallName`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123467	2022-04-13 00:46:05 +08:00
Simon Pilgrim	bc32a1dd76	[DAG] Add non-uniform vector support to (shl (sr[la] exact X, C1), C2) folds	2022-04-12 12:57:56 +01:00
Craig Topper	35be4a7af3	[SelectionDAG] Remove unecessary null check after call to getNode. NFC As far as I know getNode will never return a null SDValue. I'm guessing this was modeled after the FoldConstantArithmetic call earlier. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123550	2022-04-11 18:03:44 -07:00
Matt Arsenault	5a5034d508	GlobalISel: Verify atomic load/store ordering restriction Reject acquire stores and release loads. This matches the restriction imposed by the LLParser and IR verifier.	2022-04-11 20:12:22 -04:00
Matt Arsenault	d1f97a3419	GlobalISel: Add memSizeNotByteSizePow2 legality helper This is really a replacement for memSizeInBytesNotPow2 that actually does what most every target wants. In particular, since s1 rounds to 1 byte, it wasn't lowered by this predicate. This results in targets needing to think harder and add more matchers to catch all the degenerate cases. Also small bug fix that prevented the correct insertion of G_ASSERT_ZEXT in the AArch64 use case.	2022-04-11 19:43:37 -04:00
Matt Arsenault	1416744f84	GlobalISel: Implement computeKnownBits for overflow bool results	2022-04-11 19:43:37 -04:00
Craig Topper	2ce2562876	[RISCV][SelectionDAG] Add a hook to sign extend i32 ConstantInt operands of phis on RV64. Materializing constants on RISCV is simpler if the constant is sign extended from i32. By default i32 constant operands of phis are zero extended. This patch adds a hook to allow RISCV to override this for i32. We have an existing isSExtCheaperThanZExt, but it operates on EVT which we don't have at these places in the code. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122951	2022-04-11 14:38:39 -07:00
Craig Topper	28cb508195	[TargetLowering][RISCV] Allow truncation when checking if the arguments of a setcc are splats. We're just trying to canonicalize here and won't be using the constant value returned. The attached test changes are because we were previously commuting a seteq X, (splat_vector 0) because we also have (sub 0, X). The 0 is larger than the element type so we don't detect it as a splat without the AllowTruncation flag. By preventing the commute we are able to match it to the vmseq.vx instruction during isel. We only look for constants on the RHS in isel. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D123256	2022-04-11 09:49:36 -07:00
Momchil Velikov	b4ad28da19	[CodeGen] Async unwind - add a pass to fix CFI information This pass inserts the necessary CFI instructions to compensate for the inconsistency of the call-frame information caused by linear (non-CGA aware) nature of the unwind tables. Unlike the `CFIInstrInserer` pass, this one almost always emits only `.cfi_remember_state`/`.cfi_restore_state`, which results in smaller unwind tables and also transparently handles custom unwind info extensions like CFA offset adjustement and save locations of SVE registers. This pass takes advantage of the constraints taht LLVM imposes on the placement of save/restore points (cf. `ShrinkWrap.cpp`): * there is a single basic block, containing the function prologue * possibly multiple epilogue blocks, where each epilogue block is complete and self-contained, i.e. CSR restore instructions (and the corresponding CFI instructions are not split across two or more blocks. * prologue and epilogue blocks are outside of any loops Thus, during execution, at the beginning and at the end of each basic block the function can be in one of two states: - "has a call frame", if the function has executed the prologue, or has not executed any epilogue - "does not have a call frame", if the function has not executed the prologue, or has executed an epilogue These properties can be computed for each basic block by a single RPO traversal. From the point of view of the unwind tables, the "has/does not have call frame" state at beginning of each block is determined by the state at the end of the previous block, in layout order. Where these states differ, we insert compensating CFI instructions, which come in two flavours: - CFI instructions, which reset the unwind table state to the initial one. This is done by a target specific hook and is expected to be trivial to implement, for example it could be: ``` .cfi_def_cfa <sp>, 0 .cfi_same_value <rN> .cfi_same_value <rN-1> ... ``` where `<rN>` are the callee-saved registers. - CFI instructions, which reset the unwind table state to the one created by the function prologue. These are the sequence: ``` .cfi_restore_state .cfi_remember_state ``` In this case we also insert a `.cfi_remember_state` after the last CFI instruction in the function prologue. Reviewed By: MaskRay, danielkiss, chill Differential Revision: https://reviews.llvm.org/D114545	2022-04-11 13:27:26 +01:00
Sanjay Patel	2ed15984b4	[SDAG] try to reduce compare of funnel shift equal 0 fshl (or X, Y), X, C ==/!= 0 --> or (shl Y, C), X ==/!= 0 fshl X, (or X, Y), C ==/!= 0 --> or (srl Y, BW-C), X ==/!= 0 This is similar to an existing setcc-of-rotate fold, but the matching requires more checks for the more general funnel op: https://alive2.llvm.org/ce/z/Ab2jDd We are effectively decomposing the funnel shift into logical shifts, reassociating, and removing a shift. This should get us the final improvements for x86-64 that were originally shown in D111530 ( https://github.com/llvm/llvm-project/issues/49541 ); x86-32 still shows some SHLD/SHRD, so the pattern is not matching there yet. Differential Revision: https://reviews.llvm.org/D122919	2022-04-11 07:44:58 -04:00
Tim Northover	6c85668d28	Tail calls: look through AssertZExt to find register copy. arm64_32 guarantees the high 32 bits of pointer parameters are passed as 0, and this is modelled in the IR by inserting an AssertZExt after the CopyFromReg. The function deciding whether registers that need to be preserved actually are wasn't expecting this so it banned perfectly legitimate tail calls.	2022-04-11 12:24:47 +01:00
Matt Arsenault	9fdd25848a	Transforms: Fix code duplication between LowerAtomic and AtomicExpand	2022-04-08 19:06:36 -04:00
Fraser Cormack	18106b99f0	[VP] Explicitly map from VP intrinsic to ISD opcode This patch aims to overcome an issue in these mappings where, when an ISD node was registered with BEGIN_REGISTER_VP_SDNODE but outwidth the scope of a pair of BEGIN_REGISTER_VP_INTRINSIC/END_REGISTER_VP_INTRINSIC macros, the switch cases fell apart. This in particular happened with VP_SETCC, where we'd end up with something along the lines of: case Intrinsic::vp_fcmp: break; case Intrinsic::vp_icmp: break; ResOpc = ISD::VP_SETCC; case Intrinsic::vp_store: ... To remedy this, we introduce a special-purpose mapping macro which can map any number of VP intrinsic opcodes to an ISD opcode. As a result, we no longer need to special-case the mapping from vp.icmp and vp.fcmp to VP_SETCC, as the new helper macro does it for us. Thanks to @craig.topper for noticing this and to @rogfer01 for the idea. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D123324	2022-04-08 12:30:22 +01:00
Nikita Popov	a5a272a491	[SafeStack] Don't create SCEV min between pointer and integer (PR54784) Rather than rewriting the alloca pointer to zero, use removePointerBase() to drop the base pointer. This will simply bail if the base pointer is not the alloca. We could try doing something more fancy here (like dropping the sources not based on the alloca on the premise that they aren't SafeStack-relevant), but I don't think that's worthwhile. Fixes https://github.com/llvm/llvm-project/issues/54784. Differential Revision: https://reviews.llvm.org/D123309	2022-04-08 09:44:00 +02:00
Chih-Ping Chen	c226a5c4d7	[DebugInfo] Use DW_ATE_signed encoding when creating a Fortran array index type.	2022-04-07 07:00:56 -04:00
Fraser Cormack	8216255c9f	[RISCV][VP] Add basic RVV codegen for vp.fcmp This patch adds the necessary infrastructure to lower vp.fcmp via ISD::VP_SETCC to RVV instructions. Most notably this patch adds cond-code legalization for VP_SETCC, reusing the existing TargetLowering::LegalizeSetCCCondCode by passing in additional SDValue parameters for the Mask and EVL. This method then uses VP operations to legalize the condcode. There is still a general lack of canonicalization on VP_SETCC as opposed to SETCC which results in worse code than is theoretically possible. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123051	2022-04-07 09:16:07 +01:00
Matt Arsenault	7f14a1d46b	AtomicExpand: Add NotAtomic lowering strategy Currently LowerAtomics exists as a separate pass which blindly replaces all atomics. Add a new lowering strategy option to eliminate the atomics which the target can control on a per-instruction level.	2022-04-06 22:34:35 -04:00
Matt Arsenault	c4ea925f50	AtomicExpand: Change return type for shouldExpandAtomicStoreInIR Use the same enum as the other atomic instructions for consistency, in preparation for addition of another strategy. Introduce a new "Expand" option, since the store expansion does not use cmpxchg. Alternatively, the existing CmpXChg strategy could be renamed to Expand.	2022-04-06 22:34:04 -04:00
Craig Topper	bdb1ab9804	[LegalizeTypes][VP] Use LoVT/HiVT when splitting VP operations in SplitVecRes_UnaryOp. The VP path was using the split source VTs instead of the split destination VTs. This may not be a problem today because the VP nodes going through this have the same source and dest VTs. It will be a problem when we start using this function for legalizing VP cast operations.	2022-04-06 10:51:49 -07:00
Daniil Kovalev	62a983ebc5	Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if" This reverts commit `83a798d4b0`. As discussed in D120714 with @thakis, the patch added unneeded complexity without noticeable benefits.	2022-04-06 20:32:53 +03:00
Craig Topper	8fc19185e3	[LegalizeTypes] Move SplitVecRes_VECTOR_REVERSE/VECTOR_SPLICE near other SplitVecRes methods. NFC This file is divided into sections for different legalization actions. We should keep similar methods together.	2022-04-06 10:29:32 -07:00
Craig Topper	1ad36487e9	[LegalizeDAG] Use SelectionDAG::getBoolConstant to simplify some code. NFC	2022-04-06 10:08:11 -07:00
Craig Topper	5b5f59428c	[DAGCombiner] Replace call getSExtOrTrunc with a truncate. NFC The extend case should never occur. The sign extend would be an arbitrary choice, remove it to avoid confusion.	2022-04-06 09:59:45 -07:00
Paul Walker	7d3af9ef0f	[DAGCombine] insert_subvector undef, (splat X), N2 -> splat X Differential Revision: https://reviews.llvm.org/D120328	2022-04-06 17:15:38 +01:00
Fraser Cormack	6be5e875be	[RISCV][VP] Add basic RVV codegen for vp.icmp This patch adds the minimum required to successfully lower vp.icmp via the new ISD::VP_SETCC node to RVV instructions. Regular ISD::SETCC goes through a lot of canonicalization which targets may rely on which has not hereto been ported to VP_SETCC. It also supports expansion of individual condition codes and a non-boolean return type. Support for all of that will follow in later patches. In the case of RVV this largely isn't a problem as the vector integer comparison instructions are plentiful enough that it can lower all VP_SETCC nodes on legal integer vectors except for boolean vectors, which regular SETCC folds away immediately into logical operations. Floating-point VP_SETCC operations aren't as well supported in RVV and the backend relies on condition code expansion, so support for those operations will come in later patches. Portions of this code were taken from the VP reference patches. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122743	2022-04-06 16:51:22 +01:00
Paul Walker	1c307b9794	[NFC] Remove redundant IndexType canonicalisation from DAGTypeLegalizer::PromoteIntOp_MSCATTER Promotion does not affect the base element type and so the original index type will remain unchanged. This reflects the behaviour of DAGTypeLegalizer::PromoteIntOp_MGATHER with no tests affected.	2022-04-06 15:30:29 +01:00
zhongyunde	19e5235147	[AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD because it relies on BuildVectorSDNode is fails for scalable vectors. This patch is intend to handle that, so we can circle back the type MVT::nxv2i32 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122703	2022-04-06 20:50:42 +08:00
Roman Lebedev	34ce9fd864	[TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits E.g. in ``` %i0 = zext <2 x i8> to <2 x i16> %i1 = bitcast <2 x i16> to <4 x i8> ``` the `%i0`'s zero bits are known to be `0xFF00` (upper half of every element is known zero), but no elements are known to be zero, and for `%i1`, we don't know anything about zero bits, but the elements under `0b1010` mask are known to be zero (i.e. the odd elements). But, we didn't perform such a propagation. Noticed while investigating more aggressive `vpmaddwd` formation. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D123163	2022-04-06 14:19:31 +03:00
Daniil Kovalev	83a798d4b0	[CodeGen] Place SDNode debug ID declaration under appropriate #if Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed. Differential Revision: https://reviews.llvm.org/D120714	2022-04-06 14:09:32 +03:00
Jeremy Morse	fb6596f1ec	[DebugInfo][InstrRef] Avoid a crash from mixed variable location modes Variable locations now come in two modes, instruction referencing and DBG_VALUE. At -O0 we pick DBG_VALUE to allow fast construction of variable information. Unfortunately, SelectionDAG edits the optimisation level in the presence of opt-bisect-limit, meaning different passes have different views of what variable location mode we should use. That causes assertions when they're mixed. This patch plumbs through a boolean in SelectionDAG from start to instruction emission, so that we don't rely on the current optimisation level for correctness. Differential Revision: https://reviews.llvm.org/D123033	2022-04-06 11:55:38 +01:00
Simon Pilgrim	3369e474bb	[DAG] Allow XOR(X,MIN_SIGNED_VALUE) to perform AddLike folds As raised on PR52267, XOR(X,MIN_SIGNED_VALUE) can be treated as ADD(X,MIN_SIGNED_VALUE), so let these cases use the 'AddLike' folds, similar to how we perform no-common-bits OR(X,Y) cases. define i8 @src(i8 %x) { %r = xor i8 %x, 128 ret i8 %r } => define i8 @tgt(i8 %x) { %r = add i8 %x, 128 ret i8 %r } Transformation seems to be correct! https://alive2.llvm.org/ce/z/qV46E2 Differential Revision: https://reviews.llvm.org/D122754	2022-04-06 10:37:11 +01:00
Simon Pilgrim	9e97b2a477	[DAG] SimplifySetCC - relax fold (X^C1) == C2 --> X == C1^C2 https://alive2.llvm.org/ce/z/A_auBq Remove limitation that wouldn't perform the fold if all the inverted bits are known zero The thumb2 changes look to be benign, although it does show that the TEQ/TST isel patterns could probably be improved. Fixes movmsk regression in D122754 Differential Revision: https://reviews.llvm.org/D123023	2022-04-06 09:18:08 +01:00
Martin Storsjö	46776f7556	Fix warnings about variables that are set but only used in debug mode Add void casts to mark the variables used, next to the places where they are used in assert or `LLVM_DEBUG()` expressions. Differential Revision: https://reviews.llvm.org/D123117	2022-04-06 10:01:46 +03:00
Argyrios Kyrtzidis	330268ba34	[Support/Hash functions] Change the `final()` and `result()` of the hashing functions to return an array of bytes Returning `std::array<uint8_t, N>` is better ergonomics for the hashing functions usage, instead of a `StringRef`: * When returning `StringRef`, client code is "jumping through hoops" to do string manipulations instead of dealing with fixed array of bytes directly, which is more natural * Returning `std::array<uint8_t, N>` avoids the need for the hasher classes to keep a field just for the purpose of wrapping it and returning it as a `StringRef` As part of this patch also: * Introduce `TruncatedBLAKE3` which is useful for using BLAKE3 as the hasher type for `HashBuilder` with non-default hash sizes. * Make `MD5Result` inherit from `std::array<uint8_t, 16>` which improves & simplifies its API. Differential Revision: https://reviews.llvm.org/D123100	2022-04-05 21:38:06 -07:00
Matt Arsenault	634bf829a8	MachineVerifier: Diagnose undef set on full register defs An undef def of a full register would assert in LiveIntervalCalc.	2022-04-05 22:19:17 -04:00
Matt Arsenault	ced1250b0f	MIRParser: Fix asserting with invalid flags on machine operands Constructing an operand with kills on defs and deads on uses asserts in the constructor, so diagnose these.	2022-04-05 21:46:26 -04:00
Daniel Sanders	93977f37e6	Check if register class was changed in constrainOperandRegClass() NFC When no actual change happens there's no need to notify the observers about the fact the register class is being constrained. So we should avoid notifying observers when no change has happened, because this can dramatically affect compile time for particular test cases. Reviewed By: dsanders, arsenm Differential Revision: https://reviews.llvm.org/D122615	2022-04-05 11:55:07 -07:00
Muhammad Omair Javaid	0320115c16	Revert "[CodeGen] Async unwind - add a pass to fix CFI information" This reverts commit `980c3e6dd2`. This commit had failing tests with clang crashing across various AArch64/Linux buildots. https://lab.llvm.org/buildbot/#/builders/179/builds/3346 Differential Revision: https://reviews.llvm.org/D114545	2022-04-05 13:12:30 +05:00
Max Kazantsev	9a2798c7a3	[CodeGen][NFC] Hoist budget check out of loop Less computations & early exit if we know for sure that the limit will be exceeded.	2022-04-05 14:20:42 +07:00
Jeremy Morse	920de9c94c	Revert "[DebugInfo] Correctly recognize bitfields when emitting dwarf" This reverts commit `059d1f84d2`. Some tests on green dragon failed as a result of this -- see notes on D96334.	2022-04-04 17:14:58 +01:00
Thomas Preud'homme	449ef2fcc6	[Pipeliner] Fix comment typo	2022-04-04 16:10:27 +01:00
Momchil Velikov	980c3e6dd2	[CodeGen] Async unwind - add a pass to fix CFI information This pass inserts the necessary CFI instructions to compensate for the inconsistency of the call-frame information caused by linear (non-CFG aware) nature of the unwind tables. Unlike the `CFIInstrInserer` pass, this one almost always emits only `.cfi_remember_state`/`.cfi_restore_state`, which results in smaller unwind tables and also transparently handles custom unwind info extensions like CFA offset adjustement and save locations of SVE registers. This pass takes advantage of the constraints that LLVM imposes on the placement of save/restore points (cf. `ShrinkWrap.cpp`): * there is a single basic block, containing the function prologue * possibly multiple epilogue blocks, where each epilogue block is complete and self-contained, i.e. CSR restore instructions (and the corresponding CFI instructions are not split across two or more blocks. * prologue and epilogue blocks are outside of any loops Thus, during execution, at the beginning and at the end of each basic block the function can be in one of two states: - "has a call frame", if the function has executed the prologue, or has not executed any epilogue - "does not have a call frame", if the function has not executed the prologue, or has executed an epilogue These properties can be computed for each basic block by a single RPO traversal. In order to accommodate backends which do not generate unwind info in epilogues we compute an additional property "strong no call frame on entry" which is set for the entry point of the function and for every block reachable from the entry along a path that does not execute the prologue. If this property holds, it takes precedence over the "has a call frame" property. From the point of view of the unwind tables, the "has/does not have call frame" state at beginning of each block is determined by the state at the end of the previous block, in layout order. Where these states differ, we insert compensating CFI instructions, which come in two flavours: - CFI instructions, which reset the unwind table state to the initial one. This is done by a target specific hook and is expected to be trivial to implement, for example it could be: ``` .cfi_def_cfa <sp>, 0 .cfi_same_value <rN> .cfi_same_value <rN-1> ... ``` where `<rN>` are the callee-saved registers. - CFI instructions, which reset the unwind table state to the one created by the function prologue. These are the sequence: ``` .cfi_restore_state .cfi_remember_state ``` In this case we also insert a `.cfi_remember_state` after the last CFI instruction in the function prologue. Reviewed By: MaskRay, danielkiss, chill Differential Revision: https://reviews.llvm.org/D114545	2022-04-04 14:38:22 +01:00
Simon Pilgrim	328754474a	[DAG] SimplifySetCC - clang-format add/xor/sub with constant handling. NFC.	2022-04-04 13:30:17 +01:00
Jeremy Morse	059d1f84d2	[DebugInfo] Correctly recognize bitfields when emitting dwarf Use the "isBitfield" flag for debug types to determine whether something is a bitfield, rather than trying to guess from it's layout. Fixes https://bugs.llvm.org/show_bug.cgi?id=44601 Patch by: mahkoh Differential Revision: https://reviews.llvm.org/D96334	2022-04-04 11:14:13 +01:00
Michael Gottesman	e24f534879	[debug-info] As an NFC commit, refactor EmitFuncArgumentDbgValue so that it can be extended to support llvm.dbg.addr. The reason why I am making this change is that before this commit, EmitFuncArgumentDbgValue relied on a boolean flag IsDbgDeclare both to signal that a DBG_VALUE should be made to be indirect /and/ that the original intrinsic was a dbg.declare. This is no longer always true if we add support for handling dbg.addr since we will have an indirect DBG_VALUE that is a different intrinsic from dbg.declare. With that in mind, in this NFC patch, we prepare for future fixes by introducing a 3 case-enum argument to EmitFuncArgumentDbgValue that allows the caller to explicitly specify how the argument's DBG_VALUE should be emitted. This then allows us to turn the indirect checks into a != FuncArgumentDbgValueKind::Value and prepare us for a future where we add support here for llvm.dbg.addr directly. rdar://83957028 Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D122945	2022-04-01 17:07:28 -07:00
Craig Topper	fa630e7594	[RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1). If we expand (uaddo X, 1) we previously expanded the overflow calculation as (X + 1) <u X. This potentially increases the live range of X and can prevent X+1 from reusing the register that previously held X. Since we're adding 1, overflow only occurs if X was UINT_MAX in which case (X+1) would be 0. So this patch adds a special case to expand the overflow calculation to (X+1) == 0. This seems to help with uaddo intrinsics that get introduced by CodeGenPrepare after LSR. Alternatively, we could block the uaddo transform in CodeGenPrepare for this case. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122933	2022-04-01 13:14:10 -07:00
Simon Pilgrim	76cd11f303	[DAG] Add llvm::isMinSignedConstant helper. NFC Pulled out of D122754	2022-04-01 17:47:34 +01:00
Matt Arsenault	4a8665e23e	SelectionDAG: Avoid some uses of getPointerTy Avoids use of the default address space parameter, and avoids some assumptions about the incoming address space.	2022-03-31 18:49:22 -04:00
Matt Arsenault	395f8ccfc9	RegAllocGreedy: Fix typo	2022-03-31 16:30:01 -04:00
Abinav Puthan Purayil	898d5776ec	[AMDGPU][GlobalISel] Scalarize add/sub with overflow ops in the legalizer Differential Revision: https://reviews.llvm.org/D122803	2022-03-31 21:46:34 +05:30
Craig Topper	85eae45520	[SelectionDAG] Move extension type for ConstantSDNode from getCopyToRegs to HandlePHINodesInSuccessorBlocks. D122053 set the ExtendType for ConstantSDNodes in getCopyToRegs to ZERO_EXTEND to match assumptions in ComputePHILiveOutRegInfo. PHIs are probably not the only way ConstantSDNodeNodes can get to getCopyToRegs. This patch adds an ExtendType parameter to CopyValueToVirtualRegister and has HandlePHINodesInSuccessorBlocks pass ISD::ZERO_EXTEND for ConstantInts. This way we only affect ConstantSDNodes for PHIs. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122171	2022-03-30 11:32:43 -07:00
Sanjay Patel	436b875e49	[SDAG] avoid libcalls to fmin/fmax for soft-float targets This is an extension of D70965 to avoid creating a mathlib call where it did not exist in the original source. Also see D70852 for discussion about an alternative proposal that was abandoned. In the motivating bug report: https://github.com/llvm/llvm-project/issues/54554 ...we also have a more general issue about handling "no-builtin" options. Differential Revision: https://reviews.llvm.org/D122610	2022-03-30 11:22:03 -04:00
Sanjay Patel	e18cc5277f	[SDAG] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is the backend version of D122010 and an alternative suggested in D120648. There's an extra check to make sure the shift amount is valid that was not in the rough draft. I'm not sure if there is a larger motivating case for RISCV (bug report?), but the ARM diffs show a benefit from having a late version of the transform (because we do not combine the loads in IR). Differential Revision: https://reviews.llvm.org/D122655	2022-03-30 09:29:32 -04:00
Fraser Cormack	43a91a8474	[SelectionDAG] Don't create illegally-typed nodes while constant folding This patch fixes a (seemingly very rare) crash during vector constant folding introduced in D113300. Normally, during legalization, if we create an illegally-typed node during a failed attempt at constant folding it's cleaned up before being visited, due to it having no uses. If, however, an illegally-typed node is created during one round of legalization and isn't cleaned up, it's possible for a second round of legalization to create new illegally-typed nodes which add extra uses to the old illegal nodes. This means that we can end up visiting the old nodes before they're known to be dead, at which point we crash. I'm not happy about this fix. Creating illegal types at all seems like a bad idea, but we all-too-often rely on illegal constants being successfully folded and being fixed up afterwards. However, we can't rely on constant folding actually happening, and we don't have a foolproof way of peering into the future. Perhaps the correct fix is to revisit the node-iteration order during legalization, ensuring we visit all uses of nodes before the nodes themselves. Or alternatively we could try and clean up dead nodes immediately after failing constant folding. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122382	2022-03-30 13:17:55 +01:00
serge-sans-paille	01be9be2f2	Cleanup includes: final pass Cleanup a few extra files, this closes the work on libLLVM dependencies on my side. Impact on libLLVM preprocessed output: -35876 lines Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122576	2022-03-29 09:00:21 +02:00
Craig Topper	e68257fcee	[RISCV][SelectionDAG] Enable TargetLowering::hasBitTest for masks that fit in ANDI. Modified DAGCombiner to pass the shift the bittest input and the shift amount to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h This is an alternative to D122454. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D122458	2022-03-28 12:46:36 -07:00
David Blaikie	a5032b2633	DebugInfo: Don't allow type units to references types in the CU We could only do this in limited ways (since we emit the TUs first, we can't use ref_addr (& we can't use that in Split DWARF either) - so we had to synthesize declarations into the TUs) and they were ambiguous in some cases (if the CU type had internal linkage, parsing the TU would require knowing which CU was referencing the TU to know which type the declaration was for, which seems not-ideal). So to avoid all that, let's just not reference types defined in the CU from TUs - instead moving the TU type into the CU (recursively). This does increase debug info size (by pulling more things out of type units, into the compile unit) - about 2% of uncompressed dwp file size for clang -O0 -g -gsplit-dwarf. (5% .debug_info.dwo section size increase in the .dwp)	2022-03-25 23:49:03 +00:00
Hongtao Yu	e25f4e4c4a	[PseudoProbe] Do not emit pseudo probes when module is not probed. There is a case when a function has pseudo probe intrinsics but the module it resides does not have the probe desc. This could happen when the current module is not built with `-fpseudo-probe-for-profiling` while a function in it calls some other function from a probed module. In thinLTO mode, the callee function could be imported and inlined into the current function. While this is undefined behavior, I'm fixing the asm printer to not ICE and warn user about this. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D121737	2022-03-25 12:59:53 -07:00
Simon Pilgrim	e209190c2d	[SDAG] enable binop identity constant folds for multiplies Add mul to the list of ops that we canonicalize with a select to expose an identity merge Differential Revision: https://reviews.llvm.org/D122071	2022-03-25 11:07:04 +00:00
Simon Pilgrim	ae95f291e8	[AsmPrinter] AIXException::endFunction - use cast<> instead of dyn_cast<> to avoid dereference of nullptr The pointer is used immediately inside the getSymbol() call, so assert the cast is correct instead of returning nullptr	2022-03-25 10:23:30 +00:00
Craig Topper	67eb2f144e	[SelectionDAG] Add AssertAlign to AddNodeIDCustom so that it will CSE properly. The alignment needs to be part of the folding set hash. This is handled by getAssertAlign when nodes are created, but needs to repeated here. No test case as I found it as part of a very early experimental patch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D122279	2022-03-24 08:59:09 -07:00
Daniil Kovalev	c53cbce45e	[CodeGen] Define ABI breaking class members correctly Non-static class members declared under #ifndef NDEBUG should be declared under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to make headers library-friendly and allow cross-linking, as discussed in D120714. Differential Revision: https://reviews.llvm.org/D121549	2022-03-24 12:42:59 +03:00
Julian Lettner	64902d335c	Reland "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-23 18:36:55 -07:00
Zequan Wu	581dc3c729	Revert "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" This reverts commit `22570bac69`.	2022-03-23 16:11:54 -07:00
Craig Topper	cac9773dcc	[SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo Instead of using operator[], use DenseMap::find to prevent default constructing an entry if it isn't already in the map. Also simplify a condition to check for 0 instead of a virtual register. I'm pretty sure we can only get 0 or a virtual register out of the value map.	2022-03-23 09:52:07 -07:00
serge-sans-paille	60ca256953	Cleanup include: Add missing header Should fix https://lab.llvm.org/buildbot#builders/57/builds/16192 introduced by `02c28970b2`	2022-03-23 15:15:56 +01:00
Benjamin Kramer	9a6e0afac5	Unbreak the build after `02c28970b2`	2022-03-23 14:38:13 +01:00
serge-sans-paille	02c28970b2	Cleanup include: codegen second round Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122180	2022-03-23 13:54:00 +01:00
Craig Topper	681fd2c11e	Revert "[SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo" This reverts commit `1a9b55b63a`. Causing build bot failures	2022-03-22 23:41:47 -07:00
Craig Topper	1a9b55b63a	[SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo Instead of using operator[], use DenseMap::find to prevent default constructing an entry if it isn't already in the map.	2022-03-22 23:24:53 -07:00
Craig Topper	73f0af106b	[SelectionDAG] Add printing support for the Align value of AssertAlign nodes. Differential Revision: https://reviews.llvm.org/D122262	2022-03-22 14:16:32 -07:00
Carl Ritson	8e64d84995	[MachineSink] Check block prologue interference Sinking must check for interference between the block prologue and the instruction being sunk. Specifically check for clobbering of uses by the prologue, and overwrites to prologue defined registers by the sunk instruction. Reviewed By: rampitec, ruiling Differential Revision: https://reviews.llvm.org/D121277	2022-03-22 11:15:37 +09:00
Mircea Trofin	f658ca1aba	[mlgo] Fix build breaks introduced by includes cleanups These were not detected by the build bots because those went quietly offline, too, due to a misconfiguration (fixed since)	2022-03-21 13:49:40 -07:00
Craig Topper	37c0aacd71	[SelectionDAG] Make getPreferredExtendForValue take a Instruction * instead of Value . This is only called for instructions and the caller is already holding an Instruction . This makes the code more explicit and makes it obvious the code doesn't make decisions about constants.	2022-03-21 12:15:22 -07:00
Jay Foad	1bb3a9c642	[MachineCopyPropagation] More robust isForwardableRegClassCopy Change the implementation of isForwardableRegClassCopy so that it does not rely on getMinimalPhysRegClass. Instead, iterate over all classes looking for any that satisfy a required property. NFCI on current upstream targets, but this copes better with downstream AMDGPU changes where some new smaller classes have been introduced, which was breaking regclass equality tests in the old code like: if (UseDstRC != CrossCopyRC && CopyDstRC == CrossCopyRC) Differential Revision: https://reviews.llvm.org/D121903	2022-03-21 16:41:01 +00:00
zhongyunde	828b89bc0b	[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads Trying to reduce the number of masked loads in favour of more unpklo/hi instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions from legal types. Both of normal and masked loads test cases added to guard compile crash. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D120953	2022-03-21 23:47:33 +08:00
Simon Pilgrim	35a7be6ccb	[SDAG] enable binop identity constant folds for shifts Add shl/srl/sra to the list of ops that we canonicalize with a select to expose an identity merge Differential Revision: https://reviews.llvm.org/D122070	2022-03-21 13:02:50 +00:00
Kazu Hirata	1eada2adda	[CodeGen] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC)	2022-03-20 23:11:06 -07:00
Luo, Yuanke	10bb623192	enable binop identity constant folds for add Differential Revision: https://reviews.llvm.org/D119654	2022-03-20 19:07:16 +08:00
Craig Topper	4eb59f0179	[SelectionDAG][RISCV] Make RegsForValue::getCopyToRegs explicitly zero_extend constants. ComputePHILiveOutRegInfo assumes that constant incoming values to Phis will be zero extended if they aren't a legal type. To guarantee that we should zero_extend rather than any_extend constants. This fixes a bug for RISCV where any_extend of constants can be treated as a sign_extend. Differential Revision: https://reviews.llvm.org/D122053	2022-03-19 18:43:14 -07:00
Craig Topper	306ff74154	[SelectionDAG] Use APInt::zextOrSelf instead of zextOrTrunc in ComputePHILiveOutRegInfo The width never decreases here.	2022-03-18 23:26:19 -07:00
Eli Friedman	5cd9fa551e	Fix computation of MadeChange bit in AtomicExpandPass. Fixes llvm-clang-x86_64-expensive-checks-debian failure with `2f497ec3`. expandAtomicStore always modifies the function, so make sure we set MadeChange unconditionally. Not sure how nobody else has stumbled over this before.	2022-03-18 13:47:11 -07:00
Kai Luo	31906a6090	[AtomicExpand][PowerPC] Fix all-one mask value When generating a all-one mask value whose bitwidth is larger than 64, signed extension should be used rather then zero extension. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D120865	2022-03-18 13:35:54 +08:00
Julian Lettner	22570bac69	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-17 10:47:13 -07:00
Matt Arsenault	8d66603a48	Revert "RegAllocGreedy: Fix last chance recolor assert in impossible case" This reverts commit `c46aab01c0`. This evidently blocks compiling in some cases that used to work before. I'm also not fully convinced this is the correct place to fix this problem.	2022-03-17 13:12:01 -04:00
Marco Elver	b09439e20b	[AtomicExpandPass][NFC] Reformat with clang-format NFCI.	2022-03-17 16:58:16 +01:00
Jeremy Morse	12a2f7494e	[DebugInfo][InstrRef] Prefer stack locations for variables This patch adjusts what location is picked for a known variable value -- preferring to leave locations on the stack, even when a value is re-loaded into a register. The benefit is reduced location list entropy, on a clang-3.4 build I found that .debug_loclists reduces in size by 6%, from 29Mb down to 27Mb. Testing: a few tests need the stack slot to be written to explicitly, to force LiveDebugValues into restoring the variable location to a register. I've added an explicit test for the desired behaviour in livedebugvalues_recover_clobbers.mir . Differential Revision: https://reviews.llvm.org/D120732	2022-03-17 14:26:15 +00:00
Heejin Ahn	b8038a916d	[WebAssembly] Disable SimplifyDemandedVectorElts after legalization This fixes a reported bug that caused an infinite loop during the SelectionDAG optimization phase in ISel, by creating an overridable hook in `TargetLowering` that allows us to bail out from running `SimplifyDemandedVectorElts`. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D121869	2022-03-16 20:52:43 -07:00
Marco Elver	555df03012	[SelectionDAG][NFC] Clean up SDCallSiteDbgInfo accessors * Consistent naming: addCallSiteInfo vs. getCallSiteInfo; * Use ternary operator to reduce verbosity; * const'ify getters; * Add comments; NFCI. Differential Revision: https://reviews.llvm.org/D121820	2022-03-16 17:46:06 +01:00
Shengchen Kan	ac64d0d230	[NFC][CodeGen] Remove redundant if clause in TargetPassConfig::addPass	2022-03-16 22:14:23 +08:00
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
Matthias Gehre	09854f2af3	[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers Emit calls to __divei4 and friends for divison/remainder of large integers. This fixes https://github.com/llvm/llvm-project/issues/44994. The overall RFC is in https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329 The compiler-rt part is in https://reviews.llvm.org/D120327 Differential Revision: https://reviews.llvm.org/D120329	2022-03-16 09:36:28 +00:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Craig Topper	1bf4bbc492	[LegalizeTypes][RISCV][WebAssembly] Expand ABS in PromoteIntRes_ABS if it will expand to sra+xor+sub later. If we promote the ABS and then Expand in LegalizeDAG, then both the sra and the xor will have their inputs sign extended. This generates extra code on RISCV which lacks an i8 or i16 sign extend instructon. If we expand during type legalization, then only the sra will get its input sign extended. RISCV is able to combine this with the sra by doing a shift left followed by an sra. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121664	2022-03-15 08:27:39 -07:00
Craig Topper	ad94dfb9a0	[DAGCombiner][RISCV] Adjust (aext (and (trunc x), cst)) -> (and x, cst) to sext cst based on target preference RISCV strong prefers i32 values be sign extended to i64. This combine was always zero extending the constant using APInt methods. This adjusts the code so that it calls getNode using ISD::ANY_EXTEND instead. getNode will call TLI.isSExtCheaperThanZExt to decide how to handle the constant. Tests were copied from D121598 where I noticed that we were creating constants that were hard to materialize. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121650	2022-03-15 08:26:47 -07:00
Simon Pilgrim	7262eacd41	Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'	2022-03-15 13:01:35 +00:00
Fangrui Song	252bc2b9f5	[MachineLICM] Simplify code and avoid adding nullptr values to ParentMap. NFC	2022-03-15 01:24:01 -07:00
Julian Lettner	9c542a5a4e	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121327	2022-03-14 17:51:18 -07:00
Amara Emerson	8cbf18cb04	[GlobalISel] Fix store merging incorrectly merging volatile stores. The existing volatile checks only handle aliasing hazards between stores, but that isn't enough since by that point volatile stores may have already been added to the current candidate group.	2022-03-14 13:48:51 -07:00
Kazu Hirata	9286786e87	[CodeGen] Remove an unused variable introduced in D121128	2022-03-14 11:41:04 -07:00
Mircea Trofin	294eca35a0	[regalloc] Remove -consider-local-interval-cost Discussed extensively on D98232. The functionality introduced in D35816 never worked correctly. In D98232, it was fixed, but, as it was introducing a large compile-time regression, and the value of the original patch was called into doubt, we disabled it by default everywhere. A year later, it appears that caused no grief, so it seems safe to remove the disabled code. This should be accompanied by re-opening bug 26810. Differential Revision: https://reviews.llvm.org/D121128	2022-03-14 10:49:16 -07:00
Sanjay Patel	c2592c374e	[SDAG] simplify bitwise logic with repeated operand We do not have general reassociation here (and probably do not need it), but I noticed these were missing in patches/tests motivated by D111530, so we can at least handle the simplest patterns. The VE test diff looks correct, but we miss that pattern in IR currently: https://alive2.llvm.org/ce/z/u66_PM	2022-03-13 11:12:30 -04:00
Wenlei He	4f320ca4ba	[DebugInfo] Include DW_TAG_skeleton_unit when looking for parent UnitDie `DIE::getUnitDie` looks up parent DIE until compile unit or type unit is found. However for skeleton CU with debug fission, we would have DW_TAG_skeleton_unit instead of DW_TAG_compile_unit as top level DIE. This change fixes the look up so we can get DW_TAG_skeleton_unit as UnitDie for skeleton CU. Differential Revision: https://reviews.llvm.org/D120610	2022-03-12 13:27:42 -08:00
serge-sans-paille	ed98c1b376	Cleanup includes: DebugInfo & CodeGen Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332	2022-03-12 17:26:40 +01:00
Yuanfang Chen	d538ad53c3	[JMCInstrument] infer proper path style based on debug info By default, the path style is decided by the host. This patch makes JMC uses the path style used by the SP directory. This makes JMC output host-independent. Fixes: https://github.com/llvm/llvm-project/issues/54219 Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D121236	2022-03-10 10:50:44 -08:00
Lorenzo Albano	28cfa764c2	[VP] Strided loads/stores This patch introduces two new experimental IR intrinsics and SDAG nodes to represent vector strided loads and stores. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D114884	2022-03-10 18:46:54 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit `7f230feeea`. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
Xiang1 Zhang	c31014322c	TLS loads opimization (hoist) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120000	2022-03-10 09:29:06 +08:00
Stanislav Mekhanoshin	0be6fd44f3	[SDAG] Use MMO flags in MemSDNode folding SDNodes with different target flags may now be folded together rightfully resulting in the assertion in the refineAlignment. Folding nodes with different target flags may result in the wrong load instructions produced at least on the AMDGPU. Fixes: SWDEV-326805 Differential Revision: https://reviews.llvm.org/D121335	2022-03-09 14:25:22 -08:00
Sanjay Patel	341623653d	[SDAG] match rotate pattern with extra 'or' operation This is another fold generalized from D111530. We can find a common source for a rotate operation hidden inside an 'or': https://alive2.llvm.org/ce/z/9pV8hn Deciding when this is profitable vs. a funnel-shift is tricky, but this does not show any regressions: if a target has a rotate but it does not have a funnel-shift, then try to form the rotate here. That is why we don't have x86 test diffs for the scalar tests that are duplicated from AArch64 ( `74a65e3834` ) - shld/shrd are available. That also makes it difficult to show vector diffs - the only case where I found a diff was on x86 AVX512 or XOP with i64 elements. There's an additional check for a legal type to avoid a problem seen with x86-32 where we form a 64-bit rotate but then it gets split inefficiently. We might avoid that by adding more rotate folds, but I didn't check to see what is missing on that path. This gets most of the motivating patterns for AArch64 / ARM that are in D111530. We still need a couple of enhancements to setcc pattern matching with rotate/funnel-shift to get the rest. Differential Revision: https://reviews.llvm.org/D120933	2022-03-09 13:19:00 -05:00
Thomas Preud'homme	67c14d5c69	[MachinePipeliner] Fix isPseduo typo.	2022-03-09 15:26:39 +00:00
Tom Stellard	fb616c9b31	SafeStack: Re-enable SafeStack coloring optimization This was disabled in `2acea2786b` as a work-around for Issue #31491. I've reduced the test case from that bug and confirmed that it is now fixed. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D120866	2022-03-08 15:10:41 -08:00
Craig Topper	29511ec7da	[LegalizeTypes][VP] Add widening and splitting support for VP_FMA. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D120854	2022-03-08 09:59:59 -08:00
Craig Topper	c392b9924e	[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D120785	2022-03-08 09:59:34 -08:00
Fraser Cormack	17310f3d19	[SelectionDAG][NFC] Address a few clang-tidy warnings Fix a couple of else-after-return warnings and some unnecessary parentheses.	2022-03-08 16:22:26 +00:00
Yuanfang Chen	eddd94c27d	Reland "[clang][debug] port clang-cl /JMC flag to ELF" This relands commit `7313474319`. It failed on Windows/Mac because `-fjmc` is only checked for ELF targets. Check the flag unconditionally instead and issue a warning for non-ELF targets.	2022-03-07 21:55:41 -08:00
Yuanfang Chen	f46fa4de4a	Revert "[clang][debug] port clang-cl /JMC flag to ELF" This reverts commit `7313474319`. Break bots: http://45.33.8.238/win/54551/step_7.txt http://45.33.8.238/macm1/29590/step_7.txt	2022-03-07 12:40:43 -08:00
Craig Topper	8e132c5c1d	[LegalizeTypes][ARM][X86] Change ExpandIntRes_ABS to use sra+xor+sub. Previously we used sra+add+xor if ADDCARRY is supported. This changes to sra+xor+sub is SUBCARRY is available. This is consistent with the recent change to the default expansion in LegalizeDAG. Differential Revision: https://reviews.llvm.org/D121039	2022-03-07 11:28:32 -08:00
Yuanfang Chen	7313474319	[clang][debug] port clang-cl /JMC flag to ELF The motivation is to enable the MSVC-style JMC instrumentation usable by a ELF-based debugger. Since there is no prior experience implementing JMC feature for ELF-based debugger, it might be better to just reuse existing MSVC-style JMC instrumentation. For debuggers that support both ELF&COFF (like lldb), the JMC implementation might be shared between ELF&COFF. If this is found to inadequate, it is pretty low-cost switching to alternatives. Implementation: - The '-fjmc' is already a driver and cc1 flag. Wire it up for ELF in the driver. - Refactor the JMC instrumentation pass a little bit. - The ELF handling is different from MSVC in two places: * the flag section name is ".just.my.code" instead of ".msvcjmc" * the way default function is provided: MSVC uses /alternatename; ELF uses weak function. Based on D118428. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D119910	2022-03-07 10:16:24 -08:00
David Green	4388f4f776	[DAG] Don't convert undef to 0 when creating buildvector When inserting undef into buildvectors created from shuffles of buildvectors, we convert elements to the largest needed type. This had the effect of converting undef into 0, which isn't needed as the buildvector implicitly truncates and trunc(zext(undef)) == undef. Differential Revision: https://reviews.llvm.org/D121002	2022-03-06 18:35:34 +00:00
Sanjay Patel	f4b53972ce	[SDAG] fold bitwise logic with shifted operands This extends `acb96ffd14` to 'and' and 'xor' opcodes. Copying from that message: LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().	2022-03-05 11:14:45 -05:00
Jeremy Morse	0e96d95d13	[DebugInfo][InstrRef] Accept register-reads after isel in any block When lowering LLVM-IR to instruction referencing stuff, if a value is defined by a COPY, we try and follow the register definitions back to where the value was defined, and build an instruction reference to that instruction. In a few scenarios (such as arguments), this isn't possible. I added some assertions to catch cases that weren't explicitly whitelisted. Over the course of a few months, several more scenarios have cropped up, the lastest is the llvm.read_register intrinsic, which lets LLVM-IR read an arbitary register at any point. In the face of this, there's little point in validating whether debug-info reads a register in an expected scenario. Thus: this patch just deletes those assertions, and adds a regression test to check that something is done with the llvm.read_register intrinsic. Fixes #54190 Differential Revision: https://reviews.llvm.org/D121001	2022-03-04 17:01:12 +00:00
Paul Walker	42b4a6227e	[DAGCombine] Prevent illegal ISD::SPLAT_VECTOR operations post legalisation. When triggered during operation legalisation the affected combine generates a splat_vector that when custom lowered for SVE fixed length code generation, results in the original precombine sequence and thus we enter a legalisation/combine hang. NOTE: The patch contains no tests because I observed this issue only when combined with other work that might never become public. The current way AArch64 lowers ISD::SPLAT_VECTOR meant a specific test was not possible so I'm hoping the DAGCombiner fix can be seen as obvious. The AArch64ISelLowering change is requirted to maintain existing code quality. Differential Revision: https://reviews.llvm.org/D120735	2022-03-04 11:54:03 +00:00
Maksim Panchenko	7e570308f2	[NFC] Fix typos Reviewed By: yota9, Amir Differential Revision: https://reviews.llvm.org/D120859	2022-03-03 13:26:39 -08:00
Vasileios Porpodas	6f9640d6a3	[RegAlloc] Add a complexity limit in growRegion() to cap compilation time. growRegion() does not scale in code with BBs with a very large number of edges. In such code growRegion() becomes a compile-time bottleneck, consuming 60% of the total compilation time. This patch adds a limit to the complexity of growRegion() by incrementing a counter in each iteration. We bail out once the limit is reached. Differential Revision: https://reviews.llvm.org/D120752	2022-03-03 11:31:07 -08:00
Paul Robinson	7b85f0f32f	[PS4] isPS4 and isPS4CPU are not meaningfully different	2022-03-03 11:36:59 -05:00
Sanjay Patel	e9302bf7ef	[SDAG] try harder to remove a rotate from X == 0 https://alive2.llvm.org/ce/z/mJP7XP This can be viewed as expanding the compare into and/or-of-compares: https://alive2.llvm.org/ce/z/bkZYWE followed by reduction of each compare. This could be extended in several ways: 1. There's a (X & Y) == -1 sibling. 2. We can recurse through more than 1 'or'. 3. The fold could be generalized beyond rotates - any operation that only changes the order of bits (bswap, bitreverse). This is a transform noted in D111530.	2022-03-03 09:25:46 -05:00
Sanjay Patel	c33dbc2a2d	[SDAG] refactor foldSetCCWithRotate; NFC There are more potential optimizations to make here, so rearrange to make it easier to append those.	2022-03-02 16:42:05 -05:00
Craig Topper	ab7a7cc1dd	Revert "[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG." This reverts commit `ac93f95861`. Committed by accident.	2022-03-02 10:00:22 -08:00
Craig Topper	324c0a7206	[SelectionDAG][RISCV] Emit a canonical sign bit test from ExpandIntRes_ABS. Instead of emitting 0 > Hi, emit Hi < 0. If Hi needs to be expanded again this will allow the special case for sign bit tests in ExpandIntOp_SETCC to trigger. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120761	2022-03-02 09:47:26 -08:00
Craig Topper	ac93f95861	[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG. Differential Revision: https://reviews.llvm.org/D120785	2022-03-02 09:47:05 -08:00
Daniel McIntosh	d636b76eca	[CodeGen] Use AdjustStackOffset for Callee Saved Registers in PEI::calculateFrameObjectOffsets Also, changes how the CSR loop is indexed, which should avoid bugs like the one fixed by rG4a57bb5a3b74bdad9b0518009a7d7ac7ca2ac650 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120668	2022-03-02 11:41:12 -05:00
Nikita Popov	6fde043951	[MachineSink] Disable if there are any irreducible cycles This is an alternative to D120330, which disables MachineSink for functions with irreducible cycles entirely. This avoids both the correctness problem, and ensures we don't perform non-profitable sinks into cycles. At the same time, it may also disable profitable sinks in the same function. This can be made more precise by using MachineCycleInfo in the future. Fixes https://github.com/llvm/llvm-project/issues/53990. Differential Revision: https://reviews.llvm.org/D120800	2022-03-02 16:57:29 +01:00
Simon Pilgrim	5cce97d61e	[DAG] isSplatValue - improve ISD::VECTOR_SHUFFLE splat detection Currently we only check for splat shuffles, this extends it to see if the source operand is a splat across the demanded elts based upon the shuffle mask	2022-03-02 15:32:24 +00:00
spupyrev	bcdc047731	speeding up ext-tsp for huge instances Differential Revision: https://reviews.llvm.org/D120780	2022-03-02 07:17:48 -08:00
Simon Pilgrim	df0a2b4f30	[DAG] SelectionDAG::isSplatValue - add initial BITCAST handling This patch adds support for recognising vector splats by peeking through bitcasts to vectors with smaller element types - if all the offset subelements are splats then the bitcasted vector is a splat as well. We don't have great coverage for isSplatValue so I've made this pretty specific to the use case I'm trying to fix - regressions in some vXi64 vector shift by splat cases that 32-bit x86 doesn't recognise because the shift amount buildvector has been type legalised to v2Xi32. We can add further support (floats, bitcast from larger element types, undef elements) when we have actual test coverage. Differential Revision: https://reviews.llvm.org/D120553	2022-03-02 11:25:51 +00:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit `30e612ebdf`.	2022-03-02 14:10:11 +08:00
Mircea Trofin	cb2160760e	[nfc][codegen] Move RegisterBank[Info].h under CodeGen This wraps up from D119053. The 2 headers are moved as described, fixed file headers and include guards, updated all files where the old paths were detected (simple grep through the repo), and `clang-format`-ed it all. Differential Revision: https://reviews.llvm.org/D119876	2022-03-01 21:53:25 -08:00
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Craig Topper	8787726609	[LegalizeTypes] Remove incomplete StrictFP support from SplitVecRes_UnaryOp. NFC There is no handling of Chain operands in this function so it can't work. There's a separate splitting function for all strict fp nodes.	2022-03-01 15:43:57 -08:00
Zequan Wu	5c9e20d7d0	[PDB] Add char8_t type Differential Revision: https://reviews.llvm.org/D120690	2022-03-01 13:39:51 -08:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
Craig Topper	bf8054644d	[DAGCombiner] Don't expand (neg (abs x)) if the abs has an additional user. If the types aren't legal, the expansions may get type legalized in a different way preventing code sharing. If the type is legal, we will share some instructions between the two expansions, but we will need an extra register. Since we don't appear to fold (neg (sub A, B)) if the sub has an additional user, I think it makes sense not to expand NABS. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120513	2022-03-01 07:32:07 -08:00
Jeremy Morse	ab49dce01f	[DebugInfo][InstrRef][NFC] Use unique_ptr instead of raw pointers InstrRefBasedLDV allocates some big tables of ValueIDNum, to store live-in and live-out block values in, that then get passed around as pointers everywhere. This patch wraps the allocation in a std::unique_ptr, names some types based on unique_ptr, and passes references to those around instead. There's no functional change, but it makes it clearer to the reader that references to these tables are borrowed rather than owned, and we get some extra validity assertions too. Differential Revision: https://reviews.llvm.org/D118774	2022-03-01 12:49:50 +00:00
Sam Parker	20d75059a2	Revert "[TypePromotion] Avoid some unnecessary truncs" This reverts commit `281d29b8fe`. Report of a miscompilation and awaiting a reproducer.	2022-03-01 08:59:52 +00:00
Phoebe Wang	e03d216c28	[X86] Use bit test instructions to optimize some logic atomic operations This is to match GCC's optimizations: https://gcc.godbolt.org/z/3odh9e7WE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120199	2022-03-01 09:57:08 +08:00
Sanjay Patel	69684b84c6	[SDAG] fold (rotate X) eq/ne (0/-1) This is the SDAG equivalent of an instcombine transform added with: `fd807601a7` This is another step towards solving #49541 and part of an alternative set of more general transforms than what is proposed in D111530. https://alive2.llvm.org/ce/z/ToxaE8	2022-02-27 11:31:19 -05:00
Sanjay Patel	acb96ffd14	[SDAG] fold bitwise logic with shifted operands LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands(). This is a partial implementation of a transform suggested in D111530 (only handles 'or' bitwise logic as a first step - need to stamp out more tests for other opcodes). Several of the same tests added for D111530 are altered here (but not fully optimized). I'm not sure yet if this would help/hinder that patch, but this should be an improvement for all tests added with `ecf606cb43` since it removes a shift operation in those examples. Differential Revision: https://reviews.llvm.org/D120516	2022-02-27 09:54:12 -05:00
Simon Pilgrim	fadd20f80d	[DAG] Ensure type is legal for bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) fold As reported on D120192	2022-02-27 11:25:22 +00:00
Benjamin Kramer	1de11fe360	Use RegisterInfo::regsOverlaps instead of checking aliases This is both less code and faster since it doesn't have to expand all the sub & superreg sets. NFCI.	2022-02-26 20:32:12 +01:00
Jameson Nash	c4b1a63a1b	mark getTargetTransformInfo and getTargetIRAnalysis as const Seems like this can be const, since Passes shouldn't modify it. Reviewed By: wsmoses Differential Revision: https://reviews.llvm.org/D120518	2022-02-25 14:30:44 -05:00
Rong Xu	ccbbb4f6c7	[Sample-PGO] Emit FS discriminators only when -fdebug-info-for-profiling is set IR level addDiscriminator pass is guarded by DebugInfoForProfiling (set by option -fdebug-info-for-profiling). This patch syncs the logic for the MIR and IR level implementations. Differential Revision: https://reviews.llvm.org/D120536	2022-02-25 09:41:17 -08:00
Nikita Popov	87ebd9a36f	[IR] Use CallBase::getParamElementType() (NFC) As this method now exists on CallBase, use it rather than the one on AttributeList.	2022-02-25 10:01:58 +01:00
Rahman Lavaee	aeec9671fb	Revert "Encode address offsets of basic blocks relative to the end of the previous basic blocks." This reverts commit `029283c1c0`. The code in `ELFFile::decodeBBAddrMap` was not changed in the submitted patch. Differential Revision: https://reviews.llvm.org/D120457	2022-02-24 13:31:15 -08:00
Simon Pilgrim	370ebc9d9a	[DAG] Attempt to fold bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) If the shl is at least half the bitwidth (i.e. the lower half of the bswap source is zero), then we can reduce the shift and perform the bswap at half the bitwidth and just zero extend. Based off PR51391 + PR53867 Differential Revision: https://reviews.llvm.org/D120192	2022-02-24 19:33:51 +00:00
Sanjay Patel	4a3708cd6b	[SDAG] remove shift that is redundant with part of funnel shift This is the SDAG translation of D120253 : https://alive2.llvm.org/ce/z/qHpmNn The SDAG nodes can have different operand types than the result value. We can see an example of that with AArch64 - the funnel shift amount is an i64 rather than i32. We may need to make that match even more flexible to handle post-legalization nodes, but I have not stepped into that yet. Differential Revision: https://reviews.llvm.org/D120264	2022-02-24 11:25:46 -05:00
Jay Foad	719bac55df	[MIRParser] Diagnose too large align values in MachineMemOperands When parsing MachineMemOperands, MIRParser treated the "align" keyword the same as "basealign". Really "basealign" should specify the alignment of the MachinePointerInfo base value, and "align" should specify the alignment of that base value plus the offset. This worked OK when the specified alignment was no larger than the alignment of the offset, but in cases like this it just caused confusion: STW killed %18, 4, %stack.1.ap2.i.i :: (store (s32) into %stack.1.ap2.i.i + 4, align 8) MIRPrinter would never have printed this, with an offset of 4 but an align of 8, so it must have been written by hand. MIRParser would interpret "align 8" as "basealign 8", but I think it is better to give an error and force the user to write "basealign 8" if that is what they really meant. Differential Revision: https://reviews.llvm.org/D120400 Change-Id: I7eeeefc55c2df3554ba8d89f8809a2f45ada32d8	2022-02-24 15:32:08 +00:00
Matthias Braun	6a383369f9	PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs The `SplitIndirectBrCriticalEdges` function was originally designed for `CodeGenPrepare` and skipped splitting of edges when the destination block didn't contain any `PHI` instructions. This only makes sense when reducing COPYs like `CodeGenPrepare`. In the case of `PGOInstrumentation` or `GCOVProfiling` it would result in missed counters and wrong result in functions with computed goto. Differential Revision: https://reviews.llvm.org/D120096	2022-02-23 16:27:37 -08:00
Craig Topper	c7d6448d03	[DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable. Internally to DAGCombiner the SDValues were passed by non-const reference despite not being modified. They were then passed by const reference to TLI. This patch passes them by value which is consistent with the vast majority of code. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120420	2022-02-23 12:40:45 -08:00
Pawe Bylica	afdaa86b77	[DAGCombine] Extend combineCarryDiamond() In combineCarryDiamond() use getAsCarry() to find more candidates for being a carry flag. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118362	2022-02-23 21:37:49 +01:00
Jessica Paquette	68c718c8f4	Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"" This reverts commit `d97f997eb7`. This commit was not NFC. (See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)	2022-02-23 10:35:52 -08:00
Sanjay Patel	21d7c3bcc6	[DAG] try to convert multiply to shift via demanded bits This is a fix for a regression discussed in: https://github.com/llvm/llvm-project/issues/53829 We cleared more high multiplier bits with `995d400`, but that can lead to worse codegen because we would fail to recognize the now disguised multiplication by neg-power-of-2 as a shift-left. The problem exists independently of the IR change in the case that the multiply already had cleared high bits. We also convert shl+sub into mul+add in instcombine's negator. This patch fills in the high-bits to see the shift transform opportunity. Alive2 attempt to show correctness: https://alive2.llvm.org/ce/z/GgSKVX The AArch64, RISCV, and MIPS diffs look like clear wins. The x86 code requires an extra move register in the minimal examples, but it's still an improvement to get rid of the multiply on all CPUs that I am aware of (because multiply is never as fast as a shift). There's a potential follow-up noted by the TODO comment. We should already convert that pattern into shl+add in IR, so it's probably not common: https://alive2.llvm.org/ce/z/7QY_Ga Fixes #53829 Differential Revision: https://reviews.llvm.org/D120216	2022-02-23 12:09:32 -05:00
Rainer Orth	365be7ac72	[MC][ELF] Use SHF_SUNW_NODISCARD instead of SHF_GNU_RETAIN on Solaris As requested in D107955 <https://reviews.llvm.org/D107955>, this patch splits off the `MC` and `CodeGen` parts and adds a testcase. Tested on `sparcv9-sun-solaris2.11`, `amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D120318	2022-02-23 15:43:12 +01:00
Bill Wendling	a5bbc6ef99	[NFC] Remove unnecessary "#include"s from header files	2022-02-23 01:20:48 -08:00
Rahman Lavaee	029283c1c0	Encode address offsets of basic blocks relative to the end of the previous basic blocks. Conceptually, the new encoding emits the offsets and sizes as label differences between each two consecutive basic block begin and end label. When decoding, the offsets must be aggregated along with basic block sizes to calculate the final relative-to-function offsets of basic blocks. This encoding uses smaller values compared to the existing one (offsets relative to function symbol). Smaller values tend to occupy fewer bytes in ULEB128 encoding. As a result, we get about 25% reduction in the size of the bb-address-map section (reduction from about 9MB to 7MB). Reviewed By: tmsriram, jhenderson Differential Revision: https://reviews.llvm.org/D106421	2022-02-22 15:46:46 -08:00
Jay Foad	b47e2dc91f	[StableHashing] Hash machine basic blocks and functions This adds very basic support for hashing MachineBasicBlock and MachineFunction, for use in MachineFunctionPass to detect passes that modify the MachineFunction wrongly. Differential Revision: https://reviews.llvm.org/D120122	2022-02-22 17:38:47 +00:00
Joseph Huber	456ffd7a22	[OpenMP] Ensure offloading sections do not have SHF_ALLOC flag We use offloading sections in the new Clang driver scheme to embed device code into the host. We later use these sections to link the device image, after which point they are completely unused and should not be loaded into memory if they are still in the executable. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D120275	2022-02-21 21:35:17 -05:00
Jessica Paquette	d97f997eb7	[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges" We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner. The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates. Imagine a case like this: ``` bb: i1 i2 i3 ... i123456 ``` Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block. The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries. This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live. To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win. By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate. Doing all of this is about a 16% compile time improvement on the case. This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before.	2022-02-21 15:29:16 -08:00
Paweł Bylica	df0c16ce00	[NFC][DAGCombine] Use isOperandOf() in combineCarryDiamond Pre-commit for https://reviews.llvm.org/D118362.	2022-02-21 21:41:31 +01:00
Matt Arsenault	9c7ca51b2c	MIR: Start diagnosing too many operands on an instruction Previously this would just assert which was annoying and didn't point to the specific instruction/operand.	2022-02-21 10:36:39 -05:00
Simon Pilgrim	46f1e8359e	[DAG] visitBSWAP - pull out repeated SDLoc. NFC Cleanup for D120192	2022-02-21 13:08:01 +00:00
Jay Foad	9a547e7009	[StableHashing] Hash vregs with multiple defs This allows stableHashValue to be used on Machine IR that is not in SSA form. Differential Revision: https://reviews.llvm.org/D120121	2022-02-21 10:26:34 +00:00
Craig Topper	440c4b705a	[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y). Previous we used sra (X, size(X)-1); xor (add (X, Y), Y). By placing sub at the end, we allow RISCV to combine sign_extend_inreg with it to form subw. Some X86 tests for Z - abs(X) seem to have improved as well. Other targets look to be a wash. I had to modify ARM's abs matching code to match from sub instead of xor. Maybe instead ISD::ABS should be made legal. I'll try that in parallel to this patch. This is an alternative to D119099 which was focused on RISCV only. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119171	2022-02-20 21:11:23 -08:00
Chen Zheng	efe5b8ad90	[ISEL] remove unnecessary getNode(); NFC Reviewed By: RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D120049	2022-02-20 21:08:49 -05:00
Luo, Yuanke	67ef63138b	[SDAG] enable binop identity constant folds for sub This patch extract the sub folding from D119654 and leave only add folding in that patch. Differential Revision: https://reviews.llvm.org/D120116	2022-02-21 09:37:36 +08:00
David Blaikie	323c672789	DebugInfo: Add an assert about cross-unit references in dwo units This is helping me debug some issues with simplified template names	2022-02-20 14:53:17 -08:00
Amara Emerson	b09e63bad1	[AArch64][GlobalISel] Implement combines for boolean G_SELECT->bitwise ops. Differential Revision: https://reviews.llvm.org/D117160	2022-02-20 00:53:09 -08:00
Craig Topper	24bfa24355	[SelectionDAGBuilder] Simplify visitShift. NFC This code was detecting whether the value returned by getShiftAmountTy can represent all shift amounts. If not, it would use MVT::i32 as a placeholder. getShiftAmountTy was updated last year to return i32 if the type returned by the target couldn't represent all values. This means the MVT::i32 case here is dead and can the logic can be simplified. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120164	2022-02-19 12:40:59 -08:00
Craig Topper	1df8efae56	[SelectionDAG][X86] Support f16 in getReciprocalOpName. If the "reciprocal-estimates" attribute is present and it doesn't contain "all", "none", or "default", we previously crashed on f16 operations. This patch addes an 'h' suffix' to prevent the crash. I've added simple tests that just enable the estimate for all vec-sqrt and one test case that explicitly tests the new 'h' suffix to override the default steps. There may be some frontend change needed to, but I haven't checked that yet. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D120158	2022-02-18 21:55:49 -08:00
Craig Topper	8e7247a377	[SelectionDAG] Fix off by one error in range check in DAGTypeLegalizer::ExpandShiftByConstant. The code was considering shifts by an about larger than the number of bits in the original VT to be out of range. Shifts exactly equal to the original bit width are also out of range. I don't know how to test this. DAGCombiner should usually fold this away. I just noticed while looking for something else in this code. The llvm-cov report shows that we don't have coverage for out of range shifts here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120170	2022-02-18 18:42:20 -08:00
Craig Topper	0d59a54cea	Revert "[SelectionDAG][X86] Support f16 in getReciprocalOpName." This reverts commit `86b5e25662`. This wasn't supposed to be commited yet	2022-02-18 15:39:50 -08:00
Craig Topper	04f815c26f	[SelectionDAGBuilder] Remove LegalTypes=false from a call to getShiftAmountConstant. getShiftAmountTy will return MVT::i32 if the shift amount coming from the target's getScalarShiftAmountTy can't reprsent all possible values. That should eliminate the need to use the pointer type which is what we do when LegalTypes is false. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120165	2022-02-18 15:36:35 -08:00
Craig Topper	86b5e25662	[SelectionDAG][X86] Support f16 in getReciprocalOpName. If the "reciprocal-estimates" attribute is present and it doesn't contain "all", "none", or "default", we previously crashed on f16 operations. This patch addes an 'h' suffix' to prevent the crash. I've added simple tests that just enable the estimate for all vec-sqrt and one test case that explicitly tests the new 'h' suffix to override the default steps. There may be some frontend change needed to, but I haven't checked that yet. Differential Revision: https://reviews.llvm.org/D120158	2022-02-18 15:36:35 -08:00
Sanjay Patel	a2963d871e	[SDAG] fold sub-of-shift to add-of-shift This fold is done in IR: https://alive2.llvm.org/ce/z/jWyFrP There is an x86 test that shows an improvement from the added flexibility of using add (commutative). The other diffs are presumed neutral. Note that this could also be folded to an 'xor', but I'm not sure if that would be universally better (eg, x86 can convert adds more easily into LEA). This helps prevent regressions from a potential fold for issue #53829.	2022-02-18 11:55:50 -05:00
Jay Foad	074d1e2536	[CodeGen] Return better Changed status from PostRAHazardRecognizer Differential Revision: https://reviews.llvm.org/D119954	2022-02-18 09:46:24 +00:00
Jessica Paquette	12389e3758	[MachineOutliner] Add statistics for unsigned vector size Useful for debugging + evaluating improvements to the outliner. Stats are the number of illegal, legal, and invisible instructions in the unsigned vector, and it's total length.	2022-02-17 18:25:51 -08:00
Heejin Ahn	4f9b839772	[WebAssembly] Make EH/SjLj vars unconditionally thread local This makes three thread local variables (`__THREW__`, `__threwValue`, and `__wasm_lpad_context`) unconditionally thread local. If the target doesn't support TLS, they will be downgraded to normal variables in `stripThreadLocals`. This makes the object not linkable with other objects using shared memory, which is what we intend here; these variables should be thread local when used with shared memory. This is what we initially tried in D88262. But D88323 changed this: It only created these variables when threads were supported, because `__THREW__` and `__threwValue` were always generated even if Emscripten EH/SjLj was not used, making all objects built without threads not linkable with shared memory, which was too restrictive. But sometimes this is not safe. If we build an object using variables such as `__THREW__` without threads, it can be linked to other objects using shared memory, because the original object's `__THREW__` was not created thread local to begin with. So this CL basically reverts D88323 with some additional improvements: - This checks each of the functions and global variables created within `LowerEmscriptenEHSjLj` pass and removes it if it's not used at the end of the pass. So only modules using those variables will be affected. - Moves `CoalesceFeaturesAndStripAtomics` and `AtomicExpand` passes after all other IR pasess that can create thread local variables. It is not sufficient to move them to the end of `addIRPasses`, because `__wasm_lpad_context` is created in `WasmEHPrepare`, which runs inside `addPassesToHandleExceptions`, which runs before `addISelPrepare`. So we override `addISelPrepare` and move atomic/TLS stripping and expanding passes there. This also removes merges `TLS` and `NO-TLS` FileCheck lines into one `CHECK` line, because in the bitcode level we always create them as thread local. Also some function declarations are deleted `CHECK` lines because they are unused. Reviewed By: tlively, sbc100 Differential Revision: https://reviews.llvm.org/D120013	2022-02-17 16:04:18 -08:00
Matt Arsenault	c46aab01c0	RegAllocGreedy: Fix last chance recolor assert in impossible case This example is not compilable without handling eviction of specific subregisters. Last chance recoloring was deciding it could try evicting an overlapping superregister, which doesn't help make any progress. The LiveIntervalUnion would then assert due to an overlapping / identical range when trying the new assignment. Unfortunately this is also producing a verifier error after the allocation fails. I've seen a number of these, and not sure if we should just start deleting the function on error rather than trying to figure out how to put together valid MIR. I'm not super confident this is the right place to fix this. I also have a number of failing testcases I need to fix by handling partial evictions of superregisters.	2022-02-17 18:30:56 -05:00
Paul Walker	6457f42bde	[DAGCombiner] Extend ISD::ABDS/U combine to handle more cases. The current ABD combine doesn't quite work for SVE because only a single scalable vector per scalar integer type is legal (e.g. for i32, <vscale x 4 x i32> is the only legal scalable vector type). This patch extends the combine to also trigger for the cases when operand extension must be retained. Differential Revision: https://reviews.llvm.org/D115739	2022-02-17 13:32:20 +00:00
Bjorn Pettersson	1a8bdf95a3	[DAG] Fix in ReplaceAllUsesOfValuesWith When doing SelectionDAG::ReplaceAllUsesOfValuesWith a worklist is prepared containing all users that should be updated. Then we use the RemoveNodeFromCSEMaps/AddModifiedNodeToCSEMaps helpers to handle recursive CSE updates while doing the replacements. This patch aims at solving a problem that could arise if the recursive CSE updates would result in an SDNode present in the worklist is being removed as a side-effect of morphing a prio user in the worklist. To examplify such a scenario, imagine that we have these nodes in the DAG t12: i64 = add t8, t11 t13: i64 = add t12, t8 t14: i64 = add t11, t11 t15: i64 = add t14, t8 t16: i64 = sub t13, t15 and that the t8 uses should be replaced by t11. An initial worklist (listing the users that should be morphed) could be [t12, t13, t15]. When updating t12 we get t12: i64 = add t11, t11 which results in a CSE update that replaces t14 by t12, so we get t15: i64 = add t12, t8 which results in a CSE update that replaces t13 by t12, so we get t16: i64 = sub t12, t15 and then t13 is removed given that it was the last use of t13. So when being done with the updates triggered by rewriting the use of t8 in t12 the t13 node no longer exist. And we used to end up hitting an assertion when continuing with the worklist aiming at replacing the t8 uses in t13. The solution is based on using a DAGUpdateListener, making sure that we prune a user from the worklist if it is removed during the recursive CSE updates. The bug was found using an OOT target. I think the problem is quite old, even if the particular intree target reproducer added in this patch seem to pass when using LLVM 13.0.0. Differential Revision: https://reviews.llvm.org/D119088	2022-02-17 14:29:59 +01:00
Jay Foad	50ddb5d2d1	[CodeGen] Return better Changed status from LocalStackSlotAllocation Differential Revision: https://reviews.llvm.org/D119942	2022-02-17 09:31:41 +00:00
Jay Foad	f0092f9ded	[CodeGen] Return false from LiveIntervals::runOnMachineFunction This is an analysis pass so it does not modify the MachineFunction. Differential Revision: https://reviews.llvm.org/D119941	2022-02-17 09:31:41 +00:00
Jay Foad	3c9229c663	[CodeGen] Return better Changed status from DetectDeadLanes Differential Revision: https://reviews.llvm.org/D119940	2022-02-17 09:31:41 +00:00
Heejin Ahn	c60d822965	[WebAssembly] Make __wasm_lpad_context thread-local This makes `__wasm_lpad_context`, a struct that is used as a communication channel between compiler-generated code and personality function in libunwind, thread local. The library code will be changed to thread local in the emscripten side. Reviewed By: sbc100, tlively Differential Revision: https://reviews.llvm.org/D119803	2022-02-16 15:56:38 -08:00
Craig Topper	1daa66d3fd	[SelectionDAG] Add SPLAT_VECTOR to SelectionDAG::isConstantFPBuildVectorOrConstantFP. Matches what is done for the int version. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D119793	2022-02-16 09:22:11 -08:00
Simon Pilgrim	30e9cdd1aa	[DAG] computeKnownBits - add ISD::AVGCEILU handling Expand the ISD::AVGCEILU to determine the known bits of the result. First part of PR53622 Differential Revision: https://reviews.llvm.org/D119629	2022-02-16 13:00:15 +00:00
Shengchen Kan	ce02c79dc6	[Debugify] Mark mir-check-debugify change nothing of input Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D119914	2022-02-16 18:37:26 +08:00
Shao-Ce SUN	2aed07e96c	[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter` Reviewed By: skan Differential Revision: https://reviews.llvm.org/D119846	2022-02-16 13:10:09 +08:00
Shao-Ce SUN	9cc49c1951	Revert "[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`" This reverts commit `fe25c06cc5`.	2022-02-16 11:57:49 +08:00
Shao-Ce SUN	fe25c06cc5	[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter` For ten years, it seems that `MCRegisterInfo` is not used by any target. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D119846	2022-02-16 11:47:17 +08:00
Carl Ritson	ef949ecba5	[MachineSink] Use SkipPHIsAndLabels for sink insertion points For AMDGPU the insertion point for a block may not be the first non-PHI instruction. This happens when a block contains EXEC mask manipulation related to control flow (converging lanes). Use SkipPHIsAndLabels to determine the block insertion point so that the target can skip any block prologue instructions. Reviewed By: rampitec, ruiling Differential Revision: https://reviews.llvm.org/D119399	2022-02-16 12:44:22 +09:00
Mircea Trofin	c62eefb886	[nfc][codegen] Move RegisterBank[Info].cpp under CodeGen Layering-wise, it seems RegisterBank stuff fits under CodeGen, like other target abstraction. In particular, TargetSubtargetInfo has a getRegBankInfo member, but using that object requires making sure GlobalISel is linked, which is not always the case (e.g. llvm-jitlink doesn't). Differential Revision: https://reviews.llvm.org/D119053	2022-02-15 11:27:15 -08:00
David Green	655d0d86f9	[DAGCombine] Move AVG combine to SimplifyDemandBits This moves the matching of AVGFloor and AVGCeil into a place where demand bit are available, so that it can detect more cases for more folds. It changes the transform to start from a shift, not from a truncate. We match the pattern shr(add(ext(A), ext(B)), 1), transforming to ext(hadd(A, B)). For signed values, because only the bottom bits are demanded llvm will transform the above to use a lshr too, as opposed to ashr. In order to correctly detect the hadd we need to know the demanded bits to turn it back. Depending on whether the shift is signed (ashr) or logical (lshr), and the extensions are signed or unsigned we can create different nodes. If the shift is signed: Needs >= 2 sign bits. https://alive2.llvm.org/ce/z/h4gQAW generating signed rhadd. Needs >= 2 zero bits. https://alive2.llvm.org/ce/z/B64DUA generating unsigned rhadd. If the shift is unsigned: Needs >= 1 zero bits. https://alive2.llvm.org/ce/z/ByD8sj generating unsigned rhadd. Needs 1 demanded bit zero and >= 2 sign bits https://alive2.llvm.org/ce/z/hvPGxX and https://alive2.llvm.org/ce/z/32P5n1 generating signed rhadd. Differential Revision: https://reviews.llvm.org/D119072	2022-02-15 10:17:02 +00:00
Momchil Velikov	6398903ac8	Extend the `uwtable` attribute with unwind table kind We have the `clang -cc1` command-line option `-funwind-tables=1\|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or asynchronous unwind tables (2)`. However, this is encoded in LLVM IR by the presence or the absence of the `uwtable` attribute, i.e. we lose the information whether to generate want just some unwind tables or asynchronous unwind tables. Asynchronous unwind tables take more space in the runtime image, I'd estimate something like 80-90% more, as the difference is adding roughly the same number of CFI directives as for prologues, only a bit simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even more, if you consider tail duplication of epilogue blocks. Asynchronous unwind tables could also restrict code generation to having only a finite number of frame pointer adjustments (an example of not having a finite number of `SP` adjustments is on AArch64 when untagging the stack (MTE) in some cases the compiler can modify `SP` in a loop). Having the CFI precise up to an instruction generally also means one cannot bundle together CFI instructions once the prologue is done, they need to be interspersed with ordinary instructions, which means extra `DW_CFA_advance_loc` commands, further increasing the unwind tables size. That is to say, async unwind tables impose a non-negligible overhead, yet for the most common use cases (like C++ exceptions), they are not even needed. This patch extends the `uwtable` attribute with an optional value: - `uwtable` (default to `async`) - `uwtable(sync)`, synchronous unwind tables - `uwtable(async)`, asynchronous (instruction precise) unwind tables Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D114543	2022-02-14 14:35:02 +00:00
David Green	03380c70ed	[DAGCombine] Basic combines for AVG nodes. This adds very basic combines for AVG nodes, mostly for constant folding and handling degenerate (zero) cases. The code performs mostly the same transforms as visitMULHS, adjusted for AVG nodes. Constant folding extends to a higher bitwidth and drops the lowest bit. For undef nodes, `avg undef, x` is transformed to x. There is also a transform for `avgfloor x, 0` transforming to `shr x, 1`. Differential Revision: https://reviews.llvm.org/D119559	2022-02-14 11:18:35 +00:00
Tim Northover	a87d3ba61c	Reapply: StackProtector: ignore debug insts when splitting blocks. When deciding where to split a block to insert stack guard checks, we should move past any debug instructions we see that might (e.g.) be separating a tail call from its frame wrangling. This time, also don't run off the front of a basic block.	2022-02-14 10:58:22 +00:00
Nikita Popov	ff040eca93	[FastISel] Reuse register for bitcast that does not change MVT The current FastISel code reuses the register for a bitcast that doesn't change the IR type, but uses a reg-to-reg copy if it changes the IR type without changing the MVT. However, we can simply reuse the register in that case as well. In particular, this avoids unnecessary reg-to-reg copies for pointer bitcasts. This was found while inspecting O0 codegen differences between typed and opaque pointers. Differential Revision: https://reviews.llvm.org/D119432	2022-02-14 09:13:17 +01:00
Craig Topper	e72fe654b7	[DAGCombiner] Use getShiftAmountConstant in DAGCombiner::foldSelectOfConstants. This enables fshl to be matched earlier on X86 %6 = lshr i32 %3, 1 %7 = select i1 %4, i32 -2147483648, i32 0 %8 = or i32 %6, %7 X86 uses i8 for shift amounts. SelectionDAGBuilder creates the ISD::SRL with an i8 shift type. DAGCombiner turns the select into an ISD::SHL. Prior to this patch it would use i32 for the shift amount. fshl matching failed because the shift amounts have different types. LegalizeDAG fixes the ISD::SHL shift amount to i8. This allowed fshl matching to succeed. With this patch, the ISD::SHL will be created with an i8 shift amount. This allows the fshl to match immediately. No test case beause we still end up with a fshl either way.	2022-02-13 19:09:26 -08:00
Benjamin Kramer	bee4531bee	[MachineSink] Inline getRegUnits Reg unit sets are uniqued, so no need to wrap it in a set.	2022-02-12 17:46:12 +01:00
Sanjay Patel	96b7e0b5a0	[SDAG] clean up scalarizing load transform I have not found a way to expose a difference for this patch in a test because it only triggers for a one-use load, but this is the code that was adapted into D118376 and caused miscompiles. The new code pattern is the same as what we do in narrowExtractedVectorLoad() (reduces load width for a subvector extract). This removes seemingly unnecessary manual worklist management and fixes the chain updating via "SelectionDAG::makeEquivalentMemoryOrdering()". Differential Revision: https://reviews.llvm.org/D119549	2022-02-12 11:41:19 -05:00
Sanjay Patel	429f10f5f2	[SDAG] reduce code duplication and fix formatting; NFC	2022-02-12 10:22:13 -05:00
Arthur Eubanks	c0281c7607	[OpaquePtr][SPARC] Remove getPointerElementType() call in SparcISelLowering Requires keeping better track of sret types.	2022-02-11 11:31:19 -08:00
David Green	4072e362c0	[ISel] Port AArch64 HADD and RHADD to ISel This ports the aarch64 combines for HADD and RHADD over to DAG combine, so that they can be used in more architectures (notably MVE in a followup patch). They are renamed to AVGFLOOR and AVGCEIL in the process, to avoid confusion with instructions such as X86 hadd. The code was also rewritten slightly to remove the AArch64 idiosyncrasies. The general pattern for a AVGFLOORS is %xe = sext i8 %x to i32 %ye = sext i8 %y to i32 %a = add i32 %xe, %ye %r = lshr i32 %a, 1 %t = trunc i32 %r to i8 An AVGFLOORU is equivalent with zext. Because of the truncate lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes an extra rounding, so includes an extra add of 1. Differential Revision: https://reviews.llvm.org/D106237	2022-02-11 18:28:56 +00:00
Tim Northover	2ba06bed6b	Revert "StackProtector: ignore debug insts when splitting blocks." This reverts commit `7605ca85f1`. It caused an assertion failure in Fuschia.	2022-02-11 18:06:28 +00:00
Julien Pages	dcb2da13f1	[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16. Differential Revision: https://reviews.llvm.org/D110579	2022-02-11 12:08:23 -05:00
Tim Northover	7605ca85f1	StackProtector: ignore debug insts when splitting blocks. When deciding where to split a block to insert stack guard checks, we should move past any debug instructions we see that might (e.g.) be separating a tail call from its frame wrangling.	2022-02-11 10:13:50 +00:00
serge-sans-paille	06943537d9	Cleanup MCParser headers As usual with that header cleanup series, some implicit dependencies now need to be explicit: llvm/MC/MCParser/MCAsmParser.h no longer includes llvm/MC/MCParser/MCAsmLexer.h Preprocessed lines to build llvm on my setup: after: 1068185081 before: 1068324320 So no compile time benefit to expect, but we still get the looser coupling between files which is great. Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119359	2022-02-11 10:39:29 +01:00
Yuanfang Chen	f927021410	Reland "[clang-cl] Support the /JMC flag" This relands commit `b380a31de0`. Restrict the tests to Windows only since the flag symbol hash depends on system-dependent path normalization.	2022-02-10 15:16:17 -08:00
Yuanfang Chen	b380a31de0	Revert "[clang-cl] Support the /JMC flag" This reverts commit `bd3a1de683`. Break bots: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview	2022-02-10 14:17:37 -08:00
Reid Kleckner	64037afe01	[CodeView] Avoid integer overflow while parsing long version strings This came up on a funny vendor-provided version string that didn't have a standard dotted quad of numbers.	2022-02-10 13:52:11 -08:00
Yuanfang Chen	bd3a1de683	[clang-cl] Support the /JMC flag The introduction and some examples are on this page: https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/ The `/JMC` flag enables these instrumentations: - Insert at the beginning of every function immediately after the prologue with a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`. The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean global variable (the global variable is initialized to 1) with the name convention `__<hash>_<filename>`. All such global variables are placed in the `.msvcjmc` section. - The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping with a directory path. MSVC uses some unknown hashing function. Here I used DJB. - Add a dummy/empty COMDAT function `__JustMyCode_Default`. - Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link option via ".drectve" section. This is to prevent failure in case `__CheckForDebuggerJustMyCode` is not provided during linking. Implementation: All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch). Reviewed By: hans Differential Revision: https://reviews.llvm.org/D118428	2022-02-10 10:26:30 -08:00
Nikita Popov	6241f7dee0	[FastISel] Remove redundant reg class check (NFC) SrcVT and DstVT are the same in this branch, as such their register classes will also be the same.	2022-02-10 14:10:00 +01:00
Jeremy Morse	be5734ddaa	[DebugInfo][InstrRef] Don't fire assertions if debug-info is faulty It's inevitable that optimisation passes will fail to update debug-info: when that happens, it's best if the compiler doesn't crash as a result. Therefore, downgrade a few assertions / failure modes that would crash when illegal debug-info was seen, to instead drop variable locations. In practice this means that an instruction reference to a nonexistant or illegal operand should be tolerated. Differential Revision: https://reviews.llvm.org/D118998	2022-02-10 11:25:08 +00:00
Jay Foad	abda8d2229	[GlobalISel] CSE FP constants at -O0 At -O0 we claim to CSE constants only. I think this should apply to G_FCONSTANT as well as G_CONSTANT. Differential Revision: https://reviews.llvm.org/D119344	2022-02-10 09:17:11 +00:00
Reid Kleckner	b5a592a8e2	[DAG] Remove pointless std::function wrapper, NFC	2022-02-09 14:30:43 -08:00
Reid Kleckner	f63c150187	Revert "[DagCombine] Increase depth by number of operands to avoid a pathological compile time." Appears to be causing check-llvm to fail This reverts commit `49ab760090`.	2022-02-09 13:55:40 -08:00
Alina Sbirlea	49ab760090	[DagCombine] Increase depth by number of operands to avoid a pathological compile time. We're hitting a pathological compile-time case, profiled to be in DagCombiner::visitTokenFactor and many inserts into a SmallPtrSet. It looks like one of the paths around findBetterNeighborChains is not capped and leads to this. This patch resolves the issue. Looking for feedback if this solution looks reasonable. Differential Revision: https://reviews.llvm.org/D118877	2022-02-09 13:31:28 -08:00
Alexander Yermolovich	1be6ccfc02	[DWARF][codegen] Fix for Aranges when split inlining is present When we enable -fsplit-dwarf-inlining we end up with two entries in .debug_aranges for each CU. Because it processes Skeleton CU inline information and DWO CU. Furthermore address calculations were incorrect because we were processing sections in Skeleton CU. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D118857	2022-02-09 11:51:43 -08:00
Sander de Smalen	ec46232517	[DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)` This seems like an obvious fold, which leads to a few improvements. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118920	2022-02-09 14:30:01 +00:00
serge-sans-paille	ef736a1c39	Cleanup LLVMMC headers There's a few relevant forward declarations in there that may require downstream adding explicit includes: llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h Counting preprocessed lines required to rebuild llvm-project on my setup: before: 1052436830 after: 1049293745 Which is significant and backs up the change in addition to the usual benefits of decreasing coupling between headers and compilation units. Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119244	2022-02-09 11:09:17 +01:00
Bill Wendling	deaf22bc0e	[X86] Implement -fzero-call-used-regs option The "-fzero-call-used-regs" option tells the compiler to zero out certain registers before the function returns. It's also available as a function attribute: zero_call_used_regs. The two upper categories are: - "used": Zero out used registers. - "all": Zero out all registers, whether used or not. The individual options are: - "skip": Don't zero out any registers. This is the default. - "used": Zero out all used registers. - "used-arg": Zero out used registers that are used for arguments. - "used-gpr": Zero out used registers that are GPRs. - "used-gpr-arg": Zero out used GPRs that are used as arguments. - "all": Zero out all registers. - "all-arg": Zero out all registers used for arguments. - "all-gpr": Zero out all GPRs. - "all-gpr-arg": Zero out all GPRs used for arguments. This is used to help mitigate Return-Oriented Programming exploits. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110869	2022-02-08 17:42:54 -08:00
Mircea Trofin	2868c57caf	[nfc][mlgo][regalloc] Add the url to a reference pre-trained model	2022-02-08 16:57:24 -08:00
Matt Arsenault	5af0f097ba	GlobalISel: Constant fold G_PTR_ADD Some globals lower to literal addresses on AMDGPU. This may be wrong for non-integral address spaces. I'm wondering if we should just allow regular G_ADD to use pointer types, and reserve G_PTR_ADD for non-integral address spaces.	2022-02-08 19:21:06 -05:00
Matt Arsenault	2af4a554fe	GlobalISel: Constant fold FP bin ops in MIRBuilder Might as well handle these if we're going to handle the integer ops here.	2022-02-08 18:51:10 -05:00
Matt Arsenault	930f2498d4	GlobalISel: Constant fold integer min/max opcodes	2022-02-08 18:50:35 -05:00
Matt Arsenault	0877fbcc16	GlobalISel: Add FoldBinOpIntoSelect combine This will do the combine in cases that should fold, but don't now. e.g. we're relying on the CSEMIRBuilder's incomplete constant folding. For instance it doesn't handle FP operations or vectors (and we don't have separate constant folding combines either to catch them).	2022-02-08 18:17:21 -05:00
Mircea Trofin	5a50ab4d5c	[nfc][mlgo][regalloc] Stop warnings about unused function Added a `NoopSavedModelImpl` type which can be used as a mock AOT-ed saved model, and further minimize conditional compilation cases. This also removes unused function warnings on gcc.	2022-02-08 08:35:33 -08:00
Sanjay Patel	905abc5b7d	[SDAG] enable binop identity constant folds for fmul/fdiv The test diffs are identical to D119111. This only affects x86 currently because no other target has an override for the TLI hook that controls this transform.	2022-02-08 10:52:28 -05:00
Roman Lebedev	ae9414d562	[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply https://godbolt.org/z/js9fTTG9h ^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says unless we already knew that the operands were equal.	2022-02-08 18:35:29 +03:00
Sanjay Patel	a68e098024	[SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC This is no-functional-change-intended because only the x86 target enables the TLI hook currently. We can add fmul/fdiv opcodes to the switch similar to the proposal D119111, but we don't need to make other changes like enabling target-specific combines. We can also add integer opcodes (add, or, shl, etc.) to the switch because this function is called from all of the generic binary opcodes. The goal is to incrementally enable the profitable diffs from D90113 while avoiding regressions. Differential Revision: https://reviews.llvm.org/D119150	2022-02-08 09:55:05 -05:00
Sheng	76c83e747f	[GlobalISel] Add big endian support in CallLowering When splitting values, CallLowering assumes Lo part goes first. But in big endian ISA such as M68k, Hi part goes first. This patch fixes this. Differential Revision: https://reviews.llvm.org/D116877	2022-02-08 14:43:38 +00:00
Nikita Popov	924696d271	[AsmPrinter] Avoid pointer element type access Instead of checking for a bitcast from a function type, check whether the aliasee is a function after stripping bitcasts. This is not strictly equivalent, but serves the same purpose.	2022-02-08 15:06:02 +01:00
Simon Pilgrim	fd2bb51f1e	[ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask. This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments. I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these. Differential Revision: https://reviews.llvm.org/D119019	2022-02-08 12:04:13 +00:00
Carl Ritson	42ac4e1a12	[MachineLICM] Add shouldHoist method to TargetInstrInfo Add a shouldHoist method to TargetInstrInfo which is queried by MachineLICM to override hoisting decisions for a given target. This mirrors functionality provided by shouldSink. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D118773	2022-02-08 15:53:05 +09:00
Sheng	146c7820d9	[GlobalISel][Legalizer] Support reducing load/store width in big endian order	2022-02-07 20:06:17 -05:00
Sanjay Patel	d1ecfaa097	[SDAG] try to fold one-demanded-bit-of-multiply This is a translation of the transform added to InstCombine with: D118539	2022-02-07 17:24:35 -05:00
Sanjay Patel	fc6bee1c11	[SDAG] SimplifyDemandedBits - generalize fold for 2 LSB of X*X This is translated from recent changes to the IR version of this function: D119060 D119139	2022-02-07 15:38:50 -05:00
Vang Thao	570471199b	[AMDGPU] Fix debug values in scheduler not placed correctly when reverting Debug position data is cleared after ScheduleDAGMILive::schedule() due to it also calling placeDebugValues(). Make it so the data is not cleared after initial call to placeDebugValues since we will call it again after reverting a schedule. Secondly, since we skip debug instructions when reverting the schedule on AMDGPU, all debug instructions are now moved to the end of the scheduling region. RegionEnd points to the beginning of this chunk of debug instructions since it was not incremented when a debug instruction was skipped. RegionBegin may also point to the same debug instruction if Unsched.front() is a debug instruction thus shrinking the region to 1. Fix RegionBegin and RegionEnd so that they point to the current beginning and ending before calling placeDebugValues() since both vars will be used as reference points to move debug instructions back. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D119022	2022-02-07 11:01:13 -08:00
Simon Pilgrim	74555fd367	[DAG] visitINSERT_VECTOR_ELT - break if-else chain as they both return (style). NFC.	2022-02-07 09:58:47 +00:00
Simon Pilgrim	5d3a86489f	[GlobalISel] Move getOpcode() calls inside assert() to avoid (void)s. NFC. Tidier solution to the unused variable warnings - we already do this in other places in this file.	2022-02-07 09:50:27 +00:00
Djordje Todorovic	def10a2895	[GlobalIsel] Fix another "unused variable" warning	2022-02-07 09:32:22 +01:00
Djordje Todorovic	eab395fa40	Fix the warning after D118805 A variable was used within assert() only.	2022-02-07 09:25:02 +01:00
Craig Topper	c35ccd2ac8	[DAGCombiner][RISCV] Allow rotates by non-constant to be matched for i32 on riscv64 with Zbb. rv64izbb has a RORW/ROLW instructions that operate on the lower 32-bits of a 64-bit value and sign extend bit 31 of the result. DAGCombiner won't match rotate idioms because the i32 type isn't Legal on riscv64. This patch teaches DAGCombiner to allow it if the type is going to be promoted and the target has Custom type legalization for ISD::ROTL or ISD::ROTR. I've restricted this to scalar types. It doesn't appear any in tree targets other than riscv64 have custom type legalization for rotates. If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR after type legalization, but I'd like to avoid that if possible. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119062	2022-02-06 10:58:12 -08:00
Kazu Hirata	3a8c51480f	[CodeGen] Use = default (NFC) Identified with modernize-use-equals-default	2022-02-06 10:54:44 -08:00
Bjorn Pettersson	cecf11c315	[DAGCombiner] Fold SSHLSAT/USHLSAT to SHL when no saturation will occur When the shift amount is known and a known sign bit analysis of the shiftee indicates that no saturation will occur, then we can replace SSHLSAT/USHLSAT by SHL. Differential Revision: https://reviews.llvm.org/D118765	2022-02-06 18:59:06 +01:00
Rong Xu	52d981a4c1	[SampleFDO] Enable FSAFDO loading passes if --enable-fs-discriminator is enabled FSAFDO profile loader is currently disabled even --enable-fs-discriminator is enabled. They need to be turned on by options which makes it cumbersome for experiments. This patch changes the FSAFDO profile loader enabled by default. Since they are guarded by EnableFSDiscriminator, they will only be turned on if --enable-fs-discriminator is enabled. Note that --enable-fs-discriminator is still disabled by default. Differential Revision: https://reviews.llvm.org/D119033	2022-02-05 22:37:09 -08:00
Benjamin Kramer	a40dc4eaf8	Simplify mask creation with llvm::seq. NFCI.	2022-02-05 23:35:41 +01:00
Sander de Smalen	6452549f30	[DAGCombiner] Fold vecreduce_or/and if operand is insert_subvector. Fold: vecreduce_or(insert_subvec(zeroinitializer, vec)) -> vecreduce_or(vec) vecreduce_and(insert_subvec(allones, vec)) -> vecreduce_and(vec) vecreduce_and/or(insert_subvec(undef, vec)) -> vecreduce_and/or(vec) This is useful for SVE which uses insert/extract subvector to convert fixed-width to/from scalable vectors. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D118919	2022-02-05 14:35:53 +00:00
Hongtao Yu	dee058c670	[CSSPGO] Turn on ext-tsp by default for CSSPGO. I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119048	2022-02-04 19:46:44 -08:00
Róbert Ágoston	cd4ed08b5a	[GlobalISel] Don't combine instructions which are fed by memory instructions using different size Memory instructions like extending loads from the same address are not equal if their size is not equal. This fixes https://github.com/llvm/llvm-project/issues/53524. Differential Revision: https://reviews.llvm.org/D118805	2022-02-04 15:00:47 -08:00
John Brawn	0d8092dd48	[AArch64] Fix legalization of v1f64 strict_fsetcc and strict_fsetccs These operations are scalarized but the result type v1i1 isn't which needs special handling (the same as is done for the non-strict versions of these operations). Differential Revision: https://reviews.llvm.org/D118258	2022-02-04 12:55:38 +00:00
serge-sans-paille	ffe8720aa0	Reduce dependencies on llvm/BinaryFormat/Dwarf.h This header is very large (3M Lines once expended) and was included in location where dwarf-specific information were not needed. More specifically, this commit suppresses the dependencies on llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used, this has a decent impact on number of preprocessed lines generated during compilation of LLVM, as showcased below. This is achieved by moving some definitions back to the .cpp file, no performance impact implied[0]. As a consequence of that patch, downstream user may need to manually some extra files: llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h In some situations, codes maybe relying on the fact that llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden dependency now needs to be explicit. $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l after: 10978519 before: 11245451 Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup [0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions Differential Revision: https://reviews.llvm.org/D118781	2022-02-04 11:44:03 +01:00
Bjorn Pettersson	3db39e7479	[DAGCombiner] Fix dependency analysis in checkMergeStoreCandidatesForDependencies In the aftermath of D116895 a problem was found in the analysis of dependencies between store merge candidates in checkMergeStoreCandidatesForDependencies, that is needed to avoid the cycles are introduced in the DAG. In the past it has been enough (or assumed to be enough) to start scanning from non-chain operands when analysing the store merge candidates for dependencies, assuming that the analysis of chain dependencies performed when finding the candidates would cover up for potential dependencies that exist involving the chain operands. It was however discovered that one could end up with scenarios such as descibed in the aarch64-checkMergeStoreCandidatesForDependencies.ll test case, when the dependency between two stores is given by a mix of chain operand dependencies and non-chain operand dependencies. The fix in this patch make sure that we also account for chain operand dependencies when doing the more elaborate analysis in checkMergeStoreCandidatesForDependencies, no longer relying on that the earlier check involving chain operands is enough. Differential Revision: https://reviews.llvm.org/D118943	2022-02-04 08:53:01 +01:00
Mircea Trofin	91a33ad32b	[nfc][mlgo][regalloc] Cache live interval feature components Lazily cache the feature components of a LiveInterval. Differential Revision: https://reviews.llvm.org/D118674	2022-02-03 17:01:42 -08:00
Jessica Paquette	9a61e731ff	[GlobalISel] Combine (G_ADDO x, 0) -> x + no carry out Similar to the G_MULO change. The code for checking if a constant is legal/pre-legalize is shared between these, and is kind of hairy. So, factor it out into a new function: `isConstantLegalOrBeforeLegalizer`. To make the refactoring clean, further refactor `isLegalOrBeforeLegalizer` into a wrapper for two functions: - `isPreLegalize` - `isLegal` This is a bit easier to read in general. https://godbolt.org/z/KW7oszP1o Differential Revision: https://reviews.llvm.org/D118655	2022-02-03 14:25:15 -08:00
Jessica Paquette	c636899dc1	[GlobalISel] Combine: (G_MULO x, 0) -> 0 + no carry out Similar to the following combine in `DAGCombiner::visitMULO`: ``` // fold (mulo x, 0) -> 0 + no carry out if (isNullOrNullSplat(N1)) return CombineTo(N, DAG.getConstant(0, DL, VT), DAG.getConstant(0, DL, CarryVT)); ``` This fixes some generally poor codegen for `mulo`: https://godbolt.org/z/eTxYsvz8f Differential Revision: https://reviews.llvm.org/D118635	2022-02-03 14:23:58 -08:00
Mircea Trofin	592f52de33	[nfc][regalloc] const LiveIntervals within the allocator Once built, LiveIntervals are immutable. This patch captures that. Differential Revision: https://reviews.llvm.org/D118918	2022-02-03 12:35:36 -08:00
Bjorn Pettersson	0352ee1a22	[CodeGenPrepare] Avoid out-of-bounds shift AddressingModeMatcher::matchOperationAddr may attempt to shift a variable by the same amount of steps as found in the IR in a SHL instruction. This was done without considering that there could be undefined behavior in the IR, so the shift performed when compiling could end up having undefined behavior as well. This patch avoid UB in the codegenprepare by making sure that we limit the shift amount used, in a similar way as already being done in CodeGenPrepare::optimizeLoadExt. Differential Revision: https://reviews.llvm.org/D118602	2022-02-03 21:03:58 +01:00
Mircea Trofin	79b98f0a07	Revert "[nfc][mlgo] De-const a parameter" This reverts commit `bc3b372161`. The planned change that would have needed non-const MachineFunction refs isn't needed after all.	2022-02-03 09:20:36 -08:00
John Brawn	94843ea7d7	[AArch64] Make machine combiner patterns preserve MIFlags This is mainly done so that we don't lose the nofpexcept flag once we start emitting it. Differential Revision: https://reviews.llvm.org/D118621	2022-02-03 11:58:59 +00:00
Sander de Smalen	01bfe9729a	[ISEL] Canonicalize STEP_VECTOR to LHS if RHS is a splat. This helps recognise patterns where we're trying to match STEP_VECTOR patterns to INDEX instructions that take a GPR for the Start/Step. The reason for canonicalising this operation to the LHS is because it will already be canonicalised to the LHS if the RHS is a constant splat vector. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D118459	2022-02-03 09:31:46 +00:00
Jeremy Morse	4654fa89ea	Follow up to `6e03a68b77`, squelch another leak This patch is a sticking-paster until D118774 solves the situation with unique_ptrs. I'm certainly wishing I'd focused on that first X_X.	2022-02-02 21:02:11 +00:00
Jeremy Morse	6e03a68b77	[DebugInfo] Re-enable instruction referencing for x86_64 After discussion in D116821 this was turned off in `74db5c8c95`, `14aaaa1236` applied to limit the maximum memory consumption in rare conditions, plus some performance patches.	2022-02-02 19:41:59 +00:00
Matt Arsenault	a96dbb9035	CodeGen: Use asm register names in warning message This was using the ugly tablegenerated register enum names, which are really hideous for register tuples on AMDGPU. Use the prettier names which are recognized by the asm parser.	2022-02-02 14:20:12 -05:00
Jeremy Morse	206cafb680	Follow up to `9fd9d56dc6`, avoid a memory leak Gaps in the basic block number range (from blocks being deleted or folded) get block-value-tables allocated but never ejected, leading to a memory leak, currently tripping up the asan buildbots. Fix this up by manually freeing that memory. As suggested elsewhere, if these things were owned by a unique_ptr then cleanup would happen automagically. D118774 should eliminate the need for this dance.	2022-02-02 16:01:11 +00:00
Masoud Ataei	256d253332	[PowerPC] Scalar IBM MASS library conversion pass This patch introduces the conversions from math function calls to MASS library calls. To resolves calls generated with these conversions, one need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX. Differential: https://reviews.llvm.org/D101759 Reviewer: bmahjour	2022-02-02 07:54:19 -08:00
Mircea Trofin	660ff655c8	Fix buildbreak introduced in `ed2deab595`	2022-02-02 07:34:51 -08:00
Mircea Trofin	ed2deab595	[nfc][regalloc] Make the max inference cutoff configurable Added a flag to make configurable the number of interferences after which we 'bail out' and treat a set of intervals as un-evictable. Also using it on the ML side, as it turns out to be a good control for compile-time. With this configurable, we can do a bit of trial and error and see if bumping it has any effect on heuristic/policy quality. Differential Revision: https://reviews.llvm.org/D118707	2022-02-02 07:29:34 -08:00
Jeremy Morse	43de305704	[DebugInfo][InstrRef] Fix a tombstone-in-DenseMap crash from D117877 This is a follow-up to D117877: variable assignments of DBG_VALUE $noreg, or DBG_INSTR_REFs where no value can be found, are represented by a DbgValue object with Kind "Undef", explicitly meaning "there is no value". In D117877 I added a special-case to some assignment accounting faster, without considering this scenario. It causes variables to be given the value ValueIDNum::EmptyValue, which then ends up being a DenseMap key. The DenseMap asserts, because EmptyValue is the tombstone key. Fix this by handling the assign-undef scenario in the special case, to match what happens in the general case: the variable has no value if it's only ever assigned $noreg / undef. Differential Revision: https://reviews.llvm.org/D118715	2022-02-02 15:08:49 +00:00
Jeremy Morse	9fd9d56dc6	[DebugInfo][InstrRef][NFC] Use depth-first scope search for variable locs This patch aims to reduce max-rss from instruction referencing, by avoiding keeping variable value information in memory for too long. Instead of computing all the variable values then emitting them to DBG_VALUE instructions, this patch tries to stream the information out through a depth first search: * Make use of the fact LexicalScopes gives a depth-number to each lexical scope, * Produce a map that identifies the last lexical scope to make use of a block, * Enumerate each scope in LexicalScopes' DFS order, solving the variable value problem, * After each scope is processed, look for any blocks that won't be used by any other scope, and emit all the variable information to DBG_VALUE instructions. Differential Revision: https://reviews.llvm.org/D118460	2022-02-02 14:09:54 +00:00
Jeremy Morse	a80181a81e	[DebugInfo][InstrRef][NFC] Free resources at an earlier stage This patch releases some memory from InstrRefBasedLDV earlier that it would otherwise. The underlying problem is: * We store a big table of "live in values for each block", * We translate that into DBG_VALUE instructions in each block, And both exist in memory at the same time, which needlessly doubles that information. The most of what this patch does is: as we progressively translate live-in information into DBG_VALUEs, we free the variable-value / machine-value tracking information as we go, which significantly reduces peak memory. While I'm here, also add a clear method to wipe variable assignments that have been accumulated into VLocTracker objects, and turn a DenseMap into a SmallDenseMap to avoid an initial allocation. Differential Revision: https://reviews.llvm.org/D118453	2022-02-02 12:58:15 +00:00
Jeremy Morse	d556eb7e27	[DebugInfo][InstrRef][NFC] Cache some PHI resolutions Install a cache of DBG_INSTR_REF -> ValueIDNum resolutions, for scenarios where the value has to be reconstructed from several DBG_PHIs. Whenever this happens, it's because branch folding + tail duplication has messed with the SSA form of the program, and we have to solve a mini SSA problem to find the variable value. This is always called twice, so it makes sense to cache the value. This gives a ~0.5% geomean compile-time-performance improvement on CTMark. Differential Revision: https://reviews.llvm.org/D118455	2022-02-02 12:21:28 +00:00
Simon Pilgrim	5aa2acc86b	[DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.	2022-02-02 12:04:49 +00:00
Jeremy Morse	14aaaa1236	Re-apply `3fab2d138e`, now with a triple added Was reverted in `1c1b670a73` as it broke all non-x86 bots. Original commit message: [DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out In certain circumstances with things like autogenerated code and asan, you can end up with thousands of Values live at the same time, causing a large working set and a lot of information spilled to the stack. Unfortunately InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory when there are many many stack slots. See the reproducer in D116821. It seems very unlikely that a developer would be able to reason about hundreds of live named local variables at the same time, so a huge working set and many stack slots is an indicator that we're likely analysing autogenerated or instrumented code. In those cases: gracefully degrade by setting an upper bound on the amount of stack slots to track. This limits peak memory consumption, at the cost of dropping some variable locations, but in a rare scenario where it's unlikely someone is actually going to use them. In terms of the patch, this adds a cl::opt for max number of stack slots to track, and has the stack-slot-numbering code optionally return None. That then filters through a number of code paths, which can then chose to not track a spill / restore if it touches an untracked spill slot. The added test checks that we drop variable locations that are on the stack, if we set the limit to zero. Differential Revision: https://reviews.llvm.org/D118601	2022-02-02 11:04:00 +00:00
Sam Parker	281d29b8fe	[TypePromotion] Avoid some unnecessary truncs Check for legal zext 'sinks' before inserting a trunc. Differential Revision: https://reviews.llvm.org/D115451	2022-02-02 10:05:15 +00:00
Simon Moll	7d926b7177	[VE] LEGALAVL and staged VVP legalization The new LEGALAVL node annotates that the AVL refers to packs of 64bit. We use a two-stage lowering approach with LEGALAVL: First, standard SDNodes are translated into illegal VVP layer nodes. Regardless of source (VP or standard), all VVP nodes have a mask and AVL parameter. The AVL parameter refers to the element position (just as in VP intrinsics). Second, we legalize the AVL usage in VVP layer nodes. If the element size is < 64bit, the EVL parameter has to be adjusted to refer to packs of 64bits. We wrap the legalized AVL in a LEGALAVL node to track this. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D118321	2022-02-02 09:11:41 +01:00
Kevin Athey	1c1b670a73	Revert "[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out" This reverts commit `3fab2d138e`. Breaking PPC sanitizer build: https://lab.llvm.org/buildbot/#/builders/105/builds/20857	2022-02-01 18:37:02 -08:00
David Blaikie	f69f23396d	Revert "DebugInfo: Don't put types in type units if they reference internal linkage types" This reverts commit `ab4756338c`. Breaks some cases, including this: namespace { template <typename> struct a {}; } // namespace class c { c(); }; class b { b(); a<c> ax; }; b::b() {} c::c() {} By producing a reference to a type unit for "c" but not producing the type unit.	2022-02-01 16:13:07 -08:00
David Green	c89cfbd4dd	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This reverts commit `100763a88f` as it was making incorrect assumptions about implicit zero_extends.	2022-02-01 20:18:40 +00:00
Jeremy Morse	8e75536e51	[DebugInfo][InstrRef][NFC] Bypass a frequently-noop loop Bypass this loop if it would do nothing -- if there are no register masks to be examined, there's no point looking at each location to see if the location has been def'd. Awkwardly, this was responsible for almost an entire half a percent of performance improvement on CTMark. Differential Revision: https://reviews.llvm.org/D118613	2022-02-01 19:39:09 +00:00
Jeremy Morse	3fab2d138e	[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out In certain circumstances with things like autogenerated code and asan, you can end up with thousands of Values live at the same time, causing a large working set and a lot of information spilled to the stack. Unfortunately InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory when there are many many stack slots. See the reproducer in D116821. It seems very unlikely that a developer would be able to reason about hundreds of live named local variables at the same time, so a huge working set and many stack slots is an indicator that we're likely analysing autogenerated or instrumented code. In those cases: gracefully degrade by setting an upper bound on the amount of stack slots to track. This limits peak memory consumption, at the cost of dropping some variable locations, but in a rare scenario where it's unlikely someone is actually going to use them. In terms of the patch, this adds a cl::opt for max number of stack slots to track, and has the stack-slot-numbering code optionally return None. That then filters through a number of code paths, which can then chose to not track a spill / restore if it touches an untracked spill slot. The added test checks that we drop variable locations that are on the stack, if we set the limit to zero. Differential Revision: https://reviews.llvm.org/D118601	2022-02-01 19:25:29 +00:00
Jeremy Morse	91fb66cf91	[DebugInfo][InstrRef][NFC] Don't build a map of un-needed values When finding locations for variable values at the start of a block, we build a large map of every value to every location, and then pick out the locations for values that are desired. This takes up quite a lot of time, because, unsurprisingly, there are usually more values in registers and stack slots than there are variables. This patch instead creates a map of desired values to their locations, which are initially illegal locations. Then, as we examine every available value, we can select locations for values we care about, and ignore those that we don't. This substantially reduces the amount of work done (i.e., building a map up of values to locations that nothing wants or needs). Geomean performance improvement of 1% on CTMark, woo. Differential Revision: https://reviews.llvm.org/D118597	2022-02-01 18:58:06 +00:00
Mircea Trofin	22d3bbdf4e	[nfc][regalloc] Move DefaultEvictionAdvisor::* to RegAllocEvictionAdvisor.cpp This is leftover from the advisor refactoring. Straight-forward copy and paste.	2022-02-01 07:59:25 -08:00
Simon Pilgrim	904395ab8f	[DAG] SimplifyMultipleUseDemandedBits - add default Depth = 0 argument. Simplifies an upcoming change.	2022-02-01 12:34:38 +00:00
Simon Pilgrim	d83a96f59f	[DAG] Make it clear mul(x,x) knownbits bit[1] == 0 check should be for x is undef only As raised on rGffd0e464b4b9, if x is poison, this fold is still ok.	2022-02-01 11:32:14 +00:00
Bjorn Pettersson	3885879046	[DAGCombine] Add simple folds for SSHLSAT/USHLSAT Do "simplifyShift" and "FoldConstantArithmetic" folds for the SSHLSAT and USHLSAT DAG nodes. This includes folds such as: (shlsat undef/poison, x) -> 0 (shlsat x, undef/poison) -> undef (shlsat x, too_large_shamt) -> undef (shlsat 0, x) -> 0 (shlsat x, 0) -> x (shlsat c1, c2) -> c3 Differential Revision: https://reviews.llvm.org/D118603	2022-02-01 10:51:35 +01:00
David Sherwood	daa80339df	[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors I have updated TargetLowering::isConstTrueVal to also consider SPLAT_VECTOR nodes with constant integer operands. This allows the optimisation to also work for targets that support scalable vectors. Differential Revision: https://reviews.llvm.org/D117210	2022-02-01 09:50:00 +00:00
Mircea Trofin	a3f1491849	[nfc][mlgo][regalloc] 'hasPreferredPhys' out of feature components It isn't cacheable, it can be updated by other events than live interval resizing.	2022-01-31 18:59:47 -08:00
Mircea Trofin	9aa2c914b9	[mlgo][regalloc] Factor live interval feature calculation Factoring it out so we can subsequently cache it. This should be a NFC, however, for the float quantities, we see small errors in the least significant digits. This is because, before, we were summing up one by one. Now, we sum up results of sums. This shouldn't matter for ML, and will require rework when we do quantization (avoiding floats altogether), but meanwhile, it did require an update to the reference file used for testing. The patch also bumps the precision of the variables involved in this, to reduce the error (note they are casted back to float at the end by the SET macro, since we only work with float and not double in TF) Differential Revision: https://reviews.llvm.org/D118659	2022-01-31 15:19:15 -08:00
Mircea Trofin	d46305e22d	[NFC][regalloc] Move evict advisor initialization before VRAI This is because a subsequent patch will propose obtaining the VRAI from the advisor, which will enable feature caching for the ML advisor, for better compile time. Making this change first as it's both innocuous and keeps the future patch to be reviewed small.	2022-01-31 14:04:59 -08:00
Mircea Trofin	bc3b372161	[nfc][mlgo] De-const a parameter We plan to pass the MachineFunction& to APIs that expect it non-const (for legitimate reasons). The advisor still holds the ref as a const ref, though, so we keep most of the maintainability value of that.	2022-01-31 13:44:33 -08:00
Philip Reames	57cf29ac1b	[Statepoint] Remove another use of getActualReturnType [NFC] For the cross block gc.result projection case, we only care about the return type if there is a cross block gc.result, and if there is one, we can take the type from the gc.result. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:57:46 -08:00
Adrian Prantl	f85c6b79f3	Fix a fragment overflow problem when composing super-registers. Addresses https://github.com/llvm/llvm-project/issues/53342 Differential Revision: https://reviews.llvm.org/D118412	2022-01-31 09:47:29 -08:00
Philip Reames	6e4f7c0823	[Statepoints] Take result type from gc.result [NFC] When lowering a gc.result, we can assume that the result type of the gc.result matches the type of the underlying call. This is explicitly required in LangRef. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:42:34 -08:00
Philip Reames	093b43f48d	Sink getGCResultLocality to sole use [NFC]	2022-01-31 09:33:57 -08:00
Jeremy Morse	4a2cb01370	[DebugInfo][InstrRef][NFC] Refactor ahead of further optimisations This patch shuffles some functions around so that some blocks of code can be reused. In particular, * Move the determination of "which blocks are in scope" to its own function, as it's non-trivial to solve. Delete the "InScopeBlocks" collection too, which nothing reads from. * Split transfer emission (i.e., installing DBG_VALUEs into blocks) into its own function. * Name some useful types. * Rename "ScopeToBlocks" to "ScopeToAssignBlocks", as that's what the collection contains, blocks where assignments happen. Differential Revision: https://reviews.llvm.org/D118454	2022-01-31 16:45:53 +00:00
Jeremy Morse	e9739f116d	Revert "[DebugInfo][InstrRef][NFC] Add a missing assignment operator" This reverts commit `f18429372f`. Bitten by -Werror,-Wdeprecated-copy on a buildbot, alas!	2022-01-31 16:15:21 +00:00
Jeremy Morse	f18429372f	[DebugInfo][InstrRef][NFC] Add a missing assignment operator ValueIDNum is supposed to be a value type that boils down to a uint64_t, that has some bitfields for convenience. If we use the default operator=, we end up with each bit field being individually assigned, which is un-necessarily slow. Implement the assignment operator by just copying the uint64_t value of the object. This is quicker, and matches how the comparison operators work already. Doing so is 0.1% faster on the compile-time-tracker.	2022-01-31 16:08:38 +00:00
Kerry McLaughlin	002b944dfa	[SVE] Fix TypeSize->uint64_t implicit conversion in visitAlloca() Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca() when we call this function for a scalable alloca instruction, caused by the implicit conversion of TySize to uint64_t. This patch changes TySize to a TypeSize as returned by getTypeAllocSize() and ensures the allocation size is multiplied by vscale for scalable vectors. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D118372	2022-01-31 14:37:23 +00:00
Dávid Bolvanský	ae990a3cbd	[Analysis] Attribute noundef should not prevent tail call optimization Very similar to https://reviews.llvm.org/D101230 Fixes https://github.com/llvm/llvm-project/issues/53501	2022-01-31 15:13:52 +01:00
Simon Pilgrim	7ec8fc2932	[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements. This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.	2022-01-31 13:58:00 +00:00
Jeremy Morse	c703d77a61	[DebugInfo][InstrRef] Don't fully propagate single assigned variables If we only assign a variable value a single time, we can take a short-cut when computing its location: the variable value is only valid up to the dominance frontier of where the assignemnt happens. Past that point, there are other predecessors from where the variable has no value, meaning the variable has no location past that point. This patch recognises this scenario, and avoids expensive SSA computation, to improve compile-time performance. Differential Revision: https://reviews.llvm.org/D117877	2022-01-31 12:54:17 +00:00
Simon Pilgrim	2d1390efbe	[DAG] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero	2022-01-31 12:00:51 +00:00
Simon Pilgrim	48f45f6b25	[X86] Limit mul(x,x) knownbits tests with not undef/poison check We can only assume bit[1] == zero if its the only demanded bit or the source is not undef/poison	2022-01-31 11:55:10 +00:00
Fangrui Song	0e691aed7e	[mlgo][regalloc] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after `a8a7bf922c`	2022-01-30 15:18:30 -08:00
Mircea Trofin	a8a7bf922c	[mlgo][regalloc] Fix register masking If AllocationOrder has less than 32 elements, we were treating the extra positions as if they were valid. This was detected by a subsequent assert. The fix also tightens the asserts.	2022-01-30 14:59:08 -08:00
Markus Böck	e0b11c7659	[Support][NFC] Fix generic `ChildrenGetterTy` of `IDFCalculatorBase` Both IDFCalculatorBase and its accompanying DominatorTreeBase only supports pointer nodes. The template argument is the block type itself and any uses of GraphTraits is therefore done via a pointer to the node type. However, the ChildrenGetterTy type of IDFCalculatorBase has a use on just the node type instead of a pointer to the node type. Various parts of the monorepo has worked around this issue by providing specializations of GraphTraits for the node type directly, or not been affected by using specializations instead of the generic case. These are unnecessary however and instead the generic code should be fixed instead. An example from within Tree is eg. A use of IDFCalculatorBase in InstrRefBasedImpl.cpp. It basically instantiates a IDFCalculatorBase<MachineBasicBlock, false> but due to the bug above then goes on to specialize GraphTraits<MachineBasicBlock> although GraphTraits<MachineBasicBlock*> exists (and should be used instead). Similar dead code exists in clang which defines redundant GraphTraits to work around this bug. This patch fixes both the original issue and removes the dead code that was used to work around the issue. Differential Revision: https://reviews.llvm.org/D118386	2022-01-30 22:09:07 +01:00
Kazu Hirata	2bea207d26	[CodeGen] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-01-30 12:32:51 -08:00
Mircea Trofin	bc5644ee74	[MLGO] Regalloc: allow multiple occurences of -regalloc-enable-advisor This allows scearios where some central config sets it one way and a user wants to override it.	2022-01-29 09:00:52 -08:00
Fangrui Song	33b38339a0	[lld] Add module name to LTO inline asm diagnostic Close #52781: for LTO, the inline asm diagnostic uses `<inline asm>` as the file name (lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp) and it is unclear which module has the issue. With this patch, we will see the module name (say `asm.o`) before `<inline asm>` with ThinLTO. ``` % clang -flto=thin -c asm.c && myld.lld asm.o -e f ld.lld: error: asm.o <inline asm>:1:2: invalid instruction mnemonic 'invalid' invalid ^~~~~~~ ``` For regular LTO, unfortunately the original module name is lost and we only get ld-temp.o. Reviewed By: #lld-macho, ychen, Jez Ng Differential Revision: https://reviews.llvm.org/D118434	2022-01-28 11:32:42 -08:00
Cullen Rhodes	5d089d9a83	[DAGCombiner] Fix invalid size request in combineRepeatedFPDivisors If we have a vector FP division with a splatted divisor, use getVectorMinNumElements when scaling the num of uses by splat factor. For AArch64 the combine kicks in for the <vscale x 4 x float> case since it's above the fdiv threshold (3) when scaling num uses by splat factor, but the codegen is worse (splat + vector fdiv + vector fmul) than the <vscale x 2 x double> case (splat + vector fdiv). If the combine could be converted into a scalar FP division by scalarizeBinOpOfSplats it may be cheaper, but it looks like this is predicated on the isExtractVecEltCheap TLI function which is implemented for x86 but not AArch64. Perhaps for now combineRepeatedFPDivisors should only scale num uses by splat if the division can be converted into scalar op. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118343	2022-01-28 17:01:08 +00:00
Jeremy Morse	76fd78b4b3	[MVerifier] Don't check liveness of any debug instruction operands Shiny new DBG_PHI instruction usually have physical registers as operands -- however, the machine verifier checks to see whether they're live, and occasionally this fails. There's a filter for DBG_VALUE instructions to not get verified in this way: expand it to exempt all debug instructions from liveness checking, which means DBG_PHIs get treated like DBG_VALUEs. This also future proofs against us adding new debug instructions. Differential Revision: https://reviews.llvm.org/D117891	2022-01-28 15:04:54 +00:00
Martin Storsjö	f7d2afbac9	[CodeGen] Emit COFF symbol type for function aliases On the level of the generated object files, both symbols (both original and alias) are generally indistinguishable - both are regular defined symbols. But previously, only the original function had the COFF ComplexType set to IMAGE_SYM_DTYPE_FUNCTION, while the symbol created via an alias had the type set to IMAGE_SYM_DTYPE_NULL. This matches what GCC does, which emits directives for setting the COFF symbol type for this kind of alias symbol too. This makes a difference when GNU ld.bfd exports symbols without dllexport directives or a def file - it seems to decide between function or data exports based on the COFF symbol type. This means that functions created via aliases, like some C++ constructors, are exported as data symbols (missing the thunk for calling without dllimport). The hasnt been an issue when doing the same with LLD, as LLD decides between function or data export based on the flags of the section that the symbol points at. This should fix the root cause of https://github.com/msys2/MINGW-packages/issues/10547. Differential Revision: https://reviews.llvm.org/D118328	2022-01-28 13:06:16 +02:00
Ellis Hoag	11d3074267	[InstrProf] Add single byte coverage mode Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions * We use a single byte per function rather than 8 bytes per block The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code. When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%). [0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D116180	2022-01-27 17:38:55 -08:00
Simon Pilgrim	fdd3e2c943	[DAG] SelectionDAG::getNode(N1,N2) - detect N2 constant vector splats as well as scalars We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage). In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests. I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants. Differential Revision: https://reviews.llvm.org/D118264	2022-01-27 10:59:08 +00:00
Fraser Cormack	84e85e025e	[SelectionDAG][VP] Provide expansion for VP_MERGE This patch adds support for expanding VP_MERGE through a sequence of vector operations producing a full-length mask setting up the elements past EVL/pivot to be false, combining this with the original mask, and culminating in a full-length vector select. This expansion should work for any data type, though the only use for RVV is for boolean vectors, which themselves rely on an expansion for the VSELECT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118058	2022-01-27 09:00:41 +00:00
Adrian Prantl	ee72b17386	Fix UB in DwarfExpression::emitLegacyZExt() A shift-left > 63 triggers a UBSAN failure. This patch kicks the can down the road (to the consumer) by emitting a more compact representation of the shift computation in DWARF expressions. Relanding (I accidentally pushed an earlier version of the patch previously). Differential Revision: https://reviews.llvm.org/D118183	2022-01-26 13:08:35 -08:00
Adrian Prantl	f400a6012c	Revert "Fix UB in DwarfExpression::emitLegacyZExt()" This reverts commit `216002c4bb` while investigating bot breakage.	2022-01-26 12:46:07 -08:00
Matt Arsenault	2d670de84c	GlobalISel: Avoid crash on asm with lying result types The physical register in the asm has the wrong type for the declared IR. It seems to work in the DAG by extracting the 4 elements that are defined in the IR from the register, but that isn't handled here. This doesn't seem to be a well tested path since other mismatched cases are crashing the DAG asm handling.	2022-01-26 15:23:59 -05:00
Adrian Prantl	216002c4bb	Fix UB in DwarfExpression::emitLegacyZExt() A shift-left > 63 triggers a UBSAN failure. This patch kicks the can down the road (to the consumer) by emitting a more compact representation of the shift computation in DWARF expressions. Differential Revision: https://reviews.llvm.org/D118183	2022-01-26 10:57:11 -08:00
Chih-Ping Chen	28bfa57a73	[DebugInfo] Add stringLocationExp field to DIStringType DIStringType is used to encode the debug info of a character object in Fortran. A Fortran deferred-length character object is typically implemented as a pair of the following two pieces of info: An address of the raw storage of the characters, and the length of the object. The stringLocationExp field contains the DIExpression to get to the raw storage. This patch also enables the emission of DW_AT_data_location attribute in a DW_TAG_string_type debug info entry based on stringLocationExp in DIStringType. A test is also added to ensure that the bitcode reader is backward compatible with the old DIStringType format. Differential Revision: https://reviews.llvm.org/D117586	2022-01-26 11:56:57 -05:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
Sanjay Patel	63daea8b35	[SDAG] fix bug in ComputeNumSignBits of target constant The loop below the changed line assumes that the element width of the target constant is the same as the element width of the loaded value, but that is not always true. We could try harder to do some kind of min/max calc even if the sizes don't match, but that can be another patch if needed. This fixes #53401 (miscompile) and does not change the motivating cases added when this analysis was introduced: `ad298f86b7`	2022-01-26 10:22:41 -05:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
Sebastian Neubauer	4723f3cf03	[AMDGPU][GlobalISel] Combine unmerge of undef Fold (unmerge undef) -> undef, undef, ... Differential Revision: https://reviews.llvm.org/D118138	2022-01-26 12:30:36 +01:00
David Green	57356d6bb7	[DAG] Create fptoui.sat from clamped fptoui This is the unsigned variant of D111976, where we convert a clamped fptoui to a fptoui.sat. Because we are unsigned, the condition this time is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D114964	2022-01-26 08:37:44 +00:00
wangpc	8597458278	[regalloc] Fix assertion error when LiveInterval is empty When evicting interference, it causes an asseertion error since LiveIntervals::intervalIsInOneMBB assumes that input is not empty. This patch fixed bug mentioned in D118020. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D118124	2022-01-26 14:06:57 +08:00
Adrian Prantl	3efa016d4c	Revert accidentally pushed commit. It was not yet reviewed. "Fix UB in DwarfExpression::emitLegacyZExt()" This reverts commit `e37de5d36e`.	2022-01-25 13:53:14 -08:00
Adrian Prantl	e37de5d36e	Fix UB in DwarfExpression::emitLegacyZExt() A shift-left > 63 triggers a UBSAN failure. This patch kicks the can down the road (to the consumer) by emitting a more compact representation of the shift computation in DWARF expressions. Differential Revision: https://reviews.llvm.org/D118183	2022-01-25 13:49:14 -08:00
Sean Fertile	a2505bd063	[PowerPC][AIX] Override markFunctionEnd() During fast-isel calling 'markFunctionEnd' in the base class will call tidyLandingPads. This can cause an issue where we have determined that we need ehinfo and emitted a traceback table with the bits set to indicate that we will be emitting the ehinfo, but the tidying deletes all landing pads. In this case we end up emitting a reference to __ehinfo.N symbol, but not emitting a definition to said symbol and the resulting file fails to assemble. Differential Revision: https://reviews.llvm.org/D117040	2022-01-25 10:08:53 -05:00
Nikita Popov	a3a2239aaa	[GlobalISel] Avoid pointer element type access during InlineAsm lowering Same change as has been made for the SDAG lowering.	2022-01-25 14:26:47 +01:00
Simon Pilgrim	15e2be291f	[DAG] visitMULHS/MULHU/AND - remove some redundant LHS constant checks Now that we constant fold and canonicalize constants to the RHS, we don't need to check both LHS and RHS for specific constants	2022-01-25 11:54:23 +00:00
Bjorn Pettersson	109cc5adcc	[DAGCombine] Fold SRA of a load into a narrower sign-extending load An sra is basically sign-extending a narrower value. Fold away the shift by doing a sextload of a narrower value, when it is legal to reduce the load width accordingly. Differential Revision: https://reviews.llvm.org/D116930	2022-01-25 12:14:48 +01:00
Fraser Cormack	7cb452bfde	[SelectionDAG][VP] Add widening support for VP_MERGE This patch adds widening support for ISD::VP_MERGE, which widens identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118030	2022-01-25 10:59:40 +00:00
Fraser Cormack	5f5c5603ce	[SelectionDAG][VP] Add splitting support for VP_MERGE This patch adds splitting support for ISD::VP_MERGE, which splits identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118032	2022-01-25 10:33:23 +00:00
Victor Perez	2233befa5d	[LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter Split these nodes in a similar way as their masked versions. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D117760	2022-01-25 10:08:07 +00:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Nikita Popov	9554aaa275	[Dwarf] Optimize getOrCreateSourceID() for repeated calls on same file (NFCI) DwarfCompileUnit::getOrCreateSourceID() is often called many times in sequence with the same DIFile. This is currently very expensive, because it involves creating a string from directory and file name and looking it up in a string map. This patch remembers the last DIFile and its ID and directly returns that. This gives a geomean -1.3% compile-time improvement on CTMark O0-g. Differential Revision: https://reviews.llvm.org/D118041	2022-01-25 09:27:11 +01:00
Ahmed Bougacha	e7298464c5	[ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC. This matches the actual runtime function more closely. I considered also renaming both RetainRV/UnsafeClaimRV to end with "ARV", for AutoreleasedReturnValue, but there's less potential for confusion there.	2022-01-24 19:37:01 -08:00
Paweł Bylica	9d32847b33	[DAGCombine] Remove unused param in combineCarryDiamond(). NFC	2022-01-24 20:57:00 +01:00
Mircea Trofin	b1af01fe6a	[NFC][MLGO] Simplify conditional compilation Most of the code that's shared between 'release' and 'development' modes doesn't depend on anything special.	2022-01-24 11:19:04 -08:00
Jeremy Morse	d27f022614	[NFC][DebugInfo] Strip out an undesired #if 0 block As mentioned in discussion of D116821, it's better to just delete this block than keep it hanging around.	2022-01-24 18:04:47 +00:00
Jeremy Morse	74db5c8c95	Revert rG6a605b97a200 due to excessive memory use Over in the comments for D116821, some use-cases have cropped up where there's a substantial increase in memory usage. A quick inspection shows that a) it's a lot of memory and b) there are several things to be done to reduce it. Reverting (via disabling this feature by default) to avoid bothering people in the meantime.	2022-01-24 17:08:21 +00:00
Sander de Smalen	699e22a083	[ISEL] Move trivial step_vector folds to FoldConstantArithmetic. Given that step_vector is practically a constant, doing this early helps with DAGCombine folds that happen before type legalization. There is currently no way to test this happens earlier, although existing tests for step_vector folds continue protect the folds happening at all. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117863	2022-01-24 16:37:21 +00:00
Craig Topper	a43ed49f5b	[DAGCombiner][RISCV] Canonicalize (bswap(bitreverse(x))->bitreverse(bswap(x)). If the bitreverse gets expanded, it will introduce a new bswap. By putting a bswap before the bitreverse, we can ensure it gets cancelled out when this happens. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118012	2022-01-24 08:31:53 -08:00
Craig Topper	b8c7cdcc81	[SelectionDAG][RISCV] Teach getNode to fold bswap(bswap(x))->x. This can show up during when bitreverse is expanded to bswap and swap of bits within a byte. If the input is already a bswap, we should cancel them out before we further transform them in a way that makes it harder to see the redundancy. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118007	2022-01-24 08:17:46 -08:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit `a97e20a3a8`.	2022-01-24 09:26:52 -05:00
serge-sans-paille	5f290c090a	Move STLFunctionalExtras out of STLExtras Only using that change in StringRef already decreases the number of preoprocessed lines from 7837621 to 7776151 for LLVMSupport Perhaps more interestingly, it shows that many files were relying on the inclusion of StringRef.h to have the declaration from STLExtras.h. This patch tries hard to patch relevant part of llvm-project impacted by this hidden dependency removal. Potential impact: - "llvm/ADT/StringRef.h" no longer includes <memory>, "llvm/ADT/Optional.h" nor "llvm/ADT/STLExtras.h" Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831	2022-01-24 14:13:21 +01:00
Bjorn Pettersson	46cacdbb21	[DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth In code review for D117104 two slightly weird checks were found in DAGCombiner::reduceLoadWidth. They were typically checking if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)), but such a comparison actually only make sense if BitsB is a power of two. The checks were related to the code that attempted to shrink a load based on the fact that the loaded value would be right shifted. Afaict the legality of the value types is checked later (typically in isLegalNarrowLdSt), so the existing checks were both overly conservative as well as being wrong whenever ExtVTBits wasn't a power of two. The latter was a situation triggered by a number of lit tests so we could not just assert on ExtVTBIts being a power of two). When attempting to simply remove the checks I found some problems, that seems to have been guarded by the checks (maybe just out of luck). A typical example would be a pattern like this: t1 = load i96* ptr t2 = srl t1, 64 t3 = truncate t2 to i64 When DAGCombine is visiting the truncate reduceLoadWidth is called attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then the SRL is detected and we set ShAmt to 64. In the past we've bailed out due to i96 not being a multiple of 64. If we simply remove that check then we would end up replacing the load with a new load that would read 64 bits but with a base pointer adjusted by 64 bits. So we would read 32 bits the wasn't accessed by the original load. This patch will instead utilize the fact that the logical left shift can be folded away by using a zextload. Thus, the pattern above will now be combined into t3 = load i32* ptr+offset, zext to i64 Another case is shown in the X86/shift-folding.ll test case: t1 = load i32* ptr t2 = srl i32 t1, 8 t3 = truncate t2 to i16 In the past we bailed out due to the shift count (8) not being a multiple of 16. Now the narrowing kicks in and we get t3 = load i16* ptr+offset Differential Revision: https://reviews.llvm.org/D117406	2022-01-24 12:22:04 +01:00
Nikita Popov	0d1308a7b7	[AArch64][GlobalISel] Support returned argument with multiple registers The call lowering code assumed that a returned argument could only consist of one register. Pass an ArrayRef<Register> instead of Register to make sure that all parts get assigned. Fixes https://github.com/llvm/llvm-project/issues/53315. Differential Revision: https://reviews.llvm.org/D117866	2022-01-24 10:55:28 +01:00
Nikita Popov	e7c9a6cae0	[SDAG] Don't move DBG_VALUE instructions after insertion point during scheduling (PR53243) EmitSchedule() shouldn't be touching instructions after the provided insertion point. The change introduced in D83561 performs a scan to the end of the block, and thus may move unrelated instructions. In particular, this ends up moving instructions that have been produced by FastISel and will later be deleted. Moving them means that more instructions than intended are removed. Fix this by stopping the iteration when the insertion point is reached. Fixes https://github.com/llvm/llvm-project/issues/53243. Differential Revision: https://reviews.llvm.org/D117489	2022-01-24 10:50:49 +01:00
Sander de Smalen	4f8fdf7827	[ISEL] Canonicalise constant splats to RHS. SelectionDAG::getNode() canonicalises constants to the RHS if the operation is commutative, but it doesn't do so for constant splat vectors. Doing this early helps making certain folds on vector types, simplifying the code required for target DAGCombines that are enabled before Type legalization. Somewhat to my surprise, DAGCombine doesn't seem to traverse the DAG in a post-order DFS, so at the time of doing some custom fold where the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the constant splat to the RHS. This patch leads to a few improvements, but also a few minor regressions, which I traced down to D46492. When I tried reverting this change to see if the changes were still necessary, I ran into some segfaults. Not sure if there is some latent bug there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117794	2022-01-24 09:38:36 +00:00
Abinav Puthan Purayil	68b70d17d8	[GlobalISel] Fold or of shifts with constant amount to funnel shift. This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0 and C1 are constants where C0 + C1 is the bit-width of the shift instructions. Differential Revision: https://reviews.llvm.org/D116529	2022-01-24 10:43:32 +05:30
David Blaikie	2e58a18910	DebugInfo: Include template parameters for simplified template decls in type units LLVM DebugInfo CodeGen synthesizes type declarations in type units when referencing types that are not in type units. When those synthesized types are templates and simplified template names (or mangled simplified template names) are in use, the template arguments must be attached to those declarations. A deeper fix (with a CU or DICompositeType flag) that would also support other uses of clang's -debug-forward-template-args (such as Sony's platform) could/should be implemented to fix this more broadly.	2022-01-23 16:10:14 -08:00
David Blaikie	ab4756338c	DebugInfo: Don't put types in type units if they reference internal linkage types Doing this causes a declaration of the internal linkage (anonymous namespace) type to be emitted in the type unit, which would then be ambiguous as to which internal linkage definition it refers to (since the name is only valid internally). It's possible these internal linkage types could be resolved relative to the unit the TU is referred to from - but that doesn't seem ideal, and there's no reason to put the type in a type unit since it can only be defined in one CU anyway (since otherwise it'd be an ODR violation) & so avoiding the type unit should be a smaller DWARF encoding anyway. This also addresses an issue with Simplified Template Names where the template parameter could not be rebuilt from the declaration emitted into the TU (specifically for an enum non-type template parameter, where looking up the enumerators is necessary to rebuild the full template name)	2022-01-23 14:07:31 -08:00
Simon Pilgrim	accc07e654	[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312) Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc Differential Revision: https://reviews.llvm.org/D117983	2022-01-23 16:36:25 +00:00
Simon Pilgrim	6605057992	Revert rG7c66aaddb128dc0f342830c1efaeb7a278bfc48c "[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312)" Noticed a typo in the getBooleanContents call just after I pressed commit :(	2022-01-23 16:28:44 +00:00
Simon Pilgrim	7c66aaddb1	[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312) Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc Differential Revision: https://reviews.llvm.org/D117983	2022-01-23 16:20:42 +00:00
Simon Pilgrim	20d46fbd4a	[CodeGenPrepare] Use dyn_cast result to check for null pointers Simplifies logic and helps the static analyzer correctly check for nullptr dereferences	2022-01-23 12:47:52 +00:00
David Green	b27e5459d5	[DAG] Convert truncstore(extend(x)) back to store(x) Pulled out of D106237, this folds truncstore(extend(x)) back to store(x) if the original store was legal. This can come up due to the order we fold nodes. A fold from X86 needs to be adjusted to prevent infinite loops, to have it pick the operand of a trunc more directly. Differential Revision: https://reviews.llvm.org/D117901	2022-01-22 13:20:36 +00:00
OCHyams	b6a41fddcf	[DWARF][DebugInfo] Fix off-by-one error in size of DW_TAG_base_type types Fix PR53163 by rounding the byte size of DW_TAG_base_type types up. Without this fix we risk emitting types with a truncated size (including rounding less-than-byte-sized types' sizes down to zero). Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D117124	2022-01-21 11:37:49 +00:00
Craig Topper	9abc593e98	[TargetLowering][InstCombine] Simplify BSwap demanded bits code a little. NFC Use alignDown instead of &= ~7. Replace ResultBit with NLZ. (BitWidth - NLZ - NTZ == 8) so (BitWidth - NTZ - 8 == NLZ). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117804	2022-01-20 10:45:17 -08:00
Alexandre Ganea	5af2433e17	[clang-cl] Support the /HOTPATCH flag This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019 The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching). Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: `d703b92296/lld/COFF/Writer.cpp (L1298)` The outcome is that we can finally use Live++ or Recode along with clang-cl. NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result. Depends on D43002, D80833 and D81301 for the full feature. Differential Revision: https://reviews.llvm.org/D116511	2022-01-20 12:57:19 -05:00
Lucas Prates	283f5a198a	[GlobalISel] Fix incorrect sign extension when combining G_INTTOPTR and G_PTR_ADD The GlobalISel combiner currently uses sign extension when manipulating the LHS constant when combining a sequence of the following sequence of machine instructions into a single constant: ``` %0:_(s32) = G_CONSTANT i32 <CONSTANT> %1:_(p0) = G_INTTOPTR %0:_(s32) %2:_(s64) = G_CONSTANT i64 <CONSTANT> %3:_(p0) = G_PTR_ADD %1:_, %2:_(s64) ``` This causes an issue when the bit width of the first contant and the target pointer size are different, as G_INTTOPTR has no sign extension semantics. This patch fixes this by capture an arbitrary precision in when matching the constant, allowing the matching function to correctly zero extend it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D116941	2022-01-20 17:02:52 +00:00
Mircea Trofin	f29256a64a	[MLGO] Improved support for AOT cross-targeting scenarios The tensorflow AOT compiler can cross-target, but it can't run on (for example) arm64. We added earlier support where the AOT-ed header and object would be built on a separate builder and then passed at build time to a build host where the AOT compiler can't run, but clang can be otherwise built. To simplify such scenarios given we now support more than one AOT-able case (regalloc and inliner), we make the AOT scenario centered on whether files are generated, case by case (this includes the "passed from a different builder" scenario). This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of case by case control. A builder can opt out of an AOT case by passing that case's model path as `none`. Note that the overrides still take precedence. This patch controls conditional compilation with case-specific flags, which can be enabled locally, for the component where those are available. We still keep an overall flag for some tests. The 'development/training' mode is unchanged, because there the model is passed from the command line and interpreted. Differential Revision: https://reviews.llvm.org/D117752	2022-01-20 07:05:39 -08:00
Nikita Popov	81d35f27dd	[DebugInstrRef] Memoize variable order during sorting (NFC) Instead of constructing DebugVariables and looking up the order in the comparison function, compute the order upfront and then sort a vector of (order, instr). This improves compile-time by -0.4% geomean on CTMark ReleaseLTO-g. Differential Revision: https://reviews.llvm.org/D117575	2022-01-20 16:04:24 +01:00
Victor Perez	c10c748878	[LegalizeTypes][VP] Add widening support for vp.gather and vp.scatter Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117557	2022-01-20 08:57:57 +00:00
Alexandre Ganea	aba5b91b69	Re-land [CodeView] Add full repro to LF_BUILDINFO record This patch writes the full -cc1 command into the resulting .OBJ, like MSVC does. This allows for external tools (Recode, Live++) to rebuild a source file without any external dependency but the .OBJ itself (other than the compiler) and without knowledge of the build system. The LF_BUILDINFO record stores a full path to the compiler, the PWD (CWD at program startup), a relative or absolute path to the source, and the full CC1 command line. The stored command line is self-standing (does not depend on the environment). In the same way, MSVC doesn't exactly store the provided command-line, but an expanded version (a somehow equivalent of CC1) which is also self-standing. For more information see PR36198 and D43002. Differential Revision: https://reviews.llvm.org/D80833	2022-01-19 19:44:37 -05:00
Johannes Doerfert	dd75a6b2ae	[DWARF][FIX] Try not to crash for nvptx with missing debug information This prevents crashes in the OpenMP offload pipeline as not everything is properly annotated with debug information, e.g., the runtimes we link in. While we might want to have them annotated, it seems to be generally useful to gracefully handle missing debug info rather than crashing. TODO: A test is missing and can hopefully be distilled prior to landing. This fixes #51079. Differential Revision: https://reviews.llvm.org/D116959	2022-01-19 18:40:13 -06:00
Mircea Trofin	073e09683d	Fix build break introduced by D117147	2022-01-19 11:43:51 -08:00
Mircea Trofin	e67430cca4	[MLGO] ML Regalloc Eviction Advisor The bulk of the implementation is common between 'release' mode (==AOT-ed model) and 'development' mode (for training), the main difference is that in development mode, we may also log features (for training logs), inject scoring information (currently after the Virtual Register Rewriter) and then produce the log file. This patch also introduces the score injection pass, 'Register Allocation Pass Scoring', which is trivially just logging the score in development mode. Differential Revision: https://reviews.llvm.org/D117147	2022-01-19 11:00:32 -08:00
Simon Pilgrim	d6fee6c3b0	[DAG] SelectionDAG::computeKnownBits - add mul(x,x) self-multiply handling (PR48683) Pass the SelfMultiply flag to KnownBits::mul() - added at D108992 https://alive2.llvm.org/ce/z/NN_eaR	2022-01-19 17:39:32 +00:00
Daniel Thornburgh	2e2999cd44	[NFC] Test commit to verify commit access.	2022-01-18 18:03:26 -08:00
Matt Arsenault	5599c43124	GlobalISel: Swap order of operand checks in ConstantFoldVectorBinop Since constants are canonicalized to the RHS, this is more likely to exit early.	2022-01-18 17:21:02 -05:00
Matt Arsenault	da72822763	GlobalISel: Fix CSEMIRBuilder mishandling constant folds of vectors This was ignoring the requested result register, resulting in a missing def when this happened in the IRTranslator. Fixes some crashes and verifier errors at -O0. Alternatively we could pass DstOps to the constant fold functions.	2022-01-18 17:21:02 -05:00
David Green	100763a88f	[DAG] Extend SearchForAndLoads with any_extend handling This extends the code in SearchForAndLoads to be able to look through ANY_EXTEND nodes, which can be created from mismatching IR types where the AND node we begin from only demands the low parts of the register. That turns zext and sext into any_extends as only the low bits are demanded. To be able to look through ANY_EXTEND nodes we need to handle mismatching types in a few places, potentially truncating the mask to the size of the final load. Recommitted with a more conservative check for the type of the extend. Differential Revision: https://reviews.llvm.org/D117457	2022-01-18 21:03:08 +00:00
Matt Arsenault	984451eafc	PostRAPseudos: Don't preserve kills on some implicit copy operands This fixes a verifier error I ran into at -O0. A subregister copy had an implicit kill of an overlapping superregister, which was partially redefined by the copy. The preserved implicit operand killed subregisters made live earlier in the sequence. AMDGPU already uses similar logic for whether to preserve the kill of the superregister on the final instruction if there's overlap.	2022-01-18 13:52:04 -05:00
Fraser Cormack	c8e33978fb	[VP] Propagate align parameter attr on VP gather/scatter to ISel This patch fixes a case where the 'align' parameter attribute on the pointer operands to llvm.vp.gather and llvm.vp.scatter was being dropped during the conversion to the SelectionDAG. The default alignment equal to the ABI type alignment of the vector type was kept. It also updates the documentation to reflect the fact that the parameter attribute is now properly supported. The default alignment of these intrinsics was previously documented as being equal to the ABI alignment of the scalar type, when in fact that wasn't the case: the ABI alignment of the vector type was used instead. This has also been fixed in this patch. Reviewed By: simoll, craig.topper Differential Revision: https://reviews.llvm.org/D114423	2022-01-18 17:33:24 +00:00
Sanjay Patel	870591200d	[SDAG] remove duplicate functionality when getting shift type for demanded bits; NFCI This was noted as a potential cleanup in D117508. getShiftAmountTy() has checks for vector, phase, etc. so it should handle anything that the caller was trying to account for.	2022-01-18 12:13:45 -05:00
Nikita Popov	0d51b6ab15	[DebugInstrRef] Add some missing const qualifiers (NFC)	2022-01-18 17:19:23 +01:00
Nikita Popov	cbaae61422	[DebugInstrRef] Use DenseMap for ValueToLoc (NFC) Just replacing std::map with DenseMap here is a major regression -- because this code used an identity hash for ValueIDNum. Because ValueIDNum is composed of multiple components, it is important that we use a reasonably good hash function here, so switch it to hash_value. DenseMapInfo::getHashValue<uint64_t> would not be sufficient. This gives a -0.8% geomean improvement on CTMark ReleaseLTO-g.	2022-01-18 17:02:14 +01:00
Vang Thao	10ed1eca24	[MachineSink] Allow sinking of constant or ignorable physreg uses For AMDGPU, any use of the physical register EXEC prevents sinking even if it is not a real physical register read. Add check to see if a physical register use can be ignored for sinking. Also perform same constant and ignorable physical register check when considering sinking in loops. https://reviews.llvm.org/D116053	2022-01-18 14:17:40 +00:00
Victor Perez	b7bf96a258	[LegalizeTypes][VP] Add widening support for vp.reduce.* When widening these intrinsics, we do not have to insert neutral elements at the end of the vector as when widening vector.reduce.* intrinsics, thanks to vector predication semantics. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117467	2022-01-18 10:21:01 +00:00
Hans Wennborg	f4615feaa1	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This caused builds to fail with llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:5638: bool (anonymous namespace)::DAGCombiner::BackwardsPropagateMask(llvm::SDNode *): Assertion `NewLoad && "Shouldn't be masking the load if it can't be narrowed"' failed. See the code review for a link to a reproducer. > This extends the code in SearchForAndLoads to be able to look through > ANY_EXTEND nodes, which can be created from mismatching IR types where > the AND node we begin from only demands the low parts of the register. > That turns zext and sext into any_extends as only the low bits are > demanded. To be able to look through ANY_EXTEND nodes we need to handle > mismatching types in a few places, potentially truncating the mask to > the size of the final load. > > Differential Revision: https://reviews.llvm.org/D117457 This reverts commit `578008789f`.	2022-01-18 10:50:55 +01:00
Victor Perez	fd1dce35bd	[LegalizeTypes][VP] Add splitting support for vp.reduction.* Split vp.reduction.* intrinsics by splitting the vector to reduce in two halves, perform the reduction operation in each one of them and accumulate the results of both operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117469	2022-01-18 09:29:24 +00:00
Bjorn Pettersson	65fbe38f0a	[DwarfDebug] Restore code that make comments stay aligned in DwarfDebug::emitDebugLocEntry Commit `2bddab25db` removed a piece of code from DwarfDebug::emitDebugLocEntry that according to code comments "Make sure comments stay aligned". This patch restores that piece of code, together with the addition of some extra checks in an existing lit test to work as a regression test. Without this patch we incorrectly get .byte 159 # 0 instead of .byte 159 # DW_OP_stack_value Differential Revision: https://reviews.llvm.org/D117441	2022-01-18 09:46:03 +01:00
David Sherwood	f4515ab858	Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants" This reverts commit `197f3c0deb`. Reverting after miscompilation errors discovered with ffmpeg.	2022-01-18 08:40:20 +00:00
Sanjay Patel	ba6485e25f	[SDAG] add demanded bits transform for bswap A possible codegen regression for PowerPC is noted in D117406 because we don't recognize a pattern that demands only 1 byte from a bswap. This fold has existed in IR since close to the beginning of LLVM: https://github.com/llvm/llvm-project/blame/main/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L794 ...so this patch copies that code as much as possible and adapts it for SDAG. The test for PowerPC that would change in D117406 is over-reduced with undefs, so I recreated it for AArch64 and x86 by passing in pointer args and renamed the values to make the logic clearer. Differential Revision: https://reviews.llvm.org/D117508	2022-01-17 18:25:42 -05:00
David Green	578008789f	[DAG] Extend SearchForAndLoads with any_extend handling This extends the code in SearchForAndLoads to be able to look through ANY_EXTEND nodes, which can be created from mismatching IR types where the AND node we begin from only demands the low parts of the register. That turns zext and sext into any_extends as only the low bits are demanded. To be able to look through ANY_EXTEND nodes we need to handle mismatching types in a few places, potentially truncating the mask to the size of the final load. Differential Revision: https://reviews.llvm.org/D117457	2022-01-17 15:25:11 +00:00
David Sherwood	197f3c0deb	[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants When we know the value we're extending is a negative constant then it makes sense to use SIGN_EXTEND because this may improve code quality in some cases, particularly when doing a constant splat of an unpacked vector type. For example, for SVE when splatting the value -1 into all elements of a vector of type <vscale x 2 x i32> the element type will get promoted from i32 -> i64. In this case we want the splat value to sign-extend from (i32 -1) -> (i64 -1), whereas currently it zero-extends from (i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use a single mov immediate instruction. New tests added here: CodeGen/AArch64/sve-vector-splat.ll I believe we see some code quality improvements in these existing tests too: CodeGen/AArch64/reduce-and.ll CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only occur because the test disables codegen prepare and branch folding. Differential Revision: https://reviews.llvm.org/D114357	2022-01-17 11:08:57 +00:00
Nikita Popov	873a7ee7e4	[MachineInstr] Don't include debug uses in bundle header (PR52817) Following the recommendation in https://github.com/llvm/llvm-project/issues/52817#issuecomment-1007635426, this excludes debug instructions when finalizing the bundle. As uses in debug instructions don't have effects, they will no longer be included in the BUNDLE header. Fixes https://github.com/llvm/llvm-project/issues/52817. Differential Revision: https://reviews.llvm.org/D116945	2022-01-17 10:43:21 +01:00
Bjorn Pettersson	9f237c9e7d	[DAGCombine] Refactor DAGCombiner::ReduceLoadWidth. NFCI Update code comments in DAGCombiner::ReduceLoadWidth and refactor the handling of SRL a bit. The refactoring is done with the intent of adding support for folding away SRA by using SEXTLOAD in a follow-up patch. The function is also renamed as DAGCombiner::reduceLoadWidth. Differential Revision: https://reviews.llvm.org/D117104	2022-01-16 20:24:52 +01:00
Fangrui Song	5456249736	[SelectionDAG] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D117235	2022-01-15 17:13:09 -08:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Fraser Cormack	877d1b3d07	[SelectionDAG][VP] Add splitting/widening for VP_LOAD and VP_STORE Original patch by @hussainjk. This patch was split off from D109377 to keep vector legalization (widening/splitting) separate from vector element legalization (promoting). While the original patch added a third overload of SelectionDAG::getVPStore, this patch takes the liberty of collapsing those all down to 1, as three overloads seems excessive for a little-used node. The original patch also used ModifyToType in places, but that method still crashes on scalable vector types. Seeing as the other VP legalization methods only work when all operands need identical widening, this patch follows in that vein. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117235	2022-01-15 11:41:29 +00:00
Craig Topper	e0841f6920	[SelectionDAGBuilder] Remove unneeded vector bitcast from visitTargetIntrinsic. This seems to be a leftover from a long time ago when there was an ISD::VBIT_CONVERT and a MVT::Vector. It looks like in those days the vector type was carried in a VTSDNode. As far as I know, these days ComputeValueTypes would have already assigned "Result" the same type we're getting from TLI.getValueType here. Thus the BITCAST is always a NOP. Verified by adding an assert and running check-llvm. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D117335	2022-01-14 12:52:49 -08:00
James Y Knight	a97e20a3a8	Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This commit sometimes causes a crash when compiling a vtable thunk. E.g.: clang '--target=aarch64-grtev4-linux-gnu' -xc++ - -c -o /dev/null <<EOF struct a { virtual int f(); }; struct c { virtual int &g() const; }; struct d : a, c { int &g() const; }; int &d::g() const {} EOF Some follow-up commits have been reverted as well: Revert "IR: Make getRetAlign check callee function attributes" Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC." Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC." This reverts commit `4f414af6a7`. This reverts commit `a5507d2e25`. This reverts commit `3d2d208f6a`. This reverts commit `07ddfa95e3`.	2022-01-14 04:50:07 +00:00
David Sherwood	ba471ba8d2	Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants" This reverts commit `31009f0b5a`. It seems to be causing SVE VLA buildbot failures and has introduced a genuine regression. Reverting for now.	2022-01-13 15:59:43 +00:00
Eugene Zhulenev	764e52f0d4	[DebugInfo][InstrRef] Short-circuit unnecessary preferred location map construction Reviewed By: cota Differential Revision: https://reviews.llvm.org/D117162	2022-01-13 06:24:52 -08:00
Simon Pilgrim	4f414af6a7	Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC.	2022-01-13 11:10:50 +00:00
Sebastian Neubauer	f4139440f1	[Docs] Fix IR and TableGen grammar inconsistencies IR: - globals (and functions, ifuncs, aliases) can have a partition - catchret has a `to` before the label - the sint/int types do not exist - signext comes after the type - a variable was missing its type TableGen: - The second value after a `#` concatenation is optional See e.g. llvm/lib/Target/X86/X86InstrAVX512.td:L3351 - IncludeDirective and PreprocessorDirective were never referenced in the grammar - Add some missing ; - Parent classes of multiclasses can have generic arguments. Reuse the `ParentClassList` that is already used in other places. MIR: - liveins only allows physical registers, which start with a $ Differential Revision: https://reviews.llvm.org/D116674	2022-01-13 11:55:13 +01:00
David Sherwood	31009f0b5a	[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants When we know the value we're extending is a negative constant then it makes sense to use SIGN_EXTEND because this may improve code quality in some cases, particularly when doing a constant splat of an unpacked vector type. For example, for SVE when splatting the value -1 into all elements of a vector of type <vscale x 2 x i32> the element type will get promoted from i32 -> i64. In this case we want the splat value to sign-extend from (i32 -1) -> (i64 -1), whereas currently it zero-extends from (i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use a single mov immediate instruction. New tests added here: CodeGen/AArch64/sve-vector-splat.ll I believe we see some code quality improvements in these existing tests too: CodeGen/AArch64/dag-numsignbits.ll CodeGen/AArch64/reduce-and.ll CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only occur because the test disables codegen prepare and branch folding. Differential Revision: https://reviews.llvm.org/D114357	2022-01-13 09:43:07 +00:00
Matt Arsenault	5a16306c09	GlobalISel: Always enable GISelKnownBits for InstructionSelect This wasn't running at -O0, and causing crashes for AMDGPU. AMDGPU needs this to match the addressing modes of stack access instructions, which is even more important at -O0 than with optimizations. It currently costs nothing to run ahead of time, so just always enable it.	2022-01-12 18:57:24 -05:00
Matt Arsenault	5f39a02ea9	RegScavenger: Remove used regs from scavenge candidates In a future change, AMDGPU will have 2 emergency scavenging indexes in some situations. The secondary scavenging index ends up being used recursively when the scavenger calls eliminateFrameIndex for the emergency spill slot. Without this, it would end up seeing the same register which was just scavenged in the parent call as free, inserts a second emergency spill to the same location and returns the same register when 2 unique free registers are required. We need to only do this if the register is used. SystemZ uses 2 scavenging slots, but calls the scavenger twice in sequence and not recursively. In this case the previously scavenged register can be re-clobbered, but is still tracked in the scavenger until it sees the deferred restore instruction.	2022-01-12 18:56:52 -05:00
Matt Arsenault	07ddfa95e3	GlobalISel: Add G_ASSERT_ALIGN hint instruction Insert it for call return values only for now, which is the only case the DAG handles also.	2022-01-12 18:20:58 -05:00
Matt Arsenault	8a16201a0b	GlobalISel: Fix insert point in localizer This was inserting the new G_CONSTANT after the use, and the later block scan would run off the end. Fix calling SkipPHIsAndLabels for no apparent reason.	2022-01-12 13:44:05 -05:00
Mircea Trofin	b2d2e93138	[NFC][MLGO] The regalloc reward is float, not int64_t	2022-01-12 09:32:41 -08:00
Mircea Trofin	3150bce078	[NFC][MLGO] Prep a few files before the main ML Regalloc adviser To avoid trivial changes.	2022-01-12 08:54:00 -08:00
Petar Avramovic	c8c5dc766b	GlobalIsel: Fix fma combine when one of the operands comes from unmerge Fma combine assumes that MRI.getVRegDef(Reg)->getOperand(0).getReg() = Reg which is not true when Reg is defined by instruction with multiple defs e.g. G_UNMERGE_VALUES. Fix is to keep register and the instruction that defines register in DefinitionAndSourceRegister and use when needed. Differential Revision: https://reviews.llvm.org/D117032	2022-01-12 17:47:25 +01:00
Leonard Grey	0f85393004	[MachO] Port call graph profile section and directive This ports the `.cg_profile` assembly directive and call graph profile section generation to MachO from COFF/ELF. Due to MachO section naming rules, the section is called `__LLVM,__cg_profile` rather than `.llvm.call-graph-profile` as in COFF/ELF. Support for llvm-readobj is included to facilitate testing. Corresponding LLD change is D112164 Differential Revision: https://reviews.llvm.org/D112160	2022-01-12 09:22:26 -05:00
Jeremy Morse	6a605b97a2	[DebugInfo] Move flag for instr-ref to LLVM option, from TargetOptions This feature was previously controlled by a TargetOptions flag, and I figured that codegen::InitTargetOptionsFromCodeGenFlags would default it to "on" for all frontends. Enabling by default was discussed here: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html and originally supposed to happen in `3c04507088`, but it didn't actually take effect, as it turns out frontends initialize TargetOptions themselves. This patch moves the flag from a TargetOptions flag to a global flag to CodeGen, where it isn't immediately affected by the frontend being used. Hopefully this will actually cause instr-ref to be on by default on x86_64 now! This patch is easily reverted, and chances of turbulence are moderately high. If you need to revert, please consider instead commenting out the 'return true' part of llvm::debuginfoShouldUseDebugInstrRef to turn the feature off, and dropping me an email. Differential Revision: https://reviews.llvm.org/D116821	2022-01-12 13:28:01 +00:00
Alexey Lapshin	39385d4cd1	[CodeGen][Debuginfo][NFC] Refactor DIE values SizeOf method to not depend on AsmPrinter. SizeOf() method of DIE values(unsigned SizeOf(const AsmPrinter *AP, dwarf::Form Form) const) depends on AsmPrinter. AsmPrinter is too specific class here. This patch removes dependency on AsmPrinter and use dwarf::FormParams structure instead. It allows calculate DIE values size without using AsmPrinter. That refactoring is useful for D96035([dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.) Differential Revision: https://reviews.llvm.org/D116997	2022-01-12 13:15:26 +03:00
Craig Topper	63b17eb9ec	[RISCV] Add strictfp support for compares. This adds support for STRICT_FSETCC(quiet) and STRICT_FSETCCS(signaling). FEQ matches well to STRICT_FSETCC oeq. FLT/FLE matches well to STRICT_FSETCCS olt/ole. Others require commuting operands or multiple instructions. STRICT_FSETCC olt/ole/ogt/oge/ult/ule/ugt/uge uses FLT/FLE, but we need to save/restore FFLAGS around them to avoid spurious exceptions. I've implemented pseudo instructions with a CustomInserter to insert the save/restore CSR instructions. Unfortunately, this doesn't honor exceptions for signaling NANs but I'm not sure if signaling nans are really supported by the constrained intrinsics. STRICT_FSETCC one and ueq expand to a pair of FLT instructions with a save/restore of fflags around each. This could be improved in the future. There may be some opportunities to generate better code for strict comparisons mixed with nonans fast math flags. I've left FIXMEs in the .td files for that. Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com> Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D116694	2022-01-11 20:01:41 -08:00
Matt Arsenault	5a434ceafb	GlobalISel: Use cloneVirtualRegister in localizer	2022-01-11 16:10:12 -05:00
Nick Desaulniers	4edb9983cb	[SelectionDAG] treat X constrained labels as i for asm Completely rework how we handle X constrained labels for inline asm. X should really be treated as i. Then existing tests can be moved to use i D115410 and clang can just emit i D115311. (D115410 and D115311 are callbr, but this can be done for label inputs, too). Coincidentally, this simplification solves an ICE uncovered by D87279 based on assumptions made during D69868. This is the third approach considered. See also discussions v1 (D114895) and v2 (D115409). Reported-by: kernel test robot <lkp@intel.com> Fixes: https://github.com/ClangBuiltLinux/linux/issues/1512 Reviewed By: void, jyknight Differential Revision: https://reviews.llvm.org/D115688	2022-01-11 10:29:40 -08:00
Nick Desaulniers	9c4b49db19	[ShrinkWrap] check for PPC's non-callee-saved LR As pointed out in https://reviews.llvm.org/D115688#inline-1108193, we don't want to sink the save point past an INLINEASM_BR, otherwise prologepilog may incorrectly sink a prolog past the MBB containing an INLINEASM_BR and into the wrong MBB. ShrinkWrap is getting this wrong because LR is not in the list of callee saved registers. Specifically, ShrinkWrap::useOrDefCSROrFI calls RegisterClassInfo::getLastCalleeSavedAlias which reads CalleeSavedAliases which was populated by RegisterClassInfo::runOnMachineFunction by iterating the list of MCPhysReg returned from MachineRegisterInfo::getCalleeSavedRegs. Because PPC's LR is non-allocatable, it's NOT considered callee saved. Add an interface to TargetRegisterInfo for such a case and use it in Shrinkwrap to ensure we don't sink a prolog past an INLINEASM or INLINEASM_BR that clobbers LR. Reviewed By: jyknight, efriedma, nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D116424	2022-01-11 10:01:34 -08:00
David Sherwood	51497dc0b2	[IR] Change vector.splice intrinsic to reject out-of-bounds indices I've changed the definition of the experimental.vector.splice instrinsic to reject indices that are known to be or possibly out-of-bounds. In practice, this means changing the definition so that the index is now only valid in the range [-VL, VL-1] where VL is the known minimum vector length. We use the vscale_range attribute to take the minimum vscale value into account so that we can permit more indices when the attribute is present. The splice intrinsic is currently only ever generated by the vectoriser, which will never attempt to splice vectors with out-of-bounds values. Changing the definition also makes things simpler for codegen since we can always assume that the index is valid. This patch was created in response to review comments on D115863 Differential Revision: https://reviews.llvm.org/D115933	2022-01-11 09:37:39 +00:00
Nick Desaulniers	649b11ef8b	git-clang-format HEAD~	2022-01-10 18:34:30 -08:00
Nick Desaulniers	301e911740	[TargetLowering] precommit refactor from D115688 NFC Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>	2022-01-10 18:32:13 -08:00
Mircea Trofin	b191c1f0f9	[NFC][regalloc] Pull out some AllocationOrder/CostPerUseLimit eviction logic We are reusing that logic in the ML implementation. Differential Revision: https://reviews.llvm.org/D116075	2022-01-10 15:47:31 -08:00
Nadav Rotem	e2cc091a7d	Fix a missed opportunity to merge stores. This commit fixes a missed opportunity in merging consecutive stores. The code that searches for stores skipped the case of stores that directly connect to the root. The comment above the implementation lists this case but the code did not handle it. I found this pattern when looking into the shared_ptr destructor. GCC generates the right sequence. Here is a small repo: int foo(int* buff) { buff[0] = 0; int x = buff[1]; buff[1] = 0; return x; } Differential Revision: https://reviews.llvm.org/D116895	2022-01-10 13:49:02 -08:00
Mircea Trofin	e121269131	[NFC][regalloc] Pass RAGreedy to eviction adviser This patch simplifies the interface between RAGreedy and the eviction adviser by passing the allocator to the adviser, which allows the latter to extract needed information as needed, rather than requiring it be passed piecemeal at construction time (which would also complicate later evolution). Part of this, the patch also moves ExtraRegInfo back to RAGreedy. We keep the encapsulation of ExtraRegInfo because it has benefits (e.g. improved readability by abstracting access to the cascade info) and also simpler re-initialization at regalloc pass re-entry time (we just flush the Optional). Differential Revision: https://reviews.llvm.org/D116669	2022-01-10 11:55:16 -08:00
Matt Arsenault	0ba4e4b500	GlobalISel: Pass DebugLoc to getFunctionLiveInPhysReg Fixes crash in assertion about dropping debug info.	2022-01-10 13:50:52 -05:00

... 12 13 14 15 16 ...

33072 Commits