llvm-project

Commit Graph

Author	SHA1	Message	Date
Paul Kirth	6e9bab71b6	Revert "[llvm][NFC] Refactor code to use ProfDataUtils" This reverts commit `300c9a7881`. We will reland once these issues are ironed out.	2022-07-27 21:38:11 +00:00
Paul Kirth	300c9a7881	[llvm][NFC] Refactor code to use ProfDataUtils In this patch we replace common code patterns with the use of utility functions for dealing with profiling metadata. There should be no change in functionality, as the existing checks should be preserved in all cases. Reviewed By: bogner, davidxl Differential Revision: https://reviews.llvm.org/D128860	2022-07-27 21:13:54 +00:00
Adrian Prantl	719ab04acf	[GlobalISel] Handle IntToPtr constants in dbg.value Currently, the IR to MIR translator can only handle two kinds of constant inputs to dbg.values intrinsics: constant integers and constant floats. In particular, it cannot handle pointers created from IntToPtr ConstantExpression objects. This patch addresses the limitation above by replacing the IntToPtr with its input integer prior to converting the dbg.value input. Patch by Felipe Piovezan! Differential Revision: https://reviews.llvm.org/D130642	2022-07-27 13:42:07 -07:00
Amara Emerson	65246d3eb4	Use hasNItemsOrLess() in MRI::hasAtMostUserInstrs().	2022-07-27 11:42:14 -07:00
Amara Emerson	19cdd1908b	[AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT. This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of materializing a specific constant in code size. Doing so prevents us from sinking constants which require multiple instructions to generate into use blocks. Code size savings on CTMark -Os: Program size.__text before after diff ClamAV/clamscan 381940.00 382052.00 0.0% lencod/lencod 428408.00 428428.00 0.0% SPASS/SPASS 411868.00 411876.00 0.0% kimwitu++/kc 449944.00 449944.00 0.0% Bullet/bullet 463588.00 463556.00 -0.0% sqlite3/sqlite3 284696.00 284668.00 -0.0% consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0% 7zip/7zip-benchmark 595244.00 594972.00 -0.0% mafft/pairlocalalign 247512.00 247368.00 -0.1% tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2% Geomean difference -0.0% Differential Revision: https://reviews.llvm.org/D130554	2022-07-27 10:51:16 -07:00
Simon Pilgrim	c0b3f7a50f	[DAG] SimplifyDemandedBits - ensure we clear known One bits that AssertZext asserts are really known Zero Matches ComputeKnownBits behaviour Thanks to @uabelho for the fuzz regression report on D129765	2022-07-27 13:57:47 +01:00
Simon Pilgrim	529bd4f352	[DAG] SimplifyDemandedBits - don't early-out for multiple use values SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits. @lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern. Differential Revision: https://reviews.llvm.org/D129765	2022-07-27 10:54:06 +01:00
Dmitry Vassiliev	e3e63f30a5	[CodeGen] Fixed ambiguous symbol ExtAddrMode in case of NDEBUG and LLVM_ENABLE_DUMP This patch fixes the following error with MSVC 16.9.2 in case of NDEBUG and LLVM_ENABLE_DUMP: llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2872: 'ExtAddrMode': ambiguous symbol llvm/include/llvm/CodeGen/TargetInstrInfo.h(86): note: could be 'llvm::ExtAddrMode' llvm/lib/CodeGen/CodeGenPrepare.cpp(2447): note: or '`anonymous-namespace'::ExtAddrMode' llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2039: 'print': is not a member of 'llvm::ExtAddrMode' Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D130426	2022-07-27 00:21:57 +02:00
Fangrui Song	f106525de2	[MachineFunctionPass] Support -print-changed and -print-changed=quiet -print-changed for new pass manager is handy beside -print-after-all. Port it to MachineFunctionPass. Note: lib/Passes/StandardInstrumentations.cpp implements a number of misc features. If we want to use them for codegen, we may need to lift some functionality to LLVMIR. Reviewed By: aeubanks, jamieschmeiser Differential Revision: https://reviews.llvm.org/D130434	2022-07-26 10:16:49 -07:00
Simon Pilgrim	1ea7b9c6ee	[DAG] matchRotateSub - set demanded bits to the shift amount type size, not the shift result size. This should fix a report on D130251 of an assert due to a bitwidth mismatch in APInt::isSubSetOf	2022-07-26 17:58:51 +01:00
Stefan Gränitz	1e30820483	[WinEH] Apply funclet operand bundles to nounwind intrinsics that lower to function calls in the course of IR transforms WinEHPrepare marks any function call from EH funclets as unreachable, if it's not a nounwind intrinsic or has no proper funclet bundle operand. This affects ARC intrinsics on Windows, because they are lowered to regular function calls in the PreISelIntrinsicLowering pass. It caused silent binary truncations and crashes during unwinding with the GNUstep ObjC runtime: https://github.com/gnustep/libobjc2/issues/222 This patch adds a new function `llvm::IntrinsicInst::mayLowerToFunctionCall()` that aims to collect all affected intrinsic IDs. * Clang CodeGen uses it to determine whether or not it must emit a funclet bundle operand. * PreISelIntrinsicLowering asserts that the function returns true for all ObjC runtime calls it lowers. * LLVM uses it to determine whether or not a funclet bundle operand must be propagated to inlined call sites. Reviewed By: theraven Differential Revision: https://reviews.llvm.org/D128190	2022-07-26 17:52:43 +02:00
Paul Walker	e5c892dd85	[SVE][SelectionDAG] Use INDEX to generate matching instances of BUILD_VECTOR. This patch starts small, only detecting sequences of the form <a, a+n, a+2n, a+3n, ...> where a and n are ConstantSDNodes. Differential Revision: https://reviews.llvm.org/D125194	2022-07-26 15:28:37 +00:00
wangpc	1a7078d106	[DAGCombine] Mask doesn't have to be (EltSize - 1) exactly when combining rotation I think what we need is the least Log2(EltSize) significant bits are known to be ones. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130251	2022-07-26 21:14:45 +08:00
Sven van Haastregt	c8d91b07bb	Reassoc FMF should not optimize FMA(a, 0, b) to (b) Optimizing (a * 0 + b) to (b) requires assuming that a is finite and not NaN. DAGCombiner will do this optimization when the reassoc fast math flag is set, which is not correct. Change DAGCombiner to only consider UnsafeMath for this optimization. Differential Revision: https://reviews.llvm.org/D130232 Co-authored-by: Andrea Faulds <andrea.faulds@arm.com>	2022-07-26 09:39:12 +01:00
Kazu Hirata	3f3930a451	Remove redundaunt virtual specifiers (NFC) Identified with tidy-modernize-use-override.	2022-07-25 23:00:59 -07:00
jacquesguan	cb370cf413	[DAGCombiner] Teach scalarizeExtractedBinop to support scalable splat. This patch supports the scalable splat part for scalarizeExtractedBinop. Differential Revision: https://reviews.llvm.org/D129725	2022-07-26 09:31:45 +08:00
Amara Emerson	5ae0472694	[GlobalISel] Fix miscompile of G_UREM + G_UDIV due to not checking for equality of the first operands of each. Fixes issue #55287 Differential Revision: https://reviews.llvm.org/D130525	2022-07-25 16:03:05 -07:00
Alexander Shaposhnikov	1e636f2676	[IRBuilder] Add assert for AtomicRMW ordering Add assert for AtomicRMW: Ordering != AtomicOrdering::Unordered (https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Verifier.cpp#L3944) and adjust expandAtomicStore accordingly. Test plan: 1/ ninja check-llvm check-clang check-lld 2/ Bootstrapped LLVM/Clang pass tests Differential revision: https://reviews.llvm.org/D130457	2022-07-25 22:51:25 +00:00
Matt Arsenault	62531518f9	RegAllocGreedy: Add a command line flag for reverseLocalAssignment Introduce a flag like for some of the other target heuristic controls to help with experimentation.	2022-07-25 15:47:15 -04:00
Vladislav Dzhidzhoev	fc93ba061a	[GlobalISel][DebugInfo] Remove debug info with zero line from constants inserted at entry block Emission of constants having DebugLoc with line 0 causes significant increase of debug_line section size for some source files. To illustrate, we can compare section sizes of several files from llvm test-suite, built with SelectionDAG vs GlobalISel, on Aarch64 (macOS), using -O0 optimization level: \| Source path \| SDAG text sz \| GISel text sz \| SDAG debug_line sz \| GISel debug_line sz \| -------------------------------------------------------------- \| ------------ \| ------------- \| ------------------ \| -------------------- \| `SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c` \| 15320 \| 660 \| 14872 \| 6340 \| `SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c` \| 33640 \| 26300 \| 2812 \| 6693 \| `SingleSource/Benchmarks/Misc/flops-4.c` \| 1428 \| 1196 \| 594 \| 1008 \| `MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c` \| 2716 \| 964 \| 809 \| 903 \| `MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c` \| 2534 \| 2502 \| 189 \| 573 For instance, here is a fragment of `flops-4.c.o` debug line section dump ``` Address Line Column File ISA Discriminator Flags ------------------ ------ ------ ------ --- ------------- ------------- 0x0000000000000000 174 0 1 0 0 is_stmt 0x0000000000000010 0 0 1 0 0 0x0000000000000018 185 4 1 0 0 is_stmt prologue_end 0x000000000000001c 0 0 1 0 0 0x0000000000000024 186 4 1 0 0 is_stmt 0x000000000000002c 189 10 1 0 0 is_stmt 0x0000000000000030 0 0 1 0 0 0x0000000000000038 207 11 1 0 0 is_stmt 0x0000000000000044 208 11 1 0 0 is_stmt 0x0000000000000048 0 0 1 0 0 0x0000000000000058 210 10 1 0 0 is_stmt 0x000000000000005c 0 0 1 0 0 0x0000000000000060 211 10 1 0 0 is_stmt 0x0000000000000064 0 0 1 0 0 0x000000000000006c 212 10 1 0 0 is_stmt 0x0000000000000070 0 0 1 0 0 0x000000000000007c 213 10 1 0 0 is_stmt 0x0000000000000080 0 0 1 0 0 0x0000000000000088 214 10 1 0 0 is_stmt 0x000000000000008c 0 0 1 0 0 0x0000000000000094 215 10 1 0 0 is_stmt ``` Lot of zero lines are produced by constants (global values) having DebugLoc with line 0. It seems that they're not significant for debugging experience. With the commit applied, total size of debug_line sections of llvm shared libraries has reduced by 2.5%. Change of debug line section size of files listed above: \| Source path \| GISel debug_line sz \| Patch debug_line sz \| -------------------------------------------------------------- \| ------------------- \| -------------------- \| `SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c` \| 6340 \| 1465 \| `SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c` \| 6693 \| 3782 \| `SingleSource/Benchmarks/Misc/flops-4.c` \| 1008 \| 609 \| `MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c` \| 903 \| 841 \| `MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c` \| 573 \| 190 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D127488	2022-07-25 17:19:01 +00:00
Nikita Popov	fb7caa3c7b	[AsmPrinter] Reject ptrtoint to larger size in lowerConstant() When using a ptrtoint to a size larger than the pointer width in a global initializer, we currently create a ptr & low_bit_mask style MCExpr, which will later result in a relocation error during object file emission. This patch rejects the constant expression already during lowerConstant(), which results in a much clearer error message that references the constant expression at fault. This fixes https://github.com/llvm/llvm-project/issues/56400, for certain definitions of "fix". Differential Revision: https://reviews.llvm.org/D130366	2022-07-25 10:18:27 +02:00
Kazu Hirata	b5188591a0	[llvm] Remove redundaunt virtual specifiers (NFC) Identified with modernize-use-override.	2022-07-24 21:50:35 -07:00
Kazu Hirata	acf648b5e9	Use llvm::less_first and llvm::less_second (NFC)	2022-07-24 16:21:29 -07:00
Kazu Hirata	ea29810c9d	[CodeGen] Remove a redundant void (NFC) Identified with modernize-redundant-void-arg.	2022-07-24 12:27:14 -07:00
Matt Arsenault	40abb28f61	RegAllocGreedy: Fix subranges when rematerializing dead subreg defs This would create a new interval missing the subrange and hit this verifier error: * Bad machine code: Live interval for subreg operand has no subranges * - function: test_remat_subreg_def - basic block: %bb.0 (0xa568758) [0B;128B) - instruction: 32B dead undef %4.sub0:vreg_64 = V_MOV_B32_e32 2, implicit $exec	2022-07-24 11:51:59 -04:00
Simon Pilgrim	562ee7cc5f	[DAG] visitSMUL_LOHI/visitUMUL_LOHI - ensure we canonicalize constants to the RHS	2022-07-24 16:09:56 +01:00
Simon Pilgrim	428c0f2adc	[DAG] getNode - assert that SMUL_LOHI/UMUL_LOHI nodes have the correct ops + types	2022-07-24 15:30:57 +01:00
Simon Pilgrim	0708771cce	[DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC. Just use KnownBits::isZero() to ensure all the bits are known zero.	2022-07-24 13:12:21 +01:00
Simon Pilgrim	e82d49bfed	[DAG] SimplifyMultipleUseDemandedBits - early-out for any scalable vector types Noticed while working to remove SelectionDAG::GetDemandedBits - we were relying on the callers to have already bailed for scalable vectors	2022-07-24 12:59:43 +01:00
Simon Pilgrim	a3e38b4a20	[DAG] SimplifyDemandedVectorElts - if every and/mul element-pair has a zero/undef then just constant fold to zero	2022-07-24 12:00:31 +01:00
Kazu Hirata	7bfa06f6c0	[CodeGen] Use range-based for loops (NFC)	2022-07-23 16:10:46 -07:00
Simon Pilgrim	ac8be21365	[DAG] isSplatValue - don't attempt to merge any BITCAST sub elements if they contain UNDEFs We still haven't found a solution that correctly handles 'don't care' sub elements properly - given how close it is to the next release branch, I'm making this fail safe change and we can revisit this later if we can't find alternatives. NOTE: This isn't a reversion of D128570 - it's the removal of undef handling across bitcasts entirely Fixes #56520	2022-07-23 18:38:48 +01:00
Dmitri Gribenko	aba43035bd	Use llvm::sort instead of std::sort where possible llvm::sort is beneficial even when we use the iterator-based overload, since it can optionally shuffle the elements (to detect non-determinism). However llvm::sort is not usable everywhere, for example, in compiler-rt. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D130406	2022-07-23 15:19:05 +02:00
Simon Pilgrim	5f89d2bae9	[DAG] Move OR(AND(X,C1),AND(OR(X,Y),C2)) -> OR(AND(X,OR(C1,C2)),AND(Y,C2)) fold to SimplifyDemandedBits This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....). It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.	2022-07-23 13:17:24 +01:00
Simon Pilgrim	6aff1b7b3c	[DAG] SimplifyDemandedBits - pull out repeated getValueType() calls. NFC.	2022-07-23 12:01:54 +01:00
Simon Pilgrim	2421a5af72	[DAG] ExpandIntRes_ADDSUB - create UADDO/USUBO instead of ADDCARRY/SUBCARRY if overflow is known to be zero As noticed on D127115, when splitting ADD/SUB nodes we often end up with cases where overflow from the lower bits is impossible - in such cases we're better off breaking the carry chain dependency as soon as possible. This path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen diff without a topological worklist.	2022-07-23 11:13:44 +01:00
Simon Pilgrim	8937252465	[DAG] computeKnownBits - add basic shift-by-parts handling Concat KnownBits from ISD::SHL_PARTS / ISD::SRA_PARTS / ISD::SRL_PARTS lo/hi operands and perform the KnownBits calculation by the shift amount on the extended type, before splitting the KnownBits based on the requested lo/hi result.	2022-07-23 09:46:30 +01:00
ARCHIT SAXENA	3bb1ce2319	Add a nop instruction if a section starts with landing pad for function splitter This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.llvm.org/D73739) which avoids zero offset landing pad in basic block sections. Detailed description: The current machine functions splitter can create ˜sections which start with a landing pad themselves. This places landing pad at offset zero from LPStart. ``` .section .text.split.foo10,"ax",@progbits foo10.cold: # %lpad .cfi_startproc .cfi_personality 3, __gxx_personality_v0 .cfi_lsda 3, .Lexception5 .cfi_def_cfa %rsp, 16 .Ltmp11: <--- This is a Landing pad and also LP Start as it is start of this section movq %rax, %rdi <--- first instruction is at offest 0 from LPStart callq _Unwind_Resume@PLT ``` This will cause landing pad entries to become zero (.Ltmp11-foo10.cold) ``` .Lcst_begin4: .uleb128 .Ltmp9-.Lfunc_begin2 # >> Call Site 1 << .uleb128 .Ltmp10-.Ltmp9 # Call between .Ltmp9 and .Ltmp10 .uleb128 .Ltmp11-foo10.cold <---This is zero # jumps to .Ltmp11 .byte 3 # On action: 2 .uleb128 .Ltmp10-.Lfunc_begin2 # >> Call Site 2 << .uleb128 .Lfunc_end9-.Ltmp10 # Call between .Ltmp10 and .Lfunc_end9 .byte 0 # has no landing pad .byte 0 # On action: cleanup .p2align 2 ``` The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. This change adds a nop instruction at start of such sections so that such a case could be avoided. Output: ``` .section .text.split.foo10,"ax",@progbits foo10.cold: # %lpad .cfi_startproc .cfi_personality 3, __gxx_personality_v0 .cfi_lsda 3, .Lexception5 .cfi_def_cfa %rsp, 16 nop <--- new instruction that is added .Ltmp11: movq %rax, %rdi callq _Unwind_Resume@PLT ``` Reviewed By: modimo, snehasish, rahmanl Differential Revision: https://reviews.llvm.org/D130133	2022-07-22 15:20:10 -07:00
Craig Topper	be208b40c1	[DAGCombiner] Simplify code around call to reduceLoadWidth in visitAND. NFC We were looking for loads or any_extend+load. reduceLoadWidth hasn't known how to look through such an any_extend to find the load since D40667 almost 5 years ago. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130333	2022-07-22 08:36:56 -07:00
Nikita Popov	c2be703c6c	[AsmPrinter] Move lowerConstant() error code out of switch (NFC) Move this out of the switch, so that different branches can indicate an error by breaking out of the switch. This becomes important if there are more than the two current error cases.	2022-07-22 16:08:28 +02:00
Cullen Rhodes	bf268a05cd	[AArch64] Emit vector FP cmp when LE is used with fast-math Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130093	2022-07-22 07:53:55 +00:00
jacquesguan	e60eb7053d	recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." With fix for AArch64 and Hexgon test cases.	2022-07-21 17:34:34 +08:00
David Green	23d6186be0	[SelectionDAG] Fix fptoi.sat scalable vector lowering Vector fptosi_sat and fptoui_sat were being expanded by unrolling the vector operation. This doesn't work for scalable vector, so this patch adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable. Scalable tests are added for AArch64 and RISCV. Some of the AArch64 fptoi_sat operations should be legal, but that will be handled in another patch. Differential Revision: https://reviews.llvm.org/D130028	2022-07-21 08:00:22 +01:00
esmeyi	339392ecf2	[AIX] follow-up of D124654. Emitting the remaining aliases instead of reporting an error to avoid SPEC2017 PEAK failures. And mark this as a TODO.	2022-07-21 01:10:09 -04:00
Simon Pilgrim	029e83b401	[DAG] getNode - don't bother creating ADDO(X,0) or SUBO(X,0) nodes. Similar to what we already do in getNode for basic ADD/SUB nodes, return the X operand directly, but here we know that there will be no/zero overflow as well. As noted on D127115 - this path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen without a topological worklist.	2022-07-20 12:04:33 +01:00
Simon Pilgrim	766cd95481	[DAG] getNode - assert that ADDO/SUBO nodes have the correct ops + types	2022-07-20 11:23:58 +01:00
Simon Pilgrim	9fc347aa4e	[DAG] PromoteIntRes_BUILD_VECTOR - extend constant boolean vectors according to target BooleanContents PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target. This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur. Differential Revision: https://reviews.llvm.org/D129641	2022-07-20 10:49:31 +01:00
Lorenzo Albano	07d69d9fc9	[VP] Legalize the stride operand for EXPERIMENTAL_VP_STRIDED SDNodes Add promotion and expansion of integer operands for experimental_vp_strided SelectionDAG nodes; the expansion is actually just a truncation of the stride operand. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D123112	2022-07-20 10:22:43 +02:00
Kazu Hirata	76e18cc4f6	[llvm] Use llvm::any_of and llvm::none_of (NFC)	2022-07-20 00:36:19 -07:00
Kazu Hirata	0387da6f4f	Use value instead of getValue (NFC)	2022-07-19 21:18:26 -07:00
Kazu Hirata	41ae78ea3a	Use has_value instead of hasValue (NFC)	2022-07-19 20:15:44 -07:00
Kazu Hirata	bbbb4393ee	[CodeGen] Use value_or instead of getValueOr (NFC)	2022-07-19 19:50:43 -07:00
David Truby	4c82f56d8f	[llvm][SVE] Remove redundant and when comparing against extending load When determining if an `and` should be merged into an extending load the constant argument to the `and` is currently not checked if the argument requires truncation. This prevents the combine happening when the vector width is half the normal available vector width for SVE VLA vectors. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D129281	2022-07-19 17:08:32 +01:00
Simon Pilgrim	71c502cbca	[DAG] Call SimplifyDemandedBits from ISD::MUL nodes Noticed while triaging D129765.	2022-07-19 14:11:04 +01:00
Benjamin Kramer	8aff88fd3a	[LegalizeDAG] Propagate alignment in ExpandExtractFromVectorThroughStack Unlike the name suggests this can reuse any store as a base for a memory-based vector extract. If that store is underaligned the loads created to extract will have an invalid alignment. Since most CPUs are forgiving wrt alignment this is almost never an issue, on x86 this is only reproducible by extracting a 128 bit vector out of a wider vector. I tried making a test case in the context of https://reviews.llvm.org/D127982 but it's really really fragile, as the output pretty much looks like a missed optimization.	2022-07-19 13:13:55 +02:00
Simon Pilgrim	0f6b0461b0	[DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits. This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115 Alive2: https://alive2.llvm.org/ce/z/fl7T7K Differential Revision: https://reviews.llvm.org/D129933	2022-07-19 10:59:07 +01:00
Max Kazantsev	69b284aaf6	Revert "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." This reverts commit `58dfaaaace`. Massive AARCH test failures in buildbot.	2022-07-19 13:41:52 +07:00
jacquesguan	58dfaaaace	[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat. This revision supports to scalarize a binary operation of two scalable splat vectors. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122791	2022-07-19 11:20:51 +08:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Jay Foad	dbed4326dd	[LiveIntervals] Find better anchoring end points when repairing ranges r175673 changed repairIntervalsInRange to find anchoring end points for ranges automatically, but the calculation of Begin included the first instruction found that already had an index. This patch changes it to exclude that instruction: 1. For symmetry, so that the half open range [Begin,End) only includes instructions that do not already have indexes. 2. As a possible performance improvement, since repairOldRegInRange will scan fewer instructions. 3. Because repairOldRegInRange hits assertion failures in some cases when it sees a def that already has a live interval. (3) fixes about ten tests in the CodeGen lit test suite when -early-live-intervals is forced on. Differential Revision: https://reviews.llvm.org/D110182	2022-07-18 19:34:43 +01:00
Itay Bookstein	2570f226d1	[SDAG] Remove single-result restriction on commutative CSE The DAG Combiner unnecessarily restricts commutative CSE to nodes with a single result value. This commit removes that restriction. Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D129666	2022-07-18 19:19:13 +03:00
Lorenzo Albano	c00a44fa68	[VP] IR expansion pass for VP gather and scatter Add vp_gather and vp_scatter expansion to unpredicated intrinsics. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D120664	2022-07-18 17:00:38 +02:00
Nikita Popov	56b4b6e81b	[SDAG] Fix release build This variable was only declared in debug builds, but is needed in release builds as well.	2022-07-18 14:10:31 +02:00
Max Kazantsev	d693fd29f1	[Verifier] Make Verifier recognize undef tokens as correct IR Undef tokens may appear in unreached code as result of RAUW of some optimization, and it should not be considered as bad IR. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D128904 Reviewed By: mkazantsev	2022-07-18 16:26:06 +07:00
Lorenzo Albano	f390781cec	[VP] Implementing expansion pass for VP load and store. Added function to the ExpandVectorPredication pass to handle VP loads and stores. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D109584	2022-07-18 08:47:54 +02:00
Craig Topper	7fa1c32634	[CodeGen] Remove unnecessary APInt copy. NFC	2022-07-17 23:41:53 -07:00
Craig Topper	a55ff6aadd	[Support][CodeGen] Fix spelling Divison->Division. NFC	2022-07-17 23:16:29 -07:00
Craig Topper	795602af0c	[CodeGen] Don't compare bool with integer 0. NFC The IsAdd field is a bool.	2022-07-17 23:16:14 -07:00
Kazu Hirata	3112987d5c	Remove unused forward declarations (NFC)	2022-07-17 15:37:48 -07:00
Simon Pilgrim	53b90dd372	[DAG] Fold (or (and X, C1), (and (or X, Y), C2)) -> (or (and X, C1\|C2), (and Y, C2)) Pulled out of D77804 Alive2: https://alive2.llvm.org/ce/z/g61VRe	2022-07-17 18:51:41 +01:00
Simon Pilgrim	26ce33706f	[DAG] computeKnownBits - move UDIV handling to same place as UREM/SREM. NFC.	2022-07-17 11:59:42 +01:00
Simon Pilgrim	5ec47c6dc5	[DAG] Add MERGE_VALUE computeKnownBits/ComputeNumSignBits handling. Just forward the value tracking to the operand specified by the ResNo	2022-07-17 11:58:08 +01:00
Kazu Hirata	9e6d1f4b5d	[CodeGen] Qualify auto variables in for loops (NFC)	2022-07-17 01:33:28 -07:00
Kazu Hirata	c0fe37de04	[CodeGen] Remove redundant declaration createGreedyRegisterAllocator (NFC) The function is declared in llvm/include/llvm/CodeGen/Passes.h. Identified with readability-redundant-declaration.	2022-07-16 15:43:34 -07:00
Kazu Hirata	4d9d07c5fb	[CodeGen] Use RegClassFilterFunc where appropriate (NFC)	2022-07-16 15:43:33 -07:00
Sanjay Patel	7ca3e23f25	[SDAG] narrow truncated sign_extend_inreg trunc (sign_ext_inreg X, iM) to iN --> sign_ext_inreg (trunc X to iN), iM There are improvements on existing tests from this, and there are a pair of large regressions in D127115 for Thumb2 caused by not folding this pattern. Differential Revision: https://reviews.llvm.org/D129890	2022-07-16 16:29:15 -04:00
Simon Pilgrim	a44bdf9bc1	[DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR creation from INSERT_VECTOR_ELT chain. D127595 added the ability to recurse up a (one-use) INSERT_VECTOR_ELT chain to create a BUILD_VECTOR before other combines manage to break the chain, something that is particularly bad in D127115. The patch generalises this so it doesn't have to build the chain starting from the last element insertion, instead it can now start from any insertion and will recurse up the chain until it finds all elements or finds a UNDEF/BUILD_VECTOR/SCALAR_TO_VECTOR which represents that start of the chain. Fixes several regressions in D127115	2022-07-16 16:37:31 +01:00
Simon Pilgrim	52b6168c16	[DAG] visitINSERT_VECTOR_ELT - remove duplicate VT.getVectorNumElements() call. NFC.	2022-07-16 16:20:49 +01:00
Tim Besard	a323dfc015	Don't sink ptrtoint/inttoptr sequences into non-noop addrspacecasts. In https://reviews.llvm.org/D30114, support for mismatching address spaces was introduced to CodeGenPrepare's optimizeMemoryInst, using addrspacecast as it was argued that only no-op addrspacecasts would be considered when constructing the address mode. However, by doing inttoptr/ptrtoint, it's possible to get CGP to emit an addrspace that's not actually no-op, introducing a miscompilation: define void @kernel(i8* %julia_ptr) { %intptr = ptrtoint i8* %julia_ptr to i64 %ptr = inttoptr i64 %intptr to i32 addrspace(3)* br label %end end: store atomic i32 1, i32 addrspace(3)* %ptr unordered, align 4 ret void } Gets compiled to: define void @kernel(i8* %julia_ptr) { end: %0 = addrspacecast i8* %julia_ptr to i32 addrspace(3)* store atomic i32 1, i32 addrspace(3)* %0 unordered, align 4 ret void } In the case of NVPTX, this introduces a cvta.to.shared, whereas leaving out the %end block and branch doesn't trigger this optimization. This results in illegal memory accesses as seen in https://github.com/JuliaGPU/CUDA.jl/issues/558 In this change, I introduced a check before doing the pointer cast that verifies address spaces are the same. If not, it emits a ptrtoint/inttoptr combination to get a no-op cast between address spaces. I decided against disallowing ptrtoint/inttoptr with non-default AS in matchOperationAddr, because now its still possible to look through multiple sequences of them that ultimately do not result in a address space mismatch (i.e. the second lit test).	2022-07-16 10:56:42 -04:00
Simon Pilgrim	2bb6b03d71	Fix signed/unsigned mismatch	2022-07-16 11:48:41 +01:00
Simon Pilgrim	a5d0122f75	[DAG] Canonicalize non-inlane shuffle -> AND if all non-inlane referenced elements are known zero As mentioned on D127115, this patch that attempts to recognise shuffle masks that could be simplified to a AND mask - we already have a similar transform that will fold AND -> 'clear mask' shuffle, but this patch handles cases where the referenced elements are not from the same lane indices but are known to be zero. Differential Revision: https://reviews.llvm.org/D129150	2022-07-16 11:38:24 +01:00
Simon Pilgrim	1cb7416ee3	[DAG] combineShiftAnd1ToBitTest - match "and (srl (not X), C)), 1 --> (and X, 1<<C) == 0" patterns combineShiftAnd1ToBitTest already matches "and (not (srl X, C)), 1 --> (and X, 1<<C) == 0" patterns, but we can end up with situations where the not is before the shift. Part of some yak shaving for D127115 to generalise the "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold.	2022-07-16 11:00:07 +01:00
Kazu Hirata	1a5d007659	Use has_value/value instead of hasValue/getValue (NFC)	2022-07-15 21:48:17 -07:00
Simon Pilgrim	3c8bf29696	[DAG] Move "xor (X logical_shift ShiftC), XorC --> (not X) logical_shift ShiftC" fold into SimplifyDemandedBits SimplifyDemandedBits is called slightly later which allows the not(sext(x)) -> sext(not(x)) fold to occur via foldLogicOfShifts As mentioned on D127115, we should be able to further generalise this based off the demanded bits.	2022-07-15 13:10:15 +01:00
Edd Barrett	2e62a26fd7	[stackmaps] Legalise patchpoint arguments. This is similar to D125680, but for llvm.experimental.patchpoint (instead of llvm.experimental.stackmap). Differential review: https://reviews.llvm.org/D129268	2022-07-15 12:01:59 +01:00
Nikita Popov	2a721374ae	[IR] Don't use blockaddresses as callbr arguments Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label constraints that refer directly to callbr indirect destinations: ; Before: %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo)) to label %asm.fallthrough [label %foo] ; After: %res = callbr i8* asm "", "=r,r,!i"(i8* %x) to label %asm.fallthrough [label %foo] The benefit of this is that we can easily update the successors of a callbr, without having to worry about also updating blockaddress references. This should allow us to remove some limitations: * Allow unrolling/peeling/rotation of callbr, or any other clone-based optimizations (https://github.com/llvm/llvm-project/issues/41834) * Allow duplicate successors (https://github.com/llvm/llvm-project/issues/45248) This is just the IR representation change though, I will follow up with patches to remove limtations in various transformation passes that are no longer needed. Differential Revision: https://reviews.llvm.org/D129288	2022-07-15 10:18:17 +02:00
Craig Topper	dcfc1fd26f	[SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount. If we have a variable shift amount and the demanded mask has leading zeros, we can propagate those leading zeros to not demand those bits from operand 0. This can allow zero_extend/sign_extend to become any_extend. This pattern can occur due to C integer promotion rules. This transform is already done by InstCombineSimplifyDemanded.cpp where sign_extend can be turned into zero_extend for example. Reviewed By: spatel, foad Differential Revision: https://reviews.llvm.org/D121833	2022-07-14 16:10:14 -07:00
Amara Emerson	d4f84df0a0	[GlobalISel] Change widenScalar of G_FCONSTANT to mutate into G_CONSTANT. Widening a G_FCONSTANT by extending and then generating G_FPTRUNC doesn't produce the same result all the time. Instead, we can just transform it to a G_CONSTANT of the same bit pattern and truncate using a plain G_TRUNC instead. Fixes https://github.com/llvm/llvm-project/issues/56454 Differential Revision: https://reviews.llvm.org/D129743	2022-07-14 11:05:10 -07:00
Guozhi Wei	2f11b3a6d7	[MachineCombiner] Don't compute the latency of transient instructions If an MI will not generate a target instruction, we should not compute its latency. Then we can compute more precise instruction sequence cost, and get better result. Differential Revision: https://reviews.llvm.org/D129615	2022-07-14 17:08:14 +00:00
Nikita Popov	dcf4b733ef	[SCEVExpander] Make CanonicalMode handing in isSafeToExpand() more robust (PR50506) isSafeToExpand() for addrecs depends on whether the SCEVExpander will be used in CanonicalMode. At least one caller currently gets this wrong, resulting in PR50506. Fix this by a) making the CanonicalMode argument on the freestanding functions required and b) adding member functions on SCEVExpander that automatically take the SCEVExpander mode into account. We can use the latter variant nearly everywhere, and thus make sure that there is no chance of CanonicalMode mismatch. Fixes https://github.com/llvm/llvm-project/issues/50506. Differential Revision: https://reviews.llvm.org/D129630	2022-07-14 14:41:51 +02:00
Jannik Silvanus	e5c4cde451	[AMDGPU] SIMachineScheduler: Add support for several MachineScheduler features The SI machine scheduler inherits from ScheduleDAGMI. This patch adds support for a few features that are implemented in ScheduleDAGMI (or its base classes) that were missing so far because their support is implemented in overridden functions. * Support cl::opt -view-misched-dags This option allows to open a graphical window of the scheduling DAG. * Support cl::opt -misched-print-dags This option allows to print the scheduling DAG in text form. * After constructing the scheduling DAG, call postprocessDAG() to apply any registered DAG mutations. Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp in case SIScheduler is used. Still add this to avoid surprises in the future in case mutations are added. Differential Revision: https://reviews.llvm.org/D128808	2022-07-14 09:45:31 +02:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
Amara Emerson	2824bdd92f	[GlobalISel] Fix and(load)->zextload combine crash. We shouldn't use getOpcodeDef() if we need to guarantee the def has only one user since under the hood it may look through copies and optimization hints, which themselves may have multiple users.	2022-07-13 14:58:45 -07:00
Philip Reames	dde2a7fb6d	[RISCV] Exploit fact that vscale is always power of two to replace urem sequence When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale. vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.) We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.) It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic. Differential Revision: https://reviews.llvm.org/D129609	2022-07-13 10:54:47 -07:00
Simon Pilgrim	d172842b51	[DAG] SimplifyDemandedVectorElts - adjust demanded elements for selection mask for known zero results If an element is known zero from both selections then it shouldn't matter what the selection mask element is.	2022-07-13 17:36:05 +01:00
Philip Reames	fd67992f9c	[DAGCombine] fold (urem x, (lshr pow2, y)) -> (and x, (add (lshr pow2, y), -1)) We have the same fold in InstCombine - though implemented via OrZero flag on isKnownToBePowerOfTwo. The reasoning here is that either a) the result of the lshr is a power-of-two, or b) we have a div-by-zero triggering UB which we can ignore. Differential Revision: https://reviews.llvm.org/D129606	2022-07-13 08:34:38 -07:00
esmeyi	100319cdb4	[AIX] follow-up of D124654. Report an error when alias symbols are not emitted all.	2022-07-13 03:39:08 -04:00
Kai Nacke	4ae254e488	Revert "[GISel] Unify use of getStackGuard" This reverts commit `e60b4fb2b7`.	2022-07-12 17:00:43 -04:00
Kai Nacke	e60b4fb2b7	[GISel] Unify use of getStackGuard Some rework of getStackGuard() based on comments in https://reviews.llvm.org/D129505. - getStackGuard() now creates and returns the destination register, simplifying calls - the pointer type is passed to getStackGuard() to avoid recomputation - removed PtrMemTy in emitSPDescriptorParent(), because this type is only used here when loading the value but not when storing the value Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129576	2022-07-12 16:46:37 -04:00
Craig Topper	8eaf00e04d	[TargetLowering][RISCV] Make expandCTLZ work for non-power of 2 types. To convert CTLZ to popcount we do x = x \| (x >> 1); x = x \| (x >> 2); ... x = x \| (x >>16); x = x \| (x >>32); // for 64-bit input return popcount(~x); This smears the most significant set bit across all of the bits below it then inverts the remaining 0s and does a population count. To support non-power of 2 types, the last shift amount must be more than half of the size of the type. For i15, the last shift was previously a shift by 4, with this patch we add another shift of 8. Fixes PR56457. Differential Revision: https://reviews.llvm.org/D129431	2022-07-12 11:36:37 -07:00
Kai Nacke	42f7364fcb	[GISel] Check useLoadStackGuardNode() before generating LOAD_STACK_GUARD When lowering llvm::stackprotect intrinsic, the SDAG implementation checks useLoadStackGuardNode() to either create a LOAD_STACK_GUARD or use the first argument of the intrinsic. This check is not present in the IRTranslator, which results in always generating a LOAD_STACK_GUARD even if the target does not support it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129505	2022-07-12 11:44:42 -04:00
Simon Pilgrim	ded62411f7	[DAG] SimplifyDemandedBits - AND/OR/XOR - attempt basic knownbits simplifications before calling SimplifyMultipleUseDemandedBits Noticed while investigating the SystemZ regressions in D77804, prefer handling the knownbits analysis/simplification in the bitop nodes directly before falling back to SimplifyMultipleUseDemandedBits	2022-07-12 14:09:00 +01:00
Jay Foad	0d1b5268e8	[MachineVerifier] Try harder to verify LiveStacks Verify the LiveStacks analysis after a pass that claims to preserve it, even if there are no further passes (apart from the verifier itself) that would use the analysis. Differential Revision: https://reviews.llvm.org/D129200	2022-07-12 09:54:54 +01:00
Nikita Popov	c64aba5d93	[SDAG] Don't duplicate ParseConstraints() implementation SDAGBuilder (NFCI) visitInlineAsm() in SDAGBuilder was duplicating a lot of the code in ParseConstraints(), in particular all the logic to determine the operand value and constraint VT. Rely on the data computed by ParseConstraints() instead, and update its ConstraintVT implementation to match getCallOperandValEVT() more precisely.	2022-07-12 10:42:02 +02:00
Craig Topper	b05160dbdf	[SelectionDAG] Simplify how we drop poison flags in SimplifyDemandedBits. As far as I can tell what was happening in the original code is that the getNode call receives the same operands as the original node with different SDNodeFlags. The logic inside getNode detects that the node already exists and intersects the flags into the existing node and returns it. This results in Op and NewOp for the TLO.CombineTo call always being the same node. We may have already called CombineTo as part of the recursive handling. A second call to CombineTo as we unwind the recursion overwrites the previous CombineTo. I think this means any time we updated the poison flags that was the only change that ends up getting made and we relied on DAGCombiner to revisit and call SimplifyDemandedBits again. The second time the poison flags wouldn't need to be dropped and we would keep the CombineTo call from further down the recursion. We can instead call setFlags to drop the poison flags and remove the call to TLO.CombineTo. This way we keep the CombineTo from deeper in the recursion which should be more efficient. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D129511	2022-07-11 13:42:33 -07:00
Sanjay Patel	d0eec5f7e7	[SDAG] enhance sub->xor fold to ignore signbit As suggested in the post-commit feedback for D128123, we can ease the mask constraint to ignore the MSB (and make the code easier to read by adjusting the check). https://alive2.llvm.org/ce/z/bbvqWv	2022-07-11 12:37:50 -04:00
Mircea Trofin	24c6c35270	[mlgo] Don't provide default model URLs Pointed out in Issue #56432: the current reference models may not be quite friendly to open source projects. Their purpose is only illustrative - the expectation is that projects would train their own. To avoid unintentionally pulling such a model, made the URL cmake setting require explicit user setting. Differential Revision: https://reviews.llvm.org/D129342	2022-07-11 07:37:14 -07:00
Stephen Tozer	f9ac161af9	[DebugInfo][InstrRef] Fix error in copy handling in InstrRefLDV Currently, an error exists when InstrRefBasedLDV observes transfers of variables across copies, which causes it to lose track of variables under certain circumstances, resulting in shorter lifetimes for those variables as LDV gives up searching for live locations for them. This patch fixes this issue by storing the currently tracked values in the destination first, then updating them manually later without clobbering or assigning them the wrong value. Differential Revision: https://reviews.llvm.org/D128101	2022-07-11 13:38:23 +01:00
Kazu Hirata	5b55b7f6d2	[CodeGen] Remove unused member variable NextCascade (NFC)	2022-07-10 18:57:40 -07:00
Kazu Hirata	1fd6611fc8	[SelectionDAG] Restore calls to has_value (NFC) This patch restores calls to has_value to make it clear that we are checking the presence of an optional value, not the underlying value. This patch partially reverts `d08f34b592`. Differential Revision: https://reviews.llvm.org/D129454	2022-07-10 14:37:23 -07:00
David Green	28b41237e6	[InterleaveAccessPass] Handle multi-use binop shuffles D89489 added some logic to the interleaved access pass to attempt to undo the folding of shuffles into binops, that instcombine performs. If early-cse is run too, the binops may be commoned into a single operation with multiple shuffle uses. It is still profitable reverse the transform though, so long as all the uses are shuffles. Differential Revision: https://reviews.llvm.org/D129419	2022-07-10 17:24:37 +01:00
Nicolai Hähnle	ede600377c	ManagedStatic: remove many straightforward uses in llvm (Reapply after revert in `e9ce1a5880` due to Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other than error categories, to be checked in more detail and reapplied separately.) Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it. Differential Revision: https://reviews.llvm.org/D129120	2022-07-10 10:29:15 +02:00
Nicolai Hähnle	e9ce1a5880	Revert "ManagedStatic: remove many straightforward uses in llvm" This reverts commit `e6f1f06245`. Reverting due to a failure on the fuchsia-x86_64-linux buildbot.	2022-07-10 09:54:30 +02:00
Nicolai Hähnle	e6f1f06245	ManagedStatic: remove many straightforward uses in llvm Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it. Differential Revision: https://reviews.llvm.org/D129120	2022-07-10 09:15:08 +02:00
Craig Topper	40866b74bd	[DAGCombiner][X86] Fold sra (sub AddC, (shl X, N1C)), N1C --> sext (sub AddC1',(trunc X to (width - N1C))) We already handled this case for add with a constant RHS. A similar pattern can occur for sub with a constant left hand side. Test cases use add and a mul representing (neg (shl X, C)) because that's what I saw in the wild. The mul will be decomposed and then the new transform can kick in. Tests have not been committed, but this patch shows the changes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128769	2022-07-09 11:53:44 -07:00
Alexander Yermolovich	a84e1e6c0d	[DWARF] Add linkagename to hash Originally encountered with RUST, but also there are cases with distributed LTO where debug info dwo units contain structurally the same debug information, with difference in DW_AT_linkage_name. This causes collision on DWO ID. Differential Revision: https://reviews.llvm.org/D129317	2022-07-08 10:15:25 -07:00
Matt Arsenault	13ac4c3de9	GlobalISel: Add buildBoolExtInReg helper	2022-07-08 11:55:08 -04:00
Matt Arsenault	e9a45d45d0	GlobalISel: Allow forming atomic/volatile G_SEXTLOAD Mirror the change to G_ZEXTLOAD.	2022-07-08 11:55:08 -04:00
Matt Arsenault	1ee6ce9bad	GlobalISel: Allow forming atomic/volatile G_ZEXTLOAD SelectionDAG has a target hook, getExtendForAtomicOps, which it uses in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty ugly (as is having a separate load opcode for atomics), so instead allow making use of atomic zextload. Enable this for AArch64 since the DAG path defaults in to the zext behavior. The tablegen changes are pretty ugly, but partially helps migrate SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with atomic memory operands. For now the DAG emitter will emit matchers for patterns which the DAG will not produce. I'm still a bit confused by the intent of the isLoad/isStore/isAtomic bits. The DAG implementation rejects trying to use any of these in combination. For now I've opted to make the isLoad checks also check isAtomic, although I think having isLoad and isAtomic set on these makes most sense.	2022-07-08 11:55:08 -04:00
Simon Pilgrim	b53046122f	[DAG] SimplifyDemandedBits - fold AND(INSERT_SUBVECTOR(C,X,I),M) -> INSERT_SUBVECTOR(AND(C,M),X,I) If all the demanded bits of the AND mask covering the inserted subvector 'X' are known to be one, then the mask isn't affecting the subvector at all. In which case, if the base vector 'C' is undef/constant, then move the AND mask up to just (constant) fold it directly. Addresses some of the regressions from D129150, particularly the cases where we're attempting to zero the upper elements of a widened vector. Differential Revision: https://reviews.llvm.org/D129290	2022-07-08 16:08:31 +01:00
Daniil Fukalov	6858a17f66	[LiveIntervals] Fix incorrect range (re)construction from subranges. After D82916 `updateAllRanges()` started to fix holes in main range with subranges but it fails on instructions with two subregs def which are parts of one reg. The main range constructed with //all// subranges of subregs just after processing the first operand. So the main range gets intervals from subranges those are not updated yet. The patch takes into account lane mask to update the main range. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D128553	2022-07-08 16:07:19 +03:00
Sanjay Patel	8b75671314	[SDAG] try to replace subtract-from-constant with xor This is almost the same as the abandoned D48529, but it allows splat vector constants too. This replaces the x86-specific code that was added with the alternate patch D48557 with the original generic combine. This transform is a less restricted form of an existing InstCombine and the proposed SDAG equivalent for that in D128080: https://alive2.llvm.org/ce/z/OUm6N_ Differential Revision: https://reviews.llvm.org/D128123	2022-07-08 08:14:24 -04:00
OCHyams	6b62ca9043	[NFC][SelectionDAG] Fix debug prints in salvageUnresolvedDbgValue The prints are printing pointer values - fix by dereferencing the pointers.	2022-07-08 12:09:30 +01:00
Petar Avramovic	2483f43d47	[AArch64][GlobalISel] Fix call lowering for <3 x i32> vector arguments Differential Revision: https://reviews.llvm.org/D129194	2022-07-08 10:25:45 +02:00
Sergei Barannikov	2247fdc84d	[SelectionDAG] computeKnownBits / ComputeNumSignBits for the remaining overflow-aware nodes Some overflow-aware nodes were missing from the switches in computeKnownBits and ComputeNumSignBits.	2022-07-08 09:19:19 +01:00
Joseph Huber	41fba3c107	[Metadata] Add 'exclude' metadata to add the exclude flags on globals This patchs adds a new metadata kind `exclude` which implies that the global variable should be given the necessary flags during code generation to not be included in the final executable. This is done using the ``SHF_EXCLUDE`` flag on ELF for example. This should make it easier to specify this flag on a variable without needing to explicitly check the section name in the target backend. Depends on D129053 D129052 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D129151	2022-07-07 12:20:40 -04:00
Joseph Huber	1d2ce4da84	[Object] Add ELF section type for offloading objects Currently we use the `.llvm.offloading` section to store device-side objects inside the host, creating a fat binary. The contents of these sections is currently determined by the name of the section while it should ideally be determined by its type. This patch adds the new `SHT_LLVM_OFFLOADING` section type to the ELF section types. Which should make it easier to identify this specific data format. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D129052	2022-07-07 12:20:30 -04:00
Bradley Smith	60d6be5dd3	[LegalizeTypes] Replace vecreduce_xor/or/and with vecreduce_add/umax/umin if not legal This is done during type legalization since the target representation of these nodes may not be valid until after type legalization, and after type legalization the fact that these are dealing with i1 types may be lost. Differential Revision: https://reviews.llvm.org/D128996	2022-07-07 09:33:54 +00:00
Sander de Smalen	15c3ba8a44	[AArc64] Legalisation of compares and truncates of nxv1i1 types. Truncates and compares require some changes to generic legalisation functions to use ElementCount instead of getNumElements. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D129082	2022-07-07 07:39:27 +00:00
Eli Friedman	696f53665d	[AsmPrinter] Fix bit pattern for i1 vectors. Vectors are defined to be tightly packed, regardless of the element type. The AsmPrinter didn't realize this, and was allocating extra padding. Fixes https://github.com/llvm/llvm-project/issues/49286 Fixes https://github.com/llvm/llvm-project/issues/53246 Fixes https://github.com/llvm/llvm-project/issues/55522 Differential Revision: https://reviews.llvm.org/D129164	2022-07-06 12:56:47 -07:00
Edd Barrett	ed8ef65f3d	[stackmaps] Start legalizing live variable operands Prior to this change, live variable operands passed to `llvm.experimental.stackmap` would be emitted directly to target nodes, meaning that they don't get legalised. The upshot of this is that LLVM may crash when encountering illegally typed target nodes. e.g. https://github.com/llvm/llvm-project/issues/21657 This change introduces a platform independent stackmap DAG node whose operands are legalised as per usual, thus avoiding aforementioned crashes. Note that some kinds of argument are still not handled properly, namely vectors, structs, and large integers, like i128s. These will need to be addressed in follow-up changes. Note also that this does not change the behaviour of `llvm.experimental.patchpoint`. A follow up change will do the same for this intrinsic. Differential review: https://reviews.llvm.org/D125680	2022-07-06 14:01:54 +01:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Nikita Popov	f96cb66d19	[ValueTracking] Accept Instruction in isSafeToSpeculativelyExecute() (NFC) As constant expressions can no longer trap, it only makes sense to call isSafeToSpeculativelyExecute on Instructions, so limit the API to accept only them, rather than general Operators or Values.	2022-07-06 11:12:49 +02:00
Nikita Popov	bb84e5eeff	[SelectionDAGISel] Drop unused variable (NFC)	2022-07-06 10:46:13 +02:00
Nikita Popov	8ee913d83b	[IR] Remove Constant::canTrap() (NFC) As integer div/rem constant expressions are no longer supported, constants can no longer trap and are always safe to speculate. Remove the Constant::canTrap() method and its usages.	2022-07-06 10:36:47 +02:00
Simon Pilgrim	7068c843d2	[DAG] visitREM - use isAllOnesOrAllOnesSplat instead of isConstOrConstSplat We were only using the N1C scalar/splat value once, so for clarity use isAllOnesOrAllOnesSplat instead if we actually need it.	2022-07-05 16:44:31 +01:00
Simon Pilgrim	e7a0fa4df0	[DAG] foldAddSubOfSignBit - don't bother creating the new shift node unless constant folding succeeds Noticed by inspection - the new shift is only ever used if the constant fold occurs	2022-07-05 16:44:31 +01:00
Thomas Symalla	04c5fed5e0	[NFC] Fix wrong comment.	2022-07-05 13:37:44 +02:00
Simon Pilgrim	cce64e7a9c	[DAG] visitTRUNCATE - move GetDemandedBits AFTER SimplifyDemandedBits. Another cleanup step before removing GetDemandedBits entirely.	2022-07-04 11:25:40 +01:00
Nikita Popov	7283f48a05	[IR] Remove support for insertvalue constant expression This removes the insertvalue constant expression, as part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179. This is very similar to the extractvalue removal from D125795. insertvalue is also not supported in bitcode, so no auto-ugprade is necessary. ConstantExpr::getInsertValue() can be replaced with IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(), depending on whether a constant result is required (with the latter being fallible). The ConstantExpr::hasIndices() and ConstantExpr::getIndices() methods also go away here, because there are no longer any constant expressions with indices. Differential Revision: https://reviews.llvm.org/D128719	2022-07-04 09:27:22 +02:00
esmeyi	d2a35e4d39	[AIX] Handling the label alignment of a global variable with its multiple aliases. This patch handles the case where a variable has multiple aliases. AIX's assembly directive .set is not usable for the aliasing purpose, and using different labels allows AIX to emulate symbol aliases. If a value is emitted between any two labels, meaning they are not aligned, XCOFF will automatically calculate the offset for them. This patch implements: 1) Emits the label of the alias just before emitting the value of the sub-element that the alias referred to. 2) A set of aliases that refers to the same offset should be aligned. 3) We didn't emit aliasing labels for common and zero-initialized local symbols in PPCAIXAsmPrinter::emitGlobalVariableHelper, but emitted linkage for them in AsmPrinter::emitGlobalAlias, which caused a FAILURE. This patch fixes the bug by blocking emitting linkage for the alias without a label. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D124654	2022-07-03 23:16:16 -04:00
Quentin Colombet	f4145ddf5b	[GISel] Don't fold convergent instruction across CFG Before merging two instructions together, GISel does some sanity checks that the folding is legal. However that check was missing that the source of the pattern may be convergent. When the destination location is in a different basic block, the folding is invalid. Differential Revision: https://reviews.llvm.org/D128539	2022-07-01 10:24:24 -07:00
Sander de Smalen	690db16422	[AArch64] Make nxv1i1 types a legal type for SVE. One motivation to add support for these types are the LD1Q/ST1Q instructions in SME, for which we have defined a number of load/store intrinsics which at the moment still take a `<vscale x 16 x i1>` predicate regardless of their element type. This patch adds basic support for the nxv1i1 type such that it can be passed/returned from functions, as well as some basic support to support some existing tests that result in a nxv1i1 type. It also adds support for splats. Other operations (e.g. insert/extract subvector, logical ops, etc) will be supported in follow-up patches. Reviewed By: paulwalker-arm, efriedma Differential Revision: https://reviews.llvm.org/D128665	2022-07-01 15:11:13 +00:00
Xiang1 Zhang	72a23cef7e	[ISel] Match all bits when merge undefs for DAG combine Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128570	2022-07-01 09:09:43 +08:00
Xiang1 Zhang	64f44a90ef	Revert "[ISel] Match all bits when merge undef(s) for DAG combine" This reverts commit `5fe5aa284e`.	2022-07-01 08:59:04 +08:00
Xiang1 Zhang	5fe5aa284e	[ISel] Match all bits when merge undef(s) for DAG combine	2022-07-01 08:58:00 +08:00
Nuno Lopes	373571dbb4	[NFC] Switch a few uses of undef to poison as placeholders for unreachble code	2022-06-30 23:01:43 +01:00
jeff	09424f802c	[AMDGPU] Check for CopyToReg PhysReg clobbers in pre-RA-sched Differential Revision: https://reviews.llvm.org/D128681	2022-06-30 09:18:04 -07:00
Luo, Yuanke	fa8656d28d	[greedyalloc] Return early when there is no register to allocate. In X86 we split greddy register allocation into 2 passes. The 1st pass is to allocate tile register, and the 2nd pass is to allocate the rest of virtual register. In most cases there is no tile register, so the 1st pass is unnecessary. To improve the compiling time, we check if there is any register need to be allocated by invoking callback `ShouldAllocateClass`. If there is no register to be allocated, just return false in the pass. This would improve the 1st greed RA pass for normal cases. Differential Revision: https://reviews.llvm.org/D128804	2022-06-30 11:12:05 +08:00
Stefan Pintilie	e50a8c8435	[GlobalMerge] Ensure that the MustKeepGlobalVariables has all globals from each landingpad clause. The filter clause in the landingpad may not have a GlobalVariable operand. It may instead have a ConstantArray of operands and each operand within this ConstantArray should also be checked to see if it is a GlobalVariable. This patch add the check for the ConstantArray as well as a debug message that outputs the contents of MustKeepGlobalVariables. Reviewed By: lei, amyk, scui Differential Revision: https://reviews.llvm.org/D128287	2022-06-29 15:55:47 -05:00

1 2 3 4 5 ...

32796 Commits