llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	74b5e797d5	[InstSimplify] fold scalable vectors with over-shift splat constant to poison Fixes #56968	2022-08-07 16:26:05 -04:00
Lang Hames	41c41fcbc0	Revert "[JITLink] Fix some C++17 related fixmes." This reverts commit `6ea5bf436a`. `6ea5bf436a` made use of new c++17 rules regarding order of evaluation (specifically: in function calls the expression naming the function should be sequenced before the evalution of any operands) to simplify some continuation-passing calls. Unfortunately this appears to break at least one MSVC bot: https://lab.llvm.org/buildbot/#/builders/123/builds/12149 . Includes an update to the comments to note that the workaround is now based on MSVC limitations, not on LLVM adopting c++17.	2022-08-07 12:15:59 -07:00
Sanjay Patel	8148c28fad	[ConstFolding] fix overzealous assert when converting FP half Fixes #56981	2022-08-07 13:34:51 -04:00
Lang Hames	6ea5bf436a	[JITLink] Fix some C++17 related fixmes.	2022-08-07 09:37:14 -07:00
Aaron Ballman	32fd0b7fd5	Revert "[RDF] Remove explicit template arguments from Print" This reverts commit `ede96de751`. This breaks the build on Windows with Visual Studio: https://lab.llvm.org/buildbot/#/builders/123/builds/12134	2022-08-07 08:24:01 -04:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00
Kazu Hirata	54199d805a	[x86] Remove unused declaration processWaitCnt (NFC) The declaration was introduced without a corresponding definition on Jan 2, 2022 in commit `85e6e748d4`.	2022-08-07 00:16:19 -07:00
Kazu Hirata	d0ec61c9ff	[Target] Remove unused forward declarations (NFC)	2022-08-07 00:16:16 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Kazu Hirata	3b114087c3	[llvm] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2022-08-07 00:16:11 -07:00
Fangrui Song	fa66789d06	[llvm] LLVM_NODISCARD => [[nodiscard]]. NFC With C++17 there is no Clang pedantic warning.	2022-08-07 00:26:33 +00:00
Fangrui Song	5deb678289	Revert "[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508" This reverts commit `48c74bb2e2`. With C++17 the workaround is no longer needed.	2022-08-06 16:48:23 -07:00
Krzysztof Parzyszek	2bc390bdd6	[RDF] Use default TargetOperandInfo if not given in constructor All current in-tree users use the default implementation.	2022-08-06 14:32:52 -05:00
Krzysztof Parzyszek	ede96de751	[RDF] Remove explicit template arguments from Print CTAD takes care of it.	2022-08-06 13:29:15 -05:00
Kazu Hirata	c8e6ebd74e	Use value instead of getValue (NFC)	2022-08-06 11:21:39 -07:00
Chen Zheng	ef60e44fe8	[PowerPC] fix stack size allocated for float point argument This is for https://github.com/llvm/llvm-project/issues/56469 Allocate 4 bytes for float point arguments on PPC32. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D129558	2022-08-06 08:38:52 -04:00
Markus Böck	f7b73b7e8e	[llvm] Remove uses of deprecated `std::iterator` std::iterator has been deprecated in C++17 and some standard library implementations such as MS STL or libc++ emit deperecation messages when using the class. Since LLVM has now switched to C++17 these will emit warnings on these implementations, or worse, errors in build configurations using -Werror. This patch fixes these issues by replacing them with LLVMs own llvm::iterator_facade_base which offers a superset of functionality of std::iterator. Differential Revision: https://reviews.llvm.org/D131320	2022-08-06 14:07:37 +02:00
Leon Clark	6a275cd53c	Transform illegal intrinsics to V_ILLEGAL Related tasks: - SWDEV-240194 - SWDEV-309417 - SWDEV-334876 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123693	2022-08-06 08:59:00 +01:00
Filipp Zhinkin	c55899f763	[DAGCombiner] Hoist funnel shifts from logic operation Hoist funnel shift from logic op: logic_op (FSH x0, x1, s), (FSH y0, y1, s) --> FSH (logic_op x0, y0), (logic_op x1, y1), s The transformation improves code generated for some cases related to issue https://github.com/llvm/llvm-project/issues/49541. Reduced amount of funnel shifts can also improve throughput on x86 CPUs by utilizing more available ports: https://quick-bench.com/q/gC7AKkJJsDZzRrs_JWDzm9t_iDM Transformation correctness checks: https://alive2.llvm.org/ce/z/TKPULH https://alive2.llvm.org/ce/z/UvTd_9 https://alive2.llvm.org/ce/z/j8qW3_ https://alive2.llvm.org/ce/z/7Wq7gE https://alive2.llvm.org/ce/z/Xr5w8R https://alive2.llvm.org/ce/z/D5xe_E https://alive2.llvm.org/ce/z/2yBZiy Differential Revision: https://reviews.llvm.org/D130994	2022-08-05 17:02:22 -04:00
Lang Hames	bc062e034f	[ORC] Fix a memory leak in LLVMOrcIRTransformLayerSetTransform. This function heap-allocates a ThreadSafeModule (the current C bindings assume that TSMs are always heap-allocated), but was failing to free it. Should fix http://llvm.org/PR56953.	2022-08-05 13:52:03 -07:00
Vitaly Buka	8d2901d537	[NFC][Inliner] Add Load/Store handler This is an additional signal which may benefit sanitizers. Reviewed By: kda Differential Revision: https://reviews.llvm.org/D131129	2022-08-05 13:42:17 -07:00
Craig Topper	75c64c7c4e	[RISCV] Don't use li+sh3add for constants that can use lui+add. If we're adding a constant that can't use addi we try a few tricks, one of which is using li+sh3add. We should not do this if lui+add would work. For example adding 8192. Using sh3add prevents folding a sext.w to form addw, thus increasing instruction count.	2022-08-05 12:47:03 -07:00
Philip Reames	9a9848f4b9	[RISCVInsertVSETVLI] Remove an unsound optimization This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem: vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo vsetivli a2, a1, e32, m4, ta, mu With the unsound result being: vsetivli a1, a0, e32, m1, ta, mu vsetivli a2, a0, e32, m4, ta, mu Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively. Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16 After the unsound optimization: a1 = 16 and a2 = 33 This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound. This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own. Differential Revision: https://reviews.llvm.org/D131264	2022-08-05 12:13:08 -07:00
Zhaoshi Zheng	99e50e5838	[WinEH][ARM64] Split Unwind Info for Fucntions Larger than 1MB Create function segments and emit unwind info of them. A segment must be less than 1MB and no prolog or epilog is splitted between two segments. This patch should generate correct, though not optimal, unwind info for large functions. Currently it only generate pacted info (.pdata) only for functions that are less than 1MB (single-segment functions). This is NFC from before this patch. The next step is to enable (.pdata) only unwind info for the first segment or segments that have neither prolog or epilog in a multi-segment function. Another future work item is to further split segments that require more than 255 code words or have more than 65535 epilogs. Reference: https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling#function-fragments Differential Revision: https://reviews.llvm.org/D130049	2022-08-05 11:46:41 -07:00
Sanjay Patel	b63fc26d33	[InstSimplify] make uses of isImpliedCondition more efficient (NFCI) As suggested in the post-commit comments for `019d76196f`, this makes the usage symmetric with the 'and' patterns and should be more efficient.	2022-08-05 12:06:47 -04:00
Paul Walker	0533c39a76	[SVE] Expand DUPM patterns to handle all integer vector types. NOTE: i8 vector splats are ignored because the immediate range of DUP already has full coverage. Differential Revision: https://reviews.llvm.org/D131078	2022-08-05 16:00:08 +00:00
Sanjay Patel	019d76196f	[InstSimplify] use isImpliedCondition() instead of semi-duplicated code We get a couple of improvements from recognizing swapped operand patterns that were not handled by the replicated code. This should also enable simplifying larger patterns as seen in issue #56653 and issue #56654, but that requires enhancements to isImpliedCondition() itself.	2022-08-05 10:59:09 -04:00
Mirko Brkusanin	19bb535ed9	[AMDGPU] Remove unused MIMG tablegen variants There are no AMDGPUSampleVariant versions for _G16, it is treated more like a modifier for derivatives (_D) (also for intrinsics where it is overloaded type instead of part of instrinsic name) so we ended up making more variants for these instruction then we actually needed. 32-bit derivatives need 6 dwords at most, while 16-bit need 4 at most. Using same AMDGPUSampleVariant for both, we ended up creating 2 extra variants per instruction than were necessary. In total this deletes 260 unused tablegen records. Differential Revision: https://reviews.llvm.org/D131252	2022-08-05 15:30:47 +02:00
Dawid Jurczak	1bd31a6898	[NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T Extracted from https://reviews.llvm.org/D129781 and address comment: https://reviews.llvm.org/D129781#3655571 Differential Revision: https://reviews.llvm.org/D130268	2022-08-05 13:35:41 +02:00
David Green	b2de84633a	[ConstProp] Don't fallthorugh for poison constants on vctp and active_lane_mask. Given a poison constant as input, the dyn_cast to a ConstantInt would fail so we would fall through to the generic code that attempts to fold each element of the input vectors. The inputs to these intrinsics are not vectors though, leading to a compile time crash. Instead bail out properly for poison values by returning nullptr. This doesn't try to define what poison means for these intrinsics. Fixes #56945	2022-08-05 11:19:36 +01:00
David Spickett	c401dbde71	[llvm][IROutliner] Account for return void in sort comparator This fixes 69 llvm tests that failed when EXPENSIVE_CHECKS was enabled. llvm/test/Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll is one example. When we have EXPENSIVE_CHECKS, _GLIBCXX_DEBUG is defined. This means that libstdc++ will call the compare function to check if it is implemented correctly (that !(a < a) is true). This happens even if there is only one item and here, we expect to see one return void or multiple return constant integer. Don't sort if we have 1 item, but do assert that it is the 1 ret void we expect. In the comparator, assert that neither Value is a nullptr in case one ended up in a the list somehow. Reviewed By: AndrewLitteken Differential Revision: https://reviews.llvm.org/D130230	2022-08-05 09:36:43 +00:00
Phoebe Wang	2312b747b8	[X86] Move getting module flag into `runOnMachineFunction` to reduce compile-time. NFCI Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D131245	2022-08-05 01:58:17 -07:00
wanglei	57eb77d411	[LoongArch] Implement more of the ABI According to the description of the LoongArch abi documentation, (https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html) the calling convention of LoongArch is almost the same as the RISCV's (except for the vector part), so we borrow the implementation of RISCV. This patch only guarantees the correctness of lp64d, because only the part of lp64d is described in detail in the documentation. Differential Revision: https://reviews.llvm.org/D130249	2022-08-05 15:14:16 +08:00
Chuanqi Xu	230d6f93aa	[Coroutines] Remove lifetime intrinsics for spliied allocas in coroutine frames Closing https://github.com/llvm/llvm-project/issues/56919 It is meaningless to preserve the lifetime markers for the spilled allocas in the coroutine frames and it would block some optimizations too.	2022-08-05 14:50:43 +08:00
David Green	38c2366b3f	[AArch64][GlobalISel] Recognise some CCMPri This is a simple addition to emitConditionalComparison, to match CCMP with immediates using getIConstantVRegValWithLookThrough, letting it select the CCMPri variants of the instructions. Differential Revision: https://reviews.llvm.org/D131073	2022-08-05 07:48:42 +01:00
Paul Kirth	a812b39e8c	[llvm][ir] Add missing license to ProfDataUtils We failed to add these in D128860 or D128858 Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D131226	2022-08-05 03:39:13 +00:00
Phoebe Wang	7f648d27a8	Reland "[X86][MC] Always emit `rep` prefix for `bsf`" `BMI` new instruction `tzcnt` has better performance than `bsf` on new processors. Its encoding has a mandatory prefix '0xf3' compared to `bsf`. If we force emit `rep` prefix for `bsf`, we will gain better performance when the same code run on new processors. GCC has already done this way: https://c.godbolt.org/z/6xere6fs1 Fixes #34191 Reviewed By: craig.topper, skan Differential Revision: https://reviews.llvm.org/D130956	2022-08-05 10:22:48 +08:00
Fangrui Song	7d6017fd31	[TTI] Change new getVectorInstrCost overload to use const reference after D131114 A const reference is preferred over a non-null const pointer. `Type *` is kept as is to match the other overload. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D131197	2022-08-04 15:16:51 -07:00
Sanjay Patel	8e7acb670b	[ValueTracking] improve readability in isImpliedCond helper functions; NFC This matches the caller code naming scheme and avoids the potentially confusing transition from left/right to A/B.	2022-08-04 17:43:31 -04:00
Craig Topper	12a1ca9c42	[RISCV] Relax another one use restriction in performSRACombine. When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C) it's possible that the add is used by multiple sras. We should allow the combine if all the SRAs will eventually be updated. After transforming all of the sras, the shls will share a single (sext_inreg (add X, C1), i32). This pattern occurs if an sra with 32 is used as index in multiple GEPs with different scales. The shl from the GEPs will be combined with the sra before we get a chance to match the sra pattern.	2022-08-04 14:32:31 -07:00
Sanjay Patel	657bfa364f	[ValueTracking] reduce code in isImpliedCondICmps; NFC This copies the implementation of the subsequent match with constants.	2022-08-04 17:03:42 -04:00
Arthur Eubanks	6e45162adf	[InstrProf] Set prof global variables to internal linkage if adding a comdat COFF has a verifier check that private global variables don't have a comdat of the same name. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D131043	2022-08-04 13:24:55 -07:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
Johannes Doerfert	f81a209337	[Attributor][FIX] Deal with implicit `undef` in AAPotentialConstantValues. In contrast to AAPotentialValues, the constant values version can contain implicit `undef` in the set. We had an assertion that could misfire before. Handle it properly now.	2022-08-04 14:44:51 -05:00
Craig Topper	a2de12c987	[RISCV] Relax a one use restriction performSRACombine When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) ignore the use count on the (shl X, 32). The sext_inreg after the transform is free. So we're only making 2 new instructions, the add and the shl. So we only need to be concerned with replacing the original sra+add. The original shl can have other uses. This helps if there are multiple different constants being added to the same shl.	2022-08-04 11:25:08 -07:00
Daniel Thornburgh	22df238d4a	[Symbolizer] Implement data symbolizer markup element. This connects the Symbolizer to the markup filter and enables the first working end-to-end flow using the filter. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D130187	2022-08-04 10:20:29 -07:00
Ellis Hoag	12e78ff881	[InstrProf] Add the skipprofile attribute As discussed in [0], this diff adds the `skipprofile` attribute to prevent the function from being profiled while allowing profiled functions to be inlined into it. The `noprofile` attribute remains unchanged. The `noprofile` attribute is used for functions where it is dangerous to add instrumentation to while the `skipprofile` attribute is used to reduce code size or performance overhead. [0] https://discourse.llvm.org/t/why-does-the-noprofile-attribute-restrict-inlining/64108 Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D130807	2022-08-04 08:45:27 -07:00
Joshua Cranmer	2138c90645	[IR] Move support for dxil::TypedPointerType to LLVM core IR. This allows the construct to be shared between different backends. However, it still remains illegal to use TypedPointerType in LLVM IR--the type is intended to remain an auxiliary type, not a real LLVM type. So no support is provided for LLVM-C, nor bitcode, nor LLVM assembly (besides the bare minimum needed to make Type->dump() work properly). Reviewed By: beanz, nikic, aeubanks Differential Revision: https://reviews.llvm.org/D130592	2022-08-04 10:41:11 -04:00
Lorenzo Albano	74940d2668	[VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D121114	2022-08-04 16:12:01 +02:00
Martin Storsjö	46bc1b5689	[ORC] Actually propagate memory unmapping errors on Windows This fixes warnings like these: ../lib/ExecutionEngine/Orc/MemoryMapper.cpp:364:9: warning: ignoring return value of function declared with 'warn_unused_result' attribute [-Wunused-result] joinErrors(std::move(Err), ^~~~~~~~~~ ~~~~~~~~~~~~~~~ Differential Revision: https://reviews.llvm.org/D131056	2022-08-04 11:14:52 +03:00
Martin Storsjö	46196db4d3	[ORC] Fix a warning about an unused variable on Windows. NFC. Differential Revision: https://reviews.llvm.org/D131055	2022-08-04 11:14:52 +03:00
wanglian	b6b0690355	[LegalizeTypes][VP] Add split operand support for VP float and integer casting Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D130685	2022-08-04 15:41:50 +08:00
jacquesguan	b61cfc91ea	[RISCV] Add cost modelling for vector widenning reduction. In RVV, we use vwredsum.vs and vwredsumu.vs for vecreduce.add(ext(Ty A)) if the result type's width is twice of the input vector's SEW-width. In this situation, the cost of extended add reduction should be same as single-width add reduction. So as the vector float widenning reduction. Differential Revision: https://reviews.llvm.org/D129994	2022-08-04 15:31:31 +08:00
Phoebe Wang	6f867f9102	[X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk This is to address feature request from https://github.com/ClangBuiltLinux/linux/issues/1665 Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D130754	2022-08-04 15:12:15 +08:00
Thomas Lively	b19de814ad	[WebAssembly] Improve codegen for v128.bitselect Add patterns selecting ((v1 ^ v2) & c) ^ v2 and ((v1 ^ v2) & ~c) ^ v2 to v128.bitselect. Resolves #56827. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D131131	2022-08-03 23:28:37 -07:00
Craig Topper	91e8079cd5	[X86] Teach PostprocessISelDAG to fold ANDrm+TESTrr when chain result is used. The isOnlyUserOf prevented the fold if the chain result had any users. What we really care about is the the data result from the AND is only used by the TEST, and the flags results from the ANDs aren't used at all. It's ok if the chain has users, we just need to replace those users with the chain from the TESTrm. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D131117	2022-08-03 21:00:22 -07:00
Arthur Eubanks	203296d642	[BoundsChecking] Fix merging of sizes BoundsChecking uses ObjectSizeOffsetEvaluator to keep track of the underlying size/offset of pointers in allocations. However, ObjectSizeOffsetVisitor (something ObjectSizeOffsetEvaluator uses to check for constant sizes/offsets) doesn't quite treat sizes and offsets the same way as BoundsChecking. BoundsChecking wants to know the size of the underlying allocation and the current pointer's offset within it, but ObjectSizeOffsetVisitor only cares about the size from the pointer to the end of the underlying allocation. This only comes up when merging two size/offset pairs. Add a new mode to ObjectSizeOffsetVisitor which cares about the underlying size/offset rather than the size from the current pointer to the end of the allocation. Fixes a false positive with -fsanitize=bounds. Reviewed By: vitalybuka, asbirlea Differential Revision: https://reviews.llvm.org/D131001	2022-08-03 17:21:19 -07:00
Vitaly Buka	a2aa6809a8	[NFC][Inliner] Add cl::opt<int> to tune InstrCost The plan is tune this for sanitizers. Differential Revision: https://reviews.llvm.org/D131123	2022-08-03 17:14:10 -07:00
Congzhe Cao	8dc4b2edfa	[LoopInterchange][PR56275] Fix legality with negative dependence vectors This is the 2nd patch of the two-patch series (D130188, D130189) that fix PR56275 (https://github.com/llvm/llvm-project/issues/56275) which is a missed opportunity for loop interchange. As follow-up on the dependence analysis (DA) patch D130188, this patch normalizes DA results in loop interchange, such that negative dependence vectors queried by loop interchange are reversed to be non-negative. Now all tests in PR56275 can get interchanged. Those tests are added in lit test as `pr56275.ll`. Reviewed By: kawashima-fj, bmahjour, Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D130189	2022-08-03 19:59:01 -04:00
Congzhe Cao	76be554931	[DependenceAnalysis][PR56275] Normalize negative dependence analysis results This patch is the first of the two-patch series (D130188, D130179) that resolve PR56275 (https://github.com/llvm/llvm-project/issues/56275) which is a missed opportunity, where a perfrectly valid case for loop interchange failed interchange legality. If the distance/direction vector produced by dependence analysis (DA) is negative, it needs to be normalized (reversed). This patch provides helper functions `isDirectionNegative()` and `normalize()` in DA that does the normalization, and clients can query DA to do normalization if needed. A pass option `<normalized-results>` is added to DependenceAnalysisPrinterPass, and we leverage it to update DA test cases to make sure of test coverage. The test cases added in `Banerjee.ll` shows that negative vectors are normalized with `print<da><normalized-results>`. Reviewed By: bmahjour, Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D130188	2022-08-03 19:59:00 -04:00
Bill Wendling	239c831de4	Add switch to use "source_filename" instead of a hash ID for globally promoted local During LTO a local promoted to a global gets a unique suffix based on a hash of the module IR. This means that changes in the local's module can affect the contents in another module that imported it (because the name of the imported promoted local is changed, but that doesn't reflect a real change in the importing module). So any tool that's validating changes to the importing module will see a superficial change. Instead of using the module hash, we can use the "source_filename" if it exists to generate a unique identifier that doesn't change due to LTO shenanigans. Differential Revision: https://reviews.llvm.org/D128863	2022-08-03 16:41:56 -07:00
Mircea Trofin	0cb9746a7d	[nfc][mlgo] Separate logger and training-mode model evaluator This just shuffles implementations and declarations around. Now the logger and the TF C API-based model evaluator are separate. Differential Revision: https://reviews.llvm.org/D131116	2022-08-03 16:20:28 -07:00
Craig Topper	53d560b22f	[RISCV] Prevent infinite loop after D129980. D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into (seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it will be turned back into an 'and' by SimplifyDemandedBits which can cause an infinite loop. To prevent this, check if bit 31 is 0 with computeKnownBits before doing the transformation. Fixes PR56905. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131113	2022-08-03 15:19:07 -07:00
Vitaly Buka	26dd42705c	[NFC][Inliner] Simplify clamping in addCost	2022-08-03 14:54:37 -07:00
Craig Topper	84e9194828	Revert "[X86][MC] Always emit `rep` prefix for `bsf`" This reverts commit `c2066d19cd`. It's causing failures on the build bots.	2022-08-03 14:51:34 -07:00
Chris Bieneman	ce0bb316eb	[DX] [NFC] Move hasSection check up Juming out earlier if the global doesn't have a section is just a cleaner early out.	2022-08-03 15:54:53 -05:00
Vitaly Buka	e056e74dda	[NFC][inline] Add const to an argument	2022-08-03 13:20:47 -07:00
Craig Topper	ff91b2d9df	[X86] Promote i16 CTTZ/CTTZ_ZERO_UNDEF always. If we're going to emit a rep prefix before bsf as proposed in D130956, it makes sense to promote i16 operations to i32 to avoid the false depedency of tzcntw. Reviewed By: skan, pengfei Differential Revision: https://reviews.llvm.org/D130995	2022-08-03 13:12:20 -07:00
Adrian Prantl	905f2d1ecb	Fix LDV InstrRefBasedImpl to not crash when encountering unreachable MBBs. The testcase was delta-reduced from an LTO build with sanitizer coverage and the MIR tail duplication pass caused a machine basic block to become unreachable in MIR. This caused the MBB to be invisible to the reverse post-order traversal used to initialize the MBB <-> RPONumber lookup tables. rdar://97226240 Differential Revision: https://reviews.llvm.org/D130999	2022-08-03 13:05:05 -07:00
Felipe de Azevedo Piovezan	a5a8a05c78	[SelectionDAG] Handle IntToPtr constants in dbg.value The function `handleDebugValue` has custom logic to handle certain kinds constants, namely integers, floats and null pointers. However, it does not handle constant pointers created from IntToPtr ConstantExpressions. This patch addresses the issue by replacing the Constant with its integer operand. A similar bug was addressed for GlobalISel in D130642. Reviewed By: aprantl, #debug-info Differential Revision: https://reviews.llvm.org/D130908	2022-08-03 14:10:05 -04:00
Nicolai Hähnle	5c7c83885f	Revert "ManagedStatic: remove from DynamicLibrary" This reverts commit `38817af6a7`. Buildbots report a Windows build error. Revert until I can look at it more carefully.	2022-08-03 17:56:46 +02:00
Philip Reames	569a7f6aa3	[LV] Move definition of isPredicatedInst out of line and make it const [nfc]	2022-08-03 08:53:11 -07:00
Nicolai Hähnle	38817af6a7	ManagedStatic: remove from DynamicLibrary Differential Revision: https://reviews.llvm.org/D129127	2022-08-03 17:43:52 +02:00
Philip Reames	a1cab0daae	[LV] Use cost base decision for uniform mem op strategy [nfc-ish] This is mostly a stylistic change to make the uniform memop widening cost code fit more naturally with the sourounding code. Its not strictly speaking NFC as I added in the store with invariant value case, and we could in theory have a target where a gather/scatter is cheaper than a single load/store... but it's probably NFC in practice. Note that the scatter/gather result can still be overriden later if the result is uniform-by-parts.	2022-08-03 07:47:24 -07:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Alex Bradbury	28f12a09ae	[RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics An unnecessary sext.w is generated when masking the result of the riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed. Although this isn't a particularly important optimisation, removing the sext.w simplifies implementation of an additional cmpxchg-related optimisation in D130192. Although I can't produce a test with different codegen for the other atomics intrinsics, these are added as well for completeness. Differential Revision: https://reviews.llvm.org/D130191	2022-08-03 13:41:58 +01:00
Nicolai Hähnle	d4cab87094	ManagedStatic: remove from CrashRecoveryContext Differential Revision: https://reviews.llvm.org/D129126	2022-08-03 14:35:30 +02:00
Dmitry Preobrazhensky	05b3aadfff	[AMDGPU][MC][GFX11] Correct v_dot2_f16_f16 and v_dot2_bf16_bf16 Enable SGPRs for the following operands of these opcodes: - src operands of VOP3 variant. - src2 operand of DPP variants. Differential Revision: https://reviews.llvm.org/D130989	2022-08-03 15:08:23 +03:00
Dmitry Preobrazhensky	ae553f9e49	[AMDGPU][MC][GFX10] Correct encoding of VOP3 v_cmpx* opcodes Encode dst=EXEC but allow disassembler accept any dst value. Differential Revision: https://reviews.llvm.org/D130978	2022-08-03 15:03:44 +03:00
Nicolai Hähnle	4cf0a9d4ae	ManagedStatic: remove from Interpreter/ExternalFunctions Differential Revision: https://reviews.llvm.org/D129124	2022-08-03 13:29:24 +02:00
Johannes Reifferscheid	3e9e43b48e	Fix compiler error: init-statements in if/switch. Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D131061	2022-08-03 11:36:41 +02:00
Fraser Cormack	646e2f4803	[VP] Rename VP int<->float conversion ISD opcodes These should be named like the non-VP versions for consistency. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130967	2022-08-03 10:04:38 +01:00
Phoebe Wang	c2066d19cd	[X86][MC] Always emit `rep` prefix for `bsf` `BMI` new instruction `tzcnt` has better performance than `bsf` on new processors. Its encoding has a mandatory prefix '0xf3' compared to `bsf`. If we force emit `rep` prefix for `bsf`, we will gain better performance when the same code run on new processors. GCC has already done this way: https://c.godbolt.org/z/6xere6fs1 Fixes #34191 Reviewed By: skan Differential Revision: https://reviews.llvm.org/D130956	2022-08-03 17:09:36 +08:00
Johannes Reifferscheid	7ae5d00afa	Fix a stack overflow in ScalarEvolution. Unfortunately, this overflow is extremely hard to reproduce reliably (in fact, I was unable to do so). The issue is that: - getOperandsToCreate sometimes skips creating an SCEV for the LHS - then, createSCEV is called for the BinaryOp - ... which calls getNoWrapFlagsFromUB - ... which under certain circumstances calls isSCEVExprNeverPoison - ... which under certain circumstances requires the SCEVs of all operands For certain deep dependency trees, this causes a stack overflow. Reviewed By: bkramer, fhahn Differential Revision: https://reviews.llvm.org/D129745	2022-08-03 11:08:01 +02:00
Nicolai Hähnle	2fe3589acd	ManagedStatic: remove from PluginLoader Differential Revision: https://reviews.llvm.org/D129123	2022-08-03 10:41:58 +02:00
Nicolai Hähnle	48c401a60e	ManagedStatic: remove from TimeProfiler Differential Revision: https://reviews.llvm.org/D129121	2022-08-03 10:41:58 +02:00
Liu, Chen3	5bbb0a831f	[X86] Using `X86MemOperand` instead of `Operand` for `i32mem_TC` and `i64mem_TC` To fix build fail when X86_GEN_FOLD_TABLES is enabled. Differential Revision: https://reviews.llvm.org/D131049	2022-08-03 16:17:51 +08:00
Nikita Popov	b128e057c1	[AA] Make ModRefInfo a bitmask enum (NFC) Mark ModRefInfo as a bitmask enum, which allows using normal & and \| operators on it. This supersedes various functions like unionModRef() and intersectModRef(). I think this makes the code cleaner than going through helper functions... Differential Revision: https://reviews.llvm.org/D130870	2022-08-03 10:05:55 +02:00
Max Kazantsev	34ae308c73	[SCEV] Use context to strengthen flags of BinOps Sometimes SCEV cannot infer nuw/nsw from something as simple as ``` len in [0, MAX_INT] ... iv = phi(0, iv.next) guard(iv <s len) guard(iv <u len) iv.next = iv + 1 ``` just because flag strenthening only relies on definition and does not use local facts. This patch adds support for the simplest case: inference of flags of `add(x, constant)` if we can contextually prove that `x <= max_int - constant`. In case if it has negative CT impact, we can add an option to switch it off. I woudln't expect that though. Differential Revision: https://reviews.llvm.org/D129643 Reviewed By: apilipenko	2022-08-03 14:08:57 +07:00
Craig Topper	f19497f7b0	[RISCV] Use InstVisitor in RISCVCodeGenPrepare. NFC Makes it easy to add new instructions to look at without dispatching manually.	2022-08-02 21:19:30 -07:00
Chuanqi Xu	ce1b24cca8	[IRBuilder] Handle constexpr-bitcast for IRBuilder::CreateThreadLocalAddress In case that opaque pointers not enabled, there may be some constexpr bitcast uses for thread local variables and the design of llvm allow people to sink constant arbitrarily. This breaks the assumption of IRBuilder::CreateThreadLocalAddress. This patch tries to handle the case.	2022-08-03 11:13:49 +08:00
Paul Kirth	d434e40f39	[llvm][NFC] Refactor code to use ProfDataUtils In this patch we replace common code patterns with the use of utility functions for dealing with profiling metadata. There should be no change in functionality, as the existing checks should be preserved in all cases. Reviewed By: bogner, davidxl Differential Revision: https://reviews.llvm.org/D128860	2022-08-03 00:09:45 +00:00
Austin Kerbow	3dfa562643	[AMDGPU] Add CL option for max-ilp scheduler. When compiling for multiple targets the scheduler that is selected via the -misched option is applied globally. This patch adds a target CL option instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131022	2022-08-02 16:52:14 -07:00
Craig Topper	a5605f1f68	[RISCV] Fix operand number in debug message in RISCVMergeBaseOffset. This used to print from the ADDI where the operand number was correct. It recently changed to print from the LUI or AUIPC which needs to use operand 1 instead of 2. This shows up as a crash with -debug.	2022-08-02 15:27:23 -07:00
Nicolai Hähnle	f7872cdce1	CommandLine: add and use cl::SubCommand::get{All,TopLevel} Prefer using these accessors to access the special sub-commands corresponding to the top-level (no subcommand) and all sub-commands. This is a preparatory step towards removing the use of ManagedStatic: with a subsequent change, these global instances will be moved to be regular function-scope statics. It is split up to give downstream projects a (albeit short) window in which they can switch to using the accessors in a forward-compatible way. Differential Revision: https://reviews.llvm.org/D129118	2022-08-02 23:49:16 +02:00
Austin Kerbow	40eec27618	[AMDGPU] Add llvm_unreachable to switch statement added in `d7100b398`.	2022-08-02 13:45:38 -07:00
Mircea Trofin	4146c1756d	[nfc] Remove unused parameter in TailDuplicator::duplicateSimpleBB Differential Revision: https://reviews.llvm.org/D131008	2022-08-02 13:39:34 -07:00
Austin Kerbow	d7100b398b	[AMDGPU] Add GCNMaxILPSchedStrategy Creates a new scheduling strategy that attempts to maximize ILP for a single wave. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130869	2022-08-02 13:21:24 -07:00
Xiang Li	20f7f9b709	[NFC][DirectX backend] Fix crash when emit_obj for DirectX backend. When emit-obj from clang directly, DirectX backend will hit assert caused by not initialize passes for AsmPrinter. The fix will initialize the passes by calling createPassConfig. Also ignore global variable which not has section in DXILAsmPrinter::emitGlobalVariable to avoid hit llvm_unreachable in DXILTargetObjectFile::SelectSectionForGlobal. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D130856	2022-08-02 12:09:07 -07:00
Arthur Eubanks	43aa4ac70b	[StandardInstrumentations] Assign names to basic blocks without names Fixes code in OrderedChangedData<T>::report which assumes that a string will only appear once in Before/After. Reviewed By: jamieschmeiser Differential Revision: https://reviews.llvm.org/D130587	2022-08-02 11:04:01 -07:00
Kai Nacke	b38375378d	[GIsel] Add missing libcall for G_MUL to LegalizerHelper The LegalizerHelper misses the code to lower G_MUL to a library call, which this change adds. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D130987	2022-08-02 13:35:25 -04:00
Vladislav Dzhidzhoev	71aecbb75c	[AArch64] Treat x18 as callee-saved in functions with Windows calling convention on Darwin rGcf97e0ec42b8 makes $x18 to be treated as callee-saved in functions with Windows calling convention on non-Windows OSes. Here we mark $x18 as callee-saved for functions with Windows calling convention on Darwin, as well as on other non-Windows platforms, in order to prevent some miscompilations (like miscompilation of win64cc-darwin-backup-x18.ll). Since getCalleeSavedRegs doesn't return x18 in list of callee-saved registers, assignCalleeSavedSpillSlots and determineCalleeSaves consider different sets of registers as callee-saved. It causes an error: ``` Assertion failed: ((!HasCalleeSavedStackSize \|\| getCalleeSavedStackSize() == Size) && "Invalid size calculated for callee saves"), function getCalleeSavedStackSize, file AArch64MachineFunctionInfo.h, line 292. ``` Differential Revision: https://reviews.llvm.org/D130676	2022-08-02 20:33:42 +03:00
Vladislav Dzhidzhoev	f6d9f00031	[DebugInfo] Test commit: update irrelevant comments Differential Revision: https://reviews.llvm.org/D130998	2022-08-02 20:21:24 +03:00
Guozhi Wei	85a6dd50ad	[MIPS] Expose the ZERO register as a constant physical register The ZERO register should be exposed as a constant physical register through the interface TargetRegisterInfo::isConstantPhysReg. Differential Revision: https://reviews.llvm.org/D130932	2022-08-02 17:04:52 +00:00
Craig Topper	ae6877836e	[RISCV] Add scheduler classes to PseudoVMV*R_V. I think these pseudos will exist when the post-RA scheduler runs so they should have sched classes. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D130945	2022-08-02 09:38:32 -07:00
Craig Topper	2e5c516a3d	[RISCV] Add scheduler class to PseudoReadVLENB. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D130938	2022-08-02 09:38:32 -07:00
Alexander Timofeev	a321d95b59	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the `2e29b0138c` we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-08-02 18:37:57 +02:00
Jay Foad	e301e071ba	[AMDGPU] Remove IR SpeculativeExecution pass from codegen pipeline This pass seems to have very little effect because all it does is hoist some instructions, but it is followed later in the codegen pipeline by the IR CodeSinking pass which does the opposite. Differential Revision: https://reviews.llvm.org/D130258	2022-08-02 17:35:20 +01:00
Jay Foad	c24d68fff1	[AMDGPU] Take advantage of VOP3 literals in convertToThreeAddress This improves a corner case where v_fmac can be converted to v_fma on GFX10+ even if it has a literal operand. Differential Revision: https://reviews.llvm.org/D130992	2022-08-02 17:27:11 +01:00
Philip Reames	0b47615fcf	[LV] Recognize store of invariant value to invariant address as uniform This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result. For context, the basic structure of the existing code and how the change fits in: * First, we select a widening strategy. (The result is irrelevant for this patch.) * Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.) * If it is, we overrule the widening strategy, and unconditionally scalarize. * VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.) An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions. Differential Revision: https://reviews.llvm.org/D130364	2022-08-02 08:09:49 -07:00
Phoebe Wang	23021d4d8c	[X86][FP16] Fix vector_shuffle and lowering without f16c feature problems The problem Alexander reported on D127982 was caused by an optimization for AVX512-FP16 instruction. We must limit it to the feature enabled only. During the investigation, I found we didn't expand for fp_round/fp_extend without F16C. This may result runtime crash, so change them too. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130817	2022-08-02 22:26:41 +08:00
Jay Foad	bb2832410e	[IRBuilder] CreateIntrinsic with implicit mangling Add a new IRBuilderBase::CreateIntrinsic which takes the return type and argument values for the intrinsic call but does not take an explicit list of types to mangle. Instead the builder works this out from the intrinsic declaration and the types of the supplied arguments. This means that the mangling is hidden from the client, which in turn means that intrinsic definitions can change which arguments are mangled without requiring any changes to the client code. Differential Revision: https://reviews.llvm.org/D130776	2022-08-02 13:08:35 +01:00
David Green	1206f72e31	[AArch64] Fold Mul(And(Srl(X, 15), 0x10001), 0xffff) to CMLTz This folds a v4i32 Mul(And(Srl(X, 15), 0x10001), 0xffff) into a v8i16 CMLTz instruction. The Srl and And extract the top bit (whether the input is negative) and the Mul sets all values in the i16 half to all 1/0 depending on if that top bit was set. This is equivalent to a v8i16 CMLTz instruction. The same applies to other sizes with equivalent constants. Differential Revision: https://reviews.llvm.org/D130874	2022-08-02 13:01:59 +01:00
Simon Pilgrim	b651fdff79	[DAG] matchRotateSub - ensure the (pre-extended) shift amount is wide enough for the amount mask (PR56859) matchRotateSub is given shift amounts that will already have stripped any/zero-extend nodes from - so make sure those values are wide enough to take a mask.	2022-08-02 11:38:52 +01:00
David Sherwood	4ef9cb6c17	[AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses If we have interleave groups in the loop we want to vectorise then we should fall back on normal vectorisation with a scalar epilogue. In such cases when tail-folding is enabled we'll almost certainly go on to create vplans with very high costs for all vector VFs and fall back on VF=1 anyway. This is likely to be worse than if we'd just used an unpredicated vector loop in the first place. Once the vectoriser has proper support for analysing all the costs for each combination of VF and vectorisation style, then we should be able to remove this. Added an extra test here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D128342	2022-08-02 09:52:33 +01:00
Alex Bradbury	5ad59c9e59	[RISCV][NFCI] Set TransientStackAlignment and rely on it rather than RVV-specific logic on RVV-less functions * TargetFrameLowering has a TransientStackAlignment field that "returns the number of bytes to which the stack pointer must be aligned at all times, even between calls. * As explained in the [RISC-V calling convention](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc), the stack pointer must remain fully aligned throughout execution for compliant code. This is important for embedded targets that might avoid realigning the stack pointer for interrupt service routines. Systems running full OSes may always realign the stack anyway. * TransientStackAlignment is used in estimateStackSize in MachineFrameInfo and in PEI::calculateFrameObjectOffsets. * estimateStackSize is only used in the RISC-V backend for scavenging slots. It may be possible to craft a function where the difference is observable, but it wouldn't be a meaningful test. * calculateFrameObjectOffsets makes use of TransientStackAlignment, but then sets the stack alignment to the max of that alignment and MaxAlign, which is unconditionally set to 16 in RISCVFrameLowering::processFunctionBeforeFrameFinalized * I've changed this logic to only set MaxAlign if there are RVV frame objects. There should be no functional change here for either RVV targets (MaxAlign is set as before) or non-RVV targets (TransientStackAlign is now 16 anyway). Differential Revision: https://reviews.llvm.org/D130068	2022-08-02 09:46:06 +01:00
Tim Northover	b586dc21a7	Outliner: add "target-cpu" feature from source function to outlined The CPU is used to determine which inline asm instructions are allowed, so needs to be copied across in case the outlined function contains any.	2022-08-02 09:33:29 +01:00
wanglian	e208bab55f	[RISCV][NFC] Use defined variable instead some code. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130687	2022-08-02 16:26:33 +08:00
jacquesguan	e38af7ba95	[LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add. Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd. This patch does the following changes: 1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost. 2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul. Differential Revision: https://reviews.llvm.org/D130868	2022-08-02 16:02:38 +08:00
Sotiris Apostolakis	995b61cdac	[SelectOpti] Auto-disable other cmov optis when the new select-opti pass is enabled Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D129817	2022-08-02 00:19:59 +00:00
Sunho Kim	5680ef870f	[IntelJITEvents] Add missing include. Fixes compilation error. Differential Revision: https://reviews.llvm.org/D130898	2022-08-02 08:45:14 +09:00
Martin Sebor	bcef4d238d	[InstCombine] Correct strtol folding with nonnull endptr Reflect in the pointer's offset the length of the leading part of the consumed string preceding the first converted digit. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130912	2022-08-01 16:47:05 -06:00
Craig Topper	da5b1bf5bb	[RISCV] Teach RISCVMergeBaseOffset to merge %lo/%pcrel_lo into load/store after folding arithmetic. It's possible we have: lui a0, %hi(sym) addi a0, %lo(sym) addi a0, <offset1> lw a0, <offset2>(a0) We want to arrive at lui a0, %hi(sym+offset1+offset2) lw a0, %lo(sym+offset1+offset2) We currently fail to do this because we only consider loads/stores if we didn't find any arithmetic. This patch splits arithmetic folding and load/store folding into two separate phases. The load/store folding can no longer assume the offset in hi/lo is 0 so we must combine the offsets. I've applied the same simm32 limit that we applied in the arithmetic folding. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D130931	2022-08-01 15:33:21 -07:00
Ilia Diachkov	b25b507c77	[SPIRV] use tablegen to create SPIRVBaseInfo* The patch replaces SPIRVBaseInfo.* previously created using macros by the tablegen approach. There are many small changes in other files due to differences in namespaces. Also, functions in SPIRVUtils are moved to the llvm namespace. Differential Revision: https://reviews.llvm.org/D130518 Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com> Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com> Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>	2022-08-02 01:57:23 +03:00
Craig Topper	e07a8155f5	[RISCV] Move Pre-RA pseudo expansion from addMachineSSAOptimization to addPreRegAlloc. addMachineSSAOptimization is skipped for -O0, but this pass is required for -O0.	2022-08-01 13:44:43 -07:00
Vasileios Porpodas	f669030373	[TTI][AArch64][SLP] Sets the cost of an ADD reduction 2xi64 to 2. 2xi64 is the legalized type for wide reductions (like 16xi64) and setting the cost to 2 makes `load-reduce` and `load-zext-reduce` patterns profitable. The few performance measurments that I did on an aarch64 machine confirm that these patterns are actually faster when vectorized. Differential Revision: https://reviews.llvm.org/D130740	2022-08-01 13:03:14 -07:00
Fangrui Song	2b70bebc6d	[MachineFunctionPass] Support -print-changed={,c}diff{,-quiet} Follow-up to D130434. Move doSystemDiff to PrintPasses.cpp and call it in MachineFunctionPass.cpp. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D130833	2022-08-01 12:56:15 -07:00
Vang Thao	7fc52d7c8b	[AMDGPU] Fix DGEMM hazard for GFX90a For VALU write and memory (VM, L/DS, FLAT) instructions, SQ would insert wait-states to avoid data hazard. However when there is a DGEMM instruction in-between them, SQ incorrectly disables the wait-states thus the data hazard needs to be handled with this workaround. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130677	2022-08-01 11:56:22 -07:00
Craig Topper	450edb0b37	[RISCV] Explicitly select second operand of branch condition to X0. At least based on the lit tests, the coalescer sometimes fails to propagate the copy from X0 into the branch instruction. This patch does it manually during isel. The majority of the changes are from the select patterns. Some of the changes are just register allocation changes. Only the Select change affects the whether a b*z instruction is generated in the tests. I changed the branch pattern for consistency. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130809	2022-08-01 11:16:48 -07:00
Craig Topper	ad8db972b0	[RISCV] Eagerly delete instructions in MergeBaseOffset. The only iterator we're holding points to HiLUI and we never delete that so I think it is safe to delete everything else immediately. I want to split detectAndFoldOffset into two phases. First, combine LUI+ADDI with any ADD/ADDI/SHXADD that comes after it. This may open opportunities to fold the ADDI from the LUI+ADDI into a load/store address. So the load/store folding should run as a second phase even if the ADD/ADDI/SHXADD made changes. In order to do this we need to eagerly delete instructions in the first phase so that we don't have dead users of the LUI+ADDI when we start the second phase. Patches to split the phases will come later. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D130119	2022-08-01 09:32:46 -07:00
Lorenzo Albano	71b7c03fd6	[RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121113	2022-08-01 09:23:45 -07:00
Piotr Sobczak	f29a19b0b8	[AMDGPU] Extend cases for ReadM0MovRelInterpHazard Extend hazard recognizer of ReadM0MovRelInterpHazard with DS_READ_ADDTID and DS_WRITE_ADDTID, as they also require a manually inserted S_NOP after SALU writing m0. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130783	2022-08-01 17:59:33 +02:00
Simon Pilgrim	27105e2f30	MisExpect.h - fix Wdocumentation warnings. NFC.	2022-08-01 15:06:30 +01:00
Dmitry Preobrazhensky	3aae8cd842	[AMDGPU][MC] Verify selection of LDS MUBUF opcodes Differential Revision: https://reviews.llvm.org/D130761	2022-08-01 16:44:39 +03:00
Anubhab Ghosh	ac3cb4ecd0	[Orc] Disable use of shared memory on Android shm_open and shm_unlink are not available on Android. This commit disables SharedMemoryMapper on Android until a better solution is available. https://android.googlesource.com/platform/bionic/+/refs/heads/master/docs/status.md https://github.com/llvm/llvm-project/issues/56812 Differential Revision: https://reviews.llvm.org/D130814	2022-08-01 18:48:39 +05:30
Dmitry Preobrazhensky	bb901dcc5a	[AMDGPU][MC][GFX940] Correct disassembly of MFMA opcodes Add a decoder table for GFX940 MFMA opcodes. Differential Revision: https://reviews.llvm.org/D130759	2022-08-01 16:00:47 +03:00
Jay Foad	a5a7a9da39	[X86] Fix updating LiveVariables in convertToThreeAddress Fix all instances of: * Bad machine code: Kill missing from LiveVariables * in the X86 CodeGen tests with D129213 applied, which adds verification of LiveIntervals after the TwoAddressInstruction pass runs. Differential Revision: https://reviews.llvm.org/D129634	2022-08-01 13:45:21 +01:00
Lucas Prates	ba9caf9170	[Arm] Fix parsing and emission of Tag_also_compatible_with eabi attribute According to the ABI for the Arm Architecture, the value for the Tag_also_compatible_with eabi attribute is represented by an NTBS entry. This string value, in turn, is composed of a pair of tag+value encoded in one of two formats: - ULEB128: tag, ULEB128: value, 0. - ULEB128: tag, NBTS: data. (See [[ `60a8eb8c55/addenda32/addenda32.rst (3373secondary-compatibility-tag)` \| section 3.3.7.3 on the Addenda to, and Errata in, the ABI for the Arm Architecture ]].) Currently the Arm assembly parser and streamer ignore the encoding of the attribute's NTBS value, which can result in incorrect attributes being emitted in both assembly and object file outputs. This patch fixes these issues by properly handing the value's encoding. An update to llvm-readobj to properly handle the attribute's value will be covered by a separate patch. Patch by Victor Campos and Lucas Prates. Reviewed By: vhscampos Differential Revision: https://reviews.llvm.org/D129500	2022-08-01 13:28:01 +01:00
Marius Brehler	ddb6c28638	Avoid comparison of integers of different signs Otherwiese a warning is emitted when compiling with `-Wsign-compare`.	2022-08-01 11:20:41 +00:00
Pierre van Houtryve	a847e3dc52	[NFC][AMDGPU] Fix typo in SIRegisterInfo.cpp	2022-08-01 07:01:33 -04:00
Simon Pilgrim	b43d7aacf8	[DAG] visitINSERT_VECTOR_ELT - extend folding to BUILD_VECTOR if all missing elements from an insertion chain are known zero	2022-08-01 11:32:33 +01:00
Petar Avramovic	e8d260753e	[AMDGPU] gfx11 allow dlc for MUBUF atomics Add MC support for dlc in gfx11 MUBUF atomic instructions. Differential Revision: https://reviews.llvm.org/D129075	2022-08-01 12:18:01 +02:00
Dominik Adamski	d90b7bf2c5	Add support for lowering simd if clause to LLVM IR Scope of changes: 1) Added new function to generate loop versioning 2) Added support for if clause to applySimd function 2) Added tests which confirm that lowering is successful If ifCond is specified, then collapsed loop is duplicated and if branch is added. Duplicated loop is executed if simd ifCond is evaluated to false. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D129368 Signed-off-by: Dominik Adamski <dominik.adamski@amd.com>	2022-08-01 04:43:32 -05:00
David Sherwood	41119a0f52	[DAGCombiner] Extend visitAND to include EXTRACT_SUBVECTOR Eliminate an AND by redefining an anyext\|sext\|zext. (and (extract_subvector (anyext\|sext\|zext v) _) iN_mask) => (extract_subvector (zeroext_iN v)) Differential Revision: https://reviews.llvm.org/D130782	2022-08-01 10:32:32 +01:00
Luís Marques	0bc177b6f5	[RISCV] Extend the Merge Base Offset pass to handle AUIPC+ADDI Builds upon D123264, adding support for merging the low part of the LLA address into the load/store instruction offsets. Differential Revision: https://reviews.llvm.org/D123265	2022-08-01 11:30:02 +02:00
Vladislav Dzhidzhoev	facb3ac385	[GlobalISel][DebugInfo] salvageDebugInfo analogue for gMIR Salvage debug info of instruction that is about to be deleted as dead in Combiner pass. Currently supported instructions are COPY and G_TRUNC. It allows to salvage debug info of some dead arguments of functions, by putting DWARF expression corresponding to the instruction being deleted into related DBG_VALUE instruction. Here is an example of missing variables location https://godbolt.org/z/K48osb9dK. We see that arguments x, y of function foo are not available in debugger, and corresponding DBG_VALUE instructions have undefined register operand instead of variables locaton after Aarch64PreLegalizerCombiner pass. The reason is that registers where variables are located are removed as dead (with instruction G_TRUNC). We can use salvageDebugInfo analogue for gMIR to preserve debug locations of dead variables. Statistics of llvm object files built with vs without this commit on -O2 optimization level (CMAKE_BUILD_TYPE=RelWithDebInfo, -fglobal-isel) on Aarch64 (macOS): Number of variables with 100% of parent scope covered by DW_AT_location has been increased by 7,9%. Number of variables with 0% coverage of parent scope has been decreased by 1,2%. Number of variables processed by location statistics has been increased by 2,9%. Average PC ranges coverage has been increased by 1,8 percentage points. Coverage can be improved by supporting more instructions, or by calling salvageDebugInfo for instructions that are deleted during Combiner rules exection. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D129909	2022-08-01 11:14:53 +02:00
Alex Bradbury	9bf2d8cbbe	[NFC] Use AllocaInst's getAddressSpace helper	2022-08-01 10:11:16 +01:00
Nikita Popov	7314ad7a06	Revert "[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions" This reverts commit `7b0f6378e2`. As commented on the review, this patch has a correctness issue regarding the modelling of memory effects.	2022-08-01 09:20:56 +02:00
Momchil Velikov	7b0f6378e2	[SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions SimplifyCFG does some common code hoisting, which is limited to hoisting a sequence of identical instruction in identical order and stops at the first non-identical instruction. This patch allows hoisting instruction pairs over same-length sequences of non-matching instructions. The linear asymptotic complexity of the algorithm stays the same, there's an extra parameter `simplifycfg-hoist-common-skip-limit` serving to limit compilation time and/or the size of the hoisted live ranges. The patch improves SPECv6/525.x264_r by about 10%. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D129370	2022-08-01 07:55:14 +01:00
Nikita Popov	4ec22ba9c8	[GlobalsAA] Remove unnecessary AAResultBase fallback (NFC) This is unnecessary, as AA result chaining is implemented at a higher level now.	2022-08-01 08:34:58 +02:00
Nikita Popov	a21c245307	[ARMParallelDSP] Remove unnecessary ModRef intersection (NFC) Intersecting with ModRef is a no-op, as these are the only two possible values.	2022-08-01 08:34:58 +02:00
Nikita Popov	5b1d10bda6	[AA] Drop setModAndRef() function (NFC) Without the "must" state, this function is pointless, because we can just directly create a ModRef instead.	2022-08-01 07:55:39 +02:00
Nikita Popov	34683c3e35	[MSSA] Fix expensive checks build	2022-08-01 07:28:52 +02:00
Nikita Popov	f96ea53e89	[AA] Do not track Must in ModRefInfo getModRefInfo() queries currently track whether the result is a MustAlias on a best-effort basis. The only user of this functionality is the optimized memory access type in MemorySSA -- which in turn has no users. Given that this functionality has not found a user since it was introduced five years ago (in D38862), I think we should drop it again. The context is that I'm working to separate FunctionModRefBehavior to track mod/ref for different location kinds (like argmem or inaccessiblemem) separately, and the fact that ModRefInfo also has an unrelated Must flag makes this quite awkward, especially as this means that NoModRef is not a zero value. If we want to retain the functionality, I would probably split getModRefInfo() results into a part that just contains the ModRef information, and a separate part containing a (best-effort) AliasResult. Differential Revision: https://reviews.llvm.org/D130713	2022-08-01 07:14:31 +02:00
Chuanqi Xu	9701053517	Introduce @llvm.threadlocal.address intrinsic to access TLS variable This belongs to a series of patches which try to solve the thread identification problem in coroutines. See https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015 for a full background. The problem consists of two concrete problems: TLS variable and readnone functions. This patch tries to convert the TLS problem to readnone problem by converting the access of TLS variable to an intrinsic which is marked as readnone. The readnone problem would be addressed in following patches. Reviewed By: nikic, jyknight, nhaehnle, ychen Differential Revision: https://reviews.llvm.org/D125291	2022-08-01 10:51:30 +08:00
Kazu Hirata	bf6021709a	Use drop_begin (NFC)	2022-07-31 15:17:09 -07:00
Kazu Hirata	d11103f9a0	[Hexagon] Remove unused declaration adjustForCalleeSavedRegsSpillCall (NFC) The function definition was removed on Apr 23, 2015 in commit `876a19d855`, but the declaration has remained since.	2022-07-31 15:17:06 -07:00
Kazu Hirata	71638b8be7	[ExecutionEngine] Ensure newlines at the end of files (NFC)	2022-07-31 15:16:58 -07:00
Luís Marques	260a641068	[RISCV] Pre-RA expand pseudos pass Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up patches to fold the addi of PseudoLLA instructions into the immediate operand of load/store instructions. Differential Revision: https://reviews.llvm.org/D123264	2022-07-31 23:19:00 +02:00
Sanjay Patel	02b3a35892	[InstSimplify] fold FP rounding intrinsic with rounded operand issue #56775 I rearranged the Thumb2 codegen test to avoid simplifying the chain of rounding instructions. I'm assuming the intent of the test is to verify lowering of each of those intrinsics.	2022-07-31 10:00:27 -04:00
Simon Pilgrim	acb5abb7d3	[X86] getFauxShuffleMask - use DemandedElts variant of getTargetShuffleInputs. NFCI. We don't specify the demanded elts yet, this patch just rewires the getTargetShuffleInputs calls and gives an "all demanded elts" mask.	2022-07-31 12:15:04 +01:00
Simon Pilgrim	9cdba33337	[X86] combineX86ShufflesRecursively - determine demanded elts to pass to getTargetShuffleInputs Only PACKSS/PACKUS faux shuffles make use of the demanded elts at the moment, but this at least improves the handling of a couple of truncation patterns.	2022-07-31 11:30:40 +01:00
Sunho Kim	c559072e46	[JITLink][COFF] Remove unused variable.	2022-07-31 09:19:17 +09:00
Sunho Kim	b501770aef	[JITLink][COFF] Handle COMDAT symbol with offset. Handles COMDAT symbol with an offset and refactor the code to only generated symbol if the second symbol was encountered. This happens very infrequently but happens in recursive_mutex implementation of MSVC STL library. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D130454	2022-07-31 09:09:48 +09:00
Sunho Kim	d86f903b1d	[JITLink][COFF][x86_64] Implement remaining IMAGE_REL_AMD64_REL32_. Implements remaining IMAGE_REL_AMD64_REL32_. We only need IMAGE_REL_AMD64_REL32_4 for now but doing all remaining ones for completeness. (clang only uses IMAGE_REL_AMD64_REL32_1 and IMAGE_REL_AMD64_REL32) Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D130452	2022-07-31 09:03:28 +09:00
Sunho Kim	e781451140	[JITLink] Relax zero-fill edge assertions. Relax zero-fill edge assertions to only consider relocation edges. Keep-alive edges to zero-fill blocks can cause this assertion which is too strict. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D130450	2022-07-31 08:34:10 +09:00
Sunho Kim	ee9cf336d6	[JITLink][COFF] Remove obsolete FIXMEs. (NFC)	2022-07-31 08:10:14 +09:00
Sunho Kim	ea75c25833	[JITLInk][COFF] Remove unnecessary unique_ptr. (NFC)	2022-07-31 08:08:19 +09:00
Sunho Kim	067faddb55	[JITLink][COFF] Add explicit std::move. Since ArgList is not copyable we need to make sure it's moved explicitly.	2022-07-31 08:01:00 +09:00
Sunho Kim	88181375a3	[JITLink][COFF] Implement include/alternatename linker directive. Implements include/alternatename linker directive. Alternatename is used by static msvc runtime library. Alias symbol is technically incorrect (we have to search for external definition) but we don't have a way to represent this in jitlink/orc yet, this is solved in the following up patch. Inlcude linker directive is used in ucrt to forcelly lookup the static initializer symbols so that they will be emitted. It's implemented as extenral symbols with live flag on that cause the lookup of these symbols. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D130276	2022-07-31 07:49:59 +09:00
Craig Topper	d21b315360	[RISCV] Remove vmerges from vector ceil, floor, trunc lowering. Use masked operations to suppress spurious exception bits being set in fflags. Unfortunately, doing this adds extra copies.	2022-07-30 10:58:41 -07:00
Simon Pilgrim	df457f583a	[X86] Use std::tie so we can have more meaningful variable names for demanded bits/elts pairs. NFCI. .first + .second were proving difficult to keep track of.	2022-07-30 18:57:15 +01:00
Kazu Hirata	12b29900a1	Use any_of (NFC)	2022-07-30 10:35:56 -07:00
Kazu Hirata	60db8d9b4e	Use nullptr instead of 0 (NFC) Identified with modernize-use-nullptr.	2022-07-30 10:35:48 -07:00
Kazu Hirata	66b6cc3acd	[ExecutionEngine] Ensure a newline at the end of a file (NFC)	2022-07-30 10:35:43 -07:00
Craig Topper	a23f07fb1d	[RISCV] Add merge operands to more RISCVISD::*_VL opcodes. This adds a merge operand to all of the binary _VL nodes. Including integer and widening. They all share multiclasses in tablegen so doing them all at once was easiest. I plan to use FADD_VL in an upcoming patch. The rest are just for consistency to keep tablegen working. This does reduce the isel table size by about 25k so that's nice. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130816	2022-07-30 10:26:38 -07:00
Craig Topper	9bf305fe2b	[RISCV] Swap the merge and mask operand order for VRGATHER*_VL and FCOPYSIGN_VL nodes. Based on review feedback from D130816.	2022-07-30 09:57:05 -07:00
Simon Pilgrim	a14f94c20c	[X86] computeKnownBitsForTargetNode - out of range X86ISD::VSRAI doesn't fold to zero Noticed by inspection and I can't seem to make a test case, but SSE arithmetic bit shifts clamp to the max shift amount (i.e. create a sign splat) - combineVectorShiftImm already does something similar.	2022-07-30 17:55:39 +01:00
Dmitry Vassiliev	adc387460d	[CodeGen] Fixed undeclared MISchedCutoff in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS This patch fixes the error llvm/lib/CodeGen/MachineScheduler.cpp(755): error C2065: 'MISchedCutoff': undeclared identifier in case of NDEBUG and LLVM_ENABLE_ABI_BREAKING_CHECKS. Note MISchedCutoff is declared under #ifndef NDEBUG. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130425	2022-07-30 18:24:50 +02:00
Sanjay Patel	7073ec530e	[InstCombine] canonicalize more zext-and-of-bool compare to narrow and https://alive2.llvm.org/ce/z/vBNiiM This matches variants of patterns that were folded with: `b5a9361c90`	2022-07-30 11:22:05 -04:00
Austin Kerbow	7898426a72	[AMDGPU] Remove unused function	2022-07-30 07:47:35 -07:00
Simon Pilgrim	49c0980eac	Fix Wdocumentation warning. NFC. warning: '\returns' command used in a comment that is attached to a function returning void	2022-07-30 15:41:13 +01:00
Simon Pilgrim	813459ed2b	[X86] combineSelect fold 'smin' style pattern select(pcmpgt(RHS, LHS), LHS, RHS) -> select(pcmpgt(LHS, RHS), RHS, LHS) if pcmpgt(LHS, RHS) already exists Avoids repeated commuted comparisons when we're performing min/max and clamp patterns	2022-07-30 15:31:36 +01:00
Nuno Lopes	d4b4747de5	ConstantFolding: fold OOB accesses to poison instead of undef	2022-07-30 15:20:32 +01:00
Sanjay Patel	f95a6aea1b	[InstCombine] avoid splitting a constant expression with div/rem fold Follow-up to `d4940c0f3d` to further limit the transform to avoid an unintended pattern/fold of a constant expression.	2022-07-30 09:45:25 -04:00
Simon Pilgrim	9ad082eb5a	[DAG] Pull out repeated getOperand() calls for shuffle ops. NFC.	2022-07-30 14:02:54 +01:00
Simon Pilgrim	276480b1d3	[AMDGPU] Fix \|\| vs && precedence warning. NFC.	2022-07-30 14:02:54 +01:00
Nuno Lopes	fffabd5348	[NFC] Switch a few uses of undef to poison as placeholders for unreachable code	2022-07-30 13:55:56 +01:00
Alexander Shaposhnikov	4220ef2be1	[InstCombine] Add fold for redundant sign bits count comparison For power-of-2 C: ((X s>> ShiftC) ^ X) u< C --> (X + C) u< (C << 1) ((X s>> ShiftC) ^ X) u> (C - 1) --> (X + C) u> ((C << 1) - 1) (https://github.com/llvm/llvm-project/issues/56479) Test plan: 0/ ninja check-llvm check-clang + bootstrap LLVM/Clang 1/ https://alive2.llvm.org/ce/z/eEUfx3 Differential revision: https://reviews.llvm.org/D130433	2022-07-30 09:06:53 +00:00
Carl Ritson	4c4db81630	[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions Apply merging to s_load as is done for s_buffer_load. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130742	2022-07-30 11:38:39 +09:00
Alexander Shaposhnikov	d982f1e0c6	[InstCombine] Refactor foldICmpMulConstant This is a follow-up to `2ebfda2417` (replace "if" with "else if" since the cases nuw/nsw were meant to be handled separately). Test plan: 1/ ninja check-llvm check-clang check-lld 2/ Bootstrapped LLVM/Clang pass tests	2022-07-30 02:29:15 +00:00
Fangrui Song	ce6dd4e835	Revert D130458 "[llvm-objcopy] Support --{,de}compress-debug-sections for zstd" This reverts commit `c26dc2904b`. The new Zstd dispatch has an ongoing design discussion related to https://reviews.llvm.org/D130516#3688123 . Revert for now before it is resolved.	2022-07-29 15:46:51 -07:00
Sanjay Patel	d4940c0f3d	[InstCombine] fix miscompile from urem/udiv transform with constant expression The isa<Constant> check could misfire on an instruction with 2 constant operands. This bug was introduced with `bb789381fc` (D36988). See issue #56810 for a C source example that exposed the bug.	2022-07-29 17:14:30 -04:00
Craig Topper	e637feee80	[RISCV] Add isel pattern for (setne/eq GPR, -2048) For constants in the range [-2047, 2048] we use addi. If the constant is -2048 we can use xori. If we don't match this explicitly, we'll emit an LI for the -2048 followed by an XOR.	2022-07-29 14:07:38 -07:00
Jay Foad	9436a85eb6	[IRBuilder] Make createCallHelper a member function. NFC. This just avoids explicitly passing in 'this' as an argument in a bunch of places. Differential Revision: https://reviews.llvm.org/D130752	2022-07-29 21:17:26 +01:00
Austin Kerbow	2c82a126d7	[AMDGPU] Omit unnecessary waitcnt before barriers It is not necessary to wait for all outstanding memory operations before barriers on hardware that can back off of the barrier in the event of an exception when traps are enabled. Add a new subtarget feature which tracks which HW has this ability. Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D130722	2022-07-29 11:12:36 -07:00
Sanjay Patel	b5a9361c90	[InstCombine] canonicalize zext-and-of-bool compare to narrow and https://alive2.llvm.org/ce/z/3jYbEH We should choose one of these forms, and the option that uses the narrow type allows the motivating example from issue #56294 to reduce. In the best case (no 'not' needed and 'trunc' remains), this does remove an instruction. Note that there is what looks like a regression because there is an existing canonicalization that turns trunc into and+icmp. That is a long-standing transform, and I'm not sure what effect reversing it would have.	2022-07-29 12:02:54 -04:00
Matt Devereau	a8b726ac65	[AArch64][SVE] Change DupLane128Combine Index comparison to 0 IdxInsert == IdxDupLane is incorrect. IdxInsert is the starting element number, whereas IdxIndex is the index of a quadword	2022-07-29 14:31:00 +00:00
Simon Pilgrim	bc2c4f6c85	[X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) (REAPPLIED) If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc. Updated version - original version rG057db2002bb3 didn't correctly account for multiple uses of the mask that might be folding "OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y))" in canonicalizeBitSelect	2022-07-29 15:12:26 +01:00
Nikita Popov	5eaeeed8cb	[InstCombine] Avoid ConstantExpr::getFNeg() calls (NFCI) Instead call the constant folding API, which can fail. For now, this should be NFC, as we still allow the creation of fneg constant expressions.	2022-07-29 16:01:46 +02:00
Amaury Séchet	226086230c	[DAG] Use recursivelyDeleteUnusedNodes in CommitTargetLoweringOpt. It simplifies the logic and removes the need for manual bookkeeping. Differential Revision: https://reviews.llvm.org/D130445	2022-07-29 13:49:03 +00:00
Mirko Brkusanin	6a1aa627fa	[AMDGPU] Enable image_gather4h instruction for gfx10 and gfx11 Differential Revision: https://reviews.llvm.org/D130764	2022-07-29 15:42:06 +02:00
Alexey Lapshin	ece341f598	[Debuginfo][DWARF][NFC] Add paired methods working with DWARFDebugInfoEntry. This review is extracted from D96035. DWARF Debuginfo classes have two representations for DIEs: DWARFDebugInfoEntry (short) and DWARFDie(extended). Depending on the task, it might be more convenient to use DWARFDebugInfoEntry or/and DWARFDie. DWARFUnit class already has methods working with DWARFDie and DWARFDebugInfoEntry. This patch adds more methods working with DWARFDebugInfoEntry to have paired functionality. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D126059	2022-07-29 16:40:17 +03:00
Jay Foad	3cfa9b1431	[AMDGPU] user-sgpr-init16-bug does not apply to gfx1103 Differential Revision: https://reviews.llvm.org/D130347	2022-07-29 14:21:13 +01:00
Simon Pilgrim	af1b7ebcdf	[TargetLowering] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC. Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.	2022-07-29 14:20:35 +01:00
Matt Arsenault	ef906f287e	AMDGPU: Fix assertion when printing unreachable functions Since `814a0abcce`, this would break if we had a function in the module that becomes dead in any codegen IR pass. The function wasn't deleted since it was initially used in dead code, but is detached from the call graph and doesn't appear in the PO traversal. Do a second walk over the module to populate the resources of any functions which weren't already processed.	2022-07-29 08:57:43 -04:00
Matt Arsenault	a4834ad068	RegisterCoalescer: Shrink main range after shrinking subranges If the subregister uses were dead, this would leave the main range segment pointing to a deleted instruction. Not sure if this should try to avoid shrinking if we know we don't have dead components.	2022-07-29 08:57:28 -04:00
Alexander Timofeev	d7ae1a9097	Revert "[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs" This reverts commit `76d9ae924c`. because it causes several VK CTS tests to fail	2022-07-29 14:19:07 +02:00
Simon Pilgrim	641dba9e28	[DAG] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC. Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.	2022-07-29 11:34:39 +01:00
Simon Pilgrim	9082c13106	[Support] Add KnownBits::concat method Add a method for the various cases where we need to concatenate 2 KnownBits together (BUILD_PAIR and SHIFT_PARTS in particular) - uses the existing APInt::concat 'HiBits.concat(LoBits)' convention Differential Revision: https://reviews.llvm.org/D130557	2022-07-29 11:06:39 +01:00
Jay Foad	d03110155b	[IR] Simplify Intrinsic::getDeclaration. NFC.	2022-07-29 10:45:22 +01:00
wanglei	56ab2f4ccd	[LoongArch] Offset folding for frameindex This patch is for frameindex calculations. Differential Revision: https://reviews.llvm.org/D130248	2022-07-29 17:27:34 +08:00
wanglei	fd6545322c	[LoongArch] Refactor insertDivByZeroTrap Ensure non-terminators don't follow terminators. This patch fixes the `sdiv-udiv-srem-urem.ll` test failure with expensive check. Differential Revision: https://reviews.llvm.org/D130247	2022-07-29 17:06:49 +08:00
David Sherwood	487fa6f8c3	[AArch64][DAGCombine] Add performBuildVectorCombine 'extract_elt ~> anyext' A build vector of two extracted elements is equivalent to an extract subvector where the inner vector is any-extended to the extract_vector_elt VT, because extract_vector_elt has the effect of an any-extend. (build_vector (extract_elt_i16_to_i32 vec Idx+0) (extract_elt_i16_to_i32 vec Idx+1)) => (extract_subvector (anyext_i16_to_i32 vec) Idx) Depends on D130697 Differential Revision: https://reviews.llvm.org/D130698	2022-07-29 09:51:09 +01:00
Florian Hahn	214e2d8fe5	[SCEV] Avoid repeated proveNoSignedWrapViaInduction calls. At the moment, proveNoSignedWrapViaInduction may be called for the same AddRec a large number of times via getSignExtendExpr. This can have a severe compile-time impact for very loop-heavy code. If proveNoSignedWrapViaInduction failed to prove NSW the first time, it is unlikely to succeed on subsequent tries and the cost doesn't seem to be justified. This is the signed version of `8daa338297` / D130648. This can drastically improve compile-time in some excessive cases and also has a slightly positive compile-time impact on CTMark: NewPM-O3: -0.06% NewPM-ReleaseThinLTO: -0.04% NewPM-ReleaseLTO-g: -0.04% https://llvm-compile-time-tracker.com/compare.php?from=8daa338297d533db4d1ae8d3770613eb25c29688&to=aed126a196e7a5a9803543d9b4d6bdb233d0009c&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D130694	2022-07-29 09:15:03 +01:00
Sunho Kim	e590f945c6	Revert "[JITLink][COFF] Implement include/alternatename linker directive." This reverts commit `f1fcd06a2a`. Faliures reported in https://lab.llvm.org/buildbot/#/builders/193/builds/16143 and http://lab.llvm.org/buildbot/#/builders/91/builds/13010	2022-07-29 17:03:19 +09:00
Sunho Kim	f1fcd06a2a	[JITLink][COFF] Implement include/alternatename linker directive. Implements include/alternatename linker directive. Alternatename is used by static msvc runtime library. Alias symbol is technically incorrect (we have to search for external definition) but we don't have a way to represent this in jitlink/orc yet, this is solved in the following up patch. Inlcude linker directive is used in ucrt to forcelly lookup the static initializer symbols so that they will be emitted. It's implemented as extenral symbols with live flag on that cause the lookup of these symbols. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D130276	2022-07-29 16:48:29 +09:00
Sunho Kim	049fd21b42	[JITLink][COFF][x86_64] Implement ADDR64 relocation. Implements ADDR64 relocation using x86_64 edge kind. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D130178	2022-07-29 16:32:07 +09:00
Sunho Kim	410e0aa759	[JITLink][COFF] Implement dllimport stubs. Implements dllimport stubs using GOT table manager. Benefit of using GOT table manager is that we can just reuse jitlink-check architecture. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D130175	2022-07-29 16:29:53 +09:00
Sunho Kim	6b27890b2c	[ORC][COFF] Handle COFF import files of static archive. Handles COFF import files of static archive. Changes static library genrator to build up object file map keyed by symbol name that excludes the symbols from dllimported symbols so that static generator will not be responsible for them. It exposes the list of dynamic libraries that need to be imported. Client should properly load the libraries in this list beforehand. Object file map is also an improvment from the past in terms of performance. Archive.findSym does a slow O(n) linear serach of symbol list to find the symbol. (we call findSym O(n) times, thus full time complexity is O(n^2); we were the only user of findSym function in fact) There is a room for improvements in how to load the libraries in the list. We currently just hand the responsibility over to the clinet. A better way would be let ORC read this list and hand them over to JITLink side that would also help validation (e.g. not trying to generate stub for non dllimported targets) Nevertheless, we will have to exclude the symbols from COFF import object file list and need a way to access this list, which this patch offers. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D129952	2022-07-29 16:25:08 +09:00
Changpeng Fang	2b731b30a7	AMDGPU: Take care of "tied" operand when removeOperand Summary: Flat scratch load of D16 type by default has tied vdst_in operand (with vdst). This should be taken care of at the time of "removeOperand" in eliminateFrameIndex. Otherwise we will hit an assert saying "Cannot move tied operands". This patch unties vdst_in before the move, and retie it with vdst afterwards. Reviewers: arsenm, foad Differential Revision: https://reviews.llvm.org/D130537	2022-07-28 17:30:49 -07:00
Francis Visoiu Mistrih	bfd3883e83	[Matrix] Refactor transpose distribution. NFC Use a function to distribute transposes. Preparation for future patches.	2022-07-28 17:30:00 -07:00
Felipe de Azevedo Piovezan	58526b2d2b	[GlobalISel] Handle nullptr constants in dbg.value Currently, the LLVM IR -> MIR translator fails to translate dbg.values whose first argument is a null pointer. However, in other portions of the code, such pointers are always lowered to the constant zero, for example see IRTranslator::Translate(Constant, Register). This patch addresses the limitation by following the same approach of lowering null pointers to zero. A prior test was checking that null pointers were always lowered to $noreg; this test is changed to check for zero, and the previous behavior is now checked by introducing a dbg.value whose first argument is the address of a global variable. Differential Revision: https://reviews.llvm.org/D130721	2022-07-28 14:58:14 -07:00
Felipe de Azevedo Piovezan	0ef6809c48	[GlobalISel][nfc] Remove unnecessary cast The getOperand method already returns a Constant when it is called on a ConstantExpression, as such the cast is not needed. To prevent a type mismatch between the different return statements of the lambda, the lambda return type is explicitly provided. Differential Revision: https://reviews.llvm.org/D130719	2022-07-28 14:55:07 -07:00
Anshil Gandhi	5c38056431	[AMDGPU][Scheduler] Avoid initializing Register pressure tracker when tracking is disabled When register pressure tracking is disabled, the scheduler attempts to load pressures at SReg_32 and VGPR_32. This causes an index out of bounds error. This patch fixes this issue by disabling the initialization of RPTracker when not needed. NFC Reviewed By: rampitec, kerbowa, arsenm Differential Revision: https://reviews.llvm.org/D129322	2022-07-28 15:39:28 -06:00
David Blaikie	6139626d73	llvm-dwp: Include dwo name even when the input is a dwo This still only includes the dwo name if it's in the DW_AT_dwo_name attribute in the split unit - though it could be improved/modified to use the dwo name from the command line (if linking raw dwo files) or retrieved from the DW_AT_dwo_name in the executable (when using -e). It's useful in any case because you might have a large command line with many files and knowing exactly which dwo files are relevant will simplify debugging, but especially with '-e' when you didn't pass the dwo files explicitly in nthe first place it would be quite non-obvious where the duplicate units are coming from.	2022-07-28 20:24:05 +00:00
Alexey Lapshin	e74197bc12	[Reland][Debuginfo][llvm-dwarfutil] Add check for unsupported debug sections. Current DWARFLinker implementation does not support some debug sections (mainly DWARF v5 sections). This patch adds diagnostic for such sections. The warning would be displayed for critical(such that could not be removed) sections and the source file would be skipped. Other unsupported sections would be removed and warning message should be displayed. The zero exit status would be returned for both cases. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D123623	2022-07-28 21:29:16 +03:00
Austin Kerbow	0f93a45b11	[AMDGPU] Add isMeta flag to SCHED_GROUP_BARRIER	2022-07-28 11:04:33 -07:00
Fangrui Song	c26dc2904b	[llvm-objcopy] Support --{,de}compress-debug-sections for zstd Also, add ELFCOMPRESS_ZSTD (2) from the approved generic-abi proposal: https://groups.google.com/g/generic-abi/c/satyPkuMisk ("Add new ch_type value: ELFCOMPRESS_ZSTD") Link: https://discourse.llvm.org/t/rfc-zstandard-as-a-second-compression-method-to-llvm/63399 ("[RFC] Zstandard as a second compression method to LLVM") Differential Revision: https://reviews.llvm.org/D130458	2022-07-28 10:45:53 -07:00
Austin Kerbow	f5b21680d1	[AMDGPU] Add amdgcn_sched_group_barrier builtin This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used either for testing, in situations where the default scheduler heuristics cannot be improved, or in critical kernels where users are trying to get performance that is close to handwritten assembly. Obviously using these builtins will require extra work from the kernel writer to maintain the desired behavior. The builtin can be used to create groups of instructions called "scheduling groups" where ordering between the groups is enforced by the scheduler. __builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter is a mask that determines the types of instructions that you would like to synchronize around and add to a scheduling group. These instructions will be selected from the bottom up starting from the sched_group_barrier's location during instruction scheduling. The second parameter is the number of matching instructions that will be associated with this sched_group_barrier. The third parameter is an identifier which is used to describe what other sched_group_barriers should be synchronized with. Note that multiple sched_group_barriers must be added in order for them to be useful since they only synchronize with other sched_group_barriers. Only "scheduling groups" with a matching third parameter will have any enforced ordering between them. As an example, the code below tries to create a pipeline of 1 VMEM_READ instruction followed by 1 VALU instruction followed by 5 MFMA instructions... // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 1 VALU __builtin_amdgcn_sched_group_barrier(2, 1, 0) // 5 MFMA __builtin_amdgcn_sched_group_barrier(8, 5, 0) // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 3 VALU __builtin_amdgcn_sched_group_barrier(2, 3, 0) // 2 VMEM_WRITE __builtin_amdgcn_sched_group_barrier(64, 2, 0) Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D128158	2022-07-28 10:43:14 -07:00
Craig Topper	2750873dfe	[RISCV] Update lowerFROUND to use masked instructions. This avoids a vmerge at the end and avoids spurious fflags updates. This isn't used for constrained intrinsic so we technically don't have to worry about fflags, but it doesn't cost much to support it. To support I've extend our FCOPYSIGN_VL node to support a passthru operand. Similar to what was done for VRGATHER*_VL nodes. I plan to do a similar update for trunc, floor, and ceil. Reviewed By: reames, frasercrmck Differential Revision: https://reviews.llvm.org/D130659	2022-07-28 10:05:19 -07:00
Craig Topper	89173dee71	[RISCV] Remove duplicate code. NFC The same operations are part of `FloatingPointVecReduceOps` a little bit earlier.	2022-07-28 10:05:19 -07:00
Simon Pilgrim	8c99cef1e7	[DAG] Remove SelectionDAG::GetDemandedBits and use SimplifyMultipleUseDemandedBits directly. GetDemandedBits is mainly a wrapper around SimplifyMultipleUseDemandedBits now, and is only used by DAGCombiner::visitSTORE so I've moved all remaining functionality there. visitSTORE was making use of this to 'simplify' constants for a trunc-store. Just removing this code left to a mixture of regressions and gains - it came down to whether a target preferred a sign or zero extended constant for materialization/truncation. I've just moved the code over for now, but a next step would be to move this to targetShrinkDemandedConstant, but some targets that override the method expect a basic binop, and might react badly to a store node.....	2022-07-28 17:03:44 +01:00
Philip Reames	82c1b136db	[LV] Don't predicate uniform mem op stores unneccessarily We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible. Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would not be legal to make this same change for merely "uniform" operations. Differential Revision: https://reviews.llvm.org/D130637	2022-07-28 08:55:52 -07:00
Liqiang Tao	d52e775b05	[llvm][ModuleInliner] Add inline cost priority for module inliner This patch introduces the inline cost priority into the module inliner, which uses the same computation as InlineCost. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D130012	2022-07-28 22:44:03 +08:00
Liqiang Tao	c113594378	Revert "[llvm][ModuleInliner] Add inline cost priority for module inliner" This reverts commit `bb7f62bbbd`.	2022-07-28 22:36:28 +08:00
Florian Hahn	f912bab111	Revert "[X86][DAGISel] Don't widen shuffle element with AVX512" This reverts commit `5fb4134210`. This patch is causing crashes when building llvm-test-suite when optimizing for CPUs with AVX512. Reproducer crashing with llc: target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx" define i32 @test(<32 x i32> %0) #0 { entry: %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1> %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1) ret i32 %2 } ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1 attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" } attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }	2022-07-28 15:26:42 +01:00
Simon Pilgrim	be488ba7de	[DAG] DAGCombiner::visitTRUNCATE - remove GetDemandedBits call This should now all be handled by SimplifyDemandedBits.	2022-07-28 15:23:04 +01:00
Simon Pilgrim	ea7f14dad0	[DAG] SelectionDAG::GetDemandedBits - don't simplify opaque constants I'm actually trying to get rid of GetDemandedBits - but while dismantling it I noticed that we were altering opaque constants. Fixing that causes a FP_TO_INT_SAT regression that should be addressed separately - I'll raise a bug.	2022-07-28 14:46:59 +01:00
Liqiang Tao	bb7f62bbbd	[llvm][ModuleInliner] Add inline cost priority for module inliner This patch introduces the inline cost priority into the module inliner, which uses the same computation as InlineCost. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D130012	2022-07-28 21:28:07 +08:00
Simon Pilgrim	69d5a038b9	[DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits. There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP. Differential Revision: https://reviews.llvm.org/D77804	2022-07-28 14:10:44 +01:00
Amaury Séchet	474a8ee03d	[DAG] Use recursivelyDeleteUnusedNodes in PromoteLoad It simplifies the code overall and removes the need for manual bookkeeping. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130447	2022-07-28 12:54:52 +00:00
Amaury Séchet	7920805b27	[DAG] Use recursivelyDeleteUnusedNodes in ReplaceLoadWithPromotedLoad It simplifies the code overall and removes the need for manual bookkeeping. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130444	2022-07-28 12:32:37 +00:00
Alexander Timofeev	76d9ae924c	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the `2e29b0138c` we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-07-28 14:30:29 +02:00
Sunho Kim	72ea1a721e	[ORC] Fix weak hidden symbols failure on PPC with runtimedyld Fix "JIT session error: Symbols not found: [ DW.ref.__gxx_personality_v0 ] error" which happens when trying to use exceptions on ppc linux. To do this, it expands AutoClaimSymbols option in RTDyldObjectLinkingLayer to also claim weak symbols before they are tried to be resovled. In ppc linux, DW.ref symbols is emitted as weak hidden symbols in the later stage of MC pipeline. This means when using IRLayer (i.e. LLJIT), IRLayer will not claim responsibility for such symbols and RuntimeDyld will skip defining this symbol even though it couldn't resolve corresponding external symbol. Reviewed By: sgraenitz Differential Revision: https://reviews.llvm.org/D129175	2022-07-28 21:12:25 +09:00
Dmitry Preobrazhensky	2b230d69ad	[AMDGPU][MC][GFX90A] Correct MIMG dst size validation Correct validator to enable MIMG dst size checks. Differential Revision: https://reviews.llvm.org/D130512	2022-07-28 14:30:08 +03:00
Sanjay Patel	28ad5dc3f7	[InstCombine] try harder to narrow bitwise logic with cast operands This works with any logic + extend: https://alive2.llvm.org/ce/z/vzsqQD The motivating case is from issue #56294, but that's still not optimal (it should simplify completely).	2022-07-28 07:23:22 -04:00
Dmitry Preobrazhensky	fa7fd8ec31	[AMDGPU][MC][GFX11] Disable SGPRs for src1 of v_fma_mix*_dpp opcodes Differential Revision: https://reviews.llvm.org/D130634	2022-07-28 14:20:05 +03:00
chendewen	7eeb468ae5	[Aarch64] Add cost for missing extensions. This patch adds a cost estimate for some missing sign extensions. ref: https://reviews.llvm.org/D14730 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D130565	2022-07-28 17:34:00 +08:00
Florian Hahn	8daa338297	[SCEV] Avoid repeated proveNoUnsignedWrapViaInduction calls. At the moment, proveNoUnsignedWrapViaInduction may be called for the same AddRec a large number of times via getZeroExtendExpr. This can have a severe compile-time impact for very loop-heavy code. One one particular workload, LSR takes ~51s without this patch, almost exlusively in proveNoUnsignedWrapViaInduction. With this patch, the time in LSR drops to ~0.4s. If proveNoUnsignedWrapViaInduction failed to prove NUW the first time, it is unlikely to succeed on subsequent tries and the cost doesn't seem to be justified. Besides drastically improving compile-time in some excessive cases, this also has a slightly positive compile-time impact on CTMark: NewPM-O3: -0.07% NewPM-ReleaseThinLTO: -0.08% NewPM-ReleaseLTO-g: -0.06 https://llvm-compile-time-tracker.com/compare.php?from=b435da027d7774c24cdb8c88d09f6b771e07fb14&to=f2729e33e8284b502f6c35a43345272252f35d12&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D130648	2022-07-28 10:02:19 +01:00
David Spickett	a0ccba5e19	[llvm] Fix some test failures with EXPENSIVE_CHECKS and libstdc++ DebugLocEntry assumes that it either contains 1 item that has no fragment or many items that all have fragments (see the assert in addValues). When EXPENSIVE_CHECKS is enabled, _GLIBCXX_DEBUG is defined. On a few machines I've checked, this causes std::sort to call the comparator even if there is only 1 item to sort. Perhaps to check that it is implemented properly ordering wise, I didn't find out exactly why. operator< for a DbgValueLoc will crash if this happens because the optional Fragment is empty. Compiler/linker/optimisation level seems to make this happen or not. So I've seen this happen on x86 Ubuntu but the buildbot for release EXPENSIVE_CHECKS did not have this issue. Add an explicit check whether we have 1 item. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D130156	2022-07-28 08:53:38 +00:00
Max Kazantsev	2d1c6e0b44	[LAA] Remove block order sensitivity in LAA algorithm. PR56672 As test in PR56672 shows, LAA produces different results which lead to either positive or negative vectorization decisions depending on the order of blocks in loop. The exact reason of this is not clear to me, however this makes investigation of related bugs extremely complex. Current order of blocks in the loop is arbitrary. It may change, for example, if loop info analysis is dropped and recomputed. Seems that it interferes with LAA's logic. This patch chooses fixed traversal order of blocks in loops, making it RPOT. Note: this is not a fix for bug with incorrect analysis result. It just makes the answer more robust to make the investigation easier. Differential Revision: https://reviews.llvm.org/D130482 Reviewed By: aeubanks, fhahn	2022-07-28 13:36:56 +07:00
Phoebe Wang	726d9f8e8c	[X86][MC] Avoid emitting incorrect warning for complex FMUL We will insert a new operand which is identical to the Dest for complex FMUL with a mask. https://godbolt.org/z/eTEdnYv3q Complex FMA and FMUL with maskz don't have this problem. Reviewed By: LuoYuanke, skan Differential Revision: https://reviews.llvm.org/D130638	2022-07-28 13:58:34 +08:00
Austin Kerbow	ba0d079c7a	[AMDGPU] Aggressively schedule to reduce RP in occupancy limited regions By not clustering loads and adjusting heuristics to more aggressively reduce register pressure we may be able to increase occupancy for the function if it was dropped in a first pass scheduling. Similarly, try to reduce spilling if register usage exceeds lower bound occupancy. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130329	2022-07-27 22:34:37 -07:00
Amara Emerson	93e3aeb9a8	[AArch64][GlobalISel] Fix custom legalization of rotates using sext for shift vs zext. Rotates are defined according to DAG documentation as having unsigned shifts, so we need to zero-extend instead of sign-extend here. Fixes issue 56664	2022-07-27 22:10:42 -07:00
Carl Ritson	dbda30e294	[AMDGPU][SIFoldOperands] Clear kills when folding COPY Clear all kill flags on source register when folding a COPY. This is necessary because the kills may now be out of order with the uses. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130622	2022-07-28 11:57:55 +09:00
Craig Topper	a304d70ee9	[RISCV] Reorder (and/or/xor (shl X, C1), C2) if we can form ANDI/ORI/XORI. InstCombine and DAGCombine prefer to keep shl before binops. This patch teaches isel to convert to (shl (and/or/xor X, C1 >> C2), C2) if (C1 >> C2) is a simm12. The idea was taken from X86's isel code. There's a special case implemented for a sext_inreg between the shift and the binop. Differential Revision: https://reviews.llvm.org/D130610	2022-07-27 17:35:26 -07:00
Craig Topper	1d1d8d6025	[RISCV] Reorder code in lowerFROUND to make the diff in D130659 cleaner. NFC	2022-07-27 17:13:04 -07:00
Matt Arsenault	bfdca1535c	RegAllocGreedy: Fix nondeterminism in tryLastChanceRecoloring tryLastChanceRecoloring iterates over the set of LiveInterval pointers and used that to seed the recoloring stack, which was nondeterministic. Fixes a future test failing about 20% of the time. This just takes the order the interfering vreg was encountered. Not sure if we should try to order this more intelligently.	2022-07-27 19:02:06 -04:00
Craig Topper	98647330bf	[RISCV] Add merge operand to RISCVISD::FCOPYSIGN_VL. Similar to what was done for VRGATHER*_VL recently. This will be used in D130659.	2022-07-27 15:25:34 -07:00
Paul Kirth	6e9bab71b6	Revert "[llvm][NFC] Refactor code to use ProfDataUtils" This reverts commit `300c9a7881`. We will reland once these issues are ironed out.	2022-07-27 21:38:11 +00:00
Paul Kirth	300c9a7881	[llvm][NFC] Refactor code to use ProfDataUtils In this patch we replace common code patterns with the use of utility functions for dealing with profiling metadata. There should be no change in functionality, as the existing checks should be preserved in all cases. Reviewed By: bogner, davidxl Differential Revision: https://reviews.llvm.org/D128860	2022-07-27 21:13:54 +00:00
Paul Kirth	6047deb7c2	[llvm] Provide utility function for MD_prof Currently, there is significant code duplication for dealing with MD_prof metadata throughout the compiler. These utility functions can improve code reuse and simplify boilerplate code when dealing with profiling metadata, such as branch weights. The inent is to provide a uniform set of APIs that allow common tasks, such as identifying specific types of MD_prof metadata and extracting branch weights. Future patches can build on this initial implementation and clean up the different implementations across the compiler. Reviewed By: bogner Differential Revision: https://reviews.llvm.org/D128858	2022-07-27 21:13:51 +00:00
Adrian Prantl	719ab04acf	[GlobalISel] Handle IntToPtr constants in dbg.value Currently, the IR to MIR translator can only handle two kinds of constant inputs to dbg.values intrinsics: constant integers and constant floats. In particular, it cannot handle pointers created from IntToPtr ConstantExpression objects. This patch addresses the limitation above by replacing the IntToPtr with its input integer prior to converting the dbg.value input. Patch by Felipe Piovezan! Differential Revision: https://reviews.llvm.org/D130642	2022-07-27 13:42:07 -07:00
Philip Reames	15c645f7ee	[RISCV] Enable (scalable) vectorization by default This change enables vectorization (using scalable vectorization only, fixed vectors are not yet enabled) for RISCV when vector instructions are available for the target configuration. At this point, the resulting configuration should be both stable (e.g. no crashes), and profitable (i.e. few cases where scalar loops beat vector ones), but is not going to be particularly well tuned (i.e. we emit the best possible vector loop). The goal of this change is to align testing across organizations and ensure the default configuration matches what downstreams are using as closely as possible. This exposes a large amount of code which hasn't otherwise been on by default, and thus may not have been fully exercised. Given that, having issues fall out is not unexpected. If you find issues, please make sure to include as much information as you can when reverting this change. Differential Revision: https://reviews.llvm.org/D129013	2022-07-27 12:36:04 -07:00
Stanislav Mekhanoshin	68901fdbeb	[AMDGPU] Consider S_SETPRIO a scheduling boundary The instruction is used to modify wave priority with the intent to affect VALU execution and currently we can reschedule VALU around it since that VALU does not have side effects. Differential Revision: https://reviews.llvm.org/D130654	2022-07-27 11:50:23 -07:00
Jonas Devlieghere	a8c3d9815e	[DebugInfo] Teach LLVM and LLDB about ptrauth in DWARF Teach libDebugInfo (llvm-dwarfdump) and lldb about DWARF tags and attributes for pointer authentication. These values have been emitted by Apple clang for several releases. Although upstream LLVM doesn't emit these values yet, we hope to upstream that part sometime soon. Differential revision: https://reviews.llvm.org/D130215	2022-07-27 11:48:35 -07:00
Amara Emerson	65246d3eb4	Use hasNItemsOrLess() in MRI::hasAtMostUserInstrs().	2022-07-27 11:42:14 -07:00
Florian Hahn	16e0620d6d	[VPlan] Mark VPPredInstPHIRecipe as not having side-effects. Now that all uses of VPPredInstPHIRecipes are properly modeled, they can be treated as not having side-effects, enabling removal.	2022-07-27 19:29:26 +01:00
Mingming Liu	34348814e1	[AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.pmull64 Without this, the intrinsic will be expanded to an integer; thereby an explicit copy (from GPR to SIMD register) will be codegen'd. This matches the general convention of using "v1" types to represent scalar integer operations in vector registers. The similar approach is observed in D56616, and the pattern likely applies on other intrinsic that accepts integer scalars (e.g., int_aarch64_neon_sqdmulls_scalar) Differential Revision: https://reviews.llvm.org/D130548	2022-07-27 11:11:16 -07:00
Amara Emerson	19cdd1908b	[AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT. This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of materializing a specific constant in code size. Doing so prevents us from sinking constants which require multiple instructions to generate into use blocks. Code size savings on CTMark -Os: Program size.__text before after diff ClamAV/clamscan 381940.00 382052.00 0.0% lencod/lencod 428408.00 428428.00 0.0% SPASS/SPASS 411868.00 411876.00 0.0% kimwitu++/kc 449944.00 449944.00 0.0% Bullet/bullet 463588.00 463556.00 -0.0% sqlite3/sqlite3 284696.00 284668.00 -0.0% consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0% 7zip/7zip-benchmark 595244.00 594972.00 -0.0% mafft/pairlocalalign 247512.00 247368.00 -0.1% tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2% Geomean difference -0.0% Differential Revision: https://reviews.llvm.org/D130554	2022-07-27 10:51:16 -07:00
Stanislav Mekhanoshin	0562cf442f	Allow data prefetch into non-default address space I am playing with the LoopDataPrefetch pass and found out that it bails to work with a pointer in a non-zero address space. This patch adds the target callback to check if an address space is to be considered for prefetching. Default implementation still only allows address space 0, so this is NFCI. This does not currently affect any known targets, but seems to be generally useful for the future. Differential Revision: https://reviews.llvm.org/D129795	2022-07-27 10:01:26 -07:00
Eli Friedman	1a6d82b93f	Fix misc uses of "long" variables to use "int64_t". I don't have any evidence these particular uses are actually causing any issues, but we should avoid accidentally truncating immediate values depending on the host.	2022-07-27 09:47:19 -07:00
Craig Topper	32622d6de4	[RISCV] Add isel pattern for (mul (and X, 0xffffffff), 3<<C) with Zba. We can use slli.uw by C followed by sh1add. Similar can be done for multiples of 5 and 9. We need to make sure that C is less than 32 to stay in bounds of the 5-bit immediate for slli.uw. We have existing patterns for (mul X, 3<<C) that use sh1add followed by slli. That order doesn't allow the and to be folded. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130146	2022-07-27 09:41:59 -07:00
Craig Topper	9b27d13204	[RISCV] Disable constant hoisting for multiply by negated power of 2. A mul by a negated power of 2 is a slli followed by neg. This doesn't require any constant materialization and may be lower latency than mul. The neg may also be foldable into other arithmetic. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130047	2022-07-27 09:37:59 -07:00
Umesh Kalappa	f38ea84a9f	[PowerPC] Change long to int64_t (which is always 64 bit or 8 bytes ) We can't guarantee the long always 64 bits like WINDOWS or LLP64 data model (rare but we should consider). So use int64_t from inttypes.h and safe in this case. Fixes https://github.com/llvm/llvm-project/issues/55911 .	2022-07-27 09:34:45 -07:00
Sanjay Patel	e079bf6558	[AggressiveInstCombine] check sqrt operand to allow more libcall->intrinsic transforms This should fix issue #56383 (at least when compiled with -O3 because this pass is only run at -O3 currently).	2022-07-27 11:36:13 -04:00
Dmitri Gribenko	b435da027d	[amdgpu][nfc] Fix build with a certan Clang version It errors out in the Bazel CI: AMDGPULowerModuleLDSPass.cpp:384:12: error: chosen constructor is explicit in copy-initialization return {SGV, std::move(Map)}; Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D130623	2022-07-27 17:29:36 +02:00
Joseph Huber	b08369f7f2	Revert "[OpenMP] Remove noinline attributes in the device runtime" The behaviour of this patch is not great, but it has some side-effects that are required for OpenMPOpt to work. The problem is that when we use `-mlink-builtin-bitcode` we only import used symbols from the runtime. Then OpenMPOpt will insert calls to symbols that were not previously included. This patch removed this implicit behaviour as these functions were kept alive by the `noinline` simply because it kept calls to them in the module. This caused regression in some tests that relied on some OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but will try to fix it more correctly on main. This reverts commit `d61d72dae6`. Fixes #56752	2022-07-27 11:09:18 -04:00
Quinn Pham	b6cc5ddc94	[libLTO] Set data-sections by default in libLTO. This patch changes legacy LTO to set data-sections by default. The user can explicitly unset data-sections. The reason for this patch is to match the behaviour of lld and gold plugin. Both lld and gold plugin have data-sections on by default. This patch also fixes the forwarding of the clang options -fno-data-sections and -fno-function-sections to libLTO. Now, when -fno-data/function-sections are specified in clang, -data/function-sections=0 will be passed to libLTO to explicitly unset data/function-sections. Reviewed By: w2yehia, MaskRay Differential Revision: https://reviews.llvm.org/D129401	2022-07-27 09:39:39 -05:00
Quinn Pham	70ec8cd024	Revert "[libLTO] Set data-sections by default in libLTO." This reverts commit `f565444b48`.	2022-07-27 08:47:00 -05:00
Quinn Pham	f565444b48	[libLTO] Set data-sections by default in libLTO. This patch changes legacy LTO to set data-sections by default. The user can explicitly unset data-sections. The reason for this patch is to match the behaviour of lld and gold plugin. Both lld and gold plugin have data-sections on by default. This patch also fixes the forwarding of the clang options -fno-data-sections and -fno-function-sections to libLTO. Now, when -fno-data/function-sections are specified in clang, -data/function-sections=0 will be passed to libLTO to explicitly unset data/function-sections. Reviewed By: w2yehia, MaskRay Differential Revision: https://reviews.llvm.org/D129401	2022-07-27 08:34:40 -05:00
Simon Pilgrim	c0b3f7a50f	[DAG] SimplifyDemandedBits - ensure we clear known One bits that AssertZext asserts are really known Zero Matches ComputeKnownBits behaviour Thanks to @uabelho for the fuzz regression report on D129765	2022-07-27 13:57:47 +01:00
Aaron Kogon	dd3ca65c37	Sinking or hoisting instructions between loops before fusion Instructions between two adjacent loops will be hoisted above the first loop, or sunk below the second to facilitate loop fusion. Hoisting will be attempted for an instruction that dominates the first loop. Otherwise, sinking this instructions will be attempted. Instructions with side effects will not be considered for sinking or hoisting. Hoisting/sinking of any instructions between loops will only be performed if all the instructions can be moved. As well, sinking/hoisting is considered for each instruction in isolation, without taking into account sinking/hoisting decisions for other instructions in the preheader. Differential Revision: https://reviews.llvm.org/D118076	2022-07-27 06:55:09 -04:00
LiaoChunyu	bf4f9a468a	[RISCV]Enable isIntDivCheap when attribute is minsize Don't expand divisions by constants when attribute is minsize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130543	2022-07-27 18:22:51 +08:00
Rainer Orth	979ddfff37	[Support] Handle SPARC in sys::getHostCPUName While working on D118450 <https://reviews.llvm.org/D118450>, I noticed that `sys::getHostCPUName` lacks SPARC support. This patch implements it. The code is taken from/inspired by GCC's `gcc/config/sparc/driver-sparc.cc`. There's one caveat: since LLVM, unlike GCC, doesn't support the SPARC-M7, -S7, and -M8 CPUs, I map all those to the latest supported one (UltraSparc T4/`niagara4`). Tested on `sparcv9-sun-solaris2.11` and `sparc64-unknown-linux-gnu` by running `savcov --version` on - Netra SPARC S7-2 (SPARC-S7, Solaris 11.4) - SPARC T5-2 (SPARC T5, Solaris 11.4) - SPARC Enterprise T5220 (UltraSPARC T2, Solaris 11.3) - SPARC T5 (UltraSPARC T5, Debian sid) - SPARC T3 (UltraSPARC T3, Debian sid) - SPARC Enterprise T5220 (Debian sid) Differential Revision: https://reviews.llvm.org/D130272	2022-07-27 12:21:03 +02:00
Simon Pilgrim	529bd4f352	[DAG] SimplifyDemandedBits - don't early-out for multiple use values SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits. @lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern. Differential Revision: https://reviews.llvm.org/D129765	2022-07-27 10:54:06 +01:00
Zi Xuan Wu (Zeson)	70b8b738c5	[CSKY] Fix the btsti16 instruction missing in generic processor Normally, generic processor does not have any SubtargetFeature. And it can just generate most basic instructions which have no Predicates to guard. But it needs to enbale predicate for the btsti16 instruction as one of the most basic instructions. Or the generic processor can't finish codegen process. So Add FeatureBTST16 SubtargetFeature to generic ProcessorModel.	2022-07-27 17:39:15 +08:00
Alexey Lapshin	79ff02a122	Revert "[Debuginfo][llvm-dwarfutil] Add check for unsupported debug sections." This reverts commit `0d191b7553`.	2022-07-27 11:48:56 +03:00
Ying Yi	8d46cb343f	Emit a simple StackSizesSection on PS4. Differential Revision: https://reviews.llvm.org/D130495	2022-07-27 09:39:24 +01:00
David Green	39f8384964	[ARM] Correct features on pacbti instructions. Given a patch like D129506, using instructions not valid for the current feature set becomes an error. This updates the Arm hint-space instructions for pac/bti to require thumbv7m as opposed to 8.1-m.main, to make them valid when compiling for thumbv7m with -mbranch-protection. Differential Revision: https://reviews.llvm.org/D129692	2022-07-27 09:15:14 +01:00
Nikita Popov	b1b1086973	[ARM] Add target feature to force 32-bit atomics This adds a +atomic-32 target feature, which instructs LLVM to assume that lock-free 32-bit atomics are available for this target, even if they usually wouldn't be. If only atomic loads/stores are used, then this won't emit libcalls. If atomic CAS is used, then the user is responsible for providing any necessary __sync implementations (e.g. by masking interrupts for single-core privileged use cases). See https://reviews.llvm.org/D120026#3674333 for context on this change. The tl;dr is that the thumbv6m target in Rust has historically made atomic load/store only available, which is incompatible with the change from D120026, which switched these to use libatomic. Differential Revision: https://reviews.llvm.org/D130480	2022-07-27 10:00:31 +02:00
Amara Emerson	9cc1dd209d	[AArch64][GlobalISel] Lower vector G_CTTZ. Fixes issue 56398	2022-07-27 00:14:30 -07:00
Kirill Stoimenov	d6e1e0a019	[ASan] Use stack safety analysis to optimize allocas instrumentation. Added alloca optimization which was missed during the implemenation of D112098. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130503	2022-07-26 18:48:16 -07:00
Jon Chesterfield	3ccd88f209	[amdgpu][nfc] Separate processUsedLDS into independent pieces, rename it	2022-07-27 01:55:43 +01:00
Jon Chesterfield	9981afdd42	[amdgpu][nfc] Extract kernel annotation from processUsedLDS	2022-07-27 01:38:41 +01:00
Tom Stellard	fd84d97ba6	Revert "[Support] Workaround compiler bug in MSVC" This reverts commit `ec8f4fd68c`. This caused a failure in the mlir-windows bot.	2022-07-26 15:49:35 -07:00
Dmitry Vassiliev	e3e63f30a5	[CodeGen] Fixed ambiguous symbol ExtAddrMode in case of NDEBUG and LLVM_ENABLE_DUMP This patch fixes the following error with MSVC 16.9.2 in case of NDEBUG and LLVM_ENABLE_DUMP: llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2872: 'ExtAddrMode': ambiguous symbol llvm/include/llvm/CodeGen/TargetInstrInfo.h(86): note: could be 'llvm::ExtAddrMode' llvm/lib/CodeGen/CodeGenPrepare.cpp(2447): note: or '`anonymous-namespace'::ExtAddrMode' llvm/lib/CodeGen/CodeGenPrepare.cpp(2581): error C2039: 'print': is not a member of 'llvm::ExtAddrMode' Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D130426	2022-07-27 00:21:57 +02:00
Martin Sebor	4447603616	[InstCombine] Fold strtoul and strtoull and avoid PR #56293 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D129224	2022-07-26 14:11:40 -06:00
Jon Chesterfield	923b90bddb	[amdgpu][nfc] Separate LDS struct creation from RAUW	2022-07-26 20:59:17 +01:00

... 4 5 6 7 8 ...

160922 Commits