llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	e18b7c7ae0	[llvm-objcopy] Support --decompress-debug-sections when zlib is disabled When zlib is disabled at build time, the diagnostic `LLVM was not compiled with LLVM_ENABLE_ZLIB: cannot decompress` for --decompress-debug-sections may be inaccurate: if zstd is enabled, we should still support zstd decompression. It's not useful to test zlib and zstd. Just remove the diagnostic and add a new one before `compression::decompress`. This fixes compress-debug-sections-zstd.test Reviewed By: mariusz-sikora-at-amd, jhenderson, phosek Differential Revision: https://reviews.llvm.org/D135744	2022-10-12 11:52:05 -07:00
Arthur Eubanks	f59e1bcc22	[PrintPipeline] Handle CoroConditionalWrapper and add more verification Add a check (can be disabled via a flag) that the pipeline we generate is actually parsable. Can be disabled because we don't expect to handle every pass in -print-pipeline-passes. Fixes #58280. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135703	2022-10-12 09:36:45 -07:00
Mirko Brkusanin	8b8463ef6c	[SelectionDAG] Use consistent type sizes for opcode	2022-10-12 17:33:04 +02:00
Sanjay Patel	7b9482df3d	[InstCombine] fold sdiv with common shl amount in operands (X << Z) / (Y << Z) --> X / Y https://alive2.llvm.org/ce/z/CLKzqT This requires a surprising "nuw" constraint because we have to guard against immediate UB via signed-div overflow with -1 divisor. This extends `008a89037a` and is another transform derived from issue #58137.	2022-10-12 11:32:15 -04:00
Alexey Bataev	d71ad41080	[SLP]Fix insertpoint of the extractellements instructions to avoid reshuffle crash. Need to set the insertpoint for extractelement to point to the first instruction in the node to avoid possible crash during external uses combine process. Without it we may endup with the incorrect transformation. Differential Revision: https://reviews.llvm.org/D135591	2022-10-12 08:18:30 -07:00
Sanjay Patel	008a89037a	[InstCombine] fold udiv with common shl amount in operands (X << Z) / (Y << Z) --> X / Y https://alive2.llvm.org/ce/z/E5eaxU This fixes the motivating example from issue #58137, but it is not the most general transform. We should probably also convert left-shift in the divisor to right-shift in the dividend for that, but that exposes another missed canonicalization for shifts and adds.	2022-10-12 11:12:26 -04:00
Jordan Rupprecht	cbae57c0e1	[NFC] Ignore unused var in no-asserts builds	2022-10-12 08:11:10 -07:00
Alexey Bataev	1be3428ea0	[SLP]Fix PR58177: Improve isUndefVector function to avoid extra freeze. Freeze instruction in some cases makes codegen worse, so need to be very careful when emitting it. Instead improve analysis in isUndefVector function to generate mask of unused elements and use it in the analysis. Differential Revision: https://reviews.llvm.org/D135382	2022-10-12 07:32:54 -07:00
gonglingqin	ec2640bf3a	[LoongArch] Handle missing CondCodes Support SETLE/SETEQ and expand SETGE/SETNE/SETGT Differential Revision: https://reviews.llvm.org/D135511	2022-10-12 21:26:57 +08:00
Sanjay Patel	fe97f95036	[InstCombine] propagate "exact" through folds of div These folds were added recently with: `6b869be810` `8da2fa856f` ...but they didn't account for the "exact" attribute, and that can be safely propagated: https://alive2.llvm.org/ce/z/F_WhnR https://alive2.llvm.org/ce/z/ft9Cgr	2022-10-12 09:25:05 -04:00
Sanjay Patel	d117ee25b8	[InstCombine] add helper function for div+shl folds; NFC There are at least 2 similar patterns that could be added here, and the existing fold can be improved because it fails to propagate "exact".	2022-10-12 09:25:04 -04:00
gonglingqin	b1d7a95e4e	[LoongArch] Add earlyclobber of destination register to atomic instructions If the AM* atomic memory access instruction has the same register number as rd and rj, the execution will trigger an Instruction Non-defined Exception. If the AM* atomic memory access instruction has the same register number as rd and rk, the execution result is uncertain. Reference: https://github.com/loongson/LoongArch-Documentation Differential Revision: https://reviews.llvm.org/D135641	2022-10-12 21:09:21 +08:00
Florian Hahn	c1fe52bfa6	[VPlan] Remove dead recipes before sinking. optimizeInductions may leave dead recipes which can prevent sinking. Sinking on the other hand should not introduce new dead recipes, so clean up dead recipes before sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133762	2022-10-12 12:49:42 +01:00
Max Kazantsev	fbad5fdc03	[NFC] Perform all legality checks for non-trivial unswitch in one function They have been scattered over the code. For better structuring, perform them in one place. Potential CT drop is possible because we collect exit blocks twice, but it's small price to pay for much better code structure.	2022-10-12 18:35:12 +07:00
Luo, Yuanke	f885c08034	Don't widen shuffle element with AVX512 Fix crash issue of D129537 and reopen it. Currently the X86 shuffle lowering would widen the element type for shuffle if the mask element value is adjacent. For below example %t2 = add nsw <16 x i32> %t0, %t1 %t3 = sub nsw <16 x i32> %t0, %t1 %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3, <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15> ret <16 x i32> %t4 Compiler would transform the shuffle to %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3, <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> This may lose the oppotunity to let ISel select mask instruction when avx512 is enabled. This patch is to prevent the tranform when avx512 feature is enabled. Thank Simon for the idea. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130830	2022-10-12 19:18:10 +08:00
David Green	1e723b7ab3	Revert "[AArch64] Add support for 128-bit non temporal loads." This reverts commit `661403b85c` as the custom lowering of loads prevents expanding unaligned loads with strict-align.	2022-10-12 11:11:32 +01:00
Martin Storsjö	a07787c9a5	[AArch64] Exclude instructions after setting the FP from SEH prologues After setting up the FP, the rest of the prologue doesn't need to be replayed for unwinding the stack frame. This allows reverting the functional parts of `2f7fbf8376` (but fixing inconsistent duplicate setting of HasWinCFI). Differential Revision: https://reviews.llvm.org/D135686	2022-10-12 12:36:21 +03:00
Cullen Rhodes	388cacb341	[AArch64][SVE] Add instcombine for PTEST_ANY(X=OP(PG,...), X) -> PTEST_ANY(PG, X)) Given this is an OR reduction the two are equivalent and later optimizations (AArch64InstrInfo::optimizePTestInstr) may rewrite the sequence to use the flag-setting variant of instruction X, to remove the PTEST altogether. Reviewed By: paulwalker-arm, bsmith Differential Revision: https://reviews.llvm.org/D134946	2022-10-12 09:14:08 +00:00
Cullen Rhodes	a17fcb2230	[AArch64][SVE] Fix BRKNS bug in optimizePTestInstr The BRKNS instruction is unlike the other instructions that set flags since it has an all active implicit predicate, so the existing PTEST(PG, BRKN(PG, A, B)) -> BRKNS(PG, A, B) in AArch64InstrInfo::optimizePTestInstr is incorrect, however PTEST(PTRUE_B(31), BRKN(PG, A, B)) -> BRKNS(PG, A, B) is correct. Spotted by @paulwalker-arm in D134946. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135655	2022-10-12 08:34:41 +00:00
Martin Storsjö	afa6bb643f	[MC] [Win64EH] Generate ARM64 packed unwind info with signed return addresses Differential Revision: https://reviews.llvm.org/D135660	2022-10-12 11:07:12 +03:00
Martin Storsjö	c43bff64e9	[AArch64] Add support for the SEH opcode for return address signing This was documented upstream in https://github.com/MicrosoftDocs/cpp-docs/pull/4202. Differential Revision: https://reviews.llvm.org/D135276	2022-10-12 11:07:11 +03:00
Max Kazantsev	6bfcac612f	[SimpleLoopUnswitch][NFC] Separate legality checks from cost computation These are semantically two different stages, but were entwined in the old implementation. Now cost computation does not do legality checks, and they all are done beforehead.	2022-10-12 13:31:36 +07:00
Max Kazantsev	421728b40c	[NFC] Factor out computation of best unswitch cost candidate Split out a major peice of this method to make code more readable.	2022-10-12 12:36:46 +07:00
chenglin.bi	41f5bbe18b	[AArch64][Windows] Check sret attribute also for inreg attribute Fix the issue: #57684 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135512	2022-10-12 09:58:50 +08:00
Yeting Kuo	2749b942e9	[RISCV] Add isel patterns for vmacc, vnmsac. The patch selects VSELECT/VP_MERGE_VL which uses fmadd/fnmsub as true operand and the adden of the fmadd/fnmsub as false operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D135330	2022-10-12 09:19:01 +08:00
Craig Topper	1bdf21d55c	[RISCV] Use mask/tail agnostic if tied source is IMPLICIT_DEF regardless of the policy operand. If the source is implicit_def, the register allocator won't have any constraint on what register it picks for the destination. This doesn't give the user much control of what register is being used. So in my mind that means the only reason to honor the policy operand is to control what policy is used in vsetvli to maybe avoid a vtype change. Given the other optimizations we do on the policy field, I don't think allowing the user this control is reliable. Therefore, I think we should use agnostic policies if the source is undef. This should give better performance on some CPUs for VP intrinsics where there is no merge operand and the backend adds IMPLICIT_DEF to the instruction. Differential Revision: https://reviews.llvm.org/D135396	2022-10-11 16:40:16 -07:00
Craig Topper	ac9209751a	Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))" This reverts commit `0148df8157`. Getting a lit test failures on AMDGPU but I can't reproduce it so far. Reverting to investigate.	2022-10-11 16:30:40 -07:00
Craig Topper	902c1b3c2f	[RISCV] Remove unused SchedClass WriteVFCvtFToFV. NFC This isn't bound to any instruction. From the section comment it was for single-width F-to-F conversions, but those don't exist.	2022-10-11 16:26:06 -07:00
Craig Topper	0148df8157	[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y)) (sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y. This pattern shows up when type legalizing wide multiplies involving a sign extended value. Fixes PR57549. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133399	2022-10-11 16:20:55 -07:00
Jessica Paquette	0f1a51e173	[GlobalISel] Allow vectors in redundant or + add combines We support KnownBits for vectors, so we can enable these. https://godbolt.org/z/r9a9W4Gj1 Differential Revision: https://reviews.llvm.org/D135719	2022-10-11 15:31:09 -07:00
Fangrui Song	8ef3fd8d59	[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its leader has been made available_externally. This violates the COMDAT rule that its members must be retained or discarded as a unit. To fix this, update the regular LTO change D34803 to track local linkage GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.) This fixes two problems. (a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to linger around (unreferenced, hence benign), and is now correctly discarded. ``` int foo(); inline int v = foo(); ``` (b) Fix https://github.com/llvm/llvm-project/issues/58215: as a size optimization, we place private `__profd_` in a COMDAT with a `__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change makes the `__profd_` available_externally. ``` cat > c.h <<'eof' extern void bar(); inline __attribute__((noinline)) void foo() {} eof cat > m1.cc <<'eof' int main() { bar(); foo(); } eof cat > m2.cc <<'eof' __attribute__((noinline)) void bar() { foo(); } eof clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw ``` Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D135427	2022-10-11 15:30:07 -07:00
Fangrui Song	d8162a7196	[MC] .addrsig_sym: ignore unregistered symbols .addrsig_sym forces registering the symbol regardless whether it is otherwise registered. This creates an undefined symbol which is inconvenient/undesired: * `extern int x; void f() { (void)x; }` has inconsistent behavior whether `x` is emitted as an undefined symbol. `-O0 -faddrsig` makes `x` undefined while other -O levels and -fno-addrsig eliminate the symbol. * In ThinLTO, after a non-prevailing linkonce_odr definition is converted to available_externally, and then a declaration, the addrsig code emits a symbol while the symbol is otherwise unseen. D135427 fixed a bug that a non-prevailing `__cxx_global_var_init` was incorrectly retained. However, the IR declaration causes an undesired `.addrsig_sym __cxx_global_var_init`. This can be addressed in a way similar to D101512 (`isTransitiveUsedByMetadataOnly`) but the increased `OutStreamer->emitAddrsigSym(getSymbol(&GV));` complexity makes me nervous. Just ignoring unregistered symbols circumvents the problem. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D135642	2022-10-11 15:07:14 -07:00
Arthur Eubanks	60e4af7ab8	[CallGraph] Port -print-callgraph-sccs to new pass manager And remove the legacy opt-specific pass. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135487	2022-10-11 14:43:16 -07:00
Chris Bieneman	2b2afb2529	[DX] Add analysis and printer for shader flags This adds infrastructural pieces for an analysis to compute the DXIL shader flags. In this state the analysis can compute two fairly straightforward feature flags for use of double-precision floating point values and the DX 11.1 extended double support. This patch does conflict with D135190, conflicts will be resolved prior to merging. Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D135393 # Conflicts: # llvm/lib/Target/DirectX/CMakeLists.txt # llvm/lib/Target/DirectX/DirectXTargetMachine.cpp	2022-10-11 14:27:05 -05:00
Abinav Puthan Purayil	3d9f011a9c	[AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later Unfortunately, we have a broken handling of this in the runtime of rocm 5.3. The runtime is expected to handle this correctly when v5 becomes the default. Differential Revision: https://reviews.llvm.org/D134714	2022-10-11 23:28:43 +05:30
Jessica Paquette	036a13065b	[GlobalISel] Combine (X op Y) == X --> Y == 0 This matches patterns of the form ``` (X op Y) == X ``` And transforms them to ``` Y == 0 ``` where appropriate. Example: https://godbolt.org/z/hfW811c7W Differential Revision: https://reviews.llvm.org/D135380	2022-10-11 09:52:48 -07:00
Philip Reames	487695e7c9	[SDAG] Treat DemandedElts argument to isSplatVector as splat for scalable vectors [nfc] The previous code used a APInt(1, 0) to represent the demanded elts of a scalable vector, and then ignored that argument if type was scalable. This was inconsistent with the UndefElts parameter which is set to either APInt(1, 0) or APInt(1,1) - that is, implicitly broadcast across all lanes. Particularly since the undef code relied on the DemandedElts parameter having bitwidth 1 to achieve that result! This change switches the demanded parameter to APInt(1,1), documents the broadcast semantics, and takes advantage of it to remove one special case for scalable vectors which is no longer required.	2022-10-11 09:49:28 -07:00
Joe Nash	3648fc5b42	[AMDGPU] Make disassembler convertFMAanyK call more generic Make support more generic to support future instructions. Currently NFC. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D135678	2022-10-11 11:22:25 -04:00
Krzysztof Parzyszek	cb6804104f	[Hexagon] Remove unused function, NFC	2022-10-11 08:05:22 -07:00
Philip Reames	ac4f3fff8c	[SDAG] Clarify behavior of scalable demanded/undef elts in isSplatValue [nfc] Update comment, and add an assertion to check property expected by sole (non-test) caller. Remove tests which appear to have been copied from fixed vector tests, and whose demanded bits don't correspond to the way this interface is otherwise used.	2022-10-11 07:28:34 -07:00
Florian Hahn	be611ef7fa	[LoopRotation] Also drop block dispositions. LoopRotation may also fold basic blocks, so cached block dispositions also need to be dropped. Fixes #58291.	2022-10-11 15:25:27 +01:00
Luke Drummond	940fa35ece	[NVPTX] Fix a segfault for bitcasted calls with byval params `getFunctionParamOptimizedAlign` was being passed a null function argument when getting the callee of a bitcasted function symbol. This is because `CallBase::getCalledFunction` does not look through bitcasts. There is already code to handle this case in `NVPTXTargetLowering::getArgumentAlignment`, which is now hoisted into an NVPTX util. The alignment computation now gracefully handles computing alignment of virtual functions with a check for null.	2022-10-11 15:12:25 +01:00
Sanjay Patel	7ec604a317	[InstCombine] try harder to cancel out mul/div ((Op1 * X) / Y) / Op1 --> X / Y https://alive2.llvm.org/ce/z/JYxWjA InstSimplify handles the more basic mul+div pattern with shared operand, but we don't seem to have any reassociation folds to handle cases where the common op is further away. This is a generalization of `9cff4711ac` and another transform derived from issue #58137.	2022-10-11 09:51:51 -04:00
Max Kazantsev	91aa9097ae	[NFC] Factor out collection of unswitch candidate to a separate function Just to make the code more structured and easier to understand.	2022-10-11 19:35:16 +07:00
Max Kazantsev	f18979912d	[NFC] Refine API in SimpleLoopUnswitch: add missing const notions	2022-10-11 19:35:16 +07:00
Max Kazantsev	a8a07890aa	[NFC] Refine API: add missing const notion in hasPartialIVCondition	2022-10-11 19:35:16 +07:00
Weining Lu	42b70793a1	Reland "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC" Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html k: A memory operand whose address is formed by a base register and (optionally scaled) index register. m: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as st.w and ld.w. ZB: An address that is held in a general-purpose register. The offset is zero. ZC: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ll.w and sc.w. Note: The INLINEASM SDNode flags in below tests are updated because the new introduced enum `Constraint_k` is added before `Constraint_m`. llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/X86/callbr-asm-kill.mir This patch passes `ninja check-all` on a X86 machine with all official targets and the LoongArch target enabled. Differential Revision: https://reviews.llvm.org/D134638	2022-10-11 19:51:48 +08:00
Dmitry Preobrazhensky	4e62d02db9	[AMDGPU][MC] Correct image_gather4h Correct encoding of image_gather4h for GFX9; disable this instruction for SI, CI and VI. Differential Revision: https://reviews.llvm.org/D135605	2022-10-11 14:41:27 +03:00
Martin Storsjö	018ac7847b	[AArch64] Add SEH_Nop opcodes for BTI hints These are harmless for the unwinder - the unwinder doesn't need to handle them for being able to unwind correctly. Only add the opcodes when the branch target is in a SEH prologue; for jumptables e.g. within a function, we shouldn't add any SEH opcodes. Differential Revision: https://reviews.llvm.org/D135277	2022-10-11 14:32:01 +03:00
wanglei	64c42a4d70	[LoongArch] Define getSetCCResultType for setting vector setCC type To avoid trigger "No default SetCC type for vectors!" Assertion. Differential Revision: https://reviews.llvm.org/D135527	2022-10-11 19:05:14 +08:00
wanglei	d1b526fb95	[LoongArch] Add codegen support of GlobalTLSAddress lowering There are static and dynamic TLS address lowering in DAG stage according to different TLS models. TLS address will be lowered to pseudo instruction and then expanded by the `LoongArch Pre-RA pseudo instruction expansion` pass. Differential Revision: https://reviews.llvm.org/D134713	2022-10-11 18:10:13 +08:00
Nikita Popov	df8264c46a	[SimplifyLibCalls] Use helper methods to query attributes (NFC)	2022-10-11 11:41:28 +02:00
Nikita Popov	ac47db6aca	[Attributes] Return Optional from getAllocSizeArgs() (NFC) As suggested on D135572, return Optional<> from getAllocSizeArgs() rather than the peculiar pair(0, 0) sentinel. The method on Attribute itself does not return Optional, because the attribute must exist in that case.	2022-10-11 11:05:21 +02:00
Nikita Popov	256976774f	[Attributes] Support int attributes with zero value This regularly comes up as a stumbling stone when adding int attributes: They currently need to be encoded in a way to avoids the zero value. This adds support for zero-value int attributes by a) making the ctor determine int/enum attribute based on attribute kind, not whether the value is non-zero and b) switching getRawIntAttr() to return an Optional, so that it's possible to distinguish a zero value from non-existence. Differential Revision: https://reviews.llvm.org/D135572	2022-10-11 10:38:27 +02:00
Nikita Popov	24b1340ff9	[AttrBuilder] Remove unused vscale accessors (NFC) These accessors are not used. Generally, nowadays it is preferable to perform queries on AttributeSets/Lists, rather than the AttrBuilder, which is optimized towards attribute construction now.	2022-10-11 10:33:33 +02:00
Nikita Popov	454722745b	[Attributes] Remove AttrBuilder::hasAlignmentAttr() method (NFC) This was the odd one out, with similar methods not existing for any other attributes. In the places where it is used, it is best replaced by AttrBuilder::getAttribute(), which allows us to both test for presence of the attribute and retrieve its value at the same time. (To just check for presence, contains() could be used.)	2022-10-11 09:56:37 +02:00
Nikita Popov	7b7d3b20ff	[AsmParser] Remove some redundant checks for align attributes The verifier already has a more general check that the attribute is legal for the position it is used in.	2022-10-11 09:53:08 +02:00
Nikita Popov	15c1ab25ab	[Attributes] Remove special SRet/ByVal attribute handling in C API Proper construction functions for these have long since been exposed, and these attributes require a type nowadays, so drop the old compatibility code.	2022-10-11 09:39:39 +02:00
Nikita Popov	884bb97dca	[MustExec][LICM] Handle latch being part of an inner cycle (PR57780) The algorithm in allLoopPathsLeadToBlock() does not handle the case where the loop latch is part of the predecessor set correctly: In this case, we may take the backedge (escaping to a different loop iteration) and not execute other latch successors. This can happen if the latch is part of an inner cycle. Fixes https://github.com/llvm/llvm-project/issues/57780. Differential Revision: https://reviews.llvm.org/D134279	2022-10-11 09:30:13 +02:00
Daniel Sanders	4a95a64e4a	[instcombine] (extelt (inselt Vec, Value, Index), Index) -> Value When Index is variable but still trivially known to be equal we can use Value from before the insertion, possibly eliminating the vector. Reverts a functional change from: Author: Philip Reames <listmail@philipreames.com> Date: Wed Dec 8 12:21:10 2021 -0800 [instcombine] A couple style tweaks to visitExtractElementInst [nfc] Thanks to Michele Scandale for identifying the bug Differential Revision: https://reviews.llvm.org/D135625	2022-10-10 15:41:53 -07:00
Michal Paszkowski	7a3c9a85c5	[SPIRV] Fix call lowering of "anonymous" functions The patch fixes lowering of anonymous functions, removes file/linkage info for builtin call demangling, and adds relevant test demonstrating a fixed problem. Differential Revision: https://reviews.llvm.org/D135390	2022-10-11 00:06:29 +02:00
Craig Topper	0121b1a4ac	Revert "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant." This reverts commit `d4facda414`. This has been reported to cause failures. Reverting while I investigate.	2022-10-10 14:53:29 -07:00
David Green	deb8f8ab17	[ARM] Add errors for MVE exclusive registers. These instructions already had errors for operands that could not share the same register: VCMUL, VMULL, VQDMULL. This extends that to a few others: VREV64, VQDMULLqr, VCADD and VHCADD. Only the i32 types require the error. Differential Revision: https://reviews.llvm.org/D135560	2022-10-10 22:06:35 +01:00
Joe Nash	8a7d4993b7	[AMDGPU] Fix True16 patterns for cmp on GFX11 These patterns should have a True16 version and a non-true16 version. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135609	2022-10-10 16:41:06 -04:00
Florian Hahn	4b599fa1ee	[SCEV] Verify block disposition cache. This extends the existing SCEV verification to catch cache invalidation issues as in #57837. The validation logic is similar to the recently added loop disposition cache validation in `bb68b2402d`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134531	2022-10-10 20:42:19 +01:00
Sanjay Patel	baab4aa1ba	[VectorCombine] convert scalar fneg with insert/extract to vector fneg insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index --> shuffle DestVec, (fneg SrcVec), Mask This is a specialized form of what could be a more general fold for a binop. It's also possible that fneg is overlooked by SLP in this kind of insert/extract pattern since it's a unary op. This shows up in the motivating example from #issue 58139, but it won't solve it (that probably requires some x86-specific backend changes). There are also some small enhancements (see TODO comments) that can be done as follow-up patches. Differential Revision: https://reviews.llvm.org/D135278	2022-10-10 14:59:56 -04:00
Jordan Rupprecht	fb27fd5f88	Revert "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally" This reverts commit `4fbe33593c`. It causes linking errors, with details provided internally. (Hopefully the author/reviewers will be able to upstream the internal repro).	2022-10-10 11:40:45 -07:00
Craig Topper	d4facda414	[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant. If the divisor is even, we can first shift the dividend and divisor right by the number of trailing zeros. Now the divisor is odd and we can do the original algorithm to calculate a remainder. Then we shift that remainder left by the number of trailing zeros and add the bits that were shifted out of the dividend. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D135541	2022-10-10 11:02:22 -07:00
Florian Hahn	4b6bd1c9d5	[LoopSimplifyCFG] Clear SCEV dispositions when removing dead blocks. When removing loops & blocks we also need to clear the SCEV dispositions as they may now contain incorrect values. Fixes #58262.	2022-10-10 18:08:35 +01:00
Florian Hahn	80e49f49e4	[ConstraintElimination] Bail out for GEPs with scalable vectors. This fixes a crash with scalable vectors, thanks @nikic for spotting this!	2022-10-10 16:01:20 +01:00
Shubham Narlawar	b920407cf5	[LICM] Disable thread-safety checks in single-thread model If the single-thread model is used, or the -licm-force-thread-model-single flag is specified, skip checks related to thread-safety. This means that store promotion for conditionally executed stores only requires proof of dereferenceability and writability, but not of thread-safety. For example, this enables promotion of stores to (non-constant) globals, as well as captured allocas. Fixes https://github.com/llvm/llvm-project/issues/50537. Differential Revision: https://reviews.llvm.org/D130466	2022-10-10 16:51:16 +02:00
Alex Brachet	deb82d4a20	Revert "[PGO] Make emitted symbols hidden" This reverts commit `4ea1a647ff`. This breaks on Darwin which tries to export these symbols `ebb258d3b0/clang/lib/Driver/ToolChains/Darwin.cpp (L1363)` I'll try to reland which that removed and approval from Apple folks.	2022-10-10 14:37:59 +00:00
Joe Nash	ebb258d3b0	[AMDGPU] Make V_SAT_PK_U8_I16 a True16 Instruction The return type is two u8 packed into a 16 bit VGPR, so this instruction should be True16. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D135478	2022-10-10 10:33:49 -04:00
Sanjay Patel	9cff4711ac	[InstCombine] fold udiv with common factor ((X *nuw Y) >> Z) / X --> Y >> Z https://alive2.llvm.org/ce/z/x3kKnq This is similar to `6b869be810` / `8da2fa856f`, but I have not found a signed equivalent, so it's just an unsigned match for now.	2022-10-10 08:12:06 -04:00
Peter Rong	2343ad755d	[AArch64] Add index check before lowerInterleavedStore() uses ShuffleVectorInst's mask This commit fixes https://github.com/llvm/llvm-project/issues/57326. Currently we would take a Mask out and directly use it by doing auto Mask = SVI->getShuffleMask(); However, if the mask is undef, this Mask is not initialized. It might be a vector of -1 or random integers. This would cause an Out-of-bound read later when trying to find a StartMask. This change checks if all indices in the Mask is in the allowed range, and fixes the out-of-bound accesses. Differential Revision: https://reviews.llvm.org/D132634	2022-10-10 12:52:31 +01:00
Matt Devereau	d48e63074f	[AArch64][SVE] Fix AArch64_SVE_VectorCall calling convention This fixes the case where callees with SVE arguments outside of the z0-z7 range were incorrectly deduced as SVE calling convention functions	2022-10-10 10:25:29 +00:00
Nikita Popov	874c0327e7	[Attributor] Use ConstantFoldLoadFromConst() When determining the initial value of the object, use the constant folding API to load a given type at a given offset in the global initializer. This makes it work for cases where the load doesn't directly correspond to an aggregate member. Differential Revision: https://reviews.llvm.org/D135435	2022-10-10 10:17:37 +02:00
LiaoChunyu	a835b92e6c	[RISCV] Use hasAllWUsers to recover XORI/ORI reference `0fbe71e91f`. Also add testcase for addi. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D135538	2022-10-10 14:16:50 +08:00
Lang Hames	d6c9b3cc34	[ORC] Relax assertions in SimpleRemoteEPCTransport. Null source/destination pointers are ok for zero-sized messages.	2022-10-09 21:58:10 -07:00
Lang Hames	aedeb8d557	[LLJIT] Default to EPCEHFrameRegistrar rather than InProcessEHFrameRegistrar. Now that ExecutionSession objects alway have ExecutorProcessControl (EPC) objects attached we can use EPCEHFrameRegistrar by default, rather than InProcessEHFrameRegistrar. This allows LLJIT to work out-of-the-box with remote EPCs on platforms that use JITLink, without requiring a custom ObjectLinkingLayerCreator to override the eh-frame registrar.	2022-10-09 21:58:10 -07:00
Mingming Liu	159fb378f7	[AArch64] Swap 'lsl(val1,small-shmt)' to right hand side for AND(lsl(val1,small-shmt), lsl(val2,large-shmt)) On many aarch64 processors (Cortex A78, Neoverse N1/N2/V1, etc), ADD with LSL shift (shift-amount <= 4) has smaller latency and higher throughput than ADD with larger shift (shift-amunt > 4). This is at least no-op for the rest of the processors. Differential Revision: https://reviews.llvm.org/D135208	2022-10-09 17:26:54 -07:00
Florian Hahn	fee8f561bd	[ConstraintElimination] Include index type scale. The current decomposition for GEPs did not correctly handle cases where GEPs access different source types. Adjust the constraints by including the indexed type-size as coefficients. Further generalization to allow GEPs with more than one index is a needed general follow-up improvement.	2022-10-09 21:53:30 +01:00
luxufan	eaf6e2fc33	[DSE] Relax constraint on isGuaranteedLoopInvariant If the location ptr to be killed is in no loop and the Function does not have irreducible loops, then we can regard it as loop invariant. Differential Revision: https://reviews.llvm.org/D135369	2022-10-06 03:01:21 +00:00
Yeting Kuo	7329dc0cc3	[RISCV][NFC] Fix unused variable warning. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135365	2022-10-09 21:39:30 +08:00
gonglingqin	5593d36356	[LoongArch] Expand fptrunc store from f64 to f32 Differential Revision: https://reviews.llvm.org/D135510	2022-10-09 17:55:42 +08:00
Florian Hahn	11a6e64ba7	[ConstraintElim] Move logic to get constraint for solving to helper. Move common logic shared by callers of getConstraint that use the result to query the constraint system to a new helper getConstraintForSolving. This includes common legality checks (i.e. not an equality constraint, no new variables) and the logic to query the unsigned system if possible for signed predicates.	2022-10-09 10:44:36 +01:00
Ting Wang	bc5e969ca1	[PowerPC] Add vector pair calling convention for AIX This is AIX part of update after https://reviews.llvm.org/D117225 Fixed the issue that AIX64 with vector pair enabled saw redundant spill/reload of callee saved vector registers. Based on original patch by: Kai Luo Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D133466	2022-10-09 01:23:18 -04:00
WANG Xuerui	31327c29fb	[LoongArch] Don't merge FrameIndex accesses into [F]{LD,ST}X Otherwise eliminateFrameIndex cannot figure out how to fixup the stack offset with its stateless logic, because there wouldn't be an immediate slot for it to trivially write to, and it may not be easy to transform the surrounding code to make it work. This fixes a fairly common crash when compiling moderately complex code with Clang. Differential Revision: https://reviews.llvm.org/D135251	2022-10-09 13:04:21 +08:00
Tatsuyuki Ishi	5699692dfc	[Support] Add fast path for StringRef::find with needle of length 2. InclusionRewriter on Windows (CRLF line endings) will exercise this in a hot path. Calling memcmp repeatedly would be highly suboptimal for that use case, so give it a specialized path. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D133660	2022-10-09 00:58:07 +00:00
Fangrui Song	4fbe33593c	[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally See the updated linkonce_resolution_comdat.ll. For a local linkage GV in a non-prevailing COMDAT, it remains defined while its leader has been made available_externally. This violates the COMDAT rule that its members must be retained or discarded as a unit. To fix this, update the regular LTO change D34803 to track local linkage GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.) Fix https://github.com/llvm/llvm-project/issues/58215: as a size optimization, we place private `__profd_` in a COMDAT with a `__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change makes the `__profd_` available_externally. ``` cat > c.h <<'eof' extern void bar(); inline __attribute__((noinline)) void foo() {} eof cat > m1.cc <<'eof' #include "c.h" int main() { bar(); foo(); } eof cat > m2.cc <<'eof' #include "c.h" __attribute__((noinline)) void bar() { foo(); } eof clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw # one _Z3foov clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw # one _Z3foov ``` Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D135427	2022-10-08 11:09:43 -07:00
Florian Hahn	e0136a62cc	[ConstraintElimination] Support chained GEPs with constant offsets. Handle the (gep (gep ....), C) case by incrementing the constant coefficient of the inner GEP, if C is a constant.	2022-10-08 16:59:27 +01:00
Florian Hahn	73950f26f5	[LV] Replace check with assert for reduction resume values (NFC). At this point, we need to have resume values for all inductions. If not, this would result in silent mis-compiles.	2022-10-08 16:26:10 +01:00
Florian Hahn	be858bda69	[ConstraintElimination] Remove unused function (NFC).	2022-10-08 16:05:56 +01:00
Sanjay Patel	eccb9a77c6	[InstCombine] fold exact sdiv to ashr (2nd try) The 1st attempt failed to updated the test checks as expected. Original commit message: sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative) https://alive2.llvm.org/ce/z/kB6VF7 It would probably be better to use ValueTracking to replace this and the existing transform above it, but the analysis does not account for the no-wrap properly, and it's not immediately clear to me how to fix it.	2022-10-08 10:09:44 -04:00
Sanjay Patel	68d4dbc2c1	Revert "[InstCombine] fold exact sdiv to ashr" This reverts commit `fe15290e0c`. The test checks were not updated as expected.	2022-10-08 10:02:03 -04:00
Sanjay Patel	fe15290e0c	[InstCombine] fold exact sdiv to ashr sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative) https://alive2.llvm.org/ce/z/kB6VF7 It would probably be better to use ValueTracking to replace this and the existing transform above it, but the analysis does not account for the no-wrap properly, and it's not immediately clear to me how to fix it.	2022-10-08 09:23:46 -04:00
Florian Hahn	9d31d1c214	[ConstraintElimination] Use logic from `3771310eed` for queries only. The logic added in `3771310eed` was placed sub-optimally. Applying the transform in ::getConstraint meant that it would also impact conditions that are added to the system by the signed <-> unsigned transfer logic. This meant we failed to add some signed facts to the signed system. To make sure we still add as many useful facts to the signed/unsigned systems, move the logic to the point where we query the system.	2022-10-08 11:03:45 +01:00
Freddy Ye	566c277c64	[X86] Remove AVX512VP2INTERSECT from Sapphire Rapids. For more details, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D135509	2022-10-08 14:54:03 +08:00
gonglingqin	f4ccb577f8	[LoongArch] Do not assert value type in isFPImmLegal This patch fixes the failure of llvm/test/CodeGen/Generic/vector.ll and CodeGen/PowerPC/2007-11-19-VectorSplitting.ll for a LoongArch native build. Differential Revision: https://reviews.llvm.org/D134798	2022-10-08 14:50:48 +08:00
wanglei	730ee6568c	[LoongArch] Set correct encodings for DWARF exception handling This patch sets correct encodings for DWARF exception handling for LoongArch. Differential Revision: https://reviews.llvm.org/D134710	2022-10-08 11:53:48 +08:00
Craig Topper	f749b2d9a5	[RISCV] Fix incorrect parenthese placement in comment. NFC	2022-10-07 17:16:38 -07:00
Craig Topper	9f67047cf0	[VP][RISCV] Add vp.smax/smin/umax/umin intrinsics Differential Revision: https://reviews.llvm.org/D135418	2022-10-07 17:14:31 -07:00
Krzysztof Parzyszek	09d84e0ad8	[Hexagon] Implement helper to get intrinsic for instruction opcode There are intrinsics for most scalar instructions and almost all HVX instructions. What's somewhat painful is that there are two intrinsics for each HVX instruction: one for 64- and one for 128-byte mode. Instead of checking the current codegen settings every time, this function would simply return the right intrinsic.	2022-10-07 15:56:06 -07:00
Amaury Séchet	62ea6c5be7	[DAGCombine] Deduplicate addcarry node using commutativity. The first two parameters of addcarry are commutative. We may face a situation where both variant are present in the DAG, in which case we benefit from using just one. Depends on D57302 and D33587 Reviewed By: RKSimon, chfast Differential Revision: https://reviews.llvm.org/D57317	2022-10-08 00:55:14 +02:00
Jessica Paquette	42cb2f8b12	[GlobalISel] Mark mi_match as nodiscard Typically when you match something, you want to check the result. Fix a couple warnings in the AMDGPUPostLegalizerCombiner which appear as a result of this. Differential Revision: https://reviews.llvm.org/D135491	2022-10-07 15:47:05 -07:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Pavel Chupin	27ef42bec8	Fix warnings in build done by clang-based compiler Differential Revision: https://reviews.llvm.org/D135230	2022-10-07 14:12:10 -07:00
Artem Belevich	9a01cca660	Add support for CUDA-11.8 and sm_{87,89,90} GPUs. Differential Revision: https://reviews.llvm.org/D135306	2022-10-07 13:59:28 -07:00
Florian Hahn	13ac102726	[LoopSimplifyCFG] Invalidate SCEV dispositions. Clear all dispositions if there are any dead blocks (which will get removed later) and also clear dispositions for removed instructions. Clearing all dispositions in case there are dead blocks happens first, which should avoid traversing SCEV use-lists for invalidating dispositions for individual values. Fixes #58179.	2022-10-07 21:35:42 +01:00
Matt Arsenault	74ef03d38a	AMDGPU: Update SlotIndexes independently of LiveIntervals Apparently StackColoring depends on SlotIndexes, but not LiveIntervals. If regalloc fast were manually requested, LiveIntervals would be dropped before SILowerSGPRSpills but not SlotIndexes. SILowerSGPRSpills preserved SlotIndexes, but only through LiveIntervals. As a result, SILowerSGPRSpills was incorrectly reporting it preserved SlotIndexes. Start updating these directly, instead of depending on LiveIntervals also being available.	2022-10-07 13:15:15 -07:00
Florian Hahn	19ad1cd5ce	Recommit "[SCEV] Support clearing Block/LoopDispositions for a single value." This reverts commit `92f698f01f`. The updated version of the patch includes handling for non-SCEVable types. A test case has been added in `ec86e9a99b`.	2022-10-07 20:15:44 +01:00
Philip Reames	cb66e123c6	Remove PlaceSafepoints pass This patch was added way back in the beginning of the work which became the statepoint infrastructure. The idea was that safepoints could be inserted late in the optimization pipeline. This is true if the only concern is garbage collection, but this approach turned out to be incompatible with the requirement to also support deoptimization at safepoints. In theory, this pass would still be quite useful for an AOT compiled language which wants to support garbage collection, but we have no known users, and haven't for over 5 years. Time to remove unused code. If someone wants to use this, restoring it would not be hard. The immediate motivation for removal is that this is one of the last passes remaining which hasn't been ported to the new pass manager and the (straight forward) work to do so is not justified for unused code. Differential Revision: https://reviews.llvm.org/D135371	2022-10-07 11:51:00 -07:00
Sanjay Patel	3e6767ed5f	[InstCombine] propagate 'exact' when converting ashr to lshr The shift amount is not changing, so if we guaranteed shifting out zeros before, those bits are still zeros. https://alive2.llvm.org/ce/z/sokQca	2022-10-07 13:17:19 -04:00
Florian Hahn	92f698f01f	Revert "[SCEV] Support clearing Block/LoopDispositions for a single value." This reverts commit `9e931439dd`. This commit causes a crash when TSan, e.g. with https://lab.llvm.org/buildbot/#/builders/70/builds/28309/steps/10/logs/stdio Reverting while I extract a reproducer and submit a fix.	2022-10-07 17:58:54 +01:00
Ellis Hoag	70fb7bb561	[InstrProf][llvm-profdata] Dump profile correlation data as YAML Change the behavior of the `llvm-profdata show --debug-info=` command to dump a YAML file when using debug info correlation since it provides more information in a parseable format. Reviewed By: yozhu, phosek Differential Revision: https://reviews.llvm.org/D134770	2022-10-07 09:47:25 -07:00
Krzysztof Parzyszek	d184045d36	[Hexagon] Formatting changes, NFC	2022-10-07 09:13:51 -07:00
Krzysztof Parzyszek	e492cdc358	[Hexagon] Add couple of helper functions in HexagonVectorCombine 1. `length(value/type)`: return the number of elements in the vector input, 2. `getHvxTy(elem_type)`: return the HVX vector type with the element type provided. These will help write things more succintly.	2022-10-07 09:10:08 -07:00
Krzysztof Parzyszek	06019b8e55	[Hexagon] Add default parameter to HexagonVectorCombine::getIntTy, NFC	2022-10-07 08:52:19 -07:00
Krzysztof Parzyszek	d376b2667a	[Hexagon] Make HexagonSubtarget::isHVXVectorType take EVT instead of MVT EVT can be created for any Type, and so this function can now be used to check if given Type, as-is, is an HVX type (as opposed to a type that may be subject to legalization to an HVX type).	2022-10-07 08:42:39 -07:00
Sanjay Patel	bdfefac9a4	[InstCombine] refactor sdiv by (negative) power-of-2 folds; NFCI It's probably better to try harder on this kind of pattern by using ValueTracking.	2022-10-07 11:35:17 -04:00
Kazu Hirata	7f90597be6	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:800:17: error: unused variable 'DST_IDX' [-Werror,-Wunused-variable]	2022-10-07 08:27:02 -07:00
Krzysztof Parzyszek	2216d8f6b8	[Hexagon] Replace llvm::Optional with std::optional, NFC	2022-10-07 08:23:39 -07:00
Krzysztof Parzyszek	473210ae90	[Hexagon] Constify member refererence, NFC	2022-10-07 08:23:39 -07:00
Florian Hahn	9e931439dd	[SCEV] Support clearing Block/LoopDispositions for a single value. Extend forgetBlockAndLoopDisposition to allow clearing information for a single value. This can be useful when only a single value is changed, e.g. because the instruction is moved. We also need to clear the cached values for all SCEV users, because they may depend on the starting value's disposition. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134614	2022-10-07 16:07:17 +01:00
Bjorn Pettersson	01e1f32971	[ValueTracking][SimplifyLibCalls] Fix bug in getConstantDataArrayInfo for wchar_t When SimplifyLibCalls is dealing with wchar_t (e.g. optimizing wcslen) it uses ValueTracking helpers with a CharSize/ElementSize that isn't 8, but rather 16 or 32 (to match with the size in bits of a wchar_t). Problem I've seen is that llvm::getConstantDataArrayInfo is taking both an "ElementSize" argument (basically indicating size of a char/element in bits) and an "Offset" which afaict is an offset in the unit "number of elements". Then it also use stripAndAccumulateConstantOffsets to get a "StartIdx" which afaict is calculated in bytes. The returned Slice.Length is based on arithmetics that add/subtract variables that are having different units (bytes vs elements). Most notably I think the "StartIdx" must be scaled using the "ElementSize" to get correct results. The symptom of the above problem was seen in the wcslen-1.ll test case which miscompiled. This patch is supposed to resolve the bug by converting between bytes and elements when needed. Differential Revision: https://reviews.llvm.org/D135263	2022-10-07 15:29:32 +02:00
Dmitry Preobrazhensky	8f8e4e3b38	[AMDGPU][MC][GFX11] Correct v_fmac_.*_e64_dpp Differential Revision: https://reviews.llvm.org/D134961	2022-10-07 16:21:55 +03:00
Dmitry Preobrazhensky	1d1c7555e2	[AMDGPU][GFX11][NFC] Refactor VOPD handling in codegen Differential Revision: https://reviews.llvm.org/D135084	2022-10-07 16:13:05 +03:00
Dmitry Preobrazhensky	fd7b0eeaf6	[AMDGPU][MC][GFX11] Add VOPD VGPR bank access validation Differential Revision: https://reviews.llvm.org/D134960	2022-10-07 15:52:59 +03:00
Jan Sjodin	4627cef113	[OpenMP][OMPIRBuilder] Migrate emitOffloadingArraysArgument from clang This patch moves the emitOffloadingArraysArgument function and supporting data structures to OpenMPIRBuilder. This will later be used in flang as well. The TargetDataInfo class was split up into generic information and clang-specific data, which remain in clang. Further migration will be done in in the future. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D134662	2022-10-07 07:03:03 -05:00
zhongyunde	75358f060c	[AArch64] Lower multiplication by a constant int to madd Lower a = b * C -1 into madd a) instcombine change b * C -1 --> b * C + (-1) b) machine-combine change b * C + (-1) --> madd Assembler will transform the neg immedate of sub to add, see https://gcc.godbolt.org/z/cTcxePPf4 Fixes AArch64 part of https://github.com/llvm/llvm-project/issues/57255. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134336	2022-10-07 19:33:47 +08:00
Florian Hahn	3771310eed	[ConstraintElimination] Convert to unsigned Pred if possible. Convert SLE/SLT predicates to unsigned equivalents if both operands are known to be signed-positive. https://alive2.llvm.org/ce/z/tBeiZr	2022-10-07 12:27:36 +01:00
Nikita Popov	b43a4d0850	[LoopPeeling] Support peeling loops with non-latch exits Loop peeling currently requires that a) the latch is exiting b) a branch and c) other exits are unreachable/deopt. This patch removes all of these limitations, and adds the necessary branch weight updating support. It essentially works the same way as before with latch -> exiting terminator and loop trip count -> per exit trip count. It's worth noting that there are still other limitations in profitability heuristics: This patch enables peeling of loops to make conditions invariant (which is pretty much always highly profitable if possible), while peeling to make loads dereferenceable still checks that non-latch exits are unreachable and PGO-based peeling has even more conditions. Those checks could be relaxed later if we consider those cases profitable. The motivation for this change is that loops using iterator adaptors in Rust often optimize very badly, and end up with a loop phi of the form phi(true, false) in the final result. Peeling eliminates that phi and conditions based on it, which enables a lot of follow-on simplification. Differential Revision: https://reviews.llvm.org/D134803	2022-10-07 12:35:52 +02:00
Nikita Popov	ccf53cae32	[ValueTracking] Remove unused Offset argument in getConstantStringInfo() (NFC)	2022-10-07 11:35:55 +02:00
eopXD	dbc681c98e	[VP][RISCV] Add vp.roundtozero and its RISC-V support The scalar instruction of this is `llvm.trunc`. However the naming of ISD::VP_TRUNC is already taken by `trunc` of the LLVM IR. Naming this as `vp.ftrunc` would likely cause confusion with `vp.fptrunc`. So adding `vp.roundtozero` that will look similar to `vp.roundeven`. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D135233	2022-10-07 02:15:23 -07:00
Dmitry Makogon	8307f6c854	[LoopPredication] Insert assumes of conditions of predicated guards As LoopPredication performs non-equivalent transforms removing some checks from loops, other passes may not be able to perform transforms they'd be able to do if the checks were left in loops. This patch makes LoopPredication insert assumes of the replaced conditions either after a guard call or in the true block of widenable condition branch. Differential Revision: https://reviews.llvm.org/D135354	2022-10-07 16:10:24 +07:00
Nikita Popov	333246b48e	Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify Relative to the previous attempt, this adjusts simplification to use the correct context instruction: We need to use the terminator of the incoming block, not the original instruction. ----- foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes https://github.com/llvm/llvm-project/issues/57448, which was my original motivation for the change. Differential Revision: https://reviews.llvm.org/D134954	2022-10-07 11:04:19 +02:00
Pierre van Houtryve	36c3833783	[GISel] Add Trunc/Lshr/BuildVector Folding Similar to the current "Trunc/BuildVector" folding - which folds low element extracts of BuildVectors, folds hi element extracts done using bitshifts. For D134354 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135148	2022-10-07 08:44:03 +00:00
Pierre van Houtryve	a34977c4d0	[GISel] Handle G_TRUNC in `matchExtractVecEltBuildVec` Spotted some cases in D134354 where this was an issue. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135147	2022-10-07 08:37:18 +00:00
Mike Hommey	d3b0e745e8	[CodeView] Avoid NULL deref of Scope Regression from D131400: cross-language LTO causes a crash in the compiler on the NULL deref of Scope in `isa` call when Rust IR is involved. Presumably, this might affect other languages too, and even Rust itself without cross-language LTO when the Rust compiler switched to LLVM 16. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D134616	2022-10-07 08:34:57 +02:00
Xiang Li	220185552f	[DirectX backend] Add analysis to collect DXILResources Now only DXILTranslateMetadata uses DXILResources, so DXILResourceWrapper is only used by DXILTranslateMetadata. Once we add lower for createHandle, DXILResourceWrapper will be used in more passes. Also we can add resource index allocation in DXILResourceWrapper. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D135190	2022-10-06 19:34:29 -07:00
Leonard Chan	69695817f5	[llvm] Clear the ForwardRefDSOLocalEquivalentIDs map I accidentally cleared ForwardRefDSOLocalEquivalentNames twice instead. Differential Revision: https://reviews.llvm.org/D135315	2022-10-06 23:13:42 +00:00
Craig Topper	3b20765cf7	[RISCV] Use mask agnostic policy for isel patterns where the merge operand is IMPLICIT_DEF. I tend to think we should ignore the policy bit in vsetvli insertion if the tied operand is IMPLICIT_DEF. But that raises questions about what the policy operand on RVV intrinsics means if you also pass vundefined(). This change at least fixes some cases. I'll post a separate patch for vsetvli insertion for discussion. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135386	2022-10-06 15:44:39 -07:00
Alina Sbirlea	0fbeca0ee6	[SmallVector] Reallocate if assigned memory is right after the current vector, created with capacity 0 Potential solution for https://github.com/llvm/llvm-project/issues/57324. Differential Revision: https://reviews.llvm.org/D132512	2022-10-06 15:37:43 -07:00
Philip Reames	79f0413e5e	[RISCV] Use branchless form for selects with -1 in either arm We can lower these as an or with the negative of the condition value. This appears to result in significantly less branch-y code on multiple common idioms (as seen in tests). Differential Revision: https://reviews.llvm.org/D135316	2022-10-06 15:18:43 -07:00
Philip Reames	027516553d	[RISCV] Verify that policy operands only exist on instructions with tied passthru operands This is a non-trivial property relied upon by D135396. I wrote this to convince myself it was true. Differential Revision: https://reviews.llvm.org/D135403	2022-10-06 15:18:43 -07:00
Shubham Sandeep Rastogi	f491b898c5	Revert "Remove the dependency between lib/DebugInfoDWARF and MC." This reverts commit `d96ade00c3`.	2022-10-06 14:58:34 -07:00
Shubham Sandeep Rastogi	d96ade00c3	Remove the dependency between lib/DebugInfoDWARF and MC. This patch had to be reverted because on gcc 7.5.0 we see an error converting from std::unique_ptr<MCRegisterInfo> to Expected<std::unique_ptr<MCRegisterInfo>> as the return type for the function createRegInfo. This has now been fixed.	2022-10-06 14:46:01 -07:00
Philip Reames	04bb32e58a	[DAG] Extract helper for (neg x) [nfc] This is a frequently reoccurring pattern, let's factor it out. Differential Revision: https://reviews.llvm.org/D135301	2022-10-06 13:23:52 -07:00
Alina Sbirlea	b9898e7ed1	Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify" This reverts commit `e94619b955`.	2022-10-06 13:12:24 -07:00
Alexey Bataev	323ed2308a	[SLP]Improve/fix CSE analysis of the blocks/instructions. Added analysis for invariant extractelement instructions and improved detection of the CSE blocks for generated extractelement instructions. Differential Revision: https://reviews.llvm.org/D135279	2022-10-06 12:08:48 -07:00
Alex Brachet	4ea1a647ff	[PGO] Make emitted symbols hidden Differential Revision: https://reviews.llvm.org/D135340	2022-10-06 18:28:16 +00:00
Joao Moreira	eac3e5c3fb	[X86] Do not emit JCC to __x86_indirect_thunk Clang may optimize conditional tailcall blocks with the following layout: cmp <condition> je tailcall_target ret When retpoline is in place, indirect calls are converted into direct calls to a retpoline thunk. When these indirect calls are tail calls, they may be subject to the above described optimization (there is no indirect JCC, but since now the jump is direct it can be made conditional). The above layout is non-ideal for the Linux kernel scenario because the branches into thunks may be patched back into indirect branches during runtime depending on the underlying CPU features, what would not be feasible if the binary is emitted with the optimized layout above. Thus, prevent clang from emitting this it if CodeModel is Kernel. Feature request from the respective kernel mailing list: https://lore.kernel.org/llvm/Yv3uI%2FMoJVctmBCh@worktop.programming.kicks-ass.net/ Reviewed By: nickdesaulniers, pengfei Differential Revision: https://reviews.llvm.org/D134915	2022-10-06 11:09:24 -07:00
Bjorn Pettersson	0db4b1d1a8	[SimplifyLibCalls] Adjust code comment in optimizeStringLength. NFC The limitation in LibCallSimplifier::optimizeStringLength to only optimize when the string is an i8 array was changed already in commit `50ec0b5dce` back in 2017. We still only simplify when 's' points at an array of 'CharSize', so the comment is still valid in the sense that we do not support arbitrary array types. Differential Revision: https://reviews.llvm.org/D135261	2022-10-06 20:00:27 +02:00
Arthur Eubanks	ae5733346f	Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI" This reverts commit `cd8f3e7581`. Causes miscompiles, see D132657	2022-10-06 10:36:02 -07:00
Sanjay Patel	8da2fa856f	[InstCombine] fold sdiv with hidden common factor (X * Y) s/ (X << Z) --> Y s/ (1 << Z) https://alive2.llvm.org/ce/z/yRSddG issue #58137	2022-10-06 13:11:50 -04:00
Ivan Tetyushkin	0e6c1576e6	[RISCV] Optimization for using compressed beqz and bnez PR#56391 Optimization for using compressed beqz and bnez If there is pattern ``` br_cc val1 constval eq/neq place select_cc val1 constval eq/neq trueval falseval ``` and constval does not fit in compressed imm format(6 bit), but fit in imm format(12 bit), we can replace by non compress sub and compress c.beqz/c.bneqz: ``` addi val val -constval c.beqz val place ``` Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132839	2022-10-06 09:33:32 -07:00
Shubham Sandeep Rastogi	870b74d590	Revert "Remove the dependency between lib/DebugInfoDWARF and MC." This reverts commit `0008990479`.	2022-10-06 09:30:46 -07:00
Shubham Sandeep Rastogi	0008990479	Remove the dependency between lib/DebugInfoDWARF and MC. Differential Revision: https://reviews.llvm.org/D134817	2022-10-06 09:25:57 -07:00
Florian Hahn	a7ac0dd0cf	[ConstraintElimination] Generalize AND matching. Extend more general matching used for chains of ORs to also support chains of ANDs.	2022-10-06 17:17:38 +01:00
Tyker	1654b22ac0	[ADT] Add support for more formats in APFixedPoint Prior to this patch FixedPointSemantics and APFixedPoint only support semantics where the Scale larger or equal to zero and the Width is larger or equal to the Scale. This patch removes both those requirements while staying API compatible.	2022-10-06 17:55:31 +02:00
Sanjay Patel	6b869be810	[InstCombine] fold udiv with hidden common factor (X * Y) u/ (X << Z) --> Y u>> Z https://alive2.llvm.org/ce/z/4G9D_W	2022-10-06 11:35:27 -04:00
Philip Reames	d89d45ca9a	[RISCV][InsertVSETVLI] Default to MA not MU This changes the default value used for mask policy from mask undisturbed to mask agnostic. In hardware, there may be a minor preference for ta/ma, but since this is only going to apply to instructions which don't use the mask policy bit, this is functionally mostly a nop. The main value is to make future changes to using MA when legal for masked instructions easier to review by reducing test churn. The prior code was motivated by a desire to minimize state transitions between masked and unmasked code. This patch achieves the same effect using the demanded field logic (landed in `afb45ff`), and there are no regressions I spotted in the test diffs. (Given the size, I have only been able to skim.) I do want to call out that regressions are possible here; the demanded analysis only works on a block local scope right now, so e.g. a tight loop mixing masked and unmasked computation might see an extra vsetvli or two. Differential Revision: https://reviews.llvm.org/D133803	2022-10-06 07:59:39 -07:00
Florian Hahn	8e3e96298f	[ConstraintElimination] Order cmps for signed <-> unsigned transfer first. Make sure conditions with constant operands come before conditions without constant operands. This increases the effectiveness of the current signed <-> unsigned fact transfer logic.	2022-10-06 15:56:25 +01:00
Philip Reames	afb45ffce7	[RISCV][InsertVSETVLI] Treat mask policy as undemanded if usesMaskPolicy is false Differential Revision: https://reviews.llvm.org/D135327	2022-10-06 07:20:16 -07:00
Florian Hahn	349375d093	[ConstraintElimination] Generalize OR matching. Extend OR handling to traverse chains of ORs.	2022-10-06 11:56:22 +01:00
Nikita Popov	028874dd61	[Local] Fix unused variable warnings (NFC)	2022-10-06 10:30:59 +02:00
Nikita Popov	3d0b5f019e	[AA] Remove unused template argument from AAResultBase (NFC) After D94363, there is no more need to use CRTP here.	2022-10-06 10:21:17 +02:00
Nikita Popov	c5bf452022	[AA] Pass AAResults through AAQueryInfo Currently, AAResultBase (from which alias analysis providers inherit) stores a reference back to the AAResults aggregation it is part of, so it can perform recursive alias analysis queries via getBestAAResults(). This patch removes the back-reference from AAResultBase to AAResults, and instead passes the used aggregation through the AAQueryInfo. This can be used to perform recursive AA queries using the full aggregation. Differential Revision: https://reviews.llvm.org/D94363	2022-10-06 10:10:19 +02:00
Nikita Popov	6053b37e45	[AA] Thread AAQI through getModRefBehavior() (NFC) This is in preparation for D94363, as we will need AAQI to perform the recursive call to the function variant.	2022-10-06 09:57:42 +02:00
Pierre van Houtryve	3ec0085c3f	[DAG] Update `isKnownNeverNaN` for `FMA/FMAD` We can still get a NaN even if none of the operands are NaN, e.g. from +inf/-inf. D50804 didn't catch that. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134854	2022-10-06 06:52:36 +00:00
Florian Hahn	7449570ff7	[ConstraintElimination] Use ConstraintTy::IsSigned instead of Predicate. This should be NFC and ensure the sign of the constraint is used consistently in the future.	2022-10-06 07:51:49 +01:00
Pierre van Houtryve	bb71079e30	[AMDGPU][GISel] Add missing V2S16 BUILD_VECTOR_TRUNC legalization Previously we would be unable to legalize V2S16 BUILD_VECTOR_TRUNC on GFX8 & below as the custom legalization was missing. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135149	2022-10-06 06:48:53 +00:00
Ilia Diachkov	25ee36c6b1	[SPIRV] read kernel arg attributes from fuction/module metadata The patch introduces reading the attributes of kernel arguments both from function-attached and module-level metadata, during kernel arguments lowering. Two tests are added to show the improvement. Differential Revision: https://reviews.llvm.org/D135106 Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com> Co-authored-by: Andrey Tretyakov <andrey.tretyakov@mail.com> Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>	2022-10-06 04:43:52 +03:00
Carl Ritson	c316332e17	[Sink] Allow sinking of invariant loads across critical edges Invariant loads can always be sunk. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D135133	2022-10-06 09:21:12 +09:00
Sami Tolvanen	43f4c215a1	[AArch64][KCFI] Define Size for KCFI_CHECK Specify the correct size for the KCFI_CHECK pseudo instruction, which is lowered into six 4-byte instructions in AArch64AsmPrinter::LowerKCFI_CHECK. Link: https://github.com/ClangBuiltLinux/linux/issues/1730	2022-10-05 22:24:50 +00:00
Quentin Colombet	6e440ee2aa	[RISCV][ISel] Finally fix the UBSan error Forgot another SDValue check and a boolean initialization.	2022-10-05 21:43:09 +00:00
Quentin Colombet	6bbe7d376e	[RISCV][ISel] Attempt to fix UBSan error Explicitly check an SDValue with the invalid SDValue. UBSan reports: runtime error: load of value 36, which is not a valid value for type 'bool' https://lab.llvm.org/buildbot/#/builders/85/builds/11231	2022-10-05 20:59:28 +00:00
Quentin Colombet	c5c2de287e	[RISCV][ISel] Fold extensions when all the users can consume them This patch allows the combines that fold extensions in binary operations to have more than one use. The approach here is pretty conservative: if all the users of an extension can fold the extension, then the folding is done, otherwise we don't fold. This is the first step towards avoiding the one-use limitation. As a result, we make a decision to fold/don't fold for a web of instructions. An instruction is part of the web of instructions as soon as it consumes an extension that needs to be folded for all its users. Because of how SDISel works a web of instructions can be visited over and over. More precisely, if the folding happens, it happens for the whole web and that's the end of it, but if the folding fails, the whole web may be revisited when another member of the web is visited. To avoid a compile time explosion in pathological cases, we bail out earlier for webs that are bigger than a given threshold (arbitrarily set at 18 for now.) This size can be changed using `--riscv-lower-ext-max-web-size=<maxWebSize>`. At the current time, I didn't see a better scheme for that. Assuming we want to stick with doing that in SDISel. Differential Revision: https://reviews.llvm.org/D133739	2022-10-05 20:49:21 +00:00
David Green	03b145480d	[AArch64] Add tablegen patterns for bf16 trn/zip/uzp. This adds some missing tablegen patterns to handle trn1/trn2/zip1/zip2/uzp1/uzp2, similar to the Arm handling in `5e1a9d319d`, but via tablegen patterns for the AArch64 backend.	2022-10-05 21:47:36 +01:00
Leonard Chan	34004d2d03	Fix build failures from `4f6477a615`.	2022-10-05 18:54:28 +00:00
Leonard Chan	4f6477a615	[llvm][NFC] Consolidate equivalent function type parsing code into single function Differential Revision: https://reviews.llvm.org/D135296	2022-10-05 18:41:19 +00:00
Quentin Colombet	4852f26acd	[RISCV][ISel] Refactor the formation of VW operations This patch centralizes all the combines of add\|sub\|mul with extended operands in one "framework". The rationale for this change is to offer a one-stop-shop for all these transformations so that, in the future, it is easier to make combine decisions for a web of instructions (i.e., instructions connected through s\|zext operands). Technically this patch is not NFC because the new version is more powerful than the previous version. In particular, it diverges in two cases: - VWMULSU can now also be produced from `mul(splat, zext)`, whereas previously only `mul(sext, splat)` were supported when `splat`s were involved. (As demonstrated in rvv/fixed-vectors-vwmulsu.ll) - VWSUB(U) can now also be produced from `sub(splat, ext)`, whereas previously only `sub(ext, splat)` were supported when `splat`s were involved. (As demonstrated in rvv/fixed-vectors-vwsub.ll) If we wanted, we could block these transformations to make this patch really NFC. For instance, we could do something similar to `AllowSplatInVW_W`, which prevents the combines to form vw(add\|sub)(u)_w when the RHS is a splat. Regarding the "framework" itself, the bulk of the patch is some boilderplate code that abstracts away the actual extensions that are present in the DAG. This allows us to handle `vwadd_w(ext a, b)` as if it was a regular `add(ext a, ext b)`. Since the node `ext b` doesn't actually exist in the DAG, we have a bunch of methods (all in the NodeExtensionHelper class) that fake all that for us. The other half of the change is around `CombineToTry` and `CombineResult`. These helper structures respectively: - Represent the kind of combines that can be applied to a node, and - Store what needs to happen to do that combine. This can be viewed as a two step approach: - First, check if a pattern applies, and - Second apply it. The checks and the materialization of the combines are decoupled so that in the future we can perform several checks and do all the related applies in one go. Differential Revision: https://reviews.llvm.org/D134703	2022-10-05 17:43:48 +00:00
Ellis Hoag	549773f9e9	[Dwarf] Reference the correct CU when inlining Sometimes when a function is inlined into a different CU, `llvm-dwarfdump --verify` would find an inlined subroutine with an invalid abstract origin. This is because `DwarfUnit::addDIEEntry()` will incorrectly assume the inlined subroutine and the abstract origin are from the same CU if it can't find the CU for the inlined subroutine. In the added test, the inlined subroutine for `bar()` is created before the CU for `B.swift` is created, so it tries to point to `goo()` in the wrong CU. Interestingly, if we swap the order of the two functions then we don't see a crash since the module for `goo()` is created first. The fix is to give a parent DIE to `ScopeDIE` before calling `addDIEEntry()` so that its CU can be found. Luckily, `constructInlinedScopeDIE()` is only called once so we can pass it the DIE of the scope's parent and give it a child just after it's created. `constructInlinedScopeDIE()` should always return a DIE, so assert that it is not null. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D135114	2022-10-05 09:19:12 -07:00
Alexandre Ganea	1c25ce1738	[Orc] Fix the SharedMemoryMapper dtor As briefly discussed on https://reviews.llvm.org/rG1134d3a03facccd75efc5385ba46918bef94fcb6, fix the unintended copy while iterating on Reservations and add a mutex guard, to be symmetric with other usages of Reservations. Differential revision: https://reviews.llvm.org/D134212	2022-10-05 12:16:54 -04:00
Florian Hahn	9aa004a04c	[ConstraintElimination] Convert NewIndices to vector and rename (NFCI). The callers of getConstraint only require a list of new variables. Update the naming and types to make this clearer.	2022-10-05 16:25:00 +01:00
Joe Nash	203d0b0ee1	[AMDGPU] Fix V_CMP_CLASS_F16_t16_e64 src1 type. For V_CMP_CLASS_F16_t16_e64 and V_CMPX_CLASS_F16_t16_e64, https://reviews.llvm.org/D133723 changed the value type of src1 from i32 to i16. These src1 operands are 16 bits, therefore need to be encoded as true16 operands. So the _e32 type was correctly set to VGPR_32_Lo128. In _e64 form the operand class went from VSrc_b32 to VSrc_b16. For some reason, we cannot encode inline literals for VSrc_b16, see `5f5f566b26`. In this phase of the true16 implementation, VSrc_b16 and VSrc_b32 are still similar, except from that quirk of inlines. So set the operand class to regain that function. Reviewed By: dp, arsenm Differential Revision: https://reviews.llvm.org/D134897	2022-10-05 11:15:40 -04:00
Michael Maitland	fbf8aa1b49	[RISCV][CodeGen] Add Scheduling for vset{i}vl{i} instruction Differential Revision: https://reviews.llvm.org/D135188	2022-10-05 07:54:22 -07:00
Juan Manuel MARTINEZ CAAMAÑO	fa2b1cb8c9	[NFC][AMDGPULowerKernelAttributes] Factorize repeated code into function Differential Revision: https://reviews.llvm.org/D135266	2022-10-05 09:26:39 -05:00
Anton Sidorenko	3e97e94237	[NFC][RISCV] Move getSEWLMULRatio function to header More uses of getSEWLMULRatio will be added in D130895. Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D135086	2022-10-05 15:10:53 +01:00
Johannes Doerfert	e18736149c	[Attributor] Teach AAPointerInfo about atomic cmxchg and rmw The atomic operations behave similar to a store except that we don't know the new value and we read the result first.	2022-10-05 06:48:00 -07:00
Dmitry Preobrazhensky	f4b1cfa1cb	[AMDGPU][MC][GFX11] Correct e64_dpp variants of v_movreld and v_movrelsd Differential Revision: https://reviews.llvm.org/D135079	2022-10-05 16:47:18 +03:00
Kerry McLaughlin	f7f44f018f	[AArch64][SME] Set up a lazy-save/restore around calls. Setting up a lazy-save mechanism around calls is done during SelectionDAG because calls to intrinsics may be expanded into an actual function call (e.g. calls to @llvm.cos()), and maintaining an allowed-list in the SMEABI pass is not feasible. The approach for conditionally restoring the lazy-save based on the runtime value of TPIDR2_EL0 is similar to how we handle conditional smstart/smstop. We create a pseudo-node which gets expanded into a conditional branch and expands to a call to __arm_tpidr2_restore(%tpidr2_object_ptr). The lazy-save buffer and TPIDR2 block are only allocated once at the start of the function. For each call, the TPIDR2 block is initialised, and at the end of the call, a pseudo node (RestoreZA) is planted. Patch by Sander de Smalen. Differential Revision: https://reviews.llvm.org/D133900	2022-10-05 14:36:53 +01:00
Johannes Doerfert	93e51fa444	[Attributor] AAPointerInfo can model non-escaping call uses If a call base use will not capture a pointer we can approximate the effects. This is important especially for readnone/only uses. Even may-write uses are not too bad with reachability in place. Capturing is the problem as we loose track of update sides.	2022-10-05 06:29:14 -07:00
Johannes Doerfert	477e8e10f0	[Attributor] Teach AAPointerInfo to look into aggregates If we have a constant aggregate, e.g., as an initializer, we usually failed to extract the proper value/type from it. This patch provides the size and offset information necessary to extract the right part of the constant.	2022-10-05 06:19:47 -07:00
Nikita Popov	5fa14ee835	[MemCpyOpt] Don't hoist above producer of pointer operand This was already handled correctly below, but not checked for the original store pointer operand. Encountered when converting tests to opaque pointers, where the intermediate bitcast goes away.	2022-10-05 14:52:33 +02:00
David Stuttard	d1d7d2235c	[AggressiveInstCombine] Fix cases where non-opaque pointers are used In the case of non-opaque pointers, when combining consecutive loads, need to bitcast the pointer source to the combined type size, otherwise asserts are triggered. Differential Revision: https://reviews.llvm.org/D135249	2022-10-05 13:42:46 +01:00
Nikita Popov	e94619b955	Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify The infinite loop seen on buildbots should be fixed by `11897708c0` (assuming there are not multiple infinite combine loops...) ----- foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes https://github.com/llvm/llvm-project/issues/57448, which was my original motivation for the change. Differential Revision: https://reviews.llvm.org/D134954	2022-10-05 14:00:19 +02:00
Nikita Popov	11897708c0	[InstCombine] Directly replace instr in foldIntegerTypedPHI() (NFCI) Rather than inserting a ptrtoint + inttoptr pair, directly replace the inttoptr with the new phi node. This ensures that no other transform can undo it before the pair gets folded away. This avoids the infinite loop when combined with D134954. This is NFCI in the sense that it shouldn't make a difference, but could due to different worklist order.	2022-10-05 13:28:23 +02:00
David Sherwood	f0f474dfd0	[AArch64][SME] Add codegen pass to handle ZA state in arm_new_za functions. The new pass implements the following: * Inserts code at the start of an arm_new_za function to commit a lazy-save when the lazy-save mechanism is active. * Adds a smstart intrinsic at the start of the function. * Adds a smstop intrinsic at the end of the function. Patch co-authored by kmclaughlin. Differential Revision: https://reviews.llvm.org/D133896	2022-10-05 09:43:57 +00:00
Fraser Cormack	08497a785b	[VP] Fix unused variable in release configurations	2022-10-05 10:33:07 +01:00
Florian Hahn	469f0fc6a6	[SimpleLoopUnswitch] Clear dispos in deleteDeadBlocksFromLoop. SimpleLoopUnswitch may remove blocks from loops. Clear block and loop dispositions in that case, to clean up invalid entries in the cache. Fixes #58158. Fixes #58159.	2022-10-05 10:28:15 +01:00
David Sherwood	991a36da1b	[AArch64][SME] Prevent SVE object address calculations between smstop and call This patch introduces a new AArch64 ISD node (OBSCURE_COPY) that can be used when we want to prevent SVE object address calculations from being rematerialised between a smstop/smstart and a call. At the moment we use COPY to copy the frame index to a register, which leads to problems because the "simple register coalescing" pass understands the COPY instruction and attempts to rematerialise an address calculation with 'addvl' between an smstop and a call. When in streaming mode the 'addvl' instruction may have different behaviour because the streaming SVE vector length is not guaranteed to equal the normal SVE vector length. The new ISD opcode OBSCURE_COPY gets lowered to a new pseudo instruction also called OBSCURE_COPY. This ensures it cannot be rematerialised and we expand this into a simple move very late in the machine instruction pipeline. A new test is added here: CodeGen/AArch64/sme-streaming-interface.ll Differential Revision: https://reviews.llvm.org/D134940	2022-10-05 08:11:16 +00:00
Martin Storsjö	2f7fbf8376	[AArch64] Add missing SEH_Nop when aligning the stack This makes sure that the instructions of the prologue matches the SEH opcodes. Also remove a couple redundant cases of setting HasWinCFI; it was already set unconditionally after the conditional cases. Differential Revision: https://reviews.llvm.org/D135101	2022-10-05 11:00:36 +03:00
Fraser Cormack	a3a9b0743e	[VP][NFC] Remove \brief commands from doxygen comments Following a precedent set in D46861.	2022-10-05 08:08:30 +01:00
Fraser Cormack	3362e2d57f	[VP] Add IR expansion for vp.icmp and vp.fcmp These intrinsics are simply expanded to regular icmp/fcmp instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121594	2022-10-05 08:07:39 +01:00
Serguei Katkov	d330731f94	[RegAllocFast] Clean-up. Remove redundant operations. NFC. Reviewed By: MatzeB, arsenm Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D109213	2022-10-05 11:38:54 +07:00
Johannes Doerfert	a9557115b4	[Attributor] Qualify variables to avoid clashes in the future	2022-10-04 19:43:04 -07:00
Yeting Kuo	74a130af97	[RISCV] Add isel patterns for vfmacc, vfnmacc, vfmsac and vfnmsac. The patch selects VSELECT_VL/VP_MERGE_VL that uses VF(N)M(ACC\|SAC) as its true operand and the adden of the true operand as its false operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D135080	2022-10-05 09:56:43 +08:00
Stefan Pintilie	30d639180f	[PowerPC] Fix the register allocation hints for ACC registers. The allocation hints for copies of ACC registers assumed that we would only be copying between VSRp and UACC registers. In reality it is also possible to copy between UACC and ACC registers. This patch adds a new case for the ACC copy to fix that issue. Note that the test case added with this patch will hit an assert without the fix. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D134501	2022-10-04 20:30:16 -05:00
Stella Laurenzo	e28b15b572	Add APFloat and MLIR type support for fp8 (e5m2). (Re-Apply with fixes to clang MicrosoftMangle.cpp) This is a first step towards high level representation for fp8 types that have been built in to hardware with near term roadmaps. Like the BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary floating point formats but, due to the size limits, have been tweaked in various ways in order to maximally use the range/precision in various scenarios. The list of variants is small/finite and bounded by real hardware. This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM, and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf As the more conformant of the two implemented datatypes, we are plumbing it through LLVM's APFloat type and MLIR's type system first as a template. It will be followed by the range optimized E4M3 FP8 format described in the paper. Since that format deviates further from the IEEE-754 norms, it may require more debate and implementation complexity. Given that we see two parts of the FP8 implementation space represented by these cases, we are recommending naming of: * `F8M<N>` : For FP8 types that can be conceived of as following the same rules as FP16 but with a smaller number of mantissa/exponent bits. Including the number of mantissa bits in the type name is enough to fully specify the type. This naming scheme is used to represent the E5M2 type described in the paper. * `F8M<N>F` : For FP8 types such as E4M3 which only support finite values. The first of these (this patch) seems fairly non-controversial. The second is previewed here to illustrate options for extending to the other known variant (but can be discussed in detail in the patch which implements it). Many conversations about these types focus on the Machine-Learning ecosystem where they are used to represent mixed-datatype computations at a high level. At that level (which is why we also expose them in MLIR), it is important to retain the actual type definition so that when lowering to actual kernels or target specific code, the correct promotions, casts and rescalings can be done as needed. We expect that most LLVM backends will only experience these types as opaque `I8` values that are applicable to some instructions. MLIR does not make it particularly easy to add new floating point types (i.e. the FloatType hierarchy is not open). Given the need to fully model FloatTypes and make them interop with tooling, such types will always be "heavy-weight" and it is not expected that a highly open type system will be particularly helpful. There are also a bounded number of floating point types in use for current and upcoming hardware, and we can just implement them like this (perhaps looking for some cosmetic ways to reduce the number of places that need to change). Creating a more generic mechanism for extending floating point types seems like it wouldn't be worth it and we should just deal with defining them one by one on an as-needed basis when real hardware implements a new scheme. Hopefully, with some additional production use and complete software stacks, hardware makers will converge on a set of such types that is not terribly divergent at the level that the compiler cares about. (I cleaned up some old formatting and sorted some items for this case: If we converge on landing this in some form, I will NFC commit format only changes as a separate commit) Differential Revision: https://reviews.llvm.org/D133823	2022-10-04 17:18:17 -07:00
Sam Clegg	be758cd4a3	[WebAssembly][MC] Fix missing `else` after `return` due to type checker bug Once we are in the `Unreachable` we want to disable type checking, but we were unconditionally returning `true` here which means we encountered and error. Instead we unconditionally return false to signal no error. Fixes: https://github.com/llvm/llvm-project/issues/56935 Differential Revision: https://reviews.llvm.org/D135195	2022-10-04 16:43:22 -07:00
Amara Emerson	c5cebf78bd	[GlobalISel] Add computeNumSignBits() support for compares. Doing so allows G_SEXT_INREG to be combined away for many vector cases. Differential Revision: https://reviews.llvm.org/D135168	2022-10-05 00:28:08 +01:00
Amara Emerson	8055aa8e8a	[AArch64][GlobalISel] Make vector G_SEXT_INREG legal and allow combining. As a result of making these legal, and tweaking the combine to allow vectors, we generate vector G_SEXT_INREG during legalization. The reason we want to make these legal in the first place is to allow for more combine opportunities. Once those have been done, we can just lower them back to shifts in the post-legalizer lowering. This needs to be one commit otherwise we start causing tests to fail due to incomplete support for selection etc.	2022-10-05 00:28:08 +01:00
Craig Topper	ece4bb5ab8	[RISCV] Teach SExtWRemoval to recognize sign extended values that come from arguments. This information is not preserved in MIR today. So this patch adds information to RISCVMachineFunctionInfo when the vreg is created for the argument. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D134621	2022-10-04 15:39:10 -07:00
Gulfem Savrun Yeniceri	d7592bbb03	Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify" This reverts commit `e1dd2cd063` because the original commit `b20e34b39f` had a dramatic increase in the build time of RTfuzzer, which caused Fuchsia Clang toolchain builders to timeout: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8801248587754572961/overview	2022-10-04 20:57:34 +00:00
Florian Hahn	4f827318e3	[LoopVersioning,LLE] Clear LoopAccessInfoManager after making changes. Loop versioning changes the control-flow, which may impact SCEVs cached by for other loops in LoopAccessInfoManager. Clear the manager after making changes. Fixes #57825. Depends on D134609. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134611	2022-10-04 21:35:42 +01:00
David Green	cf43154bc3	[AArch64] Ensure condition (SUBS) has no uses of value in performCONDCombine performCONDCombine removes and 0xff in patterns of SUBS (and (add(..), 0xff), C) under certain complex conditions. It doesn't come up often, but in the lowering of usub.sat where the SUBS is both used as a condition and as a value, the And is removed where it would only be valid for the condition. Fixes #58109. Differential Revision: https://reviews.llvm.org/D135043	2022-10-04 21:18:30 +01:00
jeff	cebec42089	[DAGCombiner] [AMDGPU] Allow vector loads in MatchLoadCombine Since SROA chooses promotion based on reaching load / stores of allocas, we may run into scenarios in which we alloca a vector, but promote it to an integer. The result of which is the familiar LoadCombine pattern (i.e. ZEXT, SHL, OR). However, instead of coming directly from distinct loads, the elements to be combined are coming from ExtractVectorElements which stem from a shared load. This patch identifies such a pattern and combines it into a load. Change-Id: I0bc06588f11e88a0a975cde1fd71e9143e6c42dd	2022-10-04 12:16:00 -07:00
Ram-NK	a58b6acf1f	[NFC][LoopInterchange] Clean up of irrelevent dependency checking with isOuterMostDepPositive() The function isOuterMostDepPositive() is checked after negative dependence vectors are normalized to be non-negative, so there will not be any negative dependency ('>' as the outermost non-equal sign) after normalization. And therefore the check in isOuterMostDepPositive() is irrelevent and redundant. Reviewed By: congzhe Differential Revision: https://reviews.llvm.org/D132982	2022-10-04 14:54:08 -04:00
Eli Friedman	76ccd1db73	[AArch64] Don't form paired loads from epilogue operations on Windows AArch64LoadStoreOptimizer has a bunch of different guards to avoid corrupting Windows SEH prologues/epilogues, but apparently we missed the case of merging two instructions where the first instruction isn't part of the epilogue, but the second instruction is. Fixes issue discovered at https://reviews.llvm.org/D130049#3704064 Differential Revision: https://reviews.llvm.org/D134992	2022-10-04 11:41:59 -07:00
Chris Bieneman	618e5006a5	[DirectX] Generate `dx.resources` metadata entry This code adds initial support for generating the HLSL resources metadata entries. It has a lot of `FIXMEs` laying around because there is a lot more work to do here, but this lays a solid groundwork and can accurately handle some trivial cases. I've filed a swath of issues covering the deficiencies here and left the issues in comments so that we can easily follow them. One big change to make sooner rather than later is to move some of this code into a new libLLVMFrontendHLSL so that we can share it with the Clang CodeGen layer. Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D134682	2022-10-04 12:36:39 -05:00
Sanjay Patel	17dcbd8165	[SDAG] don't hoist div/rem through a select with neutral constant This bug was introduced with D134966.	2022-10-04 13:15:01 -04:00
Daniel Rodríguez Troitiño	86bf43d2ab	[ObjectYAML] Support for basic data in code. This is a split of D134250. Supports for parsing and dumping the LC_DATA_IN_CODE contents (as binary data). This allows more complete testing of llvm-objdump in D133974. Reviewed By: Higuoxing Differential Revision: https://reviews.llvm.org/D134569	2022-10-04 09:36:27 -07:00
Craig Topper	a38fb90b19	[RISCV] Refactor and improve eliminateFrameIndex. There are few changes mixed in here. -Try to reuse the destination register from ADDI instead of always creating a virtual register. This way we lean on the register scavenger in fewer case. -Explicitly reuse the primary virtual register when possible. There's still a case where both getVLENFactoredAmount and handling large fixed offsets can both create a secondary virtual register. -Combine similar BuildMI calls by manipulating the Register variables. There are still a couple early outs for ADDI, but overall I tried to arrange the code into steps. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135009	2022-10-04 09:32:27 -07:00
Craig Topper	2734968e32	[RISCV] Restructure eliminateFrameIndex to share more code. NFC The old code took two different paths based on whether there is a scalable offset, but these two paths had some code in common. The main difference between the two code paths was whether we needed to create a GPR or not for the ADDI that gets created for RVVSpill. If we had a scalable offset, the same GPR was used as the destination for adding the scalable offset and the ADDI. To manage this, we now cache the scratch register and reuse it if it has already been created. This is a pre-patch for D135009. Reviewed By: reames, frasercrmck Differential Revision: https://reviews.llvm.org/D135092	2022-10-04 09:27:48 -07:00
Daniel Rodríguez Troitiño	57bd11f047	[ObjectYAML][MachO] Encode export trie address as ULEB128, not as SLEB128 The `dumpExportEntry` was dumping everything using signed LEB128, but the format seems to use unsigned LEB128. This can be cross-checked with the implementation in MachOObjectFile.cpp, the implementation in LLD's ExportTrie.cpp, and the implementation in macho2yaml.cpp, which all use ULEB128 functions.. The difference is only apparent when encoding some values with specific bit patterns (bit active in the 7th, 14th, ... bits of the binary). The encoding was not always creating problems in the resulting binaries because if the extra byte was part of the padding, the result of decoding it as ULEB128 is the same as decoding as SLEB128, however, the code of MachOObjectFile.cpp (used by llvm-objdump) checks the buffer decoding position against the reported length, which triggered an error. Modified a test that used an address with this pattern (0x3FA0, the 14th bit is active), to show that a round trip still produces the same results, and added a check using llvm-objdump to use their extra checks to verify this implementation. Reviewed By: pete Differential Revision: https://reviews.llvm.org/D134563	2022-10-04 09:17:58 -07:00
Sanjay Patel	0a1210e482	[InstSimplify] try harder to fold fmul with 0.0 operand https://alive2.llvm.org/ce/z/oShzr3 This was noted as a missing fold in D134876 (with additional examples based on issue #58046). I'm assuming that fmul with a zero operand is rare enough that the use of ValueTracking will not noticeably increase compile-time. This adjusts a PowerPC codegen test that was added with D88388 because it would get folded away and no longer provide coverage for the bug fix.	2022-10-04 11:20:01 -04:00
Alexey Bataev	ab9a81f736	[SLP]Try to emit canonical shuffle with undef operand. In the canonical form of the shuffle the poison/undef operand is the second operand, the patch tries to emit canonical form for partial vectorization of the buildvector sequence. Also, this patch starts emitting freeze instruction for shuffles with undef indices if the second shuffle operan is undef, not poison. It is an initial step to D93818, where undef mask element are treated as returning poison value. Differential Revision: https://reviews.llvm.org/D134377	2022-10-04 08:16:07 -07:00
Pierre van Houtryve	75b292cb14	[AMDGPU][DAG] Fix insert_vector_elt lowering for 8 bit elements The bitmask used to extract the bits assumed 16 bit elements and wasn't taking the size of the elements into account. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135156	2022-10-04 14:48:15 +00:00
Sanjay Patel	7f7a0f2f83	[InstSimplify] reduce code duplication for fmul folds; NFC This is a modification of the earlier attempt from: `7b7940f9da` For fma callers, we only want to swap a 0.0 or 1.0 constant.	2022-10-04 10:29:53 -04:00
Pierre van Houtryve	c93104073c	[AMDGPU] Always lower SHUFFLE_VECTOR Make it illegal, remove InstructionSelector logic for it Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134967	2022-10-04 14:23:17 +00:00
Jay Foad	af947d9fcb	[ISel] Fix crash in new FMA DAG combine Fix a crash in the FMA combine added by D132837 and amended by D134810. In cases where the newly created node could be folded, the combiner would fail this assertion: llc: DAGCombiner.cpp:268: void (anonymous namespace)::DAGCombiner::AddToWorklist(llvm::SDNode *): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed. Differential Revision: https://reviews.llvm.org/D135150	2022-10-04 15:13:18 +01:00
Dominik Adamski	6842d35012	[OpenMP][OMPIRBuilder] Add support for order(concurrent) to OMPIRBuilder for SIMD directive If 'order(concurrent)' clause is specified, then the iterations of SIMD loop can be executed concurrently. This patch adds support for LLVM IR codegen via OMPIRBuilder for SIMD loop with 'order(concurrent)' clause. The functionality added to OMPIRBuilder is similar to the functionality implemented in 'CodeGenFunction::EmitOMPSimdInit'. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D134046 Signed-off-by: Dominik Adamski <dominik.adamski@amd.com>	2022-10-04 08:30:00 -05:00
Nikita Popov	e1dd2cd063	Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify Reapply with a fix for the case where an operand simplified back to the original phi: We need to map this case to the new phi node. ----- foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes https://github.com/llvm/llvm-project/issues/57448, which was my original motivation for the change.	2022-10-04 15:18:34 +02:00
Alex Richardson	16f9c5577d	[SimplifyLibCalls] Retain attributes added by Builder.CreateMem* This currently does not make much of a difference (only one tests is affected), but it is helpful e.g. for the out-of-tree CHERI target where Builder.CreateMemCpy() can add attributes other than parameter alignment. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135075	2022-10-04 13:11:34 +00:00
Bjorn Pettersson	491ac8f3e8	[LibCalls] Cast Char argument to 'int' before calling emitFPutC The helpers in BuildLibCalls normally expect that the Value arguments already have the correct type (matching the lib call signature). And exception has been emitFPutC which casted the Char argument to 'int' using CreateIntCast. This patch moves the cast to the caller instead of doing it inside emitFPutC. I think it makes sense to make the BuildLibCall API:s a bit more consistent this way, despite the need to handle the int cast in two different places now. Differential Revision: https://reviews.llvm.org/D135066	2022-10-04 12:52:05 +02:00
Bjorn Pettersson	aa1b64cc42	[BuildLibCalls] Use TLI to get 'int' and 'size_t' type sizes Stop assuming that an 'int' is 32 bits in helpers that emit libcalls to lib functions that had 'int' in the signature. For most targets this is NFC. For a target with 16 bit 'int' type this could help out detecting if trying to emit a libcall with incorrect signature. Similarly we now derive the type mapping to 'size_t' by asking TLI about the size of 'size_t'. This should be NFC (at least for in-tree targets) since getSizeTSize(), in TLI, is deriving the size in the same way as DataLayout::getIntPtrType(). Differential Revision: https://reviews.llvm.org/D135065	2022-10-04 12:52:05 +02:00
Bjorn Pettersson	73e8d95d28	[BuildLibCalls] Name types to identify when 'int' and 'size_t' is assumed. NFC Lots of BuildLibCalls helpers are using Builder::getInt32Ty to get a type matching an 'int', and DataLayout::getIntPtrType to get a type matching 'size_t'. The former is not true for all targets, since and 'int' isn't always 32 bits. And the latter is a bit weird as well as the definition of DataLayout::getIntPtrType isn't clearly mapping it to 'size_t'. This patch is not aiming at solving any such problems. It is merely highlighting when a libcall is expecting to use 'int' and 'size_t' by naming the types as IntTy and SizeTTy when preparing the type signatures for the emitted libcalls. Differential Revision: https://reviews.llvm.org/D135064	2022-10-04 12:52:05 +02:00
Florian Hahn	825e16969e	[LAA] Pass LoopAccessInfoManager instead of GetLAA function. Use LoopAccessInfoManager directly instead of various GetLAA lambdas. Depends on D134608. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134609	2022-10-04 11:51:25 +01:00
Amara Emerson	75b18ba14d	Revert "[AArch64][GlobalISel] Fold away lowered vector sign-extend of vector compares." This reverts commit `dcd02a524b`. We should instead use the generic combine.	2022-10-04 11:03:02 +01:00
Nikita Popov	6e504d637d	[ValueTracking] Handle constant exprs in isKnownNonZero() Handle constant expressions by falling through to the general operator-based code. In particular, this adds support for bitcast and GEP expressions.	2022-10-04 11:58:07 +02:00
Florian Hahn	e399dd601f	[SimpleLoopUnswitch] Clear block and loop dispos after destroying loop. SimpleLoopUnswitch may remove loops. Clear block and loop dispositions, to clean up invalid entries in the cache. Fixes #58136.	2022-10-04 10:27:52 +01:00
Nikita Popov	635f93dff7	[SimplifyLibCalls] Place deref attr even if nonnull already set If nonnull is already set, we currently skip setting both nonnull and dereferenceable. Make these independent, to avoid regressions when additional nonnull attributes are inferred earlier.	2022-10-04 11:26:15 +02:00
Nikita Popov	0f32f0e147	Revert "[InstCombine] Switch foldOpIntoPhi() to use InstSimplify" This reverts commit `b20e34b39f`. This causes RAUW type mismatch assertions on some buildbots, reverting for now.	2022-10-04 11:17:09 +02:00
Nikita Popov	45dec8f5fd	[ValueTracking] Avoid known bits fallthrough for freeze (NFCI) The known bits logic should never produce a better result than the direct recursive non-zero query here, so skip the fallthrough.	2022-10-04 11:02:31 +02:00
Nikita Popov	9c0314f54e	[ValueTracking] Switch isKnownNonZero() to switch over opcodes (NFCI) The change in the assume-queries-counter.ll test is because we skip and unnecessary known bits query for arguments.	2022-10-04 10:54:28 +02:00
Nikita Popov	b20e34b39f	[InstCombine] Switch foldOpIntoPhi() to use InstSimplify foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes https://github.com/llvm/llvm-project/issues/57448, which was my original motivation for the change.	2022-10-04 10:12:14 +02:00
Florian Hahn	db720dc17c	[LAA] Use LoopAccessInfoManager in legacy pass. Simplify LoopAccessLegacyAnalysis by using LoopAccessInfoManager from D134606. As a side-effect this also removes printing support from LoopAccessLegacyAnalysis. Depends on D134606. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134608	2022-10-04 08:37:11 +01:00
Lang Hames	3019f488f4	[ORC] Don't unnecessarily copy collection element.	2022-10-03 21:50:01 -07:00
Craig Topper	05df15965b	[RISCV] Use _TIED form of VFWADD(U)_WV/VFWSUB(U)_WV to avoid early clobber. One of the sources is the same size as the destination so that source doesn't have an overlap with the destination register. By using the _TIED form we avoid an early clobber contraint for that source. This matches what was already done for instrinsics. ConvertToThreeAddress will fix it if it can't stay tied.	2022-10-03 21:44:08 -07:00

... 3 4 5 6 7 ...

162738 Commits