llvm-project

Commit Graph

Author	SHA1	Message	Date
Min-Yih Hsu	8dddc15297	[M68k](4/8) MC layer and object file support - Add the M68k-specific MC layer implementation - Add ELF support for M68k - Add M68k-specifc CC and reloc TODO: Currently AsmParser and disassembler are not implemented yet. Please use this bug to track the status: https://bugs.llvm.org/show_bug.cgi?id=48976 Authors: myhsu, m4yers, glaubitz Differential Revision: https://reviews.llvm.org/D88390	2021-03-08 12:30:57 -08:00
Min-Yih Hsu	bec7b16692	[M68k](3/8) Skeleton and target description files - Infrastructure for the target (i.e. build files, target triple etc.) - All of the target description TableGen file Authors: myhsu, m4yers, glaubitz Differential Revision: https://reviews.llvm.org/D88389	2021-03-08 12:30:57 -08:00
Yuta Saito	aa0c571a5f	[WebAssembly] Add new relocation for location relative data This `R_WASM_MEMORY_ADDR_SELFREL_I32` relocation represents an offset between its relocating address and the symbol address. It's very similar to `R_X86_64_PC32` but restricted to be used for only data segments. ``` S + A - P ``` A: Represents the addend used to compute the value of the relocatable field. P: Represents the place of the storage unit being relocated. S: Represents the value of the symbol whose index resides in the relocation entry. Proposal: https://github.com/WebAssembly/tool-conventions/issues/162 Differential Revision: https://reviews.llvm.org/D96659	2021-03-08 11:34:10 -08:00
Craig Topper	7a64cc4a76	[RISCV] Make use of DAG.getNeutralElement in lowerVECREDUCE to avoid repeating the same list of constants. NFC Reviewed By: frasercrmck, khchen Differential Revision: https://reviews.llvm.org/D98091	2021-03-08 09:11:10 -08:00
Craig Topper	a2651266c5	[RISCV] Add explicit i64 types to RV64 isel patterns to stop tablegen from generating unneeded i32 patterns for RV32 HwMode.	2021-03-08 09:06:56 -08:00
Nemanja Ivanovic	b0f0115308	[AIX][TLS] Generate 32-bit general-dynamic access code sequence Adds support for the TLS general dynamic access model to assembly files on AIX 32-bit. To generate the correct code sequence when accessing a TLS variable `v`, we first create two TOC entry nodes, one for the variable offset, one for the region handle. These nodes are followed by a `PPCISD::TLSGD_AIX` node (new node introduced by this patch). The `PPCISD::TLSGD_AIX` node (`TLSGDAIX` pseudo instruction) is expanded to 2 copies (to put the variable offset and region handle in the right registers) and a call to `__tls_get_addr`. This patch also changes the way TC entries are generated in asm files. If the generated TC entry is for the region handle of a TLS variable, we add the `@m` relocation and the `.` prefix to the entry name. For example: ``` L..C0: .tc .v[TC],v[TL]@m -> region handle L..C1: .tc v[TC],v[TL] -> variable offset ``` Reviewed By: nemanjai, sfertile Differential Revision: https://reviews.llvm.org/D97948	2021-03-08 09:30:19 -06:00
Anirudh Prasad	7a46d34a19	[SystemZ][z/OS] Add support to validate a HLASM Label. - This patch adds in support to determine whether a particular label is valid for the hlasm variant - The label syntax being checked is that of an ordinary HLASM symbol (Reference, Chapter 2 (Coding and Structure) - Terms, Literals and Expressions - Terms - Symbols - Ordinary Symbol) - To achieve this, the virtual function isLabel defined in MCTargetAsmParser.h is made use of - The isLabel function is overridden in SystemZAsmParser for the hlasm variant, and the syntax is checked appropriately - Things remain unchanged for the att variant - Further patches will add in support to emit the label. These future patches will make use of this isLabel function Reviewed By: uweigand, Kai Differential Revision: https://reviews.llvm.org/D97748	2021-03-08 09:55:39 -05:00
gbtozers	e5d958c456	[DebugInfo] Support DIArgList in DbgVariableIntrinsic This patch updates DbgVariableIntrinsics to support use of a DIArgList for the location operand, resulting in a significant change to its interface. This patch does not update all IR passes to support multiple location operands in a dbg.value; the only change is to update the DbgVariableIntrinsic interface and its uses. All code outside of the intrinsic classes assumes that an intrinsic will always have exactly one location operand; they will still support DIArgLists, but only if they contain exactly one Value. Among other changes, the setOperand and setArgOperand functions in DbgVariableIntrinsic have been made private. This is to prevent code from setting the operands of these intrinsics directly, which could easily result in incorrect/invalid operands being set. This does not prevent these functions from being called on a debug intrinsic at all, as they can still be called on any CallInst pointer; it is assumed that any code directly setting the operands on a generic call instruction is doing so safely. The intention for making these functions private is to prevent DIArgLists from being overwritten by code that's naively trying to replace one of the Values it points to, and also to fail fast if a DbgVariableIntrinsic is updated to use a DIArgList without a valid corresponding DIExpression.	2021-03-08 14:36:13 +00:00
Ahsan Saghir	acce401068	[PowerPC] Change target data layout for 16-byte stack alignment This changes the target data layout to make stack align to 16 bytes on Power10. Before this change, stack was being aligned to 32 bytes. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D96265	2021-03-08 08:13:08 -06:00
Simon Pilgrim	f71cee136d	[X86] Break if-else chain. NFCI. Both if blocks affect control flow - we don't need the else. Fixes clang-tidy warning.	2021-03-08 11:44:31 +00:00
Fraser Cormack	18173c57bd	[RISCV] Add new entry points to getContainerForFixedLengthVector While working on adding fixed-length vectors to the calling convention, it was necessary to be able to query for a fixed-length vector container type without access to an instance of SelectionDAG. This patch modifies the "main" getContainerForFixedLengthVector function to use an instance of TargetLowering rather than SelectionDAG, and preserves the SelectionDAG overload as a wrapper. An additional non-static version of the function was also added to simplify the common case in RISCVTargetLowering. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97925	2021-03-08 09:26:19 +00:00
Freddy Ye	5f9489b754	[X86] Refine "Support -march=alderlake" Refine "Support -march=alderlake" Compare with tremont, it includes 25 more new features. They are adx, aes, avx, avx2, avxvnni, bmi, bmi2, cldemote, f16c, fma, hreset, invpcid, kl, lzcnt, movdir64b, movdiri, pclmulqdq, pconfig, pku, serialize, shstk, vaes, vpclmulqdq, waitpkg, widekl. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D97832	2021-03-08 13:17:18 +08:00
Craig Topper	c91b3c9e63	[RISCV] Fold (select_cc (setlt X, Y), 0, ne, trueV, falseV) -> (select_cc X, Y, lt, trueV, falseV) A setcc can be created during LegalizeDAG after select_cc has been created. This combine will enable us to fold these late setccs. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98132	2021-03-07 09:44:56 -08:00
Craig Topper	fdbd5d3206	[RISCV] Fold (select_cc (xor X, Y), 0, eq/ne, trueV, falseV) -> (select_cc X, Y, eq/ne, trueV, falseV) This pattern occurs when lowering for overflow operations introduce an xor after select_cc has already been formed. I had to rework another combine that looked for select_cc of an xor with 1. That xor will now get combined away so we just need to look for the RHS of the select_cc being 1. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98130	2021-03-07 09:29:55 -08:00
Simon Pilgrim	cd938ab162	[X86] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFB handling.	2021-03-07 12:56:35 +00:00
Simon Pilgrim	772a501bf4	[X86] canonicalizeShuffleWithBinOps - shuffle oneuse constants. We can freely shuffle all ones/zeros constants but we can also freely shuffle other constants as long as they only have one use.	2021-03-07 11:17:03 +00:00
Sean Fertile	f0904a6208	[PowePC][AIX] Handle variadic vector call operands. Patch adds support for passing vector call operands to variadic functions. Arguments which are fixed shadow GPRs and stack space even when they are passed in vector registers, while arguments passed through ellipses are passed in properly aligned GPRs if available and on the stack once all GPR arguments registers are consumed. Differential Revision: https://reviews.llvm.org/D97956	2021-03-06 13:49:55 -05:00
Alexey Lapshin	cf7cdaff64	[X86][VARARG] Avoid spilling xmm registers for va_start. That review is extracted from D69372. It fixes https://bugs.llvm.org/show_bug.cgi?id=42219 bug. For the noimplicitfloat mode, the compiler mustn't generate floating-point code if it was not asked directly to do so. This rule does not work with variable function arguments currently. Though compiler correctly guards block of code, which copies xmm vararg parameters with a check for %al, it does not protect spills for xmm registers. Thus, such spills are generated in non-protected areas and could break code, which does not expect floating-point data. The problem happens in -O0 optimization mode. With this optimization level there is used FastRegisterAllocator, which spills virtual registers at basic block boundaries. Register Allocator does not protect spills with additional control-flow modifications. Thus to resolve that problem, it is suggested to not copy incoming physical registers into virtual registers. Instead, store incoming physical xmm registers into the memory from scratch. Differential Revision: https://reviews.llvm.org/D80163	2021-03-06 15:25:47 +03:00
Jay Foad	99682bc039	Revert "Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030"" This reverts commit `e58d68fcd0`. This reinstates commit `fc28f600e5` with a fix to initialize HasShaderCyclesRegister. See https://reviews.llvm.org/D97928.	2021-03-06 09:00:01 +00:00
Fangrui Song	2d922de3af	[MC][RISCV] Support .reloc , BFD_RELOC_{NONE,32,64}, BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.	2021-03-05 21:45:11 -08:00
Fangrui Song	59ff9315fd	[MC][ARM] Support .reloc , BFD_RELOC_{NONE,8,16,32}, BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.	2021-03-05 21:39:16 -08:00
Fangrui Song	3110187f1f	[MC][PowerPC] Support .reloc , BFD_RELOC_{NONE,16,32,64}, BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.	2021-03-05 21:31:45 -08:00
Fangrui Song	aceea45d87	[MC][AArch64] Support .reloc , BFD_RELOC_{NONE,16,32,64}, BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.	2021-03-05 21:31:08 -08:00
Fangrui Song	4f7562d52f	[MC][X86] Support .reloc , BFD_RELOC_{NONE,8,16,32,64}, The names are unfortunate, but BFD_RELOC_NONE provides a generic way indicating a dependency between two sections, which is useful for ld --gc-sections. See https://sourceware.org/bugzilla/show_bug.cgi?id=27530	2021-03-05 21:31:05 -08:00
Mitch Phillips	e58d68fcd0	Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030" Broke the ASan/MSan buildbots. See more comments in the original patch, https://reviews.llvm.org/D97928. Build failure at http://lab.llvm.org:8011/#/builders/5/builds/5327 This reverts commit `fc28f600e5`.	2021-03-05 18:24:59 -08:00
Jay Foad	fc28f600e5	[AMDGPU] Restore the s_memtime instruction in gfx1030 gfx1030 added a new way to implement readcyclecounter using the SHADER_CYCLES hardware register, but the s_memtime instruction still exists, so the MC layer should still accept it and the llvm.amdgcn.s.memtime intrinsic should still work. Differential Revision: https://reviews.llvm.org/D97928	2021-03-05 20:19:11 +00:00
Zarko Todorovski	2b50ce1524	[PowerPC][AIX] Enable the default AltiVec ABI on AIX This patch adds support for the default AltiVec ABI for AIX. Vector registers 20 through 31 are marked as reserved and cannot be used in the default ABI. This patch adds handling for this case and also remove the default AltiVec ABI errors. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D96351	2021-03-05 12:46:27 -05:00
RamNalamothu	3998a8e797	[AMDGPU] Do not attempt sgpr spills to vgpr, when it is disabled This covers a path missed in https://reviews.llvm.org/D95768. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D98013	2021-03-05 22:47:21 +05:30
Jinsong Ji	cc21de6789	[PowerPC] Update Copy/Paste encodings according to ISA3.1 Copy-paste P9 insns were added back in 2016, however, looks like the opcodes has changed in ISA3.1. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D97416	2021-03-05 17:05:50 +00:00
Simon Pilgrim	87d5b34c24	[X86] X86ISelDAGToDAG.cpp - include cstdint instead of stdint.h NFCI. Fixes clang-tidy warning	2021-03-05 15:58:20 +00:00
Simon Pilgrim	f11f86c114	[X86] X86DAGToDAGISel::Select - merge X86::TEST load bitsize checks. NFCI.	2021-03-05 15:58:20 +00:00
LemonBoy	8725b24c6d	[AArch64] Legalize horizontal fmax/fmin reductions on f16 vectors Expand the horizontal reduction during the instruction selection phase, but only if the target doesn't support the full fp16 instruction set. Fixes https://bugs.llvm.org/show_bug.cgi?id=49401 Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D97840	2021-03-05 16:09:37 +01:00
Ilya Leoshkevich	a7137b238a	[BPF] Add support for floats and doubles Some BPF programs compiled on s390 fail to load, because s390 arch-specific linux headers contain float and double types. At the moment there is no BTF_KIND for floats and doubles, so the release version of LLVM ends up emitting type id 0 for them, which the in-kernel verifier does not accept. Introduce support for such types to libbpf by representing them using the new BTF_KIND_FLOAT. Reviewed By: yonghong-song Differential Revision: https://reviews.llvm.org/D83289	2021-03-05 15:10:11 +01:00
Stephen Tozer	f677413071	Reapply "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values" Rewrites test to use correct architecture triple; fixes incorrect reference in SourceLevelDebugging doc; simplifies `spillReg` behaviour so as to not be dependent on changes elsewhere in the patch stack. This reverts commit `d2000b45d0`.	2021-03-05 12:32:05 +00:00
Sebastian Neubauer	e0e73714fb	[AMDGPU] Keep skip branch for ds instructions Same as other memory instructions, ds instructions add latency even if exec is zero. Jumping over them if exec=0 is cheaper than executing them. With this change, the branch instruction that skips over a basic block if exec=0 is not removed when the block contains a ds instruction. Differential Revision: https://reviews.llvm.org/D97922	2021-03-05 12:34:09 +01:00
Jingu Kang	9b302513f6	[AArch64] Add missing intrinsics for vrnd	2021-03-05 11:26:12 +00:00
Simon Pilgrim	3fd2fa1220	Revert rG8198d83965ba4b9db6922b44ef3041030b2bac39: "[X86] Pass to transform amx intrinsics to scalar operation." This reverts commit 8198d83965ba4b9db6922b44ef3041030b2bac39.due to buildbot breakages	2021-03-05 11:09:14 +00:00
Simon Pilgrim	d7b8cb4d57	[X86] X86ISelLowering.cpp - try to use for-range loops. NFCI.	2021-03-05 11:09:14 +00:00
Petar Avramovic	36beaa3ba3	Reland AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect Recommit `bf5a582650`. Depends on `4c8fb7ddd6` which was reverted. RegBankSelect creates zext and trunc when it selects banks for uniform i1. Add zext_trunc_fold from generic combiner to post RegBankSelect combiner. Differential Revision: https://reviews.llvm.org/D95432	2021-03-05 11:05:37 +01:00
Luo, Yuanke	8198d83965	[X86] Pass to transform amx intrinsics to scalar operation. This pass runs in any situations but we skip it when it is not O0 and the function doesn't have optnone attribute. With -O0, the def of shape to amx intrinsics is near the amx intrinsics code. We are not able to find a point which post-dominate all the shape and dominate all amx intrinsics. To decouple the dependency of the shape, we transform amx intrinsics to scalar operation, so that compiling doesn't fail. In long term, we should improve fast register allocation to allocate amx register. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D93594	2021-03-05 16:02:02 +08:00
Luke	d28297ff68	[RISCV] Enable fixed-length vectorization of LoopVectorizer for RISC-V Vector By implementing the method "unsigned RISCVTTIImpl::getRegisterBitWidth(bool Vector)", fixed-length vectorization is enabled when possible. Without this method, the "#pragma clang loop" directive is needed to enable vectorization(or the cost model may inform LLVM that "Vectorization is possible but not beneficial"). Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97549	2021-03-05 10:54:51 +08:00
Chen Zheng	87bbf3d1f8	[XCOFF][DebugInfo] support DWARF for XCOFF for assembly output. Reviewed By: jasonliu Differential Revision: https://reviews.llvm.org/D95518	2021-03-04 21:07:52 -05:00
Yonghong Song	9c0274cdea	BPF: permit type modifiers for __builtin_btf_type_id() relocation Lorenz Bauer from Cloudflare tried to use "const struct <name>" as the type for __builtin_btf_type_id(*(const struct <name>)0, 1) relocation and hit a llvm BPF fatal error. https://lore.kernel.org/bpf/a3782f71-3f6b-1e75-17a9-1827822c2030@fb.com/ ... fatal error: error in backend: Empty type name for BTF_TYPE_ID_REMOTE reloc Currently, we require the debuginfo type itself must have a name. In this case, the debuginfo type is "const" which points to "struct <name>". The "const" type does not have a name, hence the above fatal error will be triggered. Let us permit "const" and "volatile" type modifiers. We skip modifiers in some other cases as well like structure member type tracing. This can aviod the above fatal error. Differential Revision: https://reviews.llvm.org/D97986	2021-03-04 16:27:23 -08:00
David Blaikie	a2a55def35	Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering. This is included from IR files, and IR doesn't/can't depend on Analysis (because Analysis depends on IR). Also fix the implementation - don't use non-member static in headers, as it leads to ODR violations, inaccurate "unused function" warnings, etc. And fix the header protection macro name (we don't generally include "LIB" in the names, so far as I can tell).	2021-03-04 16:14:53 -08:00
Amara Emerson	501f6a4e9e	[AArch64][GlobalISel][RegBankSelect] Improve rbs of G_BUILD_VECTOR when fed by fp values. This is actually two changes. One is to avoid copies when fp values are fed into a build_vector, without being able to tell from the opcode. The other is that build_vectors are also marked as only defining FP, since they produce vector results. Differential Revision: https://reviews.llvm.org/D97968	2021-03-04 15:09:05 -08:00
Heejin Ahn	2b957ed4ff	[WebAssembly] Fix ExceptionInfo grouping again This is a case D97677 missed. When taking out remaining BBs that are reachable from already-taken-out exceptions (because they are not subexcptions but unwind destinations), I assumed the remaining BBs are not EH pads, but they can be. For example, ``` try { try { throw 0; } catch (int) { // (a) } } catch (int) { // (b) } try { foo(); } catch (int) { // (c) } ``` In this code, (b) is the unwind destination of (a) so its exception is taken out of (a)'s exception, But even though the next try-catch is not inside the first two-level try-catches, because the first try always throws, its continuation BB is unreachable and the whole rest of the function is dominated by EH pad (a), including EH pad (c). So after we take out of (b)'s exception out of (a)'s, we also need to take out (c)'s exception out of (a)'s, because (c) is reachable from (b). This adds one more step before what we did for remaining BBs in D97677; it traverses EH pads first to take subexceptions out of their incorrect parent exception. It's the same thing as D97677, but because we can do this before we add BBs to exceptions' sets, we don't need to fix sets and only need to fix parent exception pointers. Other changes are variable name changes (I changed `WE` -> `SrcWE`, `UnwindWE` -> `DstWE` for clarity), some comment changes, and a drive-by fix in a bug in a `LLVM_DEBUG` print statement. Fixes https://github.com/emscripten-core/emscripten/issues/13588. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97929	2021-03-04 15:05:13 -08:00
Heejin Ahn	561abd83ff	[WebAssembly] Disable uses of __clang_call_terminate Background: Wasm EH, while using Windows EH (catchpad/cleanuppad based) IR, uses Itanium-based libraries and ABIs with some modifications. `__clang_call_terminate` is a wrapper generated in Clang's Itanium C++ ABI implementation. It contains this code, in C-style pseudocode: ``` void __clang_call_terminate(void *exn) { __cxa_begin_catch(exn); std::terminate(); } ``` So this function is a wrapper to call `__cxa_begin_catch` on the exception pointer before termination. In Itanium ABI, this function is called when another exception is thrown while processing an exception. The pointer for this second, violating exception is passed as the argument of this `__clang_call_terminate`, which calls `__cxa_begin_catch` with that pointer and calls `std::terminate` to terminate the program. The spec (https://libcxxabi.llvm.org/spec.html) for `__cxa_begin_catch` says, ``` When the personality routine encounters a termination condition, it will call __cxa_begin_catch() to mark the exception as handled and then call terminate(), which shall not return to its caller. ``` In wasm EH's Clang implementation, this function is called from cleanuppads that terminates the program, which we also call terminate pads. Cleanuppads normally don't access the thrown exception and the wasm backend converts them to `catch_all` blocks. But because we need the exception pointer in this cleanuppad, we generate `wasm.get.exception` intrinsic (which will eventually be lowered to `catch` instruction) as we do in the catchpads. But because terminate pads are cleanup pads and should run even when a foreign exception is thrown, so what we have been doing is: 1. In `WebAssemblyLateEHPrepare::ensureSingleBBTermPads()`, we make sure terminate pads are in this simple shape: ``` %exn = catch call @__clang_call_terminate(%exn) unreachable ``` 2. In `WebAssemblyHandleEHTerminatePads` pass at the end of the pipeline, we attach a `catch_all` to terminate pads, so they will be in this form: ``` %exn = catch call @__clang_call_terminate(%exn) unreachable catch_all call @std::terminate() unreachable ``` In `catch_all` part, we don't have the exception pointer, so we call `std::terminate()` directly. The reason we ran HandleEHTerminatePads at the end of the pipeline, separate from LateEHPrepare, was it was convenient to assume there was only a single `catch` part per `try` during CFGSort and CFGStackify. --- Problem: While it thinks terminate pads could have been possibly split or calls to `__clang_call_terminate` could have been duplicated, `WebAssemblyLateEHPrepare::ensureSingleBBTermPads()` assumes terminate pads contain no more than calls to `__clang_call_terminate` and `unreachable` instruction. I assumed that because in LLVM very limited forms of transformations are done to catchpads and cleanuppads to maintain the scoping structure. But it turned out to be incorrect; passes can merge cleanuppads into one, including terminate pads, as long as the new code has a correct scoping structure. One pass that does this I observed was `SimplifyCFG`, but there can be more. After this transformation, a single cleanuppad can contain any number of other instructions with the call to `__clang_call_terminate` and can span many BBs. It wouldn't be practical to duplicate all these BBs within the cleanuppad to generate the equivalent `catch_all` blocks, only with calls to `__clang_call_terminate` replaced by calls to `std::terminate`. Unless we do more complicated transformation to split those calls to `__clang_call_terminate` into a separate cleanuppad, it is tricky to solve. --- Solution (?): This CL just disables the generation and use of `__clang_call_terminate` and calls `std::terminate()` directly in its place. The possible downside of this approach can be, because the Itanium ABI intended to "mark" the violating exception handled, we don't do that anymore. What `__cxa_begin_catch` actually does is increment the exception's handler count and decrement the uncaught exception count, which in my opinion do not matter much given that we are about to terminate the program anyway. Also it does not affect info like stack traces that can be possibly shown to developers. And while we use a variant of Itanium EH ABI, we can make some deviations if we choose to; we are already different in that in the current version of the EH spec we don't support two-phase unwinding. We can possibly consider a more complicated transformation later to reenable this, but I don't think that has high priority. Changes in this CL contains: - In Clang, we don't generate a call to `wasm.get.exception()` intrinsic and `__clang_call_terminate` function in terminate pads anymore; we simply generate calls to `std::terminate()`, which is the default implementation of `CGCXXABI::emitTerminateForUnexpectedException`. - Remove `WebAssembly::ensureSingleBBTermPads() function and `WebAssemblyHandleEHTerminatePads` pass, because terminate pads are already `catch_all` now (because they don't need the exception pointer) and we don't need these transformations anymore. - Change tests to use `std::terminate` directly. Also removes tests that tested `LateEHPrepare::ensureSingleBBTermPads` and `HandleEHTerminatePads` pass. - Drive-by fix: Add some function attributes to EH intrinsic declarations Fixes https://github.com/emscripten-core/emscripten/issues/13582. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97834	2021-03-04 14:26:35 -08:00
Jay Foad	ed7458398a	[AMDGPU] Don't check for VMEM hazards on GFX10 The hazard where a VMEM reads an SGPR written by a VALU counts as a data dependency hazard, so no nops are required on GFX10. Tested with Vulkan CTS on GFX10.1 and GFX10.3. Differential Revision: https://reviews.llvm.org/D97926	2021-03-04 21:44:56 +00:00
Jinsong Ji	7967221a72	[PowerPC] Disable more extended mne on AIX To avoid assembler errors. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D97418	2021-03-04 21:13:37 +00:00
Benjamin Kramer	e897feeb8a	[PPC] Silence unused variable warning in release builds. NFC.	2021-03-04 21:43:19 +01:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Caroline Concatto	f2b749be15	[CostModel][SVE] Add cost model for shuffle reverse with i1 and scalable vector This patch adds the cost model for experimental.vector.reverse with scalable vector types: nxv16i1, nxv8i1, nxv4i1 and nxv2i1. These types are missing from the previous cost model patch D95603. The cost model for experimental.vector.reverse with 1 bit mask is used by loop vectorization in the patch D95363 Differential Revision: https://reviews.llvm.org/D97758	2021-03-04 18:52:59 +00:00
Sean Fertile	aaeffbe007	[PowerPC][AIX] Handle variadic vector formal arguments. Patch adds support for passing vector arguments to variadic functions. Arguments which are fixed shadow GPRs and stack space even when they are passed in vector registers, while arguments passed through ellipses are passed in(properly aligned GPRs if available and on the stack once all GPR arguments registers are consumed. Differential Revision: https://reviews.llvm.org/D97485	2021-03-04 10:56:53 -05:00
Nico Weber	e68de60bc4	Revert "AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect" This reverts commit `bf5a582650`. Also depends on now-reverted `4c8fb7ddd6`	2021-03-04 10:16:11 -05:00
Petar Avramovic	bf5a582650	AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect RegBankSelect creates zext and trunc when it selects banks for uniform i1. Add zext_trunc_fold from generic combiner to post RegBankSelect combiner. Differential Revision: https://reviews.llvm.org/D95432	2021-03-04 15:05:24 +01:00
Ayke van Laethem	a1155ae64d	[AVR] Fix lifeness issues in the AVR backend This patch is a large number of small changes that should hopefully not affect the generated machine code but are still important to get right so that the machine verifier won't complain about them. The llvm/test/CodeGen/AVR/pseudo/*.mir changes are also necessary because without the liveins the used registers are considered undefined by the machine verifier and it will complain about them. Differential Revision: https://reviews.llvm.org/D97172	2021-03-04 14:04:39 +01:00
Simon Pilgrim	7cbc5df438	[X86] X86TargetLowering::isSafeMemOpType - break if-else chain. NFCI. All if-else blocks return - fixes clang-tidy warning.	2021-03-04 12:15:08 +00:00
Stephen Tozer	d2000b45d0	Revert "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values" This reverts commit `d07f106f4a`.	2021-03-04 11:59:21 +00:00
gbtozers	d07f106f4a	[DebugInfo] Add new instruction and DIExpression operator for variadic debug values This patch adds a new instruction that can represent variadic debug values, DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set of basic code changes in MachineInstr and a few adjacent areas, but does not correctly handle variadic debug values outside of these areas, nor does it generate them at any point. The new instruction is similar to the existing DBG_VALUE instruction, with the following differences: the operands are in a different order, any number of values may be used in the instruction following the Variable and Expression operands (these are referred to in code as “debug operands”) and are indexed from 0 so that getDebugOperand(X) == getOperand(X+2), and the Expression in a DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator to pass arguments into the expression. The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the index given by the argument onto the Expression stack. For example the sub-expression `DW_OP_LLVM_arg, 0` has the meaning “Push the debug operand at index 0 onto the expression stack.” Differential Revision: https://reviews.llvm.org/D82363	2021-03-04 11:45:35 +00:00
Oliver Stannard	aac056c528	[objdump][ARM] Use correct offset when printing ARM/Thumb branch targets llvm-objdump only uses one MCInstrAnalysis object, so if ARM and Thumb code is mixed in one object, or if an object is disassembled without explicitly setting the triple to match the ISA used, then branch and call targets will be printed incorrectly. This could be fixed by creating two MCInstrAnalysis objects in llvm-objdump, like we currently do for SubtargetInfo. However, I don't think there's any reason we need two separate sub-classes of MCInstrAnalysis, so instead these can be merged into one, and the ISA determined by checking the opcode of the instruction. Differential revision: https://reviews.llvm.org/D97766	2021-03-04 11:15:57 +00:00
Andrew Savonichev	d791695cb5	[MCA] Add support for in-order CPUs This patch adds a pipeline to support in-order CPUs such as ARM Cortex-A55. In-order pipeline implements a simplified version of Dispatch, Scheduler and Execute stages as a single stage. Entry and Retire stages are common for both in-order and out-of-order pipelines. Differential Revision: https://reviews.llvm.org/D94928	2021-03-04 14:08:19 +03:00
Simon Pilgrim	1584e55a26	[X86] canonicalizeShuffleWithBinOps - handle general unaryshuffle(binop(x,c)) patterns not just xor(x,-1) Generalize the shuffle(not(x)) -> not(shuffle(x)) fold to handle any binop with 0/-1. Hopefully we can further generalize to help push target unary/binary shuffles through binops similar to what we do in DAGCombiner::visitVECTOR_SHUFFLE	2021-03-04 10:44:38 +00:00
Fraser Cormack	8e7ceffd0b	[RISCV] Fix crash when inserting large fixed-length subvectors This patch addresses a compiler crash resulting from passing a fixed-length type to one that expects scalable vector types. An assertion was added to prevent this regressing in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97868	2021-03-04 09:27:16 +00:00
Fraser Cormack	d8e1d2ebf4	[RISCV] Preserve fixed-length VL on insert_vector_elt in more cases This patch fixes up one case where the fixed-length-vector VL was dropped (falling back to VLMAX) when inserting vector elements, as the code would lower via ISD::INSERT_VECTOR_ELT (at index 0) which loses the fixed-length vector information. To this end, a custom node, VMV_S_XF_VL, was introduced to carry the VL operand through to the final instruction. This node wraps the RVV vmv.s.x and vmv.s.f instructions, which were being selected by insert_vector_elt anyway. There should be no observable difference in scalable-vector codegen. There is still one outstanding drop from fixed-length VL to VLMAX, when an i64 element is inserted into a vector on RV32; the splat (which is custom legalized) has no notion of the original fixed-length vector type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97842	2021-03-04 09:21:10 +00:00
David Green	a968e7b82e	[ARM] KnownBits for CSINC/CSNEG/CSINV This adds some simple known bits handling for the three CSINC/NEG/INV instructions. From the operands known bits we can compute the common bits of the first operand and incremented/negated/inverted second operand. The first, especially CSINC ZR, ZR, comes up fair amount in the tests. The others are more rare so a unit test for them is added. Differential Revision: https://reviews.llvm.org/D97788	2021-03-04 08:40:20 +00:00
Florian Hahn	75805dce5f	[AArch64] Add implicit uses for operands when expanding BLR_RVMARKER. Make sure we preserve info about passed arguments as implicit uses, to make sure later passes still have access to this information. This fixes a mis-compile where the machine-combiner would pick an incorrect free register.	2021-03-03 21:56:05 +00:00
Jonas Paulsson	7334b3dc3e	[SystemZ] Reimplement the i8/i16 compare-and-swap logic. Even though the implementation in emitAtomicCmpSwapW() was correct, it made Valgrind report an error. Instead of using a RISBG on CmpVal, an LL[CH]R can be made on the OldVal, and the problem is avoided. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D97604	2021-03-03 14:04:32 -06:00
Florian Hahn	8c3a70a78f	[AArch64] Move CALL_RVMARKER definition after CALL. This is a NFC with respect to the generated code. But it fixes a crash when using -debug, because of the position in the enum CALL_RVMARKER nodes were treated as memops. That caused a crash when printing CALL_RVMARKER nodes.	2021-03-03 19:42:16 +00:00
Stanislav Mekhanoshin	b70c483e04	[AMDGPU] Exclude always_inline from max bb threshold Honor always_inline attribute when processing -amdgpu-inline-max-bb. It was lost during the ports of the heuristic. There is no reason to honor inline hint, but not always inline. Differential Revision: https://reviews.llvm.org/D97790	2021-03-03 10:21:56 -08:00
Simon Pilgrim	aa4afebbf9	[X86] Fold scalar_to_vector(x) -> extract_subvector(broadcast(x),0) iff broadcast(x) exists Add handling for reusing an existing broadcast(x) to a wider vector.	2021-03-03 15:50:37 +00:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
Ayke van Laethem	15f495c0bc	[AVR] Fix def state of operands Some instructions (especially mov+pop instructions) were setting the wrong operands. For example, the pop instruction had the register set as a source operand while it is a destination operand (the value is loaded into the register). I have found these issues using the machine verifier and using manual code inspection. Differential Revision: https://reviews.llvm.org/D97159	2021-03-03 15:36:05 +01:00
Ayke van Laethem	bbfef8ac95	[AVR] Fix expansion of NEGW The previous expansion used SBCI, which is incorrect because the NEGW pseudo instruction accepts a DREGS operand (2xGPR8) and SBCI only allows LD8 registers. One solution could be to correct the NEGW pseudo instruction, but another solution is to use a different instruction (sbc) that does accept a GPR8 register and therefore allows more freedom to the register allocator. The output now matches avr-gcc for the following code: int foo(int n) { return -n; } I've found this issue using the machine instruction verifier: it was complaining about the wrong register class in NEGWRd.mir. Differential Revision: https://reviews.llvm.org/D97131	2021-03-03 15:36:05 +01:00
Ayke van Laethem	4f6d7985d4	[AVR] Add register aliases XL, YH, etc These aliases are sometimes used in assembly code and make the code more readable. They are supported by avr-gcc too. Differential Revision: https://reviews.llvm.org/D96492	2021-03-03 15:36:05 +01:00
Matt Arsenault	78dcff4841	GlobalISel: Add default implementation of assignValueToReg Refactor insertion of the asserting ops. This enables using them for AMDGPU. This code should essentially be the same for every target. Mips, X86 and ARM all have different code there now, but this seems to be an accident. The assignment functions are called with different types than they would be in the DAG, so this is all likely an assortment of hacks to get around that.	2021-03-03 09:29:53 -05:00
Piotr Sobczak	4672bac177	[AMDGPU] Introduce Strict WQM mode * Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96258	2021-03-03 14:19:16 +01:00
Piotr Sobczak	c3ce7bae80	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm * Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257	2021-03-03 09:33:57 +01:00
Carl Ritson	2ddac69f98	[AMDGPU] Rename llvm.amdgcn.msaa.load to llvm.amdgcn.msaa.load.x While the underlying instruction is called image_msaa_load, the resource must be x component only. Rename the intrinsic for clarity. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97829	2021-03-03 17:30:39 +09:00
David Green	ab280cbaa3	[ARM] Ensure undef is propagated to CBZ/CBNZ flags In some rare circumstances we can be using an undef register for a compare. When folded into a CBZ/CBNZ the undef flags are lost, leading to machine verifier problems. This propagates the existing flags to the new instruction.	2021-03-03 08:02:58 +00:00
Andy Wingo	4307069df4	[WebAssembly] Swap operand order of call_indirect in text format The WebAssembly text and binary formats have different operand orders for the "type" and "table" fields of call_indirect (and return_call_indirect). In LLVM we use the binary order for the MCInstr, but when we produce or consume the text format we should use the text order. For compilation units targetting WebAssembly 1.0 (without the reference types feature), we omit the table operand entirely. Differential Revision: https://reviews.llvm.org/D97761	2021-03-03 08:51:21 +01:00
Qiu Chaofan	72d4a41ba6	[PowerPC] Allow spilling GPR to VSR on AIX This patch enables spilling GPR to VSRs instead of stack under AIX ABI. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97367	2021-03-03 13:32:39 +08:00
Victor Huang	1756b2adc9	[AIX][TLS] Generate TLS variables in assembly files This patch allows generating TLS variables in assembly files on AIX. Initialized and external uninitialized variables are generated with the .csect pseudo-op and local uninitialized variables are generated with the .comm/.lcomm pseudo-ops. The patch also adds a check to explicitly say that TLS is not yet supported on AIX. Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile Originally patched by: bsaleil Commandeered by: NeHuang Differential Revision: https://reviews.llvm.org/D96184	2021-03-02 18:22:48 -06:00
Matt Arsenault	fd82cbcf7d	GlobalISel: Merge and cleanup more AMDGPU call lowering code This merges more AMDGPU ABI lowering code into the generic call lowering. Start cleaning up by factoring away more of the pack/unpack logic into the buildCopy{To\|From}Parts functions. These could use more improvement, and the SelectionDAG versions are significantly more complex, and we'll eventually have to emulate all of those cases too. This is mostly NFC, but does result in some minor instruction reordering. It also removes some of the limitations with mismatched sizes the old code had. However, similarly to the merge on the input, this is forcing gfx6/gfx7 to use the gfx8+ ABI (which is what we actually want, but SelectionDAG is stuck using the weird emergent ABI). This also changes the load/store size for stack passed EVTs for AArch64, which makes it consistent with the DAG behavior.	2021-03-02 17:31:13 -05:00
Heejin Ahn	4a58116b7e	[WebAssembly] Fix more ExceptionInfo grouping bugs This fixes two bugs in `WebAssemblyExceptionInfo` grouping, created by D97247. These two bugs are not easy to split into two different CLs, because tests that fail for one also tend to fail for the other. - In D97247, when fixing `ExceptionInfo` grouping by taking out the unwind destination' exception from the unwind src's exception, we just iterated the BBs in the function order, but this was incorrect; this changes it to dominator tree preorder. Please refer to the comments in the code for the reason and an example. - After this subexception-taking-out fix, there still can be remaining BBs we have to take out. When Exception B is taken out of Exception A (because EHPad B is the unwind destination of EHPad A), there can still be BBs within Exception A that are reachable from Exception B, which also should be taken out. Please refer to the comments in the code for more detailed explanation on why this can happen. To make this possible, this splits `WebAssemblyException::addBlock` into two parts: adding to a set and adding to a vector. We need to iterate on BBs within a `WebAssemblyException` to fix this, so we add BBs to sets first. But we add BBs to vectors later after we fix all incorrectness because deleting BBs from vectors is expensive. I considered removing the vector from `WebAssemblyException`, but it was not easy because this class has to maintain a similar interface with `MachineLoop` to be wrapped into a single interface `SortRegion`, which is used in CFGSort. Other misc. drive-by fixes: - Make `WebAssemblyExceptionInfo` do not even run when wasm EH is not used or the function doesn't have any EH pads, not to waste time - Add `LLVM_DEBUG` lines for easy debugging - Fix `preds` comments in cfg-stackify-eh.ll - Fix `__cxa_throw`'s signature in cfg-stackify-eh.ll Fixes https://github.com/emscripten-core/emscripten/issues/13554. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97677	2021-03-02 13:44:09 -08:00
Yonghong Song	51cdb780db	BPF: Fix a bug in peephole TRUNC elimination optimization Andrei Matei reported a llvm11 core dump for his bpf program https://bugs.llvm.org/show_bug.cgi?id=48578 The core dump happens in LiveVariables analysis phase. #4 0x00007fce54356bb0 __restore_rt #5 0x00007fce4d51785e llvm::LiveVariables::HandleVirtRegUse(unsigned int, llvm::MachineBasicBlock, llvm::MachineInstr&) #6 0x00007fce4d519abe llvm::LiveVariables::runOnInstr(llvm::MachineInstr&, llvm::SmallVectorImpl<unsigned int>&) #7 0x00007fce4d519ec6 llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock, unsigned int) #8 0x00007fce4d51a4bf llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&) The bug can be reproduced with llvm12 and latest trunk as well. Futher analysis shows that there is a bug in BPF peephole TRUNC elimination optimization, which tries to remove unnecessary TRUNC operations (a <<= 32; a >>= 32). Specifically, the compiler did wrong transformation for the following patterns: %1 = LDW ... %2 = SLL_ri %1, 32 %3 = SRL_ri %2, 32 ... %3 ... %4 = SRA_ri %2, 32 ... %4 ... The current transformation did not check how many uses of %2 and did transformation like %1 = LDW ... ... %1 ... %4 = SRL_ri %2, 32 ... %4 ... and pseudo register %2 is used by not defined and caused LiveVariables analysis core dump. To fix the issue, when traversing back from SRL_ri to SLL_ri, check to ensure SLL_ri has only one use. Otherwise, don't do transformation. Differential Revision: https://reviews.llvm.org/D97792	2021-03-02 13:03:42 -08:00
Amara Emerson	8a316045ed	[AArch64][GlobalISel] Enable use of the optsize predicate in the selector. To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass. Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize. Differential Revision: https://reviews.llvm.org/D97732	2021-03-02 12:55:51 -08:00
David Green	438c98515c	[ARM] Use 0, not ZR during ISel for CSINC/INV/NEG Instead of converting the 0 into a ZR reg during lowering, do that with tablegen by matching the zero immediate. This when combined with other optimizations is more likely to use ZR and helps keep the DAG more easily optimizable. It should not otherwise effect code generation.	2021-03-02 19:01:14 +00:00
Jonas Paulsson	52bbbf4d44	[SystemZ] Assign the full space for promoted and split outgoing args. When a large "irregular" (e.g. i96) integer call argument is converted to indirect, 64-bit parts are stored to the stack. The full stack space (e.g. i128) was not allocated prior to this patch, but rather just the exact space of the original type. This caused neighboring values on the stack to be overwritten. Thanks to Josh Stone for reporting this. Review: Ulrich Weigand Fixes https://bugs.llvm.org/show_bug.cgi?id=49322 Differential Revision: https://reviews.llvm.org/D97514	2021-03-02 12:56:47 -06:00
Joe Nash	5531f24cc2	[AMDGPU] Make OMod explicit for V_CVT_{U,I}* Make OMod explicit instead of implied by HasModifiers in the operand list. Requires explicitly setting HasOMod=1 for irregular OMod usage in instruction V_CVT_{U,I}* Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97587 Change-Id: I230e1476f529e816eec60e242531f23a99e3839f	2021-03-02 13:32:06 -05:00
Fraser Cormack	c1695ddf7d	[RISCV] Support fixed-length INSERT_VECTOR_ELT This patch enables support for lowering INSERT_VECTOR_ELT on fixed-length vector types. The strategy follows that for scalable vector types. This patch also includes a quick fix to prevent the compiler infinitely looping between lowering BUILD_VECTOR as VECTOR_SHUFFLE and back again. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97698	2021-03-02 16:48:38 +00:00
Simon Pilgrim	25b788716b	[AMDGPU] Fix "initialization is never read" clang-tidy warnings. NFCI.	2021-03-02 12:06:24 +00:00
Fraser Cormack	de2b70010a	[RISCV] Lower CONCAT_VECTORS to INSERT_SUBVECTOR nodes The default expansion of CONCAT_VECTORS goes through the stack. This patch avoids that penalty by custom-lowering CONCAT_VECTORS to a series of INSERT_SUBVECTOR nodes. Futher optimizations are possible, but this is a good start. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97692	2021-03-02 11:13:59 +00:00
Benjamin Kramer	10c256ccaf	Revert "[X86] Fold shuffle(not(x),undef) -> not(shuffle(x,undef))" This reverts commit `925093d88a`. Causes an infinite loop when compiling some shuffles: $ cat bugpoint-reduced-simplified.ll target triple = "x86_64-unknown-linux-gnu" define void @foo() { entry: %0 = load i8, i8* undef, align 1 %broadcast.splatinsert = insertelement <16 x i8> poison, i8 %0, i32 0 %1 = icmp ne <16 x i8> %broadcast.splatinsert, zeroinitializer %2 = shufflevector <16 x i1> %1, <16 x i1> undef, <16 x i32> zeroinitializer %wide.load = load <16 x i8>, <16 x i8>* undef, align 1 %3 = icmp ne <16 x i8> %wide.load, zeroinitializer %4 = and <16 x i1> %3, %2 %5 = zext <16 x i1> %4 to <16 x i8> store <16 x i8> %5, <16 x i8>* undef, align 1 ret void } $ llc < bugpoint-reduced-simplified.ll <timeout>	2021-03-02 11:24:07 +01:00
Dmitry Preobrazhensky	28f164bca7	[AMDGPU][MC][GFX9+] Corrected encoding of op_sel_hi for unused operands in VOP3P Corrected encoding of VOP3P op_sel_hi for unused operands. See bug 49363. Differential Revision: https://reviews.llvm.org/D97689	2021-03-02 13:02:25 +03:00
David Green	d6ba8ecb60	[ARM] Add handling of t2LDRSB/t2LDRSH in Constant Island Pass These constant pool loads should be treated similarly to t2LDRB/t2LDRH, acting on the same offset ranges. Add handling and a simple test.	2021-03-02 08:46:07 +00:00
Stanislav Mekhanoshin	7c724a896f	[AMDGPU] Do not check max-bb for a single block callee -amdgpu-inline-max-bb option could lead to a suboptimal codegen preventing inlining of really simple functions including pure wrapper calls. Relax the cutoff by allowing to call a function with a single block on the grounds that it will not increase total number of blocks after inlining. Differential Revision: https://reviews.llvm.org/D97744	2021-03-01 19:48:50 -08:00
Jian Cai	c35105055e	[ARM] support symbolic expressions as branch target in b.w Currently ARM backend validates the range of branch targets before the layout of fragments is finalized. This causes build failure if symbolic expressions are used, with the exception of a single symbolic value. For example, "b.w ." works but "b.w . + 2" currently fails to assemble. This fixes the issue by delaying this check (in ARMAsmParser::validateInstruction) of b.w instructions until the symbol expressions are resolved (in ARMAsmBackend::adjustFixupValue). Link: https://github.com/ClangBuiltLinux/linux/issues/1286 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D97568	2021-03-01 17:41:35 -08:00
Jessica Paquette	3e8223b165	[AArch64][GlobalISel] NFC: Remove dead G_BUILD_VECTOR legalization rule Remove a rule which allows larger scalar types than the destination vector element type. This appears to be irrelevant now that we have G_BUILD_VECTOR_TRUNC. Plus, making a G_BUILD_VECTOR which satisfies this introduces a verifier failure anyway. Differential Revision: https://reviews.llvm.org/D97727	2021-03-01 14:04:40 -08:00
David Green	e880f8b88a	[ARM] Rename pass to MVETPAndVPTOptimisationsPass This pass has for a while performed Tail predication as well as VPT block optimizations. Rename the pass to make that clear.	2021-03-01 21:57:19 +00:00
Amara Emerson	b783aa8979	[AArch64] Fix emitting an AdrpAddLdr LOH when there's a potential clobber of the def of the adrp before the ldr. Apparently this pass used to have liveness analysis but it was removed for scompile time reasons. This workaround prevents the LOH from being emitted unless the ADD and LDR are adjacent. Fixes https://github.com/JuliaLang/julia/issues/39820 Differential Revision: https://reviews.llvm.org/D97571	2021-03-01 13:52:57 -08:00
Anirudh Prasad	5cb417527c	[SystemZ] Introduce distinction between the jg/jl family of mnemonics for GNU as vs HLASM - This patch adds in the distinction between jg[] and jl[] pc-relative mnemonics based on the variant/dialect. - Under the hlasm variant, we use the jl[] family of mnemonics and under the att (GNU as) variant, we use the jg[] family of mnemonics. - jgnop which was added in https://reviews.llvm.org/D92185, is now restricted to att variant. jlnop is introduced and restricted to hlasm variant. - The br[]l additional mnemonics are mapped to either jl[]/jg[*] based on the variant. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D97581	2021-03-01 16:36:07 -05:00
Heejin Ahn	dcfec279d6	[WebAssembly] Handle empty cleanuppads when adding catch_all In `LateEHPrepare::addCatchAlls`, the current code tries to get the iterator's debug info even when it is `MachineBasicBlock::end()`. This fixes the bug by adding empty debug info instead in that case. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D97679	2021-03-01 10:07:05 -08:00
Andy Wingo	2632ba6a35	[WebAssembly] call_indirect issues table number relocs If the reference-types feature is enabled, call_indirect will explicitly reference its corresponding function table via TABLE_NUMBER relocations against a table symbol. Also, as before, address-taken functions can also cause the function table to be created, only with reference-types they additionally cause a symbol table entry to be emitted. Differential Revision: https://reviews.llvm.org/D90948	2021-03-01 16:49:00 +01:00
Simon Pilgrim	925093d88a	[X86] Fold shuffle(not(x),undef) -> not(shuffle(x,undef)) Move NOT out to expose more AND -> ANDN folds	2021-03-01 14:47:39 +00:00
Jay Foad	796a60d2ea	[AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32) The expected use case is for frontends to insert this into shaders that are to be run under a debugger. The shader can then be resumed or single stepped from the point of the call under debugger control. Differential Revision: https://reviews.llvm.org/D97670	2021-03-01 14:30:23 +00:00
Jay Foad	48ca5d3398	[AMDGPU] Simplify SITargetLowering::isSDNodeSourceOfDivergence. NFC. Check for read-modify-write AtomicSDNodes instead of using an exhaustive list of ISD opcodes. Differential Revision: https://reviews.llvm.org/D97671	2021-03-01 14:22:08 +00:00
Matt Arsenault	6c260d3bc0	GlobalISel: Move splitToValueTypes to generic code I copied the nearly identical function from AArch64 into AMDGPU, so fix this duplication. Mips and X86 have their own more exotic versions which should be removed. However replacing those is better left for a separate patch since it requires other changes to avoid regressions.	2021-03-01 08:58:18 -05:00
Matt Arsenault	b4bfe29415	AArch64/GlobalISel: Fix using wrong calling convention for calls This was reusing the parent function calling convention instead of the callee. I'm not sure if there's a case where there's an observable difference. I previously missed this in `b72a23650f`	2021-03-01 08:46:33 -05:00
David Green	7abf7dd5ef	[AArch64] Add combine for add(udot(0, x, y), z) -> udot(z, x, y). Given a zero input for a udot, an add can be folded in to take the place of the input, using thte addition that the instruction naturally performs. Differential Revision: https://reviews.llvm.org/D97188	2021-03-01 12:53:34 +00:00
Fraser Cormack	3fea9226ee	[RISCV] Support INSERT_SUBVECTOR on vector masks Like with EXTRACT_SUBVECTOR, INSERT_SUBVECTOR poses a problem for vector masks as RVV isn't able to slide mask types around. We choose instead to bitcast to equivalently-sized i8 types where we can, else we zero-extend, perform the operation, and truncate back down. One test was left disabled due to a crash in the legalizer. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97559	2021-03-01 12:04:11 +00:00
Fraser Cormack	e80ca3af82	[RISCV] Fix INSERT/EXTRACT_SUBVECTOR on fractional LMUL types This patch fixes a bug where the lowering for INSERT_SUBVECTOR and EXTRACT_SUBVECTOR would insist on first extracting a register-aligned LMUL1 vector type before perfoming the slide up/down. This was even if the vector was a fractional LMUL type, in which case the aligned EXTRACT_SUBVECTOR was invalid. This issue only occurred for scalable vector types, but a variety of tests for both scalable and fixed-length vectors have been added to ensure this does not regress in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97556	2021-03-01 11:51:05 +00:00
Fraser Cormack	4ea734e6ec	[RISCV] Unify scalable- and fixed-vector INSERT_SUBVECTOR lowering This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR operations under one roof. Consequently, with this patch it is possible to support any fixed-length subvector insertion, not just "cast-like" ones. As before, support for the insertion of mask vectors will come in a separate patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97543	2021-03-01 11:38:47 +00:00
Fraser Cormack	bd4d421688	[RISCV] Support EXTRACT_SUBVECTOR on vector masks This patch adds support for extracting subvectors from vector masks. This can be either extracting a scalable vector from another, or a fixed-length vector from a fixed-length or scalable vector. Since RVV lacks a way to slide vector masks down on an element-wise basis and we don't know the true length of the vector registers, in many cases we must resort to using equivalently-sized i8 vectors to perform the operation. When this is not possible we fall back and extend to a suitable i8 vector. Support was also added for fixed-length truncation to mask types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97475	2021-03-01 11:20:09 +00:00
Fraser Cormack	6718fda6ad	[CodeGen] Fix issues with subvector intrinsic index types This patch addresses issues arising from the fact that the index type used for subvector insertion/extraction is inconsistent between the intrinsics and SDNodes. The intrinsic forms require i64 whereas the SDNodes use the type returned by SelectionDAG::getVectorIdxTy. Rather than update the intrinsic definitions to use an overloaded index type, this patch fixes the issue by transforming the index to the correct type as required. Any loss of index bits going from i64 to a smaller type is unexpected, and will be caught by an assertion in SelectionDAG::getVectorIdxConstant. The patch also updates the documentation for INSERT_SUBVECTOR and adds an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR. This necessitated changes to AArch64 which was using i64 for EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed its codegen after updating the backend accordingly. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D97459	2021-03-01 10:28:21 +00:00
David Green	91ebc4e864	[ARM] VMOVN undef folding If we insert undef using a VMOVN, we can just use the original value in three out of the four possible combinations. Using VMOVT into a undef vector will still require the lanes to be moved, but otherwise the non-undef value can be used.	2021-02-28 14:44:45 +00:00
Simon Pilgrim	ab3ea27b6f	[X86][AVX] Reuse existing VBROADCAST(x) for SCALAR_TO_VECTOR(x) Similar to what we already do for BROADCASTs of different vector sizes - if we're going to broadcast it anyway might as well reuse it.	2021-02-28 11:37:27 +00:00
David Green	0fe64812d8	[ARM] VECTOR_REG_CAST undef -> undef Propagate undef through VECTOR_REG_CAST nodes, allowing extra simplification in some patterns.	2021-02-28 11:13:49 +00:00
Craig Topper	993f4d8ffa	[X86] Fix a couple comments that said LHS where they meant RHS. NFC	2021-02-27 17:14:17 -08:00
Wang, Pengfei	42e025f9de	[X86] Disable rematerializion for PTILELOADDV Per the discussion in D97453. We currently disable it due to it's not a common scenario and has some problem in implementation. Differential Revision: https://reviews.llvm.org/D97453	2021-02-27 21:08:58 +08:00
Heejin Ahn	aa097ef8d4	[WebAssembly] Fix reverse mapping in WasmEHFuncInfo D97247 added the reverse mapping from unwind destination to their source, but it had a critical bug; sources can be multiple, because multiple BBs can have a single BB as their unwind destination. This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes it return a vector rather than a single BB. It does not return the const reference to the existing vector but creates a new vector because `WasmEHFuncInfo` stores not `BasicBlock` or `MachineBasicBlock` but `PointerUnion` of them. Also I hoped to unify those methods for `BasicBlock` and `MachineBasicBlock` into one using templates to reduce duplication, but failed because various usages require `BasicBlock*` to be `const` but it's hard to make it `const` for `MachineBasicBlock` usages. Fixes https://github.com/emscripten-core/emscripten/issues/13514. (More precisely, fixes https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744) Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97583	2021-02-26 17:12:10 -08:00
Fangrui Song	47c5576d7d	ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects If a global object is listed in `@llvm.used`, place it in a unique section with the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections` with LLD>=13 or GNU ld>=2.36. For front ends which do not expect to see multiple sections of the same name, consider emitting `@llvm.compiler.used` instead of `@llvm.used`. SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in binutils. We don't do the restriction - see the rationale in D95749. The integrated assembler has supported SHF_GNU_RETAIN since D95730. GNU as>=2.36 supports section flag 'R'. We don't need to worry about GNU ld support because older GNU ld just ignores the unknown SHF_GNU_RETAIN. With this change, `__attribute__((retain))` functions/variables emitted by clang will get the SHF_GNU_RETAIN flag. Differential Revision: https://reviews.llvm.org/D97448	2021-02-26 16:38:44 -08:00
Jessica Paquette	f5d5a7d7ea	[AArch64][GlobalISel] Import FMOV patterns rather than manually selecting it There are existing patterns for FMOVHi, FMOVSi, and FMOVDi in AArch64InstrFormats.td. Importing these allows us to remove the manual selection code for FMOV. It also allows us to select FMOVHi for non-zero constants when we have full fp-16 support. Refactor some of the code in AArch64InstrFormats.td so that we can create equivalent custom renderers in GlobalISel. Differential Revision: https://reviews.llvm.org/D97511	2021-02-26 16:27:39 -08:00
Matt Arsenault	81b2c23b77	AMDGPU: Use kill instruction to hint soft clause live ranges Previously we would use a bundle to hint the register allocator to not overwrite the pointers in a sequence of loads to avoid breaking soft clauses. This bundling was based on a fuzzy register pressure heuristic, so we could not guarantee using more registers than are really available. This would result in register allocator failing on unsatisfiable bundles. Use a kill to artificially extend the live ranges, so we can always succeed at register allocation even if it means extra spills in the worst case. This seems to capture most of the benefit of the bundle while avoiding most of the risk presented by the bundle. However the lit tests do show a handful of regressions. In some cases with sequences of volatile loads, unused load components end up getting reallocated to the next load which forces a wait between. There are also a few small scheduling regressions where a hazard used to be avoided, and one spill torture test which for some reason nearly doubles the stack usage. There is also a bit of noise from leftover kills (it may make sense for post-RA pseudos to strip all of these out).	2021-02-26 18:26:40 -05:00
Dan Gohman	c62dabc3f5	[WebAssembly] Avoid `bit_cast` when printing f32 and f64 immediates Use `APInt` to convert a 32-bit or 64-bit immediate to an `APFloat` rather than `bit_cast` to a `float` or `double` to avoid going through host floating-point and potentially changing the bit pattern of NaNs. Differential Revision: https://reviews.llvm.org/D97490	2021-02-26 14:19:02 -08:00
Heejin Ahn	d8b3dc5a68	[WebAssembly] Fix remapping branch dests in fixCatchUnwindMismatches This is a case D97178 tried to solve but missed. D97178 could not handle the case when multiple consecutive delegates are generated: - Before: ``` block br (a) try catch end_try end_block <- (a) ``` - After ``` block br (a) try ... try try catch end_try <- (a) delegate delegate end_block <- (b) ``` (The `br` should point to (b) now) D97178 assumed `end_block` exists two BBs later than `end_try`, because it assumed the order as `end_try` BB -> `delegate` BB -> `end_block` BB. But it turned out there can be multiple `delegate`s in between. This patch changes the logic so we just search from `end_try` BB until we find `end_block`. Fixes https://github.com/emscripten-core/emscripten/issues/13515. (More precisely, fixes https://github.com/emscripten-core/emscripten/issues/13515#issuecomment-784711318.) Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97569	2021-02-26 13:38:13 -08:00
Stanislav Mekhanoshin	799c50fe93	[AMDGPU] Avoid second rescheduling for some regions If a region was not constrained by a high register pressure and was not rescheduled without clustering we can skip rescheduling it ClusteredLowOccupancyReschedule stage. This improves scheduling speed by 25% on some kernels. Differential Revision: https://reviews.llvm.org/D97506	2021-02-26 12:29:37 -08:00
Stanislav Mekhanoshin	635993f07b	[AMDGPU] Skip unclusterd rescheduling w/o ld/st We are attempting rescheduling without load store clustering if occupancy limits were not met with clustering. Skip this for regions which do not have any loads or stores at all. In a set of kernels I am experimenting with this improves scheduling time by ~30%. Differential Revision: https://reviews.llvm.org/D97342	2021-02-26 12:29:03 -08:00
Anirudh Prasad	bcc1aba6c4	[SystemZ] Introducing assembler dialects for the Z backend - This patch introduces a different assembler dialect ("hlasm") for z/OS. The default dialect has now been given the "att" dialect name. For this appropriate changes have been added to SystemZ.td. - This patch also makes a few changes to SystemZInstrFormats.td which restrict a few condition code mnemonics to just the "att" dialect variant (he, le, lh, nhe, nle, nlh). These extended condition code mnemonics are not available in HLASM. - A new private function has been introduced in SystemZAsmParser.cpp to return the assembler dialect set in SystemZMCAsmInfo.cpp. The reason we couldn't/haven't explicitly queried the overriden getAssemblerDialect function from AsmParser is outlined in this thread here. This returned dialect is directly passed onto the relevant matcher functions which taken in a variantID, so that the matcher functions can appropriately choose an instruction based on the variant. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D94250	2021-02-26 15:14:38 -05:00
James Y Knight	6de6455752	Use getAlign() on atomicrmw/cmpxchg instructions, now that it's available. These locations were missed as part of adding alignment to the instructions, and were still making their own alignment assumptions.	2021-02-26 15:06:15 -05:00
Craig Topper	b183cbfacd	[RISCV] Call SelectBaseAddr on the base pointer in the custom isel for vector loads and stores. This will allow FrameIndex as the base address instead of emitting a separate ADDI from isel. eliminateFrameIndex will likely turn it back into an ADDI, but this makes things consistent with the SDPatterns and VLPatterns. I only tested one case for simplicity. I can test more if reviewers want. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97221	2021-02-26 11:38:23 -08:00
Jay Foad	dc2259537a	[AMDGPU] Add selection pattern for v_xnor_b32 This allows GlobalISel to use this instruction where available. I assume SelectionDAG always selects s_xnor_b32 so it isn't affected by this change. Differential Revision: https://reviews.llvm.org/D97560	2021-02-26 16:41:47 +00:00
Simon Pilgrim	ed1f45bce9	[X86][AVX] SimplifyDemandedBitsForTargetNode - add basic X86ISD::VBROADCAST handling. Simplify through to the scalar/vector source operand.	2021-02-26 16:13:14 +00:00
Jay Foad	3ad5216ed8	[AMDGPU] Better codegen for i64 bitreverse Differential Revision: https://reviews.llvm.org/D97547	2021-02-26 15:51:36 +00:00
Wang, Pengfei	ad9091c5fa	[X86] Allow PTILEZEROV and PTILELOADDV to be rematerializable Spilling and reloading AMX registers are expensive. We allow PTILEZEROV and PTILELOADDV to be rematerializable to avoid the register spilling. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D97453	2021-02-26 21:55:59 +08:00
Simon Pilgrim	7ac4c956af	[X86] Remove unnecessary custom lowering of vXi1 SADDSAT/SSUBSAT/UADDSAT/USUBSAT As discussed on D97478. The removal of the custom tag causes some changes in the add/sub-overflow expansion as it no longer expands to sat-arith codegen.	2021-02-26 12:10:23 +00:00
Simon Pilgrim	aefe8f2f6c	[DAG] Fold vXi1 multiplies -> and This allows us to remove X86 custom lowering of vXi1 MUL, which helps simplify a load of mask math. Mentioned in D97478 post review.	2021-02-26 11:46:12 +00:00
Simon Pilgrim	40b8b4a466	[X86] Remove unnecessary custom lowering of v16i1/v32i1 ADD/SUB These were missed in D97478	2021-02-26 11:46:11 +00:00
Fraser Cormack	37014db013	[RISCV] Use existing method for the LMUL1 type. NFCI. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97467	2021-02-26 09:44:05 +00:00
Bill Wendling	a9f9ceb35f	[X86] Use correct padding when in 16-bit mode In 16-bit mode, some of the nop patterns used in 32-bit mode can end up mangling other instructions. For instance, an aligned "movz" instruction may have the 0x66 and 0x67 prefixes omitted, because the nop that's used messes things up. xorl %ebx, %ebx .p2align 4, 0x90 movzbl (%esi,%ebx), %ecx Use instead nop patterns we know 16-bit mode can handle. Differential Revision: https://reviews.llvm.org/D97268	2021-02-25 20:05:45 -08:00
Craig Topper	d7fca3f0bf	[RISCV] Support fixed vector extract_element for FP types.	2021-02-25 16:30:28 -08:00
Yonghong Song	6d102f15a3	BPF: Add LLVMTransformUtils in CMakefile LINK_COMPONENTS Commit `1959ead525` ("BPF: Implement TTI.getCmpSelInstrCost() properly") introduced a dependency on LLVMTransformUtils library. Let us encode this dependency explicitly in CMakefile to avoid build error.	2021-02-25 15:43:25 -08:00
Yonghong Song	1959ead525	BPF: Implement TTI.getCmpSelInstrCost() properly The Select insn in BPF is expensive as BPF backend needs to resolve with conditionals. This patch set the getCmpSelInstrCost() to SCEVCheapExpansionBudget for Select insn to prevent some Select insn related optimizations. This change is motivated during bcc code review for https://github.com/iovisor/bcc/pull/3270 where IndVarSimplifyPass eventually caused generating the following asm code: ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < num); i++) { 14: 16 05 40 00 00 00 00 00 if w5 == 0 goto +64 <LBB0_6> 15: bc 51 00 00 00 00 00 00 w1 = w5 16: 04 01 00 00 ff ff ff ff w1 += -1 17: 67 05 00 00 20 00 00 00 r5 <<= 32 18: 77 05 00 00 20 00 00 00 r5 >>= 32 19: a6 01 01 00 05 00 00 00 if w1 < 5 goto +1 <LBB0_4> 20: b7 05 00 00 06 00 00 00 r5 = 6 00000000000000a8 <LBB0_4>: 21: b7 02 00 00 00 00 00 00 r2 = 0 22: b7 01 00 00 00 00 00 00 r1 = 0 ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < num); i++) { 23: 7b 1a e0 ff 00 00 00 00 (u64 )(r10 - 32) = r1 24: 7b 5a c0 ff 00 00 00 00 (u64 )(r10 - 64) = r5 Note that insn #15 has w1 = w5 and w1 is refined later but r5(w5) is eventually saved on stack at insn #24 for later use. This cause later verifier failures. With this change, IndVarSimplifyPass won't do the above transformation any more. Differential Revision: https://reviews.llvm.org/D97479	2021-02-25 14:48:53 -08:00
Craig Topper	ceaedfb5fc	[X86] Remove custom lowering of vXi1 ADD/SUB now that they are canonicalized to XOR in getNode. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97478	2021-02-25 08:52:41 -08:00
Craig Topper	95c6824995	[RISCV] Teach CleanupVSETVLI to remove 'vsetvli zero, zero, vtype' when the vtype matches the previous vsetvli or vsetivli Reviewed By: frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D97408	2021-02-25 07:51:19 -08:00
Craig Topper	25c6b7ddd2	[RISCV] Add isel pattern to match X > -1 to bgez. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D97262	2021-02-25 07:42:22 -08:00
Fraser Cormack	0ad86f879f	[RISCV] Update RVV ISA section-header comments. NFC. Some of the section headers had become stale with the transition from RVV specification version 0.9 to 0.10. This patch brings them up to date.	2021-02-25 14:15:28 +00:00
Fraser Cormack	02f435db0b	[RISCV] Support fixed-length vector i2fp/fp2i conversions This patch extends the support for scalable-vector int->fp and fp->int conversions by additionally handling fixed-length vectors. The existing scalable-vector lowering re-expresses widening/narrowing by x4+ conversions as standard nodes. The fixed-length vector support slots in at "the end" of this process by lowering the now equally-sized and widening/narrowing by x2 nodes to our custom VL versions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97374	2021-02-25 13:47:58 +00:00
Fraser Cormack	9620ce90d7	[RISCV] Support fixed-length vector FP_ROUND & FP_EXTEND This patch extends the support for vector FP_ROUND and FP_EXTEND by including support for fixed-length vector types. Since fixed-length vectors use "VL" nodes and scalable vectors can use the standard nodes, there is slightly more to do in the fixed-length case. A helper function was introduced to try and reduce the divergent paths. It is expected that this function will similarly come in useful for lowering the int-to-fp and fp-to-int operations for fixed-length vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97301	2021-02-25 12:16:06 +00:00
Fraser Cormack	84413e1947	[RISCV] Support fixed-length vector truncates This patch extends support for our custom-lowering of scalable-vector truncates to include those of fixed-length vectors. It does this by co-opting the custom RISCVISD::TRUNCATE_VECTOR node and adding mask and VL operands. This avoids unnecessary duplication of patterns and inflation of the ISel table. Some truncates go through CONCAT_VECTORS which currently isn't efficiently handled, as it goes through the stack. This can be improved upon in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97202	2021-02-25 12:11:34 +00:00
Fraser Cormack	3bc5ed3875	[RISCV] Support fixed-length vector sign/zero extension This patch adds support for the custom lowering sign- and zero-extension of fixed-length vector types. It does so through custom nodes. Since the source and destination types are (necessarily) of different sizes, it is possible that the source type is legal whilst the larger destination type isn't. In this case the legalization makes heavy use of EXTRACT_SUBVECTOR. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97194	2021-02-25 12:05:17 +00:00
Fraser Cormack	821f8bb29a	[RISCV] Unify scalable- and fixed-vector EXTRACT_SUBVECTOR lowering This patch unifies the two disparate paths for lowering EXTRACT_SUBVECTOR operations under one roof. Consequently, with this patch it is possible to support any fixed-length subvector extraction, not just "cast-like" ones. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97192	2021-02-25 11:46:57 +00:00
Simon Pilgrim	8b82669d56	[X86][SSE] Move unaryshuffle(xor(x,-1)) -> xor(unaryshuffle(x),-1) fold into helper. NFCI. We should be able to extend this "canonicalizeShuffleWithBinOps" to handle more generic binop cases where either/both operands can be cheaply shuffled.	2021-02-25 10:56:23 +00:00
Tim Northover	201ada80ee	AArch64: relax address-space assertion in FastISel. Some people are using alternative address spaces to track GC data, but otherwise they behave exactly the same. This is the only place in the backend we even try to care about it so it's really not achieving anything.	2021-02-25 10:15:55 +00:00
Stelios Ioannou	30cb9c03b5	[AArch64] Add abs intrinsic costs This patch adds cost-modelling for abs vector intrinsic. Change-Id: I89007971bfb15f5b4a02a2eadfd43018e9a73976	2021-02-25 09:31:52 +00:00
Craig Topper	159f78fc2f	[RISCV] Reuse existing SDLoc and XLenVT in the switch in RISCVISelDAGToDAG::Select. NFC A SDLoc and XLenVT were already created above the switch.	2021-02-24 21:39:00 -08:00
Liu, Chen3	4bc7c8631a	[X86] Support amx-bf16 intrinsic. Adding support for intrinsics of AMX-BF16. This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong predicate. Differential Revision: https://reviews.llvm.org/D97358	2021-02-25 09:06:48 +08:00
Craig Topper	efcdd598b7	[RISCV] Teach VSETVLI inserter to use VSETIVLI when possible. We always create the VL operand using a register, but if we can determine that it came from an ADDI X0, imm with a sufficiently small immediate, we can use VSETIVLI. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97332	2021-02-24 16:07:33 -08:00
Craig Topper	9bde29629d	[RISCV] Use a ComplexPattern for zexti32 to match sexti32. We just started using a ComplexPattern for sexti32. This updates zexti32 to match. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D97231	2021-02-24 16:06:29 -08:00
Stefan Agner	a921aaf789	[MC][ARM] make Thumb function also if type attribute is set Make sure to set the bottom bit of the symbol even when the type attribute of a label is set after the label. GNU as sets the thumb state according to the thumb state of the label. If a .type directive is placed after the label, set the symbol's thumb state according to the thumb state of the .type directive. This matches GNU as in most cases. From: Stefan Agner <stefan@agner.ch> This fixes: https://bugs.llvm.org/show_bug.cgi?id=44860 https://github.com/ClangBuiltLinux/linux/issues/866 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D74927	2021-02-24 14:08:56 -08:00
Michael Liao	0d4e12e3c1	[amdgpu] Atomic should be source of divergence. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D97392	2021-02-24 15:27:47 -05:00
Matt Arsenault	589223e044	AMDGPU: Remove special case in shouldCoalesce Unaligned registers are now constrained with classes, rather than specially reserving a subset of the whole class.	2021-02-24 14:49:44 -05:00
Matt Arsenault	78b6d73a93	AMDGPU: Add even aligned VGPR/AGPR register classes gfx90a operations require even aligned registers, but this was previously achieved by reserving registers inside the full class. Ideally this would be captured in the static instruction definitions for the operands, and we would have different instructions per subtarget. The hackiest part of this is we need to manually reassign AGPR register classes after instruction selection (we get away without this for VGPRs since those types are actually registered for legal types).	2021-02-24 14:49:37 -05:00
Jessica Paquette	e339bba637	[AArch64][GlobalISel] Fix manual selection for v4s16 and v8s8 G_DUP The manual G_DUP selection code would produce DUPv16i8 for v8s8s and DUPv8i16 for v4s16. This adds the missing cases to the manual selection code, and makes it return false when there is an unexpected size. Update select-dup.mir to reflect the change. Differential Revision: https://reviews.llvm.org/D97240	2021-02-24 10:23:06 -08:00
Craig Topper	086670d367	[RISCV] Support fixed vector extract element. Use VL=1 for scalable vector extract element. I've changed to use VL=1 for slidedown and shifts to avoid extra element processing that we don't need. The i64 fixed vector handling on i32 isn't great if the vector type isn't legal due to an ordering issue in type legalization. If the vector type isn't legal, we fall back to default legalization which will bitcast the vector to vXi32 and use two independent extracts. Doing better will require handling several different cases by manually inserting insert_subvector/extract_subvector to adjust the type to a legal vector before emitting custom nodes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97319	2021-02-24 10:17:00 -08:00
Nick Desaulniers	404843a94d	[MC][ARM] add .w suffixes for BL (T1) and DBG F1.2 Standard assembler syntax fields describes .w and .n suffixes for wide and narrow encodings. arch/arm/probes/kprobes/test-thumb.c tests installing kprobes for certain instructions using inline asm. There's a few instructions we fail to assemble due to missing .w t2InstAliases. Adds .w suffixes for: * bl (F5.1.25 BL, BLX (immediate) T1) * dbg (F5.1.42 DBG T1) Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D97236	2021-02-24 09:58:08 -08:00
Amara Emerson	0146d20631	[AArch64] Do not fold SP adjustments into pre-increment addr modes if it overflows the redzone. Instead of outright disabling this completely with the noredzone attribute, we only avoid doing the optimization if there are memory operations between the adjustment and the load/store that the adjustment would be folded into. This avoids the case of something like a stack cookie being corrupted if an exception happens before the pre-increment to the SP occurs. This also prevents the folding happening if we have a redzone, but the offset being folded is above the redzone amount (128 bytes in this case). rdar://73269336 Differential Revision: https://reviews.llvm.org/D95179	2021-02-24 09:55:48 -08:00
Jay Foad	aab709f090	[AMDGPU] Add more PAL metadata register names Add all the registers that are currently used by LLPC: https://github.com/GPUOpen-Drivers/llpc This only affects disassembly of PAL metadata generated by LLPC and similar frontends. Differential Revision: https://reviews.llvm.org/D95619	2021-02-24 13:37:05 +00:00
Jay Foad	67f0620831	[AMDGPU] Update s_sendmsg messages Update the list of s_sendmsg messages known to the assembler and disassembler and validate the ones that were added or removed in gfx9 and gfx10. Differential Revision: https://reviews.llvm.org/D97295	2021-02-24 13:07:00 +00:00
Florian Hahn	5c74c6be3c	[AArch64] Use CMTST for != 0 vector compares (vnot (CMEQz A)). (CMTST A, A) will only set elements to 0 if the element is 0 in A. Use it for != 0 compares, which currently use (vnot (CMEQz A)). This saves a mvn instruction. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D97303	2021-02-24 09:39:27 +00:00
David Green	03892a27d6	[ARM] Expand the range of allowed post-incs in load/store optimizer Currently the load/store optimizer will only fold in increments of the same size as the load/store. This patch expands that to any legal immediate for the post-inc instruction. This is a recommit of `3b34b06fc5` with correctness fixes and extra tests. Differential Revision: https://reviews.llvm.org/D95885	2021-02-24 08:46:15 +00:00
Amara Emerson	eb55203e00	[AArch64][GlobalISel][PostSelectOpt] Constrain reg operands after mutating instructions. The non-flag setting variants of instructions may have different regclass requirements. If so, we need to constrain them. Differential Revision: https://reviews.llvm.org/D97343	2021-02-23 19:32:18 -08:00
Jessica Paquette	daf7d7f0dc	[AArch64][GlobalISel] Correct function evaluation order in applyINS The order in which the nested calls to Builder.buildWhatever are evaluated in differs between GCC and Clang. This caused a bot failure because the MIR in the testcase was coming out in a different order than expected. Rather than using nested calls, pull them out in order to fix the order of evaluation.	2021-02-23 16:21:11 -08:00
Heejin Ahn	ea8c6375e3	[WebAssembly] Fix incorrect grouping and sorting of exceptions This CL is not big but contains changes that span multiple analyses and passes. This description is very long because it tries to explain basics on what each pass/analysis does and why we need this change on top of that. Please feel free to skip parts that are not necessary for your understanding. --- `WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next unwind destination>. The value (unwind dest) here is where an exception should end up when it is not caught by the key (EH pad). We record this info in WasmEHPrepare to fix catch mismatches, because the CFG itself does not have this info. A CFG only contains BBs and predecessor-successor relationship between them, but in `WasmEHFuncInfo` the unwind destination BB is not necessarily a successor or the key EH pad BB. Their relationship can be intuitively explained by this C++ code snippet: ``` try { try { foo(); } catch (int) { // EH pad ... } } catch (...) { // unwind destination } ``` So when `foo()` throws, it goes to `catch (int)` first. But if it is not caught by it, it ends up in the next unwind destination `catch (...)`. This unwind destination is what you see in `catchswitch`'s `unwind label %bb` part. --- `WebAssemblyExceptionInfo` groups exceptions so that they can be sorted continuously together in CFGSort, as we do for loops. What this analysis does is very simple: it creates a single `WebAssemblyException` per EH pad, and all BBs that are dominated by that EH pad are included in this exception. We also identify subexception relationship in this way: if EHPad A domiantes EHPad B, EHPad B's exception is a subexception of EHPad A's exception. This simple rule turns out to be incorrect in some cases. In `WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means semantically EHPad B should not be included in EHPad A's exception, because it does not make sense to rethrow/delegate to an inner scope. This is what happened in CFGStackify as a result of this: ``` try try catch ... <- %dest_bb is among here! end delegate %dest_bb ``` So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to make sure excptions' unwind destinations are not subexceptions of their unwind sources in `WasmEHFuncInfo`. But this alone does not prevent `dest_bb` in the example above from being sorted within the inner `catch`'s exception, even if its exception is not a subexception of that `catch`'s exception anymore, because of how CFGSort works, which will be explained below. --- CFGSort places BBs within the same `SortRegion` (loop or exception) continuously together so they can be demarcated with `loop`-`end_loop` or `catch`-`end_try` in CFGStackify. `SortRegion` is a wrapper for one of `MachineLoop` or `WebAssemblyException`. `SortRegionInfo` already does some complicated things because there discrepancies between those two data structures. `WebAssemblyException` is what we control, and it is defined as an EH pad as its header and BBs dominated by the header as its BBs (with a newly added exception of unwind destinations explained in the previous paragraph). But `MachineLoop` is an LLVM data structure and uses the standard loop detection algorithm. So by the algorithm, BBs that are 1. dominated by the loop header and 2. have a path back to its header. Because of the second condition, many BBs that are dominated by the loop header are not included in the loop. So BBs that contain `return` or branches to outside of the loop are not technically included in `MachineLoop`, but they can be sorted together with the loop with no problem. Maybe to relax the condition, in CFGSort, when we are in a `SortRegion` we allow sorting of not only BBs that belong to the current innermost region but also BBs that are by the current region header. (This was written this way from the first version written by Dan, when only loops existed.) But now, we have cases in exceptions when EHPad B is the unwind destination for EHPad A, even if EHPad B is dominated by EHPad A it should not be included in EHPad A's exception, and should not be sorted within EHPad A. One way to make things work, at least correctly, is change `dominates` condition to `contains` condition for `SortRegion` when sorting BBs, but this will change compilation results for existing non-EH code and I can't be sure it will not degrade performance or code size. I think it will degrade performance because it will force many BBs dominated by a loop, which don't have the path back to the header, to be placed after the loop and it will likely to create more branches and blocks. So this does a little hacky check when adding BBs to `Preferred` list: (`Preferred` list is a ready list. CFGSort maintains ready list in two priority queues: `Preferred` and `Ready`. I'm not very sure why, but it was written that way from the beginning. BBs are first added to `Preferred` list and then some of them are pushed to `Ready` list, so here we only need to guard condition for `Preferred` list.) When adding a BB to `Preferred` list, we check if that BB is an unwind destination of another BB. To do this, this adds the reverse mapping, `UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB is an unwind destination, it checks if the current stack of regions (`Entries`) contains its source BB by traversing the stack backwards. If we find its unwind source in there, we add the BB to its `Deferred` list, to make sure that unwind destination BB is added to `Preferred` list only after that region with the unwind source BB is sorted and popped from the stack. --- This does not contain a new test that crashes because of this bug, but this fix changes the result for one of existing test case. This test case didn't crash because it fortunately didn't contain `delegate` to the incorrectly placed unwind destination BB. Fixes https://github.com/emscripten-core/emscripten/issues/13514. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97247	2021-02-23 14:54:55 -08:00
David Green	f51b3de4e8	[AArch64] Introduce UDOT/SDOT DAG nodes This is used to lower UDOT/SDOT instructions, as opposed to relying on the intrinsic. Subsequent optimizations will be able to optimize them more cleanly based on these nodes.	2021-02-23 20:31:01 +00:00
Jessica Paquette	ef1f7f1d7d	Recommit "[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt" Attempted fix for the added test failing. https://lab.llvm.org/buildbot/#/builders/104/builds/2355/steps/5/logs/stdio I can't reproduce the failure anywhere, so I'm going to guess that passing a std::function as MatchInfo is sketchy in this context. Switch it to a std::tuple and hope for the best.	2021-02-23 11:55:16 -08:00
Amara Emerson	939b5ce734	[AArch64][GlobalISel] Lower G_USUBSAT and G_UADDSAT for scalars. We have some missing optimization counterparts to LowerXALUO, but it's a start.	2021-02-23 11:54:52 -08:00
Stanislav Mekhanoshin	d1b92c91af	[AMDGPU] Set threshold for regbanks reassign pass This is to limit compile time. I did experiments with some inputs and found that compile time keeps reasonable for this pass if we have less than 100000 virtual registers and then starts to explode somewhere between 100000 and 150000. Differential Revision: https://reviews.llvm.org/D97218	2021-02-23 10:22:31 -08:00
Nick Desaulniers	1e204ac789	[THUMB2] add .w suffixes for ldr/str (immediate) T4 The Linux kernel when built with CONFIG_THUMB2_KERNEL makes use of these instructions with immediate operands and wide encodings. These are the T4 variants of the follow sections from the Arm ARM. F5.1.72 LDR (immediate) F5.1.229 STR (immediate) I wasn't able to represent these simple aliases using t2InstAlias due to the Constraints on the non-suffixed existing instructions, which results in some manual parsing logic needing to be added. F1.2 Standard assembler syntax fields describes the use of the .w (wide) vs .n (narrow) encoding suffix. Link: https://bugs.llvm.org/show_bug.cgi?id=49118 Link: https://github.com/ClangBuiltLinux/linux/issues/1296 Reported-by: Stefan Agner <stefan@agner.ch> Reported-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96632	2021-02-23 09:25:40 -08:00
Nicolai Hähnle	52bc2e7577	[AMDGPU][SelectionDAG] Don't combine uniform multiplies to MUL_[UI]24 Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when possible. This significantly improves some game cases by eliminating v_readfirstlane instructions when the result feeds into a scalar operation, like the address calculation for a scalar load or store. Since isDivergent is only an approximation of whether a value is in SGPRs, it can potentially regress some situations where a uniform value ends up in a VGPR. These should be rare in real code, although the test changes do contain a number of examples. Most of the test changes are just using s_mul instead of v_mul/mad which is generally better for both register pressure and latency (at least on GFX10 where sgpr pressure doesn't affect occupancy and vector ALU instructions have significantly longer latency than scalar ALU). Some R600 tests now use MULLO_INT instead of MUL_UINT24. GlobalISel appears to handle more scenarios in the desirable way, although it can also be thrown off and fails to select the 24-bit multiplies in some cases. Alternative solution considered and rejected was to allow selecting MUL_[UI]24 to S_MUL_I32. I've rejected this because the definition of those SD operations works is don't-care on the most significant 8 bits, and this fact is used in some combines via SimplifyDemandedBits. Based on a patch by Nicolai Hähnle. Differential Revision: https://reviews.llvm.org/D97063	2021-02-23 15:39:19 +00:00
Sjoerd Meijer	e1c3bf6afe	[ARM] do not consider sp as deprecated for ldm/stm Early versions of the ARMv7 reference manuals considered the sp register as a deprecated register for ldm/stm familiy of instructions. However, later versions such as ARM DDI 0406C.d added a note to the Appendix: D9.3 Use of the SP as a general-purpose register Most ARM instructions, unlike Thumb instructions, provide exactly the same access to the SP as to R0-R12. This means that it is possible to use the SP as a general-purpose register. Earlier issues of this manual deprecated the use of SP in an ARM instruction, in any way that is deprecated, not permitted, or not possible in the corresponding Thumb instruction. However, user feedback indicates a number of cases where these instructions are useful. Therefore, ARM no longer deprecates these instruction uses. Also Armv8 manuals no longer consider SP as deprecated register for ldm/ stm A32 instructions. Furthermore, GNU as also does not print a deprecated warning when using SP with those instructions. Drop deprecation warning for pop/ldm/push/stm instructions. Patch by: Stefan Agner. Differential Revision: https://reviews.llvm.org/D82692	2021-02-23 13:26:18 +00:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Hsiangkai Wang	53c4c2b9f7	[RISCV] vle1.v/vse1.v should be unmasked instructions. vle1.v/vse1.v should be unmasked instructions. The vm encoding is 1 for unmasked instructions. Differential Revision: https://reviews.llvm.org/D97237	2021-02-23 19:59:22 +08:00
Andy Wingo	7dc98adbb0	Revert "[WebAssembly] call_indirect issues table number relocs" This reverts commit `861dbe1a02`. It broke emscripten -- see https://reviews.llvm.org/D90948#2578843.	2021-02-23 11:48:08 +01:00
Fraser Cormack	dd68f3cf28	[RISCV] Support insertion of misaligned subvectors This patch extends the support for RVV INSERT_SUBVECTOR to cover those which don't align to a vector register boundary. Like the support for EXTRACT_SUBVECTOR in D96959, it accomplishes this by extracting the nearest register-sized subvector (a subregister operation), then sliding the vector down with VSLIDEDOWN, inserting the subvector to the first position, and sliding the vector back up again afterwards. Unlike subvector extraction, for vectors that occupy less than a full vector register we must preserve the untouched elements. We do this by lowering to an LMUL=1 INSERT_SUBVECTOR using the above method and lowering that to a VSLIDEUP with a zero offset. This uses a tail-undisturbed policy and so has the effect of "sliding in" the subvector elements while preserving the surrounding ones. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96972	2021-02-23 10:31:06 +00:00
Liu, Chen3	f8b9035aae	[X86] Support amx-int8 intrinsic. Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD. Differential Revision: https://reviews.llvm.org/D97259	2021-02-23 17:08:05 +08:00
Kazu Hirata	4ed47858ab	[llvm] Use llvm::drop_begin (NFC)	2021-02-22 20:17:16 -08:00
Jessica Paquette	662402a8b3	Revert "[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt" This reverts commit `867e379c0e`. For some reason this is upsetting Linux/Windows bots. Reverting while I try to reproduce.	2021-02-22 17:36:17 -08:00
Cassie Jones	8b10aa67ad	[AArch64][GlobalISel] Make overflow legalization use clampScalar Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96674	2021-02-22 19:59:36 -05:00
Luo, Yuanke	8f48ddd193	[X86][AMX] Lower tile copy instruction. Since there is no tile copy instruction, we need to store tile register to stack and load from stack to another tile register. We need extra GR to hold the stride, and we need stack slot to hold the tile data register. We would run this pass after copy propagation, so that we don't miss copy optimization. And we would run this pass before prolog/epilog insertion, so that we can allocate stack slot. Differential Revision: https://reviews.llvm.org/D97112	2021-02-23 07:49:42 +08:00
Stanislav Mekhanoshin	bb16efe280	[AMDGPU] Move RPT::getLiveRegs() check under EXPENSIVE_CHECKS This is too expensive even for debug builds. It doubles scheduling time if enabled. Differential Revision: https://reviews.llvm.org/D97232	2021-02-22 15:21:59 -08:00
Craig Topper	3231607ce9	[RISCV] Have sexti32 also recognize AssertZExt from types smaller than i32. An i64 AssertZExt from a type smaller than i32 has at least 33 leading zeros which mean it has at least 33 sign bits. Since we have a couple patterns that use two sexti32, I've switched to a ComplexPattern so tablegen didn't have to generate 9 different permutations. As noted in the FIXME, maybe we should just call computeNumSignBits, but we don't have tests that benefit from that yet. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D97130	2021-02-22 14:56:22 -08:00
Jessica Paquette	867e379c0e	[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt Match a G_SHUFFLE_VECTOR with a mask that allows it to be represented as a G_INSERT_VECTOR_ELT and a G_EXTRACT_VECTOR_ELT. This ports `isINSMask` from AArch64ISelLowering and the portion of `AArch64TargetLowering::LowerVECTOR_SHUFFLE` which handles the equivalent transformation. This provides more opportunities for matching DUP. We don't have all of the necessary combines to actually make DUP out of these yet, but this is better for size than the full TBL expansion for G_SHUFFLE_VECTOR. This is a -0.1% code size improvement on CTMark/Bullet at -Os. IR example: https://godbolt.org/z/sdcevT Differential Revision: https://reviews.llvm.org/D97214	2021-02-22 14:44:09 -08:00
Heejin Ahn	f47a654a39	[WebAssembly] Remap branch dests after fixCatchUnwindMismatches Fixing catch unwind mismatches can sometimes invalidate existing branch destinations. This CL remaps those destinations after placing try-delegates. Fixes https://github.com/emscripten-core/emscripten/issues/13515. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97178	2021-02-22 13:25:58 -08:00
Heejin Ahn	51fb5bf4d6	[WebAssembly] Support WasmEHFuncInfo serialization This adds support for serialization of `WasmEHFuncInfo`, in the form of <Source BB Number, Unwind destination BB number>. To make YAML mapping work, we needed to make a copy of the existing `SrcToUnwindDest` map within `yaml::WebAssemblyMachineFunctionInfo`. It was hard to add EH MIR tests for CFGStackify because `WasmEHFuncInfo` could not be read from test MIR files. This adds the serialization support for that to make EH MIR tests easier. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97174	2021-02-22 13:13:51 -08:00
Heejin Ahn	a08e609d2e	[WebAssembly] Rename methods in WasmEHFuncInfo (NFC) This renames variable and method names in `WasmEHFuncInfo` class to be simpler and clearer. For example, unwind destinations are EH pads by definition so it doesn't necessarily need to be included in every method name. Also I am planning to add the reverse mapping in a later CL, something like `UnwindDestToSrc`, so this renaming will make meanings clearer. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97173	2021-02-22 12:16:11 -08:00
Craig Topper	1cd2a5a7da	[RISCV] Add isel support for bitcasts between fixed vector types. This should fix the issue reported in D96972. I don't have a good test case for this without those changes. Differential Revision: https://reviews.llvm.org/D97082	2021-02-22 12:05:46 -08:00
Jessica Paquette	95d13c01ec	[AArch64][GlobalISel] Emit G_ASSERT_SEXT for SExt parameters in CallLowering Similar to how we emit G_ASSERT_ZEXT when we have CCValAssign::LocInfo::ZExt. This will allow us to combine away some redundant sign extends. Example: https://godbolt.org/z/cTbKvr Differential Revision: https://reviews.llvm.org/D96915	2021-02-22 10:14:43 -08:00
Craig Topper	1aeb927fed	[RISCV] Custom isel the rest of the vector load/store intrinsics. A previous patch moved the index versions. This moves the rest. I also removed the custom lowering for VLEFF since we can now do everything directly in the isel handling. I had to update getLMUL to handle mask registers to index the pseudo table correctly for VLE1/VSE1. This is good for another 15K reduction in llc size. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97097	2021-02-22 09:53:46 -08:00
Ryan Santhiraraja	2c25efcbd3	[AArch64] Adding SHA3 Intrinsics support This patch adds the following SHA3 Intrinsics: vsha512hq_u64, vsha512h2q_u64, vsha512su0q_u64, vsha512su1q_u64 veor3q_u8 veor3q_u16 veor3q_u32 veor3q_u64 veor3q_s8 veor3q_s16 veor3q_s32 veor3q_s64 vrax1q_u64 vxarq_u64 vbcaxq_u8 vbcaxq_u16 vbcaxq_u32 vbcaxq_u64 vbcaxq_s8 vbcaxq_s16 vbcaxq_s32 vbcaxq_s64 Note need to include +sha3 and +crypto when building from the front-end Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96381	2021-02-22 12:09:20 +00:00
Dmitry Preobrazhensky	4813518092	[AMDGPU][MC] Corrected bound_ctrl for compatibility with sp3 Enabled "bound_ctrl:1" and disabled "bound_ctrl:-1" syntax. Corrected printer to output "bound_ctrl:1" instead of "bound_ctrl:0". See bug 35397 for detailed issue description. Differential Revision: https://reviews.llvm.org/D97048	2021-02-22 14:59:40 +03:00
David Green	188f15d973	[ARM] Remove dead lowering code. NFC Remove the unnecessary code from `21a4faab60`, left over from a different way of lowering.	2021-02-22 10:07:53 +00:00
David Green	21a4faab60	[ARM] Move double vector insert patterns using vins to DAG combine This removes the existing patterns for inserting two lanes into an f16/i16 vector register using VINS, instead using a DAG combine to pattern match the same code sequences. The tablegen patterns were already on the large side (foreach LANE = [0, 2, 4, 6]) and were not handling all the cases they could. Moving that to a DAG combine, whilst not less code, allows us to better control and expand the selection of VINSs. Additionally this allows us to remove the AddedComplexity on VCVTT. The extra trick that this has learned in the process is to move two adjacent lanes using a single f32 vmov, allowing some extra inefficiencies to be removed. Differenial Revision: https://reviews.llvm.org/D96876	2021-02-22 09:29:47 +00:00
Andy Wingo	861dbe1a02	[WebAssembly] call_indirect issues table number relocs If the reference-types feature is enabled, call_indirect will explicitly reference its corresponding function table via `TABLE_NUMBER` relocations against a table symbol. Also, as before, address-taken functions can also cause the function table to be created, only with reference-types they additionally cause a symbol table entry to be emitted. We abuse the used-in-reloc flag on symbols to indicate which tables should end up in the symbol table. We do this because unfortunately older wasm-ld will carp if it see a table symbol. Differential Revision: https://reviews.llvm.org/D90948	2021-02-22 10:13:36 +01:00
Amara Emerson	6ff09ce061	[AArch64][GlobalISel] Fix <16 x s8> G_DUP regbankselect to assign source to gpr. We can only select this type if the source is on GPR, not FPR.	2021-02-21 21:17:29 -08:00
Simon Pilgrim	b568d3d6c9	[X86] Add vector support to sub(C1, xor(X, C2)) -> add(xor(X, ~C2), C1+1) fold.	2021-02-21 21:51:27 +00:00
Simon Pilgrim	3ab32c94a4	[X86] Replace explicit constant handling in sub(C1, xor(X, C2)) -> add(xor(X, ~C2), C1+1) fold. NFCI. NFC cleanup before adding vector support - rely on the SelectionDAG to handle everything for us.	2021-02-21 21:40:32 +00:00
Craig Topper	1a6c1ac686	[SelectionDAG][RISCV] Teach ComputeNumSignBits to handle SREM. This also removes a pattern from RISCV that is no longer needed since the sexti32 on the LHS of the srem in the pattern implies the result is sign extended so the sign_extend_inreg should be removed in DAG combine now. Reviewed By: luismarques, RKSimon Differential Revision: https://reviews.llvm.org/D97133	2021-02-21 11:13:36 -08:00
Simon Pilgrim	bae04a3e2d	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - remove unnecessary BITCASTs. In conjunction with the 'vperm2x128(bitcast(x),bitcast(y),c) -> bitcast(vperm2x128(x,y,c))' fold in combineTargetShuffle, this should remove any unnecessary bitcasts around vperm2x128 lane shuffles.	2021-02-21 18:40:32 +00:00
Simon Pilgrim	a6a258f1da	[X86][AVX] Fold concat(extract_subvector(v0,c0), extract_subvector(v1,c1)) -> vperm2x128 Fixes regression exposed by removing bitcasts across logic-ops in D96206. Differential Revision: https://reviews.llvm.org/D96206	2021-02-21 14:50:43 +00:00
Simon Pilgrim	2885d1251f	[X86] Fold bitcast(logic(bitcast(X), Y)) --> logic'(X, bitcast(Y)) for int-int bitcasts Extend the existing combine that handles bitcasting for fp-logic ops to also help remove logic ops across bitcasts to/from the same integer types. This helps improve AVX512 predicate handling for D/Q logic ops and also allows DAGCombine's scalarizeExtractedBinop to remove some annoying gpr->simd->gpr transfers. The concat_vectors regression in pr40891.ll will be addressed in a followup commit on this patch. Differential Revision: https://reviews.llvm.org/D96206	2021-02-21 14:40:54 +00:00
Fraser Cormack	3e1317fd32	[RISCV] Support extraction of misaligned subvectors This patch extends the support for RVV EXTRACT_SUBVECTOR to cover those which don't align to a vector register boundary. It accomplishes this by extracting the nearest register-sized subvector (a subregister operation), then sliding the vector down with VSLIDEDOWN and extracting the subvector from the first position (a COPY operation). Since this procedure involves the use of VSCALE and multiplication, the handling of such operations is done during lowering to simplify the implementation and make use of DAG combining. This necessitated moving some helper functions from RISCVISelDAGToDAG to RISCVTargetLowering. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96959	2021-02-20 15:43:54 +00:00
Fraser Cormack	9aa20caee6	[RISCV] Improve register allocation around vector masks With vector mask registers only allocatable to V0 (VMV0Regs) it is relatively simple to generate code which uses multiple masks and naively requires spilling. This patch aims to improve codegen in such cases by telling LLVM it can use VRRegs to hold masks. This will prevent spilling in many cases by having LLVM copy to an available VR register. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97055	2021-02-20 14:47:51 +00:00
Simon Pilgrim	761bbed264	[DAG] foldSubToUSubSat - fold sub(a,trunc(umin(zext(a),b))) -> usubsat(a,trunc(umin(b,SatLimit))) This moves the last custom x86 USUBSAT fold to generic DAGCombine. Completes PR40111 Differential Revision: https://reviews.llvm.org/D96703	2021-02-20 12:02:07 +00:00
Juneyoung Lee	e4d751c271	Update BPFAdjustOpt.cpp to accept select form of or as well This is a minor pattern-match update to BPFAdjustOpt.cpp to accept not only 'or i1 a, b' but also 'select i1 a, i1 true, i1 b'. This resolves regression after SimplifyCFG's creating select form of and/or instead (https://reviews.llvm.org/D95026). This is a small change, and currently such select form isn't created or doesn't reach to the late pipeline (because InstCombine eagerly folds it into and/or i1), so I chose to commit without a review process.	2021-02-20 18:29:58 +09:00
Amara Emerson	067ec53df1	[AArch64][GlobalISel] Add selection support for G_VECREDUCE of <2 x i32> This selects to a pairwise add and a subreg copy.	2021-02-20 00:39:38 -08:00
Craig Topper	71b68fe532	[RISCV] Teach our custom vector load/store intrinsic isel code to propagate memory operands if we have them. We don't currently create memory operands for these intrinsics, but there was a suggestion of using the indexed load/store intrinsics to implement isel for scalable vector gather/scatter. That may propagate the memory operand from the gather/scatter ISD nodes.	2021-02-19 19:12:20 -08:00
Jacques Pienaar	3bec7ed59e	Different fix for gcc bug Was still running into from definition of 'template<class T> struct llvm::DenseMapInfo' [-fpermissive] template <typename T> struct DenseMapInfo; ^	2021-02-19 16:41:00 -08:00
Yusra Syeda	b006f55544	[SystemZ/z/OS] Add XPLINK 64-bit calling convention to tablegen. This commit adds the initial changes to the SystemZ target description for the XPLINK 64-bit calling convention on z/OS. Additions include: - a new predicate IsTargetXPLINK64 - different register allocation order - generaton of nopr after a call Reviewed-by: uweigand Differential Revision: https://reviews.llvm.org/D96887	2021-02-19 18:39:49 -05:00
Amara Emerson	27566e9c3e	[AArch64][GlobalISel] Make G_VECREDUCE_ADD of <2 x s32> legal.	2021-02-19 14:28:21 -08:00
Craig Topper	7e54d7304b	[RISCV] Remove VPatILoad and VPatIStore multiclasses that are no longer used. NFC	2021-02-19 13:23:08 -08:00
Craig Topper	e7c86f4ac4	[RISCV] Use inheritance to reduce some repeated code in tablegen. NFC The VLX and VSX searchable tables, share the same format so we can have a common base class for them.	2021-02-19 10:42:18 -08:00
Craig Topper	7f5b3886e4	[RISCV] Remove unneeded indexed segment load/store vector pseudo instruction. We had more combinations of data and index lmuls than we needed. Also add some asserts to verify that the IndexVT and data VT have the same element count when we isel these pseudo instructions.	2021-02-19 10:28:48 -08:00
Craig Topper	d056d5decf	[RISCV] Use custom isel for vector indexed load/store intrinsics. There are many legal combinations of index and data VTs supported for these intrinsics. This results in a lot of isel patterns in RISCVGenDAGISel.inc. By adding a separate table similar to what we use for segment load/stores, we can more efficiently manually select these intrinsics. We should also be able to reuse this table scalable vector gather/scatter. This reduces the llc binary size by ~56K. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D97033	2021-02-19 10:10:06 -08:00
Craig Topper	dbf910f0d9	[RISCV] Prevent selecting a 0 VL to X0 for the segment load/store intrinsics. Just like we do for isel patterns, we need to call selectVLOp to prevent 0 from being selected to X0 by the default isel. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97021	2021-02-19 10:07:12 -08:00
Craig Topper	98dff5e804	[RISCV] Move SHFLI matching to DAG combine. Add 32-bit support for RV64 We previously used isel patterns for this, but that used quite a bit of space in the isel table due to OR being associative and commutative. It also wouldn't handle shifts/ands being in reversed order. This generalizes the shift/and matching from GREVI to take the expected mask table as input so we can reuse it for SHFLI. There is no SHFLIW instruction, but we can promote a 32-bit SHFLI to i64 on RV64. As long as bit 4 of the control bit isn't set, a 64-bit SHFLI will preserve 33 sign bits if the input had at least 33 sign bits. ComputeNumSignBits has been updated to account for that to avoid sext.w in the tests. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96661	2021-02-19 10:07:12 -08:00
Jessica Paquette	8d3442eddb	[AArch64][GlobalISel] Run redundant_sext_inreg in the post-legalizer combiner This is to ensure that we can eliminate G_ASSERT_SEXT. In a follow-up patch, I'm going to make CallLowering emit G_ASSERT_SEXT for signext parameters. Differential Revision: https://reviews.llvm.org/D96913	2021-02-19 09:34:47 -08:00
madhur13490	3c297a2564	Make fixed-abi default for AMD HSA OS fixed-abi uses pre-defined and predictable SGPR/VGPRs for passing arguments. This patch makes this scheme default when HSA OS is specified in triple. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96340	2021-02-19 15:05:25 +00:00
David Green	a1c34a9d6a	[ARM] Correct vector predicate type in MVE getCmpSelInstrCost	2021-02-19 14:43:51 +00:00
David Green	7a5c26e99a	Revert "[ARM] Expand the range of allowed post-incs in load/store optimizer" This reverts commit `3b34b06fc5` as runtime errors were reported.	2021-02-19 13:15:10 +00:00
Fraser Cormack	d9531a3097	[RISCV] Address some clang-tidy warnings. NFCI.	2021-02-19 12:10:28 +00:00
Carl Ritson	8181dcd30f	[AMDGPU] WQM/WWM: Fix marking of partial definitions Track lanes when processing definitions for marking WQM/WWM. If all lanes have been defined then marking can stop. This prevents marking unnecessary instructions as WQM/WWM. In particular this fixes a bug where values passing through V_SET_INACTIVE would me marked as requiring WWM. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D95503	2021-02-19 20:45:24 +09:00
Simon Pilgrim	2258b367db	[X86][AVX] getFauxShuffleMask - decode VBROADCAST(EXTRACT_VECTOR_ELT(V,0)) Handle the case where we're broadcasting a scalar extracted from another vector.	2021-02-19 11:06:53 +00:00
Wang, Pengfei	c98644c2ec	[X86] Fix a codegen crash in getSetCCResultType This patch fixes some crashes coming from X86ISelLowering::getSetCCResultType, which would occasionally return an EVT constructed from an invalid MVT, which has a null Type pointer. This patch refers to D95434. Differential Revision: https://reviews.llvm.org/D97036	2021-02-19 17:30:10 +08:00
Sjoerd Meijer	260f90bb3d	[AArch64] Add some missing Neoverse features This enables AES fusion and the post RA scheduler for the Neoverse cores. And while we are it also for the A55 that we had missed earlier. Differential Revision: https://reviews.llvm.org/D96866	2021-02-19 09:18:35 +00:00
Craig Topper	cd4051ac80	[RISCV] Prune unneeded indexed load/store pseudo instructions. We were creating more combinations of value and index lmul than we needed. I've copied the loop structure used here from VPseudoAMOEI with all data sew values instead of just 32/64. Similar can be done for segment loads/store. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D97008	2021-02-18 23:08:39 -08:00
Serge Pavlov	2c4f60e45b	[FPEnv][AArch64] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96836	2021-02-19 13:16:51 +07:00
Craig Topper	8ed3bbbcc3	[RISCV] Split zvlsseg searchable table into 4 separate tables. Index by properties rather than intrinsic ID. Intrinsic ID is a 32-bit value which made each row of the table 4 byte aligned. The remaining fields used 5 bytes. This meant 3 bytes of padding per row. This patch breaks the table into 4 separate tables and indexes them by properties we know about the intrinsic. NF, masked, strided, ordered, etc. The indexed load/store tables have no padding in their rows now. All together this reduces the size of llc binary by ~28K. I'm considering adding similar tables for isel of non-segment load/store as well to cut down the size of the isel table and probably improve our isel performance. Those tables would need to indexed from intrinsics, IR loads/stores, gathers/scatters, and RISCVISD opcodes. So having a table that can be indexed without using intrinsic ID is more flexible. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D96894	2021-02-18 19:00:49 -08:00
Craig Topper	cf34559104	[RISCV] Enable PrimaryKeyEarlyOut on RISCVVPseudosTable. This table is queried in RISCVMCInstLower without knowing whether the instruction is a vector pseudo. Due to the way the binary search works, we have to do log2(tablesize) checks just to determine a non-vector instruction isn't in the table. Conveniently, all the vector pseudos are pretty tightly packed within the internal instruction enum. By enabling the PrimaryKeyEarlyOut, tablegen will emit a check against the beginning and end of the table before doing the binary search. This gives a quick early out on the search for the majority of non-vector instructions. Differential Revision: https://reviews.llvm.org/D97016	2021-02-18 18:59:32 -08:00
Leonard Chan	c77659e549	[llvm][IR] Do not place constants with static relocations in a mergeable section This patch provides two major changes: 1. Add getRelocationInfo to check if a constant will have static, dynamic, or no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.) 2. Only allow a constant with no relocations (static or dynamic) to be placed in a mergeable section. This will allow unused symbols that contain static relocations and happen to fit in mergeable constant sections (.rodata.cstN) to instead be placed in unique-named sections if -fdata-sections is used and subsequently garbage collected by --gc-sections. See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html. Differential Revision: https://reviews.llvm.org/D95960	2021-02-18 15:39:00 -08:00
Matt Arsenault	62d946e133	GlobalISel: Merge some AMDGPU ABI lowering code to generic code AMDGPU currently has a lot of pre-processing code to pre-split argument types into 32-bit pieces before passing it to the generic code in handleAssignments. This is a bit sloppy and also requires some overly fancy iterator work when building the calls. It's better if all argument marshalling code is handled directly in handleAssignments. This handles more situations like decomposing large element vectors into sub-element sized pieces. This should mostly be NFC, but does change the generated code by shifting where the initial argument packing instructions are placed. I think this is nicer looking, since it now emits the packing code directly after the relevant copies, rather than after the copies for the remaining arguments. This doubles down on gfx6/gfx7 using the gfx8+ ABI for 16-bit types. This is ultimately the better option, but incompatible with the DAG. Fixing this requires more work, especially for f16.	2021-02-18 17:26:55 -05:00
Nikita Popov	70e3c9a8b6	[BasicAA] Always strip single-argument phi nodes We can always look through single-argument (LCSSA) phi nodes when performing alias analysis. getUnderlyingObject() already does this, but stripPointerCastsAndInvariantGroups() does not. We still look through these phi nodes with the usual aliasPhi() logic, but sometimes get sub-optimal results due to the restrictions on value equivalence when looking through arbitrary phi nodes. I think it's generally beneficial to keep the underlying object logic and the pointer cast stripping logic in sync, insofar as it is possible. With this patch we get marginally better results: aa.NumMayAlias \| 5010069 \| 5009861 aa.NumMustAlias \| 347518 \| 347674 aa.NumNoAlias \| 27201336 \| 27201528 ... licm.NumPromoted \| 1293 \| 1296 I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(), as we're past the point where we can explicitly spell out everything that's getting stripped. Differential Revision: https://reviews.llvm.org/D96668	2021-02-18 23:07:50 +01:00
Craig Topper	0db938312a	[RISCV] Simplify VPseudoAMOEI multiclass. NFC lmul was already iterated in one of the loops. We don't need to recreate it from a string.	2021-02-18 12:40:51 -08:00
Stanislav Mekhanoshin	5247a0d9e6	[AMDGPU] Correct gfx90c feature list Looks like we have forced FeatureXNACK and forgot FeatureMadMacF32Insts. Differential Revision: https://reviews.llvm.org/D96989	2021-02-18 12:40:27 -08:00
Jessica Clarke	74df1ffaad	[RISCV] Use XLenRI alias for RegInfoByHwMode instances This avoids tedious repetition and matches what we do for the ValueTypeByHwMode uses. Reviewed By: craig.topper, luismarques Differential Revision: https://reviews.llvm.org/D96649	2021-02-18 19:38:36 +00:00
Sean Fertile	bb260b1ca7	[PowerPC][AIX] Add support for vector arg passing on the stack. Enable passing more vector arguments then available vector argument passing registers. Differential Revision: https://reviews.llvm.org/D96415	2021-02-18 13:32:40 -05:00
Heejin Ahn	6f2999b36a	[WebAssembly] Handle multiple EH_LABELs in EH pad Usually `EH_LABEL`s are placed in - Before an `invoke` (which becomes calls in the backend) - After an `invoke` - At the start of an EH pad I don't know exactly why, but I noticed there are cases of multiple, not a single, `EH_LABEL` instructions in the beginning of an EH pad. In that case `global.set` instruction placed to restore `__stack_pointer` ended up between two `EH_LABEL` instructions before `CATCH`. It should follow after the `EH_LABEL`s and `CATCH`. This CL fixes that case. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D96970	2021-02-18 10:18:00 -08:00
Craig Topper	156fc07e19	[RISCV] Add support for fixed vector MULHU/MULHS. This uses to division by constant optimization to use MULHU/MULHS. Reviewed By: frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D96934	2021-02-18 09:15:08 -08:00
Craig Topper	792627be35	[RISCV] Add support for fixed vector sign/zero extend from mask types. Due to vXi64 on RV32, I've directly emitted this using _VL ISD opcodes. If it wasn't for that we could just use fixed vector BUILD_VECTOR and VSELECT and let those each be legalized. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96910	2021-02-18 09:08:10 -08:00
Craig Topper	c7dd92e8a5	[RISCV] Support isel of scalable vector bitcasts These should be NOPs so we can just replace with the input. This matches what SVE does with isel patterns for all permutations. Custom isel saves us from having to list all permurations for all LMULs. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96921	2021-02-18 09:01:13 -08:00
Bradley Smith	8bad8a43c3	[AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD Adjust generateFMAsInMachineCombiner to return false if SVE is present in order to combine fmul+fadd into fma. Also add new pseudo instructions so as to select the most appropriate of FMLA/FMAD depending on register allocation. Depends on D96599 Differential Revision: https://reviews.llvm.org/D96424	2021-02-18 16:55:16 +00:00
Bradley Smith	5b094bfeb3	[AArch64] Allow folding FMUL/FADD into FMA for FP16 types isFMAFasterThanFMulAndFAdd should return true for FP16 types when HasFullFP16 is present, since we have the instructions to handle it for both SVE and NEON. (SVE patterns and tests will follow). Differential Revision: https://reviews.llvm.org/D96599	2021-02-18 16:51:22 +00:00
Hsiangkai Wang	065a187f33	[RISCV] Fix typo. Use ValueType instead of LLVMType.	2021-02-18 23:21:27 +08:00
David Green	3b34b06fc5	[ARM] Expand the range of allowed post-incs in load/store optimizer Currently the load/store optimizer will only fold in increments of the same size as the load/store. This patch expands that to any legal immediate for the post-inc instruction. Differential Revision: https://reviews.llvm.org/D95885	2021-02-18 14:59:02 +00:00
Baptiste Saleil	34dc1ccb96	[PowerPC] Exploit the vinsw, vinsd, and vins[wd][lr]x instructions on P10 This patch generates the vinsw, vinsd, vinsblx, vinshlx, vinswlx, vinsdlx, vinsbrx, vinshrx, vinswrx and vinsdrx instructions for vector insertion on P10. Differential Revision: https://reviews.llvm.org/D94454	2021-02-18 14:17:47 +00:00
Hsiangkai Wang	f1efa8abaf	[RISCV] Fix bugs in pseudo instructions for masked segment load. For masked segment load, the destination register should not overlap with mask register. It could not be V0. In the original implementation, there is no segment load/store register class without V0. In this patch, I added these register classes and modify `GetVRegNoV0` to get the correct one. Differential Revision: https://reviews.llvm.org/D96937	2021-02-18 22:17:00 +08:00
Hsiangkai Wang	b97d8b32c3	[NFC][RISCV] Use concise way to describe load/store instructions. Differential Revision: https://reviews.llvm.org/D96923	2021-02-18 22:17:00 +08:00
David Green	33ba220611	[ARM] Ensure types provided to getIntrinsicCost are valid It appears that pointer types were causing issues for the min/max cost code in getIntrinsicInstrCost. This makes sure that when matching icmp/select to a min/max, we only do that for normal int or float types.	2021-02-18 14:00:23 +00:00
Stefan Pintilie	b80357d46e	[PowerPC] Add option for ROP Protection Added -mrop-protection for Power PC to turn on codegen that provides some protection from ROP attacks. The option is off by default and can be turned on for Power 8, Power 9 and Power 10. This patch is for the option only. The feature will be implemented by a later patch. Reviewed By: amyk Differential Revision: https://reviews.llvm.org/D96512	2021-02-18 12:15:50 +00:00
David Green	1a6744e3dc	[ARM] Add larger than legal ICmp costs A v8i32 compare will produce a v8i1 predicate, but during codegen the v8i32 will be split into two v4i32, potentially requiring two v4i1 predicates to be merged into a single v8i1. Because this merging of two v4i1's into a v8i1 is very expensive, we need to make the cost of the compare equally high. This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost. Because we don't know whether the user of the predicate can be split, and the cost model is mostly pre-instruction, we may be pessimistic but that should only be for larger and legal types. This also adds min/max detection to the costmodel where it can be detected, to keep those in line with the cost of simple min/max instructions. Otherwise for the most part, costs that were already expensive have become more expensive. Differential Revision: https://reviews.llvm.org/D96692	2021-02-18 11:42:17 +00:00
Benjamin Kramer	ae1e6c3557	[RISCV] Rewrite assert to not give unused variable warnings in Release builds NFCI	2021-02-18 11:42:36 +01:00
Fraser Cormack	d876214990	[RISCV] Begin to support more subvector inserts/extracts This patch adds support for INSERT_SUBVECTOR and EXTRACT_SUBVECTOR (nominally where both operands are scalable vector types) where the vector, subvector, and index align sufficiently to allow decomposition to subregister manipulation: * For extracts, the extracted subvector must correctly align with the lower elements of a vector register. * For inserts, the inserted subvector must be at least one full vector register, and correctly align as above. This approach should work for fixed-length vector insertion/extraction too, but that will come later. Reviewed By: craig.topper, khchen, arcbbb Differential Revision: https://reviews.llvm.org/D96873	2021-02-18 10:18:27 +00:00
Fraser Cormack	0176fecfbc	[SVE][CodeGen] Expand SVE MULH[SU] and [SU]MUL_LOHI nodes This patch fixes a codegen crash introduced in `fde2466171`, where the DAGCombiner started generating optimized MULH[SU] or [SU]MUL_LOHI nodes unless the target opted out. The AArch64 backend cannot currently select any of these nodes, so ensure that they are not generated in the first place. This issue was raised by @huihuiz in D94501. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D96849	2021-02-18 10:06:24 +00:00
Wang, Pengfei	e9c11c1934	[X86] Zero AMX config buffer for non AVX512 cases. Zero AMX config buffer for non AVX512 cases. Differential Revision: https://reviews.llvm.org/D96927	2021-02-18 13:26:09 +08:00
Craig Topper	016eca8f90	[RISCV] Guard LowerINSERT_VECTOR_ELT against fixed vectors. The type legalizer can call this code based on the scalar type so we need to verify the vector type is a scalable vector. I think due to how type legalization visits nodes, the vector type will have already been legalized so we don't have an issue with using MVT here like we did for EXTRACT_VECTOR_ELT. I've added a test just in case.	2021-02-17 19:27:08 -08:00
Craig Topper	00c4e0a8f6	[RISCV] Guard the ISD::EXTRACT_VECTOR_ELT handling in ReplaceNodeResults against fixed vectors and non-MVT types. The type legalizer is calling this code based on the scalar type so we need to verify the input type is a scalable vector. The vector type has also not been legalized yet when this is called so we need to use EVT for it.	2021-02-17 18:25:38 -08:00
Stanislav Mekhanoshin	75997e8407	[AMDGPU] Fixed msan build LoadStoreOptimizer was using uninitialized SCC value for instructions where it is unsupported.	2021-02-17 18:01:23 -08:00
Chen Zheng	5517923b1c	[XCOFF][NFC] make csect properties optional for getXCOFFSection We are going to support debug sections for XCOFF. So the csect properties are not necessary. This patch makes these properties optional. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D95931	2021-02-17 20:51:42 -05:00
Stanislav Mekhanoshin	48d2e04152	[AMDGPU] Mark SMRD atomics We did not have atomic flags on SMRD, did not copy TSFlags to real instructions, and did not have ret/noret atomic map. At the moment it is NFC, but needed for D96469. Differential Revision: https://reviews.llvm.org/D96823	2021-02-17 16:47:02 -08:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Yusra Syeda	8b624a3164	[SystemZ] Separate LoZ ELF specifics in tablegen. Separate the LoZ ELF calling convention in tablegen. This will make it easier to add the z/OS ABI in future patches. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D96867	2021-02-17 16:11:58 -05:00
Heejin Ahn	da01a9db8b	[WebAssemblly] Fix EHPadStack update in fixCallUnwindMismatches Updating `EHPadStack` with respect to `TRY` and `CATCH` instructions have to be done after checking all other conditions, not before. Because we did this before checking other conditions, when we encounter `TRY` and we want to record the current mismatching range, we already have popped up the entry from `EHPadStack`, which we need to access to record the range. The `baz` call in the added test needs try-delegate because the previous TRY marker placement for `quux` was placed before `baz`, because `baz`'s return value was stackified in RegStackify. If this wasn't stackified this try-delegate is not strictly necessary, but at the moment it is not easy to identify cases like this. I plan to transfer `nounwind` attributes from the LLVM IR to prevent cases like this. The call in the test does not have `unwind` attribute in order to test this bug, but in many cases of this pattern the previous call has `nounwind` attribute. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D96711	2021-02-17 12:14:11 -08:00
Craig Topper	3bdd02735b	[RISCV] Localize RISCVZvlssegTable to RISCVISelDAGToDAG.cpp, the only place it is used.	2021-02-17 11:37:28 -08:00
Craig Topper	799f7865c8	[RISCV] Use bits<7> instead of bits<11> for the EEW field size in the RISCVZvlsseg searchable table. NFCI We only support 8, 16, 32, and 64 for EEW. These only need 7 bits to represent.	2021-02-17 11:12:36 -08:00
Heejin Ahn	7c594bab00	[WebAssembly] Change catch_all's opcode We decided to change `catch_all`'s opcode from 0x05, which is the same as `else`, to 0x19, to avoid some complicated handling in the tools. See: https://github.com/WebAssembly/exception-handling/issues/147 Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D96863	2021-02-17 10:16:23 -08:00
Craig Topper	d4353a3101	[RISCV] Merge the handlers for masked and unmasked segment loads/stores. A lot of the code for the masked and unmasked is the same. This patch adds a boolean to handle the differences so we can share the code. Differential Revision: https://reviews.llvm.org/D96841	2021-02-17 10:08:33 -08:00
Craig Topper	6f30d0035a	[RISCV] Merge the vsetvli and vsetvlimax intrinsic selection These have very similar code just with a different number of operands and handling for vsetivl. Differential Revision: https://reviews.llvm.org/D96834	2021-02-17 10:08:33 -08:00
Sidharth Baveja	cb2876800c	[PowerPC][AIX] Enable Shrinkwrapping on 32 and 64 bit AIX. Summary: Currently Shrinkwrap is not enabled on AIX. This patch enables shrink wrap on 32 and 64 bit AIX, and 64 bit ELF. Reviewed By: sfertile, nemanjai Differential Revision: https://reviews.llvm.org/D95094	2021-02-17 14:54:57 +00:00
Sean Fertile	4e127bce2d	[PowerPC] Handle FP physical register in inline asm constraint. Do not defer to the base class when the register constraint is a physical fpr. The base class will select SPILLTOVSRRC as the register class and register allocation will fail on subtargets without VSX registers. Differential Revision: https://reviews.llvm.org/D91629	2021-02-17 09:27:03 -05:00
David Green	6d835c5fcd	[ARM] Add MVE abs costs Similar to min/max, this increases the accuracy of abs intrinsics costs under MVE.	2021-02-17 14:21:09 +00:00
Piotr Sobczak	c72a63b4b0	[AMDGPU] Add implicit vcc_lo on S_CBRANCH_VCCNZ in wave32 * Update skip-if-dead.ll with tests for wave32. * Fix the crash in verifier in one newly enabled test by adding missing fixImplicitOperands in branch insertion code. ``` * Bad machine code: Using an undefined physical register * - function: test_kill_divergent_loop - basic block: %bb.2 bb (0xad96308) - instruction: S_CBRANCH_VCCNZ %bb.1, implicit $vcc_lo - operand 1: implicit $vcc_lo LLVM ERROR: Found 1 machine code errors. ``` * Simplify "cbranch_kill" to not use interp instructions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96793	2021-02-17 15:14:57 +01:00
luxufan	709ea8bc87	[RISCV] Simplify BP initialisation We can re-use copyPhysReg rather than writing a specialised copy. Differential Revision: https://reviews.llvm.org/D95227	2021-02-17 20:33:20 +08:00
Simon Pilgrim	05c64ea672	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) (REAPPLIED) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Fixes issue raised by @saugustine in rG5aa8f4c0843a where we were failing to replace null shuffle operands from MergeInnerShuffle to UNDEFs. Differential Revision: https://reviews.llvm.org/D96345	2021-02-17 11:42:43 +00:00
Jay Foad	c8be7e96bb	[AMDGPU] Rename simplifyI24 to simplifyMul24 Also simplify one of its call sites. NFC.	2021-02-17 11:33:49 +00:00
Piotr Sobczak	08131c7439	[AMDGPU] Fix a miscompile with S_ADD/S_SUB The helper function isBoolSGPR is too aggressive when determining when a v_cndmask can be skipped on a boolean value because the function does not check the operands of and/or/xor. This can be problematic for the Add/Sub combines that can leave bits set even for inactive lanes leading to wrong results. Fix this by inspecting the operands of and/or/xor recursively. Differential Revision: https://reviews.llvm.org/D86878	2021-02-17 12:24:58 +01:00
Fraser Cormack	d81161646a	[RISCV] Add support for fixed vector vselect This patch adds support for fixed-length vector vselect. It does so by lowering them to a custom unmasked VSELECT_VL node with a vector length operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96768	2021-02-17 10:59:00 +00:00
Hsiangkai Wang	a3c783dbf2	[RISCV] Spilling for RISC-V V extension. (2nd version) Differential Revision: https://reviews.llvm.org/D95148	2021-02-17 14:05:19 +08:00
Hsiangkai Wang	5a31a67385	[RISCV] Frame handling for RISC-V V extension. This patch proposes how to deal with RISC-V vector frame objects. The layout of RISC-V vector frame will look like \|---------------------------------\| \| scalar callee-saved registers \| \|---------------------------------\| \| scalar local variables \| \|---------------------------------\| \| scalar outgoing arguments \| \|---------------------------------\| \| RVV local variables && \| \| RVV outgoing arguments \| \|---------------------------------\| <- end of frame (sp) If there is realignment or variable length array in the stack, we will use frame pointer to access fixed objects and stack pointer to access non-fixed objects. \|---------------------------------\| <- frame pointer (fp) \| scalar callee-saved registers \| \|---------------------------------\| \| scalar local variables \| \|---------------------------------\| \| ///// realignment ///// \| \|---------------------------------\| \| scalar outgoing arguments \| \|---------------------------------\| \| RVV local variables && \| \| RVV outgoing arguments \| \|---------------------------------\| <- end of frame (sp) If there are both realignment and variable length array in the stack, we will use frame pointer to access fixed objects and base pointer to access non-fixed objects. \|---------------------------------\| <- frame pointer (fp) \| scalar callee-saved registers \| \|---------------------------------\| \| scalar local variables \| \|---------------------------------\| \| ///// realignment ///// \| \|---------------------------------\| <- base pointer (bp) \| RVV local variables && \| \| RVV outgoing arguments \| \|---------------------------------\| \| /////////////////////////////// \| \| variable length array \| \| /////////////////////////////// \| \|---------------------------------\| <- end of frame (sp) \| scalar outgoing arguments \| \|---------------------------------\| In this version, we do not save the addresses of RVV objects in the stack. We access them directly through the polynomial expression (a x VLENB + b). We do not reserve frame pointer when there is any RVV object in the stack. So, we also access the scalar frame objects through the polynomial expression (a x VLENB + b) if the access across RVV stack area. Differential Revision: https://reviews.llvm.org/D94465	2021-02-17 14:05:19 +08:00
Douglas Yung	0e3d7e6186	Fix gcc build after `de3a485d9` due to a gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92598 This should fix gcc based builders such as http://lab.llvm.org:8011/#/builders/76/builds/1683	2021-02-16 21:57:12 -08:00
Tony Tye	c62b737ad6	[AMDGPU] Correct rmw atomics s_waitcnt generation The AMD GPU SIMemoryLegalizer was using the ordering address space rather than the instruction address space when determining the s_waitcnt to generate to ensure that a read-modify-write atomic has completed. This resulted in additional unnecessary counters being waited on. Differential Revision: https://reviews.llvm.org/D96743	2021-02-17 01:32:29 +00:00
Sriraman Tallam	d1a838babc	Basic block sections should enable function sections implicitly. Basic block sections enables function sections implicitly, this is not needed and is inefficient with "=list" option. We had basic block sections enable function sections implicitly in clang. This is particularly inefficient with "=list" option as it places functions that do not have any basic block sections in separate sections. This causes unnecessary object file overhead for large applications. This patch disables this implicit behavior. It only creates function sections for those functions that require basic block sections. Further, there was an inconistent behavior with llc as llc was not turning on function sections by default. This patch makes llc and clang consistent and tests are added to check the new behavior. This is the first of two patches and this adds functionality in LLVM to create a new section for the entry block if function sections is not enabled. Differential Revision: https://reviews.llvm.org/D93876	2021-02-16 16:27:16 -08:00
Petr Hosek	16af973933	[MC][ELF] Support for zero flag section groups This change introduces support for zero flag ELF section groups to LLVM. LLVM already supports COMDAT sections, which in ELF are a special type of ELF section groups. These are generally useful to enable linker GC where you want a group of sections to always travel together, that is to be either retained or discarded as a whole, but without the COMDAT semantics. Other ELF assemblers already support zero flag ELF section groups and this change helps us reach feature parity. Differential Revision: https://reviews.llvm.org/D95851	2021-02-16 14:23:40 -08:00
Victor Huang	de3a485d9c	[NFC][PPC] Refactor TOC representation to allow several entries for the same symbol We currently represent TOC entries by an MCSymbol. This is not enough in some situations. For example, when accessing an initialized TLS variable v on AIX using the general dynamic model, we need to generate the two following entries for v: .tc .v[TC],v@m .tc v[TC],v One is for the region handle (with the @m relocation), the other is for the variable offset. This refactoring allows storing several entries for the same symbol with different VariantKind in the TOC. If the VariantKind is not specified, we default to VK_None. The AIX TLS implementation using this refactoring to generate the two entries will be posted in a subsequent patch. Patched By: bsaleil Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D96346	2021-02-16 21:32:16 +00:00
Sterling Augustine	5aa8f4c084	Revert "[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))" This reverts commit `5dfba562dd`. That commit causes an assertion failure with the following repro: typedef long b __attribute__((__vector_size__(16))); b d; b e; b __attribute__((__always_inline__)) c(b h, b i) { return (__attribute__((__vector_size__(8 sizeof(short)))) short)h + i; } j() { b k, l, m, n, o[6], p, q; m = d[5]; b r = m; b s = f(r, 8); q = s; l = d[1]; p = l; t(q); n = c(m, l); o[1] = c(s, f(p, 8)); k = __builtin_shufflevector(n, o[1], 0, 2); e = __builtin_ia32_psrlwi128(k, j); } ./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c	2021-02-16 12:48:15 -08:00
Craig Topper	61a238e6e1	[RISCV] Add isel patterns for fixed vector fmsub/fnmadd/fnmsub.	2021-02-16 12:03:33 -08:00
Jessica Paquette	962b73dd0f	Revert "[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE" This reverts commit `61b4702a40`. We were seeing some test failures in SPECINT2006 due to this change. Reverting to investigate.	2021-02-16 10:50:12 -08:00
Craig Topper	07ca13fe07	[RISCV] Add support for fixed vector mask logic operations. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96741	2021-02-16 09:34:00 -08:00
Florian Hahn	211147c5ba	[AArch64] Convert CMP/SELECT sign patterns to OR & ASR. ICMP & SELECT patterns extracting the sign of a value can be simplified to OR & ASR (see https://alive2.llvm.org/ce/z/Xx4iZ0). This does not save any instructions in IR, but it is profitable on AArch64, because we need at least 2 extra instructions to materialize 1 and -1 for the SELECT. The improvements result in ~5% speedups on loops of the form static int sign_of(int x) { if (x < 0) return -1; return 1; } void foo(const int x, int res, int cnt) { for (int i=0;i<cnt;i++) res[i] = sign_of(x[i]); } Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D96596	2021-02-16 17:17:34 +00:00
David Green	1e007cf43c	[ARM] Use rGPR for writeback vldrs From what I can tell, a writeback is unpredictable with LR for both loads and stores. This changes the operand from a gprnopc to a rGPR in both cases (which I believe is essentially a NFC due to the tied-def already being a rGPR.) Differential Revision: https://reviews.llvm.org/D96723	2021-02-16 16:44:47 +00:00
Matt Arsenault	a7455d7b7c	AMDGPU: Remove kills following clusters of memory instruction In a future commit, soft clauses will be hinted with kill instructions rather than forced together with bundles. Look for kills that look like this, and erase them. I'm not sure if the check for specific uses is worthwhile, or if it would be better to just unconditionally erase kills. This reduces test churn in a future patch.	2021-02-16 10:49:28 -05:00
Simon Pilgrim	5dfba562dd	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Differential Revision: https://reviews.llvm.org/D96345	2021-02-16 15:46:34 +00:00
Matt Arsenault	c320e8196a	AMDGPU: Fix debug info handling in post-RA bundler This was allowing debug instructions to break the bundling, which would change scheduling behavior. Bundle debug info / kills inside the bundle. This seems to work OK, although the asm printer doesn't understand these in a bundle. This implicitly expects the memory legalizer to unbundle. It would probably be slightly nicer to move these after. Rewrite the loop to be clearer and make sure we don't end a bundle on a meta instruction, only allow them in between other valid bundle instructions.	2021-02-16 10:42:06 -05:00
David Truby	e86f9ba15c	[llvm][Aarch64][SVE] Remove extra fmov instruction with certain literals When a literal that cannot fit in the immediate form of the fmov instruction is used to initialise an SVE vector, an extra unnecessary fmov is currently generated. This patch adds an extra codegen pattern preventing the extra instruction from being generated. Differential Revision: https://reviews.llvm.org/D96700 Co-Authored-By: Paul Walker <paul.walker@arm.com>	2021-02-16 14:16:33 +00:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Fraser Cormack	04977ce5ce	[RISCV] Fix a crash in fixed-length build_vector lowering Non-splatted non-integer build_vector nodes were mistakenly being lowered as VID expressions, which should not happen. VID can only be used to select integer build_vector nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96718	2021-02-16 10:25:15 +00:00
Fraser Cormack	b870199020	[RISCV] Add patterns for scalable-vector fabs & fcopysign The patterns mostly follow the scalar counterparts, save for some extra optimizations to match the vector/scalar forms. The patch adds a DAGCombine for ISD::FCOPYSIGN to try and reorder ISD::FNEG around any ISD::FP_EXTEND or ISD::FP_TRUNC of the second operand. This helps us achieve better codegen to match vfsgnjn. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96028	2021-02-16 10:21:09 +00:00
Craig Topper	29b894a8d3	[RISCV] Add expicit i32/i64 types to RV32 or RV64 only isel patterns. NFC This stops tablegen from generating patterns with the opposite type in the opposite HwMode. This just adds wasted bytes to the isel table. This reduces the isel table by about 1800 bytes.	2021-02-15 14:36:05 -08:00
Matt Arsenault	392e0fcfd1	GlobalISel: Handle arguments partially passed on the stack The API is a bit awkward since you need to index into an array in the passed struct. I guess an alternative would be to pass all of the individual fields.	2021-02-15 17:06:14 -05:00
Craig Topper	7ba2e1c601	[RISCV] Add support for fixed vector floating point setcc. This is annoying because the condition code legalization belongs to LegalizeDAG, but our custom handler runs in Legalize vector ops which occurs earlier. This adds some of the mask binary operations so that we can combine multiple compares that we need for expansion. I've also fixed up RISCVISelDAGToDAG.cpp to handle copies of masks. This patch contains a subset of the integer setcc patch as well. That patch is dependent on the integer binary ops patch. I'll rebase based on what order the patches go in. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96567	2021-02-15 12:52:25 -08:00
Duncan P. N. Exon Smith	22a52dfddc	TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto) This commit fixes how metadata is handled in CloneModule to be sound, and improves how it's handled in CloneFunctionInto (although the latter is still awkward when called within a module). Ruiling Song pointed out in PR48841 that CloneModule was changed to unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in `fa35c1f80f` for clarity). This flag papered over a crash caused by other various changes made to CloneFunctionInto over the past few years that made it unsound to use cloning between different modules. (This commit partially addresses PR48841, fixing the repro from preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode became unsound in `df763188c9` and this commit does not address that regression.) RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use, avoiding unnecessary clones of all referenced metadata when linking between modules (with IRMover, the source module is discarded after linking). It never makes sense to use when you're not discarding the source. This commit drops its incorrect use in CloneModule. Sadly, the right thing to do with metadata when cloning a function is complicated, and this patch doesn't totally fix it. The first problem is that there are two different types of referenceable metadata and it's not obvious what to with one of them when remapping. - `!0 = !{!1}` is metadata's version of a constant. Programatically it's called "uniqued" (probably a better term would be "constant") because, like `ConstantArray`, it's stored in uniquing tables. Once it's constructed, it's illegal to change its arguments. - `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal to change the operands after construction. What should be done with distinct metadata when cloning functions within the same module? - Should new, cloned nodes be created? - Should all references point to the same, old nodes? The answer depends on whether that metadata is effectively owned by a function. And that's the second problem. Referenceable metadata's ownership model is not clear or explicit. Technically, it's all stored on an LLVMContext. However, any metadata that is `distinct`, that transitively references a `distinct` node, or that transitively references a GlobalValue is specific to a Module and is effectively owned by it. More specifically, some metadata is effectively owned by a specific Function within a module. Effectively function-local metadata was introduced somewhere around `c10d0e5ccd`, which made it illegal for two functions to share a DISubprogram attachment. When cloning a function within a module, you need to clone the function-local debug info and suppress cloning of global debug info (the status quo suppresses cloning some global debug info but not all). When cloning a function to a new/different module, you need to clone all of the debug info. Here's what I think we should do (eventually? soon? not this patch though): - Distinguish explicitly (somehow) between pure constant metadata owned by the LLVMContext, global metadata owned by the Module, and local metadata owned by a GlobalValue (such as a function). - Update CloneFunctionInto to trigger cloning of all "local" metadata (only), perhaps by adding a bit to RemapFlag. Alternatively, split out a separate function CloneFunctionMetadataInto to prime the metadata map that callers are updated to call ahead of time as appropriate. Here's the somewhat more isolated fix in this patch: - Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to an enum called `CloneFunctionChangeType` that is one of LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule. - The code maintaining the "functions uniquely own subprograms" invariant is now only active in the first two cases, where a function is being cloned within a single module. That's necessary because this code inhibits cloning of (some) "global" metadata that's effectively owned by the module. - The code maintaining the "all compile units must be explicitly referenced by !llvm.dbg.cu" invariant is now only active in the DifferentModule case, where a function is being cloned into a new module in isolation. - CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create uses LocalChangeOnly, since `fa635d730f` only set `ModuleLevelChanges` to trigger cloning of local metadata. - CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs and special handling of !llvm.dbg.cu. - Fixed some outdated header docs and left a couple of FIXMEs. Differential Revision: https://reviews.llvm.org/D96531	2021-02-15 11:56:00 -08:00
Stanislav Mekhanoshin	5cf9292ce3	[AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn We are using AtomicNoRet map in multiple places to determine if an instruction atomic, rtn or nortn atomic. This method does not work always since we have some instructions which only has rtn or nortn version. One such instruction is ds_wrxchg_rtn_b32 which does not have nortn version. This has caused changes in memory legalizer tests. Differential Revision: https://reviews.llvm.org/D96639	2021-02-15 11:27:59 -08:00
Florian Hahn	ca23b2c8ed	[AArch64] Move machine bundle unpacking to PreEmit2 phase. This patch adjusts the placement of the bundle unpacking to just before code emission. In particular, this means bundle unpacking happens AFTER the machine outliner. With the previous position, the machine outliner may outline parts of a bundle, which breaks them up. This is an issue for BLR_RVMARKER handling, as illustrated by the rvmarker-pseudo-expansion-and-outlining.mir test case. The machine outliner should not break up the bundles created during pseudo expansion. This should fix PR49082. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D96294	2021-02-15 16:10:43 +00:00
David Green	0a98efb049	[ARM] Add some basic Min/Max costs This adds basic MVE costs for SMIN/SMAX/UMIN/UMAX, as well as MINNUM and MAXNUM representing fmin and fmax. It tightens up the costs, not using a ICmp+Select cost. Differential Revision: https://reviews.llvm.org/D96603	2021-02-15 15:06:19 +00:00
Caroline Concatto	b52e6c5891	[CostModel]Add cost model for experimental.vector.reverse This patch uses the function getShuffleCost with SK_Reverse to compute the cost for experimental.vector.reverse. For scalable vector type, it adds a table will the legal types on AArch64TTIImpl::getShuffleCost to not assert in BasicTTIImpl::getShuffleCost, and for fixed vector, it relies on the existing cost model in BasicTTIImpl. Depends on D94883 Differential Revision: https://reviews.llvm.org/D95603	2021-02-15 14:23:57 +00:00
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
David Green	a838a4f69f	[ARM] Extend search for increment in load/store optimizer Currently the findIncDecAfter will only look at the next instruction for post-inc candidates in the load/store optimizer. This extends that to a search through the current BB, until an instruction that modifies or uses the increment reg is found. This allows more post-inc load/stores and ldm/stm's to be created, especially in cases where a schedule might move instructions further apart. We make sure not to look any further for an SP, as that might invalidate stack slots that are still in use. Differential Revision: https://reviews.llvm.org/D95881	2021-02-15 13:17:21 +00:00
Sjoerd Meijer	357237e93e	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `effc3b0799`, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Sjoerd Meijer	effc3b0799	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `cd6de0e8de`.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	cd6de0e8de	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Fraser Cormack	4bd5bd4009	[RISCV] Convert VSLIDE(UP\|DOWN) nodes to "VL" versions (NFC) This patch prepares the RISCV VSLIDEUP and VSLIDEDOWN custom nodes to ones carrying additional mask and vector-length operands. This is primarily so they can be used by both systems. This also takes the opportunity to create some helper functions to deal with the common task of getting the default (unmasked) VL operands. Reviewed By: craig.topper, arcbbb Differential Revision: https://reviews.llvm.org/D96505	2021-02-15 10:32:56 +00:00
Arlo Siemsen	080866470d	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Carl Ritson	aef781b47a	[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic Add intrinsic which demotes all active lanes to helper lanes. This is used to implement demote to helper Vulkan extension. In practice demoting a lane to helper simply means removing it from the mask of live lanes used for WQM/WWM/Exact mode. Where the shader does not use WQM, demotes just become kills. Additionally add llvm.amdgcn.live.mask intrinsic to complement demote operations. In theory llvm.amdgcn.ps.live can be used to detect helper lanes; however, ps.live can be moved by LICM. The movement of ps.live cannot be remedied without changing its type signature and such a change would require ps.live users to update as well. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94747	2021-02-15 08:45:46 +09:00
Tony Tye	8a91b68b95	[AMDGPU] Limit memory scope for scratch, LDS and GDS Changes for AMD GPU SIMemoryLegalizer: - Limit the memory scope to maximum supported by the scratch, LDS and GDS address spaces. - Improve assertion checking. - Correct toSIAtomicScope argument name. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96643	2021-02-14 17:34:12 +00:00
Kazu Hirata	b4c0d610a6	[AMDGPU] Fix build breakage	2021-02-14 09:02:55 -08:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Ben Shi	efb1cb752b	[AVR] Fix a bug in 16-bit shifts Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D96590	2021-02-14 11:54:55 +08:00
Craig Topper	3520371ddb	[RISCV] Rename the RVVBaseAddr ComplexPattern to just BaseAddr and use it to merge some scalar load/store patterns too.	2021-02-13 12:01:51 -08:00
Heejin Ahn	35f5f797a6	[WebAssemblly] Fix rethrow's argument computation Previously we assumed `rethrow`'s argument was always 0, but it turned out `rethrow` follows the same rule with `br` or `delegate`: https://github.com/WebAssembly/exception-handling/pull/137 https://github.com/WebAssembly/exception-handling/issues/146#issuecomment-777349038 Currently `rethrow`s generated by our backend always rethrow the exception caught by the innermost enclosing catch, so this adds a function to compute that and replaces `rethrow`'s argument with its computed result. This also renames `EHPadStack` in `InstPrinter` to `TryStack`, because in CFGStackify we use `EHPadStack` to mean the range between `catch`~`end`, while in `InstPrinter` we used it to mean the range between `try`~`catch`, so choosing different names would look clearer. Doesn't contain any functional changes in `InstPrinter`. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D96595	2021-02-13 03:43:15 -08:00
Kazu Hirata	96c90a6d14	[AMDGPU] Drop unnecessary const from a return type (NFC) Identified with readability-const-return-type.	2021-02-12 23:44:32 -08:00
Serge Pavlov	816053bc71	[FPEnv][ARM] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96501	2021-02-13 11:16:29 +07:00
Craig Topper	532d4bf025	[RISCV] Move riscv_vfmv_v_f_vl patterns to RISCVInstrInfoVVLPatterns.td for consistency with riscv_vmv_v_x_vl. NFC	2021-02-12 16:08:27 -08:00
Craig Topper	4220a81c84	[RISCV] Add support for fixed vector fabs	2021-02-12 15:33:36 -08:00
Craig Topper	36658376d5	[RISCV] Add support for fixed vector sqrt.	2021-02-12 15:33:29 -08:00
Jessica Paquette	61b4702a40	[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE This is pretty much just ports `performGlobalAddressCombine` from AArch64ISelLowering. (AArch64 doesn't use the generic DAG combine for this.) This adds a pre-legalize combine which looks for this pattern: ``` %g = G_GLOBAL_VALUE @x %ptr1 = G_PTR_ADD %g, cst1 %ptr2 = G_PTR_ADD %g, cst2 ... %ptrN = G_PTR_ADD %g, cstN ``` And then, if possible, transforms it like so: ``` %g = G_GLOBAL_VALUE @x %offset_g = G_PTR_ADD %g, -min(cst) %ptr1 = G_PTR_ADD %offset_g, cst1 %ptr2 = G_PTR_ADD %offset_g, cst2 ... %ptrN = G_PTR_ADD %offset_g, cstN ``` Where min(cst) is the smallest out of the G_PTR_ADD constants. This means we should save at least one G_PTR_ADD. This also updates code in the legalizer + selector which assumes that G_GLOBAL_VALUE will never have an offset and adds/updates relevant tests. Differential Revision: https://reviews.llvm.org/D96624	2021-02-12 14:55:15 -08:00
Craig Topper	d32ed9b27e	[RISCV] Use a ComplexPattern to merge the PatFrags for removing unneeded masks on shift amounts. Rather than having patterns with and without an AND, use a ComplexPattern to handle both cases. Reduces the isel table by about 700 bytes.	2021-02-12 14:03:23 -08:00
Stanislav Mekhanoshin	c96e214b9c	[AMDGPU] Fix Windows build A trivial fix, 64 bit constant is 1ull, not 1ul on Windows. Fixed build broken by `c0d7a8bc62`.	2021-02-12 12:30:52 -08:00
Amara Emerson	5d6d9b63a3	[GlobalISel] Propagate extends through G_PHIs into the incoming value blocks. This combine tries to do inter-block hoisting of extends of G_PHIs, into the originating blocks of the phi's incoming value. The idea is to expose further optimization opportunities that are normally obscured by the PHI. Some basic heuristics, and a target hook for AArch64 is added, to allow tuning. E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine since it may be folded into the addressing mode during selection. There are very minor code size improvements on AArch64 -Os, but the real benefit is that it unlocks optimizations like AArch64 conditional compares on some benchmarks. Differential Revision: https://reviews.llvm.org/D95703	2021-02-12 11:52:52 -08:00
David Green	875f0cbcc6	[ARM] Optimize fp store of extract to integer store if already available. Given a floating point store from an extracted vector, with an integer VGETLANE that already exists, storing the existing VGETLANEu directly can be better for performance. As the value is known to already be in an integer registers, this can help reduce fp register pressure, removed the need for the fp extract and allows use of more integer post-inc stores not available with vstr. This can be a bit narrow in scope, but helps with certain biquad kernels that store shuffled vector elements. Differential Revision: https://reviews.llvm.org/D96159	2021-02-12 18:34:58 +00:00
Simon Pilgrim	4841a225b7	[DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets. This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization. We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation. The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987. Differential Revision: https://reviews.llvm.org/D96413	2021-02-12 18:22:57 +00:00
Stanislav Mekhanoshin	c0d7a8bc62	[AMDGPU] Allow accvgpr_read/write decode with opsel These two instructions are VOP3P and have op_sel_hi bits, however do not use op_sel_hi. That is recommended to set unused op_sel_hi bits to 1. However, we cannot decode both representations with 1 and 0 if bits are set to default value 1. If bits are set to be ignored with '?' initializer then encoding defaults them to 0. The patch is a hack to force ignored '?' bits to 1 on encoding for these instructions. There is still canonicalization happens on disasm print if incoming values are non-default, so that disasm output does not match binary input, but this is pre-existing problem for all instructions with '?' bits. Fixes: SWDEV-272540 Differential Revision: https://reviews.llvm.org/D96543	2021-02-12 10:04:47 -08:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Craig Topper	1697cc78b1	[RISCV] Add support for integer fixed vector setcc I believe I've covered all orderings of splat operands here. Better canonicalization in lowering might help reduce this. I did not handle the immediate adjustments needed for set(u)gt/set(u)lt. Testing here is limited to byte types because the scalable vector type used for masks for the store is calculated assuming 8 byte elements. But for the setcc its based on the element count of the container type for the setcc input. So they don't agree. We'll need to enhanced D96352 to handle this I think. Differential Revision: https://reviews.llvm.org/D96443	2021-02-12 09:29:41 -08:00
Craig Topper	875c76de2b	[RISCV] Add support for matching .vx and .vi forms of binary instructions for fixed vectors. Unlike scalable vectors, I'm only using a ComplexPattern for the immediate itself. The vmv_v_x is matched explicitly. We igore the VL argument when matching a binary operator, but we do check it when matching splat directly. I left out tests for vXi64 as they fail on rv32 right now. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96365	2021-02-12 09:18:10 -08:00
Jay Foad	7e9ceed9a2	[TableGen][GlobalISel] Allow duplicate RendererFns Allow different GICustomOperandRenderers to use the same RendererFn. This avoids the need for targets to define a bunch of identical C++ renderer functions with different names. Without this fix TableGen would have emitted code that tried to define the GICR enumeration with duplicate enumerators. Differential Revision: https://reviews.llvm.org/D96587	2021-02-12 15:05:32 +00:00
David Green	541828e35d	[ARM] Single source VMOVNT Our current lowering of VMOVNT goes via a shuffle vector of the form <0, N, 2, N+2, 4, N+4, ..>. That can of course also be a single input shuffle of the form <0, 0, 2, 2, 4, 4, ..>, where we use a VMOVNT to insert a vector into the top lanes of itself. This adds lowering of that case, re-using the existing isVMOVNMask. Differential Revision: https://reviews.llvm.org/D96065	2021-02-12 14:28:57 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
luxufan	feaf1d81e3	[RISCV] Change parseVTypeI function Change parseVTypeI function to Make the added vset instruction test cases report more concrete error message. Differential Revision: https://reviews.llvm.org/D96218	2021-02-12 19:38:34 +08:00
Fraser Cormack	e88da1d677	[RISCV] Add support for integer fixed min/max This patch extends the initial fixed-length vector support to include smin, smax, umin, and umax. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96491	2021-02-12 09:19:45 +00:00
Heejin Ahn	2968611fda	[WebAssembly] Fix delegate's argument computation I previously assumed `delegate`'s immediate argument computation followed a different rule than that of branches, but we agreed to make it the same (https://github.com/WebAssembly/exception-handling/issues/146). This removes the need for a separate `DelegateStack` in both CFGStackify and InstPrinter. When computing the immediate argument, we use a different function for `delegate` computation because in MIR `DELEGATE`'s instruction's destination is the destination catch BB or delegate BB, and when it is a catch BB, we need an additional step of getting its corresponding `end` marker. Reviewed By: tlively, dschuff Differential Revision: https://reviews.llvm.org/D96525	2021-02-11 21:57:28 -08:00
Craig Topper	7a7836b4d8	[RISCV] Add a pattern for a scalable vector mask vnot. We can use a vnand.mm with the same register for both inputs. This avoids materializing an alls ones constant with vmset.mm.	2021-02-11 15:34:58 -08:00

... 5 6 7 8 9 ...

61980 Commits