llvm-project

Commit Graph

Author	SHA1	Message	Date
Heejin Ahn	dcfec279d6	[WebAssembly] Handle empty cleanuppads when adding catch_all In `LateEHPrepare::addCatchAlls`, the current code tries to get the iterator's debug info even when it is `MachineBasicBlock::end()`. This fixes the bug by adding empty debug info instead in that case. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D97679	2021-03-01 10:07:05 -08:00
Andy Wingo	2632ba6a35	[WebAssembly] call_indirect issues table number relocs If the reference-types feature is enabled, call_indirect will explicitly reference its corresponding function table via TABLE_NUMBER relocations against a table symbol. Also, as before, address-taken functions can also cause the function table to be created, only with reference-types they additionally cause a symbol table entry to be emitted. Differential Revision: https://reviews.llvm.org/D90948	2021-03-01 16:49:00 +01:00
Simon Pilgrim	925093d88a	[X86] Fold shuffle(not(x),undef) -> not(shuffle(x,undef)) Move NOT out to expose more AND -> ANDN folds	2021-03-01 14:47:39 +00:00
Jay Foad	796a60d2ea	[AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32) The expected use case is for frontends to insert this into shaders that are to be run under a debugger. The shader can then be resumed or single stepped from the point of the call under debugger control. Differential Revision: https://reviews.llvm.org/D97670	2021-03-01 14:30:23 +00:00
Jay Foad	48ca5d3398	[AMDGPU] Simplify SITargetLowering::isSDNodeSourceOfDivergence. NFC. Check for read-modify-write AtomicSDNodes instead of using an exhaustive list of ISD opcodes. Differential Revision: https://reviews.llvm.org/D97671	2021-03-01 14:22:08 +00:00
Matt Arsenault	6c260d3bc0	GlobalISel: Move splitToValueTypes to generic code I copied the nearly identical function from AArch64 into AMDGPU, so fix this duplication. Mips and X86 have their own more exotic versions which should be removed. However replacing those is better left for a separate patch since it requires other changes to avoid regressions.	2021-03-01 08:58:18 -05:00
Matt Arsenault	b4bfe29415	AArch64/GlobalISel: Fix using wrong calling convention for calls This was reusing the parent function calling convention instead of the callee. I'm not sure if there's a case where there's an observable difference. I previously missed this in `b72a23650f`	2021-03-01 08:46:33 -05:00
David Green	7abf7dd5ef	[AArch64] Add combine for add(udot(0, x, y), z) -> udot(z, x, y). Given a zero input for a udot, an add can be folded in to take the place of the input, using thte addition that the instruction naturally performs. Differential Revision: https://reviews.llvm.org/D97188	2021-03-01 12:53:34 +00:00
Fraser Cormack	3fea9226ee	[RISCV] Support INSERT_SUBVECTOR on vector masks Like with EXTRACT_SUBVECTOR, INSERT_SUBVECTOR poses a problem for vector masks as RVV isn't able to slide mask types around. We choose instead to bitcast to equivalently-sized i8 types where we can, else we zero-extend, perform the operation, and truncate back down. One test was left disabled due to a crash in the legalizer. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97559	2021-03-01 12:04:11 +00:00
Fraser Cormack	e80ca3af82	[RISCV] Fix INSERT/EXTRACT_SUBVECTOR on fractional LMUL types This patch fixes a bug where the lowering for INSERT_SUBVECTOR and EXTRACT_SUBVECTOR would insist on first extracting a register-aligned LMUL1 vector type before perfoming the slide up/down. This was even if the vector was a fractional LMUL type, in which case the aligned EXTRACT_SUBVECTOR was invalid. This issue only occurred for scalable vector types, but a variety of tests for both scalable and fixed-length vectors have been added to ensure this does not regress in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97556	2021-03-01 11:51:05 +00:00
Fraser Cormack	4ea734e6ec	[RISCV] Unify scalable- and fixed-vector INSERT_SUBVECTOR lowering This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR operations under one roof. Consequently, with this patch it is possible to support any fixed-length subvector insertion, not just "cast-like" ones. As before, support for the insertion of mask vectors will come in a separate patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97543	2021-03-01 11:38:47 +00:00
Fraser Cormack	bd4d421688	[RISCV] Support EXTRACT_SUBVECTOR on vector masks This patch adds support for extracting subvectors from vector masks. This can be either extracting a scalable vector from another, or a fixed-length vector from a fixed-length or scalable vector. Since RVV lacks a way to slide vector masks down on an element-wise basis and we don't know the true length of the vector registers, in many cases we must resort to using equivalently-sized i8 vectors to perform the operation. When this is not possible we fall back and extend to a suitable i8 vector. Support was also added for fixed-length truncation to mask types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97475	2021-03-01 11:20:09 +00:00
Fraser Cormack	6718fda6ad	[CodeGen] Fix issues with subvector intrinsic index types This patch addresses issues arising from the fact that the index type used for subvector insertion/extraction is inconsistent between the intrinsics and SDNodes. The intrinsic forms require i64 whereas the SDNodes use the type returned by SelectionDAG::getVectorIdxTy. Rather than update the intrinsic definitions to use an overloaded index type, this patch fixes the issue by transforming the index to the correct type as required. Any loss of index bits going from i64 to a smaller type is unexpected, and will be caught by an assertion in SelectionDAG::getVectorIdxConstant. The patch also updates the documentation for INSERT_SUBVECTOR and adds an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR. This necessitated changes to AArch64 which was using i64 for EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed its codegen after updating the backend accordingly. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D97459	2021-03-01 10:28:21 +00:00
David Green	91ebc4e864	[ARM] VMOVN undef folding If we insert undef using a VMOVN, we can just use the original value in three out of the four possible combinations. Using VMOVT into a undef vector will still require the lanes to be moved, but otherwise the non-undef value can be used.	2021-02-28 14:44:45 +00:00
Simon Pilgrim	ab3ea27b6f	[X86][AVX] Reuse existing VBROADCAST(x) for SCALAR_TO_VECTOR(x) Similar to what we already do for BROADCASTs of different vector sizes - if we're going to broadcast it anyway might as well reuse it.	2021-02-28 11:37:27 +00:00
David Green	0fe64812d8	[ARM] VECTOR_REG_CAST undef -> undef Propagate undef through VECTOR_REG_CAST nodes, allowing extra simplification in some patterns.	2021-02-28 11:13:49 +00:00
Craig Topper	993f4d8ffa	[X86] Fix a couple comments that said LHS where they meant RHS. NFC	2021-02-27 17:14:17 -08:00
Wang, Pengfei	42e025f9de	[X86] Disable rematerializion for PTILELOADDV Per the discussion in D97453. We currently disable it due to it's not a common scenario and has some problem in implementation. Differential Revision: https://reviews.llvm.org/D97453	2021-02-27 21:08:58 +08:00
Heejin Ahn	aa097ef8d4	[WebAssembly] Fix reverse mapping in WasmEHFuncInfo D97247 added the reverse mapping from unwind destination to their source, but it had a critical bug; sources can be multiple, because multiple BBs can have a single BB as their unwind destination. This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes it return a vector rather than a single BB. It does not return the const reference to the existing vector but creates a new vector because `WasmEHFuncInfo` stores not `BasicBlock` or `MachineBasicBlock` but `PointerUnion` of them. Also I hoped to unify those methods for `BasicBlock` and `MachineBasicBlock` into one using templates to reduce duplication, but failed because various usages require `BasicBlock*` to be `const` but it's hard to make it `const` for `MachineBasicBlock` usages. Fixes https://github.com/emscripten-core/emscripten/issues/13514. (More precisely, fixes https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744) Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97583	2021-02-26 17:12:10 -08:00
Fangrui Song	47c5576d7d	ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects If a global object is listed in `@llvm.used`, place it in a unique section with the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections` with LLD>=13 or GNU ld>=2.36. For front ends which do not expect to see multiple sections of the same name, consider emitting `@llvm.compiler.used` instead of `@llvm.used`. SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in binutils. We don't do the restriction - see the rationale in D95749. The integrated assembler has supported SHF_GNU_RETAIN since D95730. GNU as>=2.36 supports section flag 'R'. We don't need to worry about GNU ld support because older GNU ld just ignores the unknown SHF_GNU_RETAIN. With this change, `__attribute__((retain))` functions/variables emitted by clang will get the SHF_GNU_RETAIN flag. Differential Revision: https://reviews.llvm.org/D97448	2021-02-26 16:38:44 -08:00
Jessica Paquette	f5d5a7d7ea	[AArch64][GlobalISel] Import FMOV patterns rather than manually selecting it There are existing patterns for FMOVHi, FMOVSi, and FMOVDi in AArch64InstrFormats.td. Importing these allows us to remove the manual selection code for FMOV. It also allows us to select FMOVHi for non-zero constants when we have full fp-16 support. Refactor some of the code in AArch64InstrFormats.td so that we can create equivalent custom renderers in GlobalISel. Differential Revision: https://reviews.llvm.org/D97511	2021-02-26 16:27:39 -08:00
Matt Arsenault	81b2c23b77	AMDGPU: Use kill instruction to hint soft clause live ranges Previously we would use a bundle to hint the register allocator to not overwrite the pointers in a sequence of loads to avoid breaking soft clauses. This bundling was based on a fuzzy register pressure heuristic, so we could not guarantee using more registers than are really available. This would result in register allocator failing on unsatisfiable bundles. Use a kill to artificially extend the live ranges, so we can always succeed at register allocation even if it means extra spills in the worst case. This seems to capture most of the benefit of the bundle while avoiding most of the risk presented by the bundle. However the lit tests do show a handful of regressions. In some cases with sequences of volatile loads, unused load components end up getting reallocated to the next load which forces a wait between. There are also a few small scheduling regressions where a hazard used to be avoided, and one spill torture test which for some reason nearly doubles the stack usage. There is also a bit of noise from leftover kills (it may make sense for post-RA pseudos to strip all of these out).	2021-02-26 18:26:40 -05:00
Dan Gohman	c62dabc3f5	[WebAssembly] Avoid `bit_cast` when printing f32 and f64 immediates Use `APInt` to convert a 32-bit or 64-bit immediate to an `APFloat` rather than `bit_cast` to a `float` or `double` to avoid going through host floating-point and potentially changing the bit pattern of NaNs. Differential Revision: https://reviews.llvm.org/D97490	2021-02-26 14:19:02 -08:00
Heejin Ahn	d8b3dc5a68	[WebAssembly] Fix remapping branch dests in fixCatchUnwindMismatches This is a case D97178 tried to solve but missed. D97178 could not handle the case when multiple consecutive delegates are generated: - Before: ``` block br (a) try catch end_try end_block <- (a) ``` - After ``` block br (a) try ... try try catch end_try <- (a) delegate delegate end_block <- (b) ``` (The `br` should point to (b) now) D97178 assumed `end_block` exists two BBs later than `end_try`, because it assumed the order as `end_try` BB -> `delegate` BB -> `end_block` BB. But it turned out there can be multiple `delegate`s in between. This patch changes the logic so we just search from `end_try` BB until we find `end_block`. Fixes https://github.com/emscripten-core/emscripten/issues/13515. (More precisely, fixes https://github.com/emscripten-core/emscripten/issues/13515#issuecomment-784711318.) Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97569	2021-02-26 13:38:13 -08:00
Stanislav Mekhanoshin	799c50fe93	[AMDGPU] Avoid second rescheduling for some regions If a region was not constrained by a high register pressure and was not rescheduled without clustering we can skip rescheduling it ClusteredLowOccupancyReschedule stage. This improves scheduling speed by 25% on some kernels. Differential Revision: https://reviews.llvm.org/D97506	2021-02-26 12:29:37 -08:00
Stanislav Mekhanoshin	635993f07b	[AMDGPU] Skip unclusterd rescheduling w/o ld/st We are attempting rescheduling without load store clustering if occupancy limits were not met with clustering. Skip this for regions which do not have any loads or stores at all. In a set of kernels I am experimenting with this improves scheduling time by ~30%. Differential Revision: https://reviews.llvm.org/D97342	2021-02-26 12:29:03 -08:00
Anirudh Prasad	bcc1aba6c4	[SystemZ] Introducing assembler dialects for the Z backend - This patch introduces a different assembler dialect ("hlasm") for z/OS. The default dialect has now been given the "att" dialect name. For this appropriate changes have been added to SystemZ.td. - This patch also makes a few changes to SystemZInstrFormats.td which restrict a few condition code mnemonics to just the "att" dialect variant (he, le, lh, nhe, nle, nlh). These extended condition code mnemonics are not available in HLASM. - A new private function has been introduced in SystemZAsmParser.cpp to return the assembler dialect set in SystemZMCAsmInfo.cpp. The reason we couldn't/haven't explicitly queried the overriden getAssemblerDialect function from AsmParser is outlined in this thread here. This returned dialect is directly passed onto the relevant matcher functions which taken in a variantID, so that the matcher functions can appropriately choose an instruction based on the variant. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D94250	2021-02-26 15:14:38 -05:00
James Y Knight	6de6455752	Use getAlign() on atomicrmw/cmpxchg instructions, now that it's available. These locations were missed as part of adding alignment to the instructions, and were still making their own alignment assumptions.	2021-02-26 15:06:15 -05:00
Craig Topper	b183cbfacd	[RISCV] Call SelectBaseAddr on the base pointer in the custom isel for vector loads and stores. This will allow FrameIndex as the base address instead of emitting a separate ADDI from isel. eliminateFrameIndex will likely turn it back into an ADDI, but this makes things consistent with the SDPatterns and VLPatterns. I only tested one case for simplicity. I can test more if reviewers want. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97221	2021-02-26 11:38:23 -08:00
Jay Foad	dc2259537a	[AMDGPU] Add selection pattern for v_xnor_b32 This allows GlobalISel to use this instruction where available. I assume SelectionDAG always selects s_xnor_b32 so it isn't affected by this change. Differential Revision: https://reviews.llvm.org/D97560	2021-02-26 16:41:47 +00:00
Simon Pilgrim	ed1f45bce9	[X86][AVX] SimplifyDemandedBitsForTargetNode - add basic X86ISD::VBROADCAST handling. Simplify through to the scalar/vector source operand.	2021-02-26 16:13:14 +00:00
Jay Foad	3ad5216ed8	[AMDGPU] Better codegen for i64 bitreverse Differential Revision: https://reviews.llvm.org/D97547	2021-02-26 15:51:36 +00:00
Wang, Pengfei	ad9091c5fa	[X86] Allow PTILEZEROV and PTILELOADDV to be rematerializable Spilling and reloading AMX registers are expensive. We allow PTILEZEROV and PTILELOADDV to be rematerializable to avoid the register spilling. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D97453	2021-02-26 21:55:59 +08:00
Simon Pilgrim	7ac4c956af	[X86] Remove unnecessary custom lowering of vXi1 SADDSAT/SSUBSAT/UADDSAT/USUBSAT As discussed on D97478. The removal of the custom tag causes some changes in the add/sub-overflow expansion as it no longer expands to sat-arith codegen.	2021-02-26 12:10:23 +00:00
Simon Pilgrim	aefe8f2f6c	[DAG] Fold vXi1 multiplies -> and This allows us to remove X86 custom lowering of vXi1 MUL, which helps simplify a load of mask math. Mentioned in D97478 post review.	2021-02-26 11:46:12 +00:00
Simon Pilgrim	40b8b4a466	[X86] Remove unnecessary custom lowering of v16i1/v32i1 ADD/SUB These were missed in D97478	2021-02-26 11:46:11 +00:00
Fraser Cormack	37014db013	[RISCV] Use existing method for the LMUL1 type. NFCI. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97467	2021-02-26 09:44:05 +00:00
Bill Wendling	a9f9ceb35f	[X86] Use correct padding when in 16-bit mode In 16-bit mode, some of the nop patterns used in 32-bit mode can end up mangling other instructions. For instance, an aligned "movz" instruction may have the 0x66 and 0x67 prefixes omitted, because the nop that's used messes things up. xorl %ebx, %ebx .p2align 4, 0x90 movzbl (%esi,%ebx), %ecx Use instead nop patterns we know 16-bit mode can handle. Differential Revision: https://reviews.llvm.org/D97268	2021-02-25 20:05:45 -08:00
Craig Topper	d7fca3f0bf	[RISCV] Support fixed vector extract_element for FP types.	2021-02-25 16:30:28 -08:00
Yonghong Song	6d102f15a3	BPF: Add LLVMTransformUtils in CMakefile LINK_COMPONENTS Commit `1959ead525` ("BPF: Implement TTI.getCmpSelInstrCost() properly") introduced a dependency on LLVMTransformUtils library. Let us encode this dependency explicitly in CMakefile to avoid build error.	2021-02-25 15:43:25 -08:00
Yonghong Song	1959ead525	BPF: Implement TTI.getCmpSelInstrCost() properly The Select insn in BPF is expensive as BPF backend needs to resolve with conditionals. This patch set the getCmpSelInstrCost() to SCEVCheapExpansionBudget for Select insn to prevent some Select insn related optimizations. This change is motivated during bcc code review for https://github.com/iovisor/bcc/pull/3270 where IndVarSimplifyPass eventually caused generating the following asm code: ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < num); i++) { 14: 16 05 40 00 00 00 00 00 if w5 == 0 goto +64 <LBB0_6> 15: bc 51 00 00 00 00 00 00 w1 = w5 16: 04 01 00 00 ff ff ff ff w1 += -1 17: 67 05 00 00 20 00 00 00 r5 <<= 32 18: 77 05 00 00 20 00 00 00 r5 >>= 32 19: a6 01 01 00 05 00 00 00 if w1 < 5 goto +1 <LBB0_4> 20: b7 05 00 00 06 00 00 00 r5 = 6 00000000000000a8 <LBB0_4>: 21: b7 02 00 00 00 00 00 00 r2 = 0 22: b7 01 00 00 00 00 00 00 r1 = 0 ; for (i = 0; (i < VIRTIO_MAX_SGS) && (i < num); i++) { 23: 7b 1a e0 ff 00 00 00 00 (u64 )(r10 - 32) = r1 24: 7b 5a c0 ff 00 00 00 00 (u64 )(r10 - 64) = r5 Note that insn #15 has w1 = w5 and w1 is refined later but r5(w5) is eventually saved on stack at insn #24 for later use. This cause later verifier failures. With this change, IndVarSimplifyPass won't do the above transformation any more. Differential Revision: https://reviews.llvm.org/D97479	2021-02-25 14:48:53 -08:00
Craig Topper	ceaedfb5fc	[X86] Remove custom lowering of vXi1 ADD/SUB now that they are canonicalized to XOR in getNode. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97478	2021-02-25 08:52:41 -08:00
Craig Topper	95c6824995	[RISCV] Teach CleanupVSETVLI to remove 'vsetvli zero, zero, vtype' when the vtype matches the previous vsetvli or vsetivli Reviewed By: frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D97408	2021-02-25 07:51:19 -08:00
Craig Topper	25c6b7ddd2	[RISCV] Add isel pattern to match X > -1 to bgez. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D97262	2021-02-25 07:42:22 -08:00
Fraser Cormack	0ad86f879f	[RISCV] Update RVV ISA section-header comments. NFC. Some of the section headers had become stale with the transition from RVV specification version 0.9 to 0.10. This patch brings them up to date.	2021-02-25 14:15:28 +00:00
Fraser Cormack	02f435db0b	[RISCV] Support fixed-length vector i2fp/fp2i conversions This patch extends the support for scalable-vector int->fp and fp->int conversions by additionally handling fixed-length vectors. The existing scalable-vector lowering re-expresses widening/narrowing by x4+ conversions as standard nodes. The fixed-length vector support slots in at "the end" of this process by lowering the now equally-sized and widening/narrowing by x2 nodes to our custom VL versions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97374	2021-02-25 13:47:58 +00:00
Fraser Cormack	9620ce90d7	[RISCV] Support fixed-length vector FP_ROUND & FP_EXTEND This patch extends the support for vector FP_ROUND and FP_EXTEND by including support for fixed-length vector types. Since fixed-length vectors use "VL" nodes and scalable vectors can use the standard nodes, there is slightly more to do in the fixed-length case. A helper function was introduced to try and reduce the divergent paths. It is expected that this function will similarly come in useful for lowering the int-to-fp and fp-to-int operations for fixed-length vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97301	2021-02-25 12:16:06 +00:00
Fraser Cormack	84413e1947	[RISCV] Support fixed-length vector truncates This patch extends support for our custom-lowering of scalable-vector truncates to include those of fixed-length vectors. It does this by co-opting the custom RISCVISD::TRUNCATE_VECTOR node and adding mask and VL operands. This avoids unnecessary duplication of patterns and inflation of the ISel table. Some truncates go through CONCAT_VECTORS which currently isn't efficiently handled, as it goes through the stack. This can be improved upon in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97202	2021-02-25 12:11:34 +00:00
Fraser Cormack	3bc5ed3875	[RISCV] Support fixed-length vector sign/zero extension This patch adds support for the custom lowering sign- and zero-extension of fixed-length vector types. It does so through custom nodes. Since the source and destination types are (necessarily) of different sizes, it is possible that the source type is legal whilst the larger destination type isn't. In this case the legalization makes heavy use of EXTRACT_SUBVECTOR. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97194	2021-02-25 12:05:17 +00:00
Fraser Cormack	821f8bb29a	[RISCV] Unify scalable- and fixed-vector EXTRACT_SUBVECTOR lowering This patch unifies the two disparate paths for lowering EXTRACT_SUBVECTOR operations under one roof. Consequently, with this patch it is possible to support any fixed-length subvector extraction, not just "cast-like" ones. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97192	2021-02-25 11:46:57 +00:00
Simon Pilgrim	8b82669d56	[X86][SSE] Move unaryshuffle(xor(x,-1)) -> xor(unaryshuffle(x),-1) fold into helper. NFCI. We should be able to extend this "canonicalizeShuffleWithBinOps" to handle more generic binop cases where either/both operands can be cheaply shuffled.	2021-02-25 10:56:23 +00:00
Tim Northover	201ada80ee	AArch64: relax address-space assertion in FastISel. Some people are using alternative address spaces to track GC data, but otherwise they behave exactly the same. This is the only place in the backend we even try to care about it so it's really not achieving anything.	2021-02-25 10:15:55 +00:00
Stelios Ioannou	30cb9c03b5	[AArch64] Add abs intrinsic costs This patch adds cost-modelling for abs vector intrinsic. Change-Id: I89007971bfb15f5b4a02a2eadfd43018e9a73976	2021-02-25 09:31:52 +00:00
Craig Topper	159f78fc2f	[RISCV] Reuse existing SDLoc and XLenVT in the switch in RISCVISelDAGToDAG::Select. NFC A SDLoc and XLenVT were already created above the switch.	2021-02-24 21:39:00 -08:00
Liu, Chen3	4bc7c8631a	[X86] Support amx-bf16 intrinsic. Adding support for intrinsics of AMX-BF16. This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong predicate. Differential Revision: https://reviews.llvm.org/D97358	2021-02-25 09:06:48 +08:00
Craig Topper	efcdd598b7	[RISCV] Teach VSETVLI inserter to use VSETIVLI when possible. We always create the VL operand using a register, but if we can determine that it came from an ADDI X0, imm with a sufficiently small immediate, we can use VSETIVLI. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97332	2021-02-24 16:07:33 -08:00
Craig Topper	9bde29629d	[RISCV] Use a ComplexPattern for zexti32 to match sexti32. We just started using a ComplexPattern for sexti32. This updates zexti32 to match. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D97231	2021-02-24 16:06:29 -08:00
Stefan Agner	a921aaf789	[MC][ARM] make Thumb function also if type attribute is set Make sure to set the bottom bit of the symbol even when the type attribute of a label is set after the label. GNU as sets the thumb state according to the thumb state of the label. If a .type directive is placed after the label, set the symbol's thumb state according to the thumb state of the .type directive. This matches GNU as in most cases. From: Stefan Agner <stefan@agner.ch> This fixes: https://bugs.llvm.org/show_bug.cgi?id=44860 https://github.com/ClangBuiltLinux/linux/issues/866 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D74927	2021-02-24 14:08:56 -08:00
Michael Liao	0d4e12e3c1	[amdgpu] Atomic should be source of divergence. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D97392	2021-02-24 15:27:47 -05:00
Matt Arsenault	589223e044	AMDGPU: Remove special case in shouldCoalesce Unaligned registers are now constrained with classes, rather than specially reserving a subset of the whole class.	2021-02-24 14:49:44 -05:00
Matt Arsenault	78b6d73a93	AMDGPU: Add even aligned VGPR/AGPR register classes gfx90a operations require even aligned registers, but this was previously achieved by reserving registers inside the full class. Ideally this would be captured in the static instruction definitions for the operands, and we would have different instructions per subtarget. The hackiest part of this is we need to manually reassign AGPR register classes after instruction selection (we get away without this for VGPRs since those types are actually registered for legal types).	2021-02-24 14:49:37 -05:00
Jessica Paquette	e339bba637	[AArch64][GlobalISel] Fix manual selection for v4s16 and v8s8 G_DUP The manual G_DUP selection code would produce DUPv16i8 for v8s8s and DUPv8i16 for v4s16. This adds the missing cases to the manual selection code, and makes it return false when there is an unexpected size. Update select-dup.mir to reflect the change. Differential Revision: https://reviews.llvm.org/D97240	2021-02-24 10:23:06 -08:00
Craig Topper	086670d367	[RISCV] Support fixed vector extract element. Use VL=1 for scalable vector extract element. I've changed to use VL=1 for slidedown and shifts to avoid extra element processing that we don't need. The i64 fixed vector handling on i32 isn't great if the vector type isn't legal due to an ordering issue in type legalization. If the vector type isn't legal, we fall back to default legalization which will bitcast the vector to vXi32 and use two independent extracts. Doing better will require handling several different cases by manually inserting insert_subvector/extract_subvector to adjust the type to a legal vector before emitting custom nodes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97319	2021-02-24 10:17:00 -08:00
Nick Desaulniers	404843a94d	[MC][ARM] add .w suffixes for BL (T1) and DBG F1.2 Standard assembler syntax fields describes .w and .n suffixes for wide and narrow encodings. arch/arm/probes/kprobes/test-thumb.c tests installing kprobes for certain instructions using inline asm. There's a few instructions we fail to assemble due to missing .w t2InstAliases. Adds .w suffixes for: * bl (F5.1.25 BL, BLX (immediate) T1) * dbg (F5.1.42 DBG T1) Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D97236	2021-02-24 09:58:08 -08:00
Amara Emerson	0146d20631	[AArch64] Do not fold SP adjustments into pre-increment addr modes if it overflows the redzone. Instead of outright disabling this completely with the noredzone attribute, we only avoid doing the optimization if there are memory operations between the adjustment and the load/store that the adjustment would be folded into. This avoids the case of something like a stack cookie being corrupted if an exception happens before the pre-increment to the SP occurs. This also prevents the folding happening if we have a redzone, but the offset being folded is above the redzone amount (128 bytes in this case). rdar://73269336 Differential Revision: https://reviews.llvm.org/D95179	2021-02-24 09:55:48 -08:00
Jay Foad	aab709f090	[AMDGPU] Add more PAL metadata register names Add all the registers that are currently used by LLPC: https://github.com/GPUOpen-Drivers/llpc This only affects disassembly of PAL metadata generated by LLPC and similar frontends. Differential Revision: https://reviews.llvm.org/D95619	2021-02-24 13:37:05 +00:00
Jay Foad	67f0620831	[AMDGPU] Update s_sendmsg messages Update the list of s_sendmsg messages known to the assembler and disassembler and validate the ones that were added or removed in gfx9 and gfx10. Differential Revision: https://reviews.llvm.org/D97295	2021-02-24 13:07:00 +00:00
Florian Hahn	5c74c6be3c	[AArch64] Use CMTST for != 0 vector compares (vnot (CMEQz A)). (CMTST A, A) will only set elements to 0 if the element is 0 in A. Use it for != 0 compares, which currently use (vnot (CMEQz A)). This saves a mvn instruction. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D97303	2021-02-24 09:39:27 +00:00
David Green	03892a27d6	[ARM] Expand the range of allowed post-incs in load/store optimizer Currently the load/store optimizer will only fold in increments of the same size as the load/store. This patch expands that to any legal immediate for the post-inc instruction. This is a recommit of `3b34b06fc5` with correctness fixes and extra tests. Differential Revision: https://reviews.llvm.org/D95885	2021-02-24 08:46:15 +00:00
Amara Emerson	eb55203e00	[AArch64][GlobalISel][PostSelectOpt] Constrain reg operands after mutating instructions. The non-flag setting variants of instructions may have different regclass requirements. If so, we need to constrain them. Differential Revision: https://reviews.llvm.org/D97343	2021-02-23 19:32:18 -08:00
Jessica Paquette	daf7d7f0dc	[AArch64][GlobalISel] Correct function evaluation order in applyINS The order in which the nested calls to Builder.buildWhatever are evaluated in differs between GCC and Clang. This caused a bot failure because the MIR in the testcase was coming out in a different order than expected. Rather than using nested calls, pull them out in order to fix the order of evaluation.	2021-02-23 16:21:11 -08:00
Heejin Ahn	ea8c6375e3	[WebAssembly] Fix incorrect grouping and sorting of exceptions This CL is not big but contains changes that span multiple analyses and passes. This description is very long because it tries to explain basics on what each pass/analysis does and why we need this change on top of that. Please feel free to skip parts that are not necessary for your understanding. --- `WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next unwind destination>. The value (unwind dest) here is where an exception should end up when it is not caught by the key (EH pad). We record this info in WasmEHPrepare to fix catch mismatches, because the CFG itself does not have this info. A CFG only contains BBs and predecessor-successor relationship between them, but in `WasmEHFuncInfo` the unwind destination BB is not necessarily a successor or the key EH pad BB. Their relationship can be intuitively explained by this C++ code snippet: ``` try { try { foo(); } catch (int) { // EH pad ... } } catch (...) { // unwind destination } ``` So when `foo()` throws, it goes to `catch (int)` first. But if it is not caught by it, it ends up in the next unwind destination `catch (...)`. This unwind destination is what you see in `catchswitch`'s `unwind label %bb` part. --- `WebAssemblyExceptionInfo` groups exceptions so that they can be sorted continuously together in CFGSort, as we do for loops. What this analysis does is very simple: it creates a single `WebAssemblyException` per EH pad, and all BBs that are dominated by that EH pad are included in this exception. We also identify subexception relationship in this way: if EHPad A domiantes EHPad B, EHPad B's exception is a subexception of EHPad A's exception. This simple rule turns out to be incorrect in some cases. In `WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means semantically EHPad B should not be included in EHPad A's exception, because it does not make sense to rethrow/delegate to an inner scope. This is what happened in CFGStackify as a result of this: ``` try try catch ... <- %dest_bb is among here! end delegate %dest_bb ``` So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to make sure excptions' unwind destinations are not subexceptions of their unwind sources in `WasmEHFuncInfo`. But this alone does not prevent `dest_bb` in the example above from being sorted within the inner `catch`'s exception, even if its exception is not a subexception of that `catch`'s exception anymore, because of how CFGSort works, which will be explained below. --- CFGSort places BBs within the same `SortRegion` (loop or exception) continuously together so they can be demarcated with `loop`-`end_loop` or `catch`-`end_try` in CFGStackify. `SortRegion` is a wrapper for one of `MachineLoop` or `WebAssemblyException`. `SortRegionInfo` already does some complicated things because there discrepancies between those two data structures. `WebAssemblyException` is what we control, and it is defined as an EH pad as its header and BBs dominated by the header as its BBs (with a newly added exception of unwind destinations explained in the previous paragraph). But `MachineLoop` is an LLVM data structure and uses the standard loop detection algorithm. So by the algorithm, BBs that are 1. dominated by the loop header and 2. have a path back to its header. Because of the second condition, many BBs that are dominated by the loop header are not included in the loop. So BBs that contain `return` or branches to outside of the loop are not technically included in `MachineLoop`, but they can be sorted together with the loop with no problem. Maybe to relax the condition, in CFGSort, when we are in a `SortRegion` we allow sorting of not only BBs that belong to the current innermost region but also BBs that are by the current region header. (This was written this way from the first version written by Dan, when only loops existed.) But now, we have cases in exceptions when EHPad B is the unwind destination for EHPad A, even if EHPad B is dominated by EHPad A it should not be included in EHPad A's exception, and should not be sorted within EHPad A. One way to make things work, at least correctly, is change `dominates` condition to `contains` condition for `SortRegion` when sorting BBs, but this will change compilation results for existing non-EH code and I can't be sure it will not degrade performance or code size. I think it will degrade performance because it will force many BBs dominated by a loop, which don't have the path back to the header, to be placed after the loop and it will likely to create more branches and blocks. So this does a little hacky check when adding BBs to `Preferred` list: (`Preferred` list is a ready list. CFGSort maintains ready list in two priority queues: `Preferred` and `Ready`. I'm not very sure why, but it was written that way from the beginning. BBs are first added to `Preferred` list and then some of them are pushed to `Ready` list, so here we only need to guard condition for `Preferred` list.) When adding a BB to `Preferred` list, we check if that BB is an unwind destination of another BB. To do this, this adds the reverse mapping, `UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB is an unwind destination, it checks if the current stack of regions (`Entries`) contains its source BB by traversing the stack backwards. If we find its unwind source in there, we add the BB to its `Deferred` list, to make sure that unwind destination BB is added to `Preferred` list only after that region with the unwind source BB is sorted and popped from the stack. --- This does not contain a new test that crashes because of this bug, but this fix changes the result for one of existing test case. This test case didn't crash because it fortunately didn't contain `delegate` to the incorrectly placed unwind destination BB. Fixes https://github.com/emscripten-core/emscripten/issues/13514. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97247	2021-02-23 14:54:55 -08:00
David Green	f51b3de4e8	[AArch64] Introduce UDOT/SDOT DAG nodes This is used to lower UDOT/SDOT instructions, as opposed to relying on the intrinsic. Subsequent optimizations will be able to optimize them more cleanly based on these nodes.	2021-02-23 20:31:01 +00:00
Jessica Paquette	ef1f7f1d7d	Recommit "[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt" Attempted fix for the added test failing. https://lab.llvm.org/buildbot/#/builders/104/builds/2355/steps/5/logs/stdio I can't reproduce the failure anywhere, so I'm going to guess that passing a std::function as MatchInfo is sketchy in this context. Switch it to a std::tuple and hope for the best.	2021-02-23 11:55:16 -08:00
Amara Emerson	939b5ce734	[AArch64][GlobalISel] Lower G_USUBSAT and G_UADDSAT for scalars. We have some missing optimization counterparts to LowerXALUO, but it's a start.	2021-02-23 11:54:52 -08:00
Stanislav Mekhanoshin	d1b92c91af	[AMDGPU] Set threshold for regbanks reassign pass This is to limit compile time. I did experiments with some inputs and found that compile time keeps reasonable for this pass if we have less than 100000 virtual registers and then starts to explode somewhere between 100000 and 150000. Differential Revision: https://reviews.llvm.org/D97218	2021-02-23 10:22:31 -08:00
Nick Desaulniers	1e204ac789	[THUMB2] add .w suffixes for ldr/str (immediate) T4 The Linux kernel when built with CONFIG_THUMB2_KERNEL makes use of these instructions with immediate operands and wide encodings. These are the T4 variants of the follow sections from the Arm ARM. F5.1.72 LDR (immediate) F5.1.229 STR (immediate) I wasn't able to represent these simple aliases using t2InstAlias due to the Constraints on the non-suffixed existing instructions, which results in some manual parsing logic needing to be added. F1.2 Standard assembler syntax fields describes the use of the .w (wide) vs .n (narrow) encoding suffix. Link: https://bugs.llvm.org/show_bug.cgi?id=49118 Link: https://github.com/ClangBuiltLinux/linux/issues/1296 Reported-by: Stefan Agner <stefan@agner.ch> Reported-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96632	2021-02-23 09:25:40 -08:00
Nicolai Hähnle	52bc2e7577	[AMDGPU][SelectionDAG] Don't combine uniform multiplies to MUL_[UI]24 Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when possible. This significantly improves some game cases by eliminating v_readfirstlane instructions when the result feeds into a scalar operation, like the address calculation for a scalar load or store. Since isDivergent is only an approximation of whether a value is in SGPRs, it can potentially regress some situations where a uniform value ends up in a VGPR. These should be rare in real code, although the test changes do contain a number of examples. Most of the test changes are just using s_mul instead of v_mul/mad which is generally better for both register pressure and latency (at least on GFX10 where sgpr pressure doesn't affect occupancy and vector ALU instructions have significantly longer latency than scalar ALU). Some R600 tests now use MULLO_INT instead of MUL_UINT24. GlobalISel appears to handle more scenarios in the desirable way, although it can also be thrown off and fails to select the 24-bit multiplies in some cases. Alternative solution considered and rejected was to allow selecting MUL_[UI]24 to S_MUL_I32. I've rejected this because the definition of those SD operations works is don't-care on the most significant 8 bits, and this fact is used in some combines via SimplifyDemandedBits. Based on a patch by Nicolai Hähnle. Differential Revision: https://reviews.llvm.org/D97063	2021-02-23 15:39:19 +00:00
Sjoerd Meijer	e1c3bf6afe	[ARM] do not consider sp as deprecated for ldm/stm Early versions of the ARMv7 reference manuals considered the sp register as a deprecated register for ldm/stm familiy of instructions. However, later versions such as ARM DDI 0406C.d added a note to the Appendix: D9.3 Use of the SP as a general-purpose register Most ARM instructions, unlike Thumb instructions, provide exactly the same access to the SP as to R0-R12. This means that it is possible to use the SP as a general-purpose register. Earlier issues of this manual deprecated the use of SP in an ARM instruction, in any way that is deprecated, not permitted, or not possible in the corresponding Thumb instruction. However, user feedback indicates a number of cases where these instructions are useful. Therefore, ARM no longer deprecates these instruction uses. Also Armv8 manuals no longer consider SP as deprecated register for ldm/ stm A32 instructions. Furthermore, GNU as also does not print a deprecated warning when using SP with those instructions. Drop deprecation warning for pop/ldm/push/stm instructions. Patch by: Stefan Agner. Differential Revision: https://reviews.llvm.org/D82692	2021-02-23 13:26:18 +00:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Hsiangkai Wang	53c4c2b9f7	[RISCV] vle1.v/vse1.v should be unmasked instructions. vle1.v/vse1.v should be unmasked instructions. The vm encoding is 1 for unmasked instructions. Differential Revision: https://reviews.llvm.org/D97237	2021-02-23 19:59:22 +08:00
Andy Wingo	7dc98adbb0	Revert "[WebAssembly] call_indirect issues table number relocs" This reverts commit `861dbe1a02`. It broke emscripten -- see https://reviews.llvm.org/D90948#2578843.	2021-02-23 11:48:08 +01:00
Fraser Cormack	dd68f3cf28	[RISCV] Support insertion of misaligned subvectors This patch extends the support for RVV INSERT_SUBVECTOR to cover those which don't align to a vector register boundary. Like the support for EXTRACT_SUBVECTOR in D96959, it accomplishes this by extracting the nearest register-sized subvector (a subregister operation), then sliding the vector down with VSLIDEDOWN, inserting the subvector to the first position, and sliding the vector back up again afterwards. Unlike subvector extraction, for vectors that occupy less than a full vector register we must preserve the untouched elements. We do this by lowering to an LMUL=1 INSERT_SUBVECTOR using the above method and lowering that to a VSLIDEUP with a zero offset. This uses a tail-undisturbed policy and so has the effect of "sliding in" the subvector elements while preserving the surrounding ones. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96972	2021-02-23 10:31:06 +00:00
Liu, Chen3	f8b9035aae	[X86] Support amx-int8 intrinsic. Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD. Differential Revision: https://reviews.llvm.org/D97259	2021-02-23 17:08:05 +08:00
Kazu Hirata	4ed47858ab	[llvm] Use llvm::drop_begin (NFC)	2021-02-22 20:17:16 -08:00
Jessica Paquette	662402a8b3	Revert "[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt" This reverts commit `867e379c0e`. For some reason this is upsetting Linux/Windows bots. Reverting while I try to reproduce.	2021-02-22 17:36:17 -08:00
Cassie Jones	8b10aa67ad	[AArch64][GlobalISel] Make overflow legalization use clampScalar Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96674	2021-02-22 19:59:36 -05:00
Luo, Yuanke	8f48ddd193	[X86][AMX] Lower tile copy instruction. Since there is no tile copy instruction, we need to store tile register to stack and load from stack to another tile register. We need extra GR to hold the stride, and we need stack slot to hold the tile data register. We would run this pass after copy propagation, so that we don't miss copy optimization. And we would run this pass before prolog/epilog insertion, so that we can allocate stack slot. Differential Revision: https://reviews.llvm.org/D97112	2021-02-23 07:49:42 +08:00
Stanislav Mekhanoshin	bb16efe280	[AMDGPU] Move RPT::getLiveRegs() check under EXPENSIVE_CHECKS This is too expensive even for debug builds. It doubles scheduling time if enabled. Differential Revision: https://reviews.llvm.org/D97232	2021-02-22 15:21:59 -08:00
Craig Topper	3231607ce9	[RISCV] Have sexti32 also recognize AssertZExt from types smaller than i32. An i64 AssertZExt from a type smaller than i32 has at least 33 leading zeros which mean it has at least 33 sign bits. Since we have a couple patterns that use two sexti32, I've switched to a ComplexPattern so tablegen didn't have to generate 9 different permutations. As noted in the FIXME, maybe we should just call computeNumSignBits, but we don't have tests that benefit from that yet. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D97130	2021-02-22 14:56:22 -08:00
Jessica Paquette	867e379c0e	[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt Match a G_SHUFFLE_VECTOR with a mask that allows it to be represented as a G_INSERT_VECTOR_ELT and a G_EXTRACT_VECTOR_ELT. This ports `isINSMask` from AArch64ISelLowering and the portion of `AArch64TargetLowering::LowerVECTOR_SHUFFLE` which handles the equivalent transformation. This provides more opportunities for matching DUP. We don't have all of the necessary combines to actually make DUP out of these yet, but this is better for size than the full TBL expansion for G_SHUFFLE_VECTOR. This is a -0.1% code size improvement on CTMark/Bullet at -Os. IR example: https://godbolt.org/z/sdcevT Differential Revision: https://reviews.llvm.org/D97214	2021-02-22 14:44:09 -08:00
Heejin Ahn	f47a654a39	[WebAssembly] Remap branch dests after fixCatchUnwindMismatches Fixing catch unwind mismatches can sometimes invalidate existing branch destinations. This CL remaps those destinations after placing try-delegates. Fixes https://github.com/emscripten-core/emscripten/issues/13515. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97178	2021-02-22 13:25:58 -08:00
Heejin Ahn	51fb5bf4d6	[WebAssembly] Support WasmEHFuncInfo serialization This adds support for serialization of `WasmEHFuncInfo`, in the form of <Source BB Number, Unwind destination BB number>. To make YAML mapping work, we needed to make a copy of the existing `SrcToUnwindDest` map within `yaml::WebAssemblyMachineFunctionInfo`. It was hard to add EH MIR tests for CFGStackify because `WasmEHFuncInfo` could not be read from test MIR files. This adds the serialization support for that to make EH MIR tests easier. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97174	2021-02-22 13:13:51 -08:00
Heejin Ahn	a08e609d2e	[WebAssembly] Rename methods in WasmEHFuncInfo (NFC) This renames variable and method names in `WasmEHFuncInfo` class to be simpler and clearer. For example, unwind destinations are EH pads by definition so it doesn't necessarily need to be included in every method name. Also I am planning to add the reverse mapping in a later CL, something like `UnwindDestToSrc`, so this renaming will make meanings clearer. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97173	2021-02-22 12:16:11 -08:00
Craig Topper	1cd2a5a7da	[RISCV] Add isel support for bitcasts between fixed vector types. This should fix the issue reported in D96972. I don't have a good test case for this without those changes. Differential Revision: https://reviews.llvm.org/D97082	2021-02-22 12:05:46 -08:00
Jessica Paquette	95d13c01ec	[AArch64][GlobalISel] Emit G_ASSERT_SEXT for SExt parameters in CallLowering Similar to how we emit G_ASSERT_ZEXT when we have CCValAssign::LocInfo::ZExt. This will allow us to combine away some redundant sign extends. Example: https://godbolt.org/z/cTbKvr Differential Revision: https://reviews.llvm.org/D96915	2021-02-22 10:14:43 -08:00
Craig Topper	1aeb927fed	[RISCV] Custom isel the rest of the vector load/store intrinsics. A previous patch moved the index versions. This moves the rest. I also removed the custom lowering for VLEFF since we can now do everything directly in the isel handling. I had to update getLMUL to handle mask registers to index the pseudo table correctly for VLE1/VSE1. This is good for another 15K reduction in llc size. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97097	2021-02-22 09:53:46 -08:00
Ryan Santhiraraja	2c25efcbd3	[AArch64] Adding SHA3 Intrinsics support This patch adds the following SHA3 Intrinsics: vsha512hq_u64, vsha512h2q_u64, vsha512su0q_u64, vsha512su1q_u64 veor3q_u8 veor3q_u16 veor3q_u32 veor3q_u64 veor3q_s8 veor3q_s16 veor3q_s32 veor3q_s64 vrax1q_u64 vxarq_u64 vbcaxq_u8 vbcaxq_u16 vbcaxq_u32 vbcaxq_u64 vbcaxq_s8 vbcaxq_s16 vbcaxq_s32 vbcaxq_s64 Note need to include +sha3 and +crypto when building from the front-end Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96381	2021-02-22 12:09:20 +00:00
Dmitry Preobrazhensky	4813518092	[AMDGPU][MC] Corrected bound_ctrl for compatibility with sp3 Enabled "bound_ctrl:1" and disabled "bound_ctrl:-1" syntax. Corrected printer to output "bound_ctrl:1" instead of "bound_ctrl:0". See bug 35397 for detailed issue description. Differential Revision: https://reviews.llvm.org/D97048	2021-02-22 14:59:40 +03:00
David Green	188f15d973	[ARM] Remove dead lowering code. NFC Remove the unnecessary code from `21a4faab60`, left over from a different way of lowering.	2021-02-22 10:07:53 +00:00
David Green	21a4faab60	[ARM] Move double vector insert patterns using vins to DAG combine This removes the existing patterns for inserting two lanes into an f16/i16 vector register using VINS, instead using a DAG combine to pattern match the same code sequences. The tablegen patterns were already on the large side (foreach LANE = [0, 2, 4, 6]) and were not handling all the cases they could. Moving that to a DAG combine, whilst not less code, allows us to better control and expand the selection of VINSs. Additionally this allows us to remove the AddedComplexity on VCVTT. The extra trick that this has learned in the process is to move two adjacent lanes using a single f32 vmov, allowing some extra inefficiencies to be removed. Differenial Revision: https://reviews.llvm.org/D96876	2021-02-22 09:29:47 +00:00
Andy Wingo	861dbe1a02	[WebAssembly] call_indirect issues table number relocs If the reference-types feature is enabled, call_indirect will explicitly reference its corresponding function table via `TABLE_NUMBER` relocations against a table symbol. Also, as before, address-taken functions can also cause the function table to be created, only with reference-types they additionally cause a symbol table entry to be emitted. We abuse the used-in-reloc flag on symbols to indicate which tables should end up in the symbol table. We do this because unfortunately older wasm-ld will carp if it see a table symbol. Differential Revision: https://reviews.llvm.org/D90948	2021-02-22 10:13:36 +01:00
Amara Emerson	6ff09ce061	[AArch64][GlobalISel] Fix <16 x s8> G_DUP regbankselect to assign source to gpr. We can only select this type if the source is on GPR, not FPR.	2021-02-21 21:17:29 -08:00
Simon Pilgrim	b568d3d6c9	[X86] Add vector support to sub(C1, xor(X, C2)) -> add(xor(X, ~C2), C1+1) fold.	2021-02-21 21:51:27 +00:00
Simon Pilgrim	3ab32c94a4	[X86] Replace explicit constant handling in sub(C1, xor(X, C2)) -> add(xor(X, ~C2), C1+1) fold. NFCI. NFC cleanup before adding vector support - rely on the SelectionDAG to handle everything for us.	2021-02-21 21:40:32 +00:00
Craig Topper	1a6c1ac686	[SelectionDAG][RISCV] Teach ComputeNumSignBits to handle SREM. This also removes a pattern from RISCV that is no longer needed since the sexti32 on the LHS of the srem in the pattern implies the result is sign extended so the sign_extend_inreg should be removed in DAG combine now. Reviewed By: luismarques, RKSimon Differential Revision: https://reviews.llvm.org/D97133	2021-02-21 11:13:36 -08:00
Simon Pilgrim	bae04a3e2d	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - remove unnecessary BITCASTs. In conjunction with the 'vperm2x128(bitcast(x),bitcast(y),c) -> bitcast(vperm2x128(x,y,c))' fold in combineTargetShuffle, this should remove any unnecessary bitcasts around vperm2x128 lane shuffles.	2021-02-21 18:40:32 +00:00
Simon Pilgrim	a6a258f1da	[X86][AVX] Fold concat(extract_subvector(v0,c0), extract_subvector(v1,c1)) -> vperm2x128 Fixes regression exposed by removing bitcasts across logic-ops in D96206. Differential Revision: https://reviews.llvm.org/D96206	2021-02-21 14:50:43 +00:00
Simon Pilgrim	2885d1251f	[X86] Fold bitcast(logic(bitcast(X), Y)) --> logic'(X, bitcast(Y)) for int-int bitcasts Extend the existing combine that handles bitcasting for fp-logic ops to also help remove logic ops across bitcasts to/from the same integer types. This helps improve AVX512 predicate handling for D/Q logic ops and also allows DAGCombine's scalarizeExtractedBinop to remove some annoying gpr->simd->gpr transfers. The concat_vectors regression in pr40891.ll will be addressed in a followup commit on this patch. Differential Revision: https://reviews.llvm.org/D96206	2021-02-21 14:40:54 +00:00
Fraser Cormack	3e1317fd32	[RISCV] Support extraction of misaligned subvectors This patch extends the support for RVV EXTRACT_SUBVECTOR to cover those which don't align to a vector register boundary. It accomplishes this by extracting the nearest register-sized subvector (a subregister operation), then sliding the vector down with VSLIDEDOWN and extracting the subvector from the first position (a COPY operation). Since this procedure involves the use of VSCALE and multiplication, the handling of such operations is done during lowering to simplify the implementation and make use of DAG combining. This necessitated moving some helper functions from RISCVISelDAGToDAG to RISCVTargetLowering. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96959	2021-02-20 15:43:54 +00:00
Fraser Cormack	9aa20caee6	[RISCV] Improve register allocation around vector masks With vector mask registers only allocatable to V0 (VMV0Regs) it is relatively simple to generate code which uses multiple masks and naively requires spilling. This patch aims to improve codegen in such cases by telling LLVM it can use VRRegs to hold masks. This will prevent spilling in many cases by having LLVM copy to an available VR register. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97055	2021-02-20 14:47:51 +00:00
Simon Pilgrim	761bbed264	[DAG] foldSubToUSubSat - fold sub(a,trunc(umin(zext(a),b))) -> usubsat(a,trunc(umin(b,SatLimit))) This moves the last custom x86 USUBSAT fold to generic DAGCombine. Completes PR40111 Differential Revision: https://reviews.llvm.org/D96703	2021-02-20 12:02:07 +00:00
Juneyoung Lee	e4d751c271	Update BPFAdjustOpt.cpp to accept select form of or as well This is a minor pattern-match update to BPFAdjustOpt.cpp to accept not only 'or i1 a, b' but also 'select i1 a, i1 true, i1 b'. This resolves regression after SimplifyCFG's creating select form of and/or instead (https://reviews.llvm.org/D95026). This is a small change, and currently such select form isn't created or doesn't reach to the late pipeline (because InstCombine eagerly folds it into and/or i1), so I chose to commit without a review process.	2021-02-20 18:29:58 +09:00
Amara Emerson	067ec53df1	[AArch64][GlobalISel] Add selection support for G_VECREDUCE of <2 x i32> This selects to a pairwise add and a subreg copy.	2021-02-20 00:39:38 -08:00
Craig Topper	71b68fe532	[RISCV] Teach our custom vector load/store intrinsic isel code to propagate memory operands if we have them. We don't currently create memory operands for these intrinsics, but there was a suggestion of using the indexed load/store intrinsics to implement isel for scalable vector gather/scatter. That may propagate the memory operand from the gather/scatter ISD nodes.	2021-02-19 19:12:20 -08:00
Jacques Pienaar	3bec7ed59e	Different fix for gcc bug Was still running into from definition of 'template<class T> struct llvm::DenseMapInfo' [-fpermissive] template <typename T> struct DenseMapInfo; ^	2021-02-19 16:41:00 -08:00
Yusra Syeda	b006f55544	[SystemZ/z/OS] Add XPLINK 64-bit calling convention to tablegen. This commit adds the initial changes to the SystemZ target description for the XPLINK 64-bit calling convention on z/OS. Additions include: - a new predicate IsTargetXPLINK64 - different register allocation order - generaton of nopr after a call Reviewed-by: uweigand Differential Revision: https://reviews.llvm.org/D96887	2021-02-19 18:39:49 -05:00
Amara Emerson	27566e9c3e	[AArch64][GlobalISel] Make G_VECREDUCE_ADD of <2 x s32> legal.	2021-02-19 14:28:21 -08:00
Craig Topper	7e54d7304b	[RISCV] Remove VPatILoad and VPatIStore multiclasses that are no longer used. NFC	2021-02-19 13:23:08 -08:00
Craig Topper	e7c86f4ac4	[RISCV] Use inheritance to reduce some repeated code in tablegen. NFC The VLX and VSX searchable tables, share the same format so we can have a common base class for them.	2021-02-19 10:42:18 -08:00
Craig Topper	7f5b3886e4	[RISCV] Remove unneeded indexed segment load/store vector pseudo instruction. We had more combinations of data and index lmuls than we needed. Also add some asserts to verify that the IndexVT and data VT have the same element count when we isel these pseudo instructions.	2021-02-19 10:28:48 -08:00
Craig Topper	d056d5decf	[RISCV] Use custom isel for vector indexed load/store intrinsics. There are many legal combinations of index and data VTs supported for these intrinsics. This results in a lot of isel patterns in RISCVGenDAGISel.inc. By adding a separate table similar to what we use for segment load/stores, we can more efficiently manually select these intrinsics. We should also be able to reuse this table scalable vector gather/scatter. This reduces the llc binary size by ~56K. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D97033	2021-02-19 10:10:06 -08:00
Craig Topper	dbf910f0d9	[RISCV] Prevent selecting a 0 VL to X0 for the segment load/store intrinsics. Just like we do for isel patterns, we need to call selectVLOp to prevent 0 from being selected to X0 by the default isel. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97021	2021-02-19 10:07:12 -08:00
Craig Topper	98dff5e804	[RISCV] Move SHFLI matching to DAG combine. Add 32-bit support for RV64 We previously used isel patterns for this, but that used quite a bit of space in the isel table due to OR being associative and commutative. It also wouldn't handle shifts/ands being in reversed order. This generalizes the shift/and matching from GREVI to take the expected mask table as input so we can reuse it for SHFLI. There is no SHFLIW instruction, but we can promote a 32-bit SHFLI to i64 on RV64. As long as bit 4 of the control bit isn't set, a 64-bit SHFLI will preserve 33 sign bits if the input had at least 33 sign bits. ComputeNumSignBits has been updated to account for that to avoid sext.w in the tests. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96661	2021-02-19 10:07:12 -08:00
Jessica Paquette	8d3442eddb	[AArch64][GlobalISel] Run redundant_sext_inreg in the post-legalizer combiner This is to ensure that we can eliminate G_ASSERT_SEXT. In a follow-up patch, I'm going to make CallLowering emit G_ASSERT_SEXT for signext parameters. Differential Revision: https://reviews.llvm.org/D96913	2021-02-19 09:34:47 -08:00
madhur13490	3c297a2564	Make fixed-abi default for AMD HSA OS fixed-abi uses pre-defined and predictable SGPR/VGPRs for passing arguments. This patch makes this scheme default when HSA OS is specified in triple. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96340	2021-02-19 15:05:25 +00:00
David Green	a1c34a9d6a	[ARM] Correct vector predicate type in MVE getCmpSelInstrCost	2021-02-19 14:43:51 +00:00
David Green	7a5c26e99a	Revert "[ARM] Expand the range of allowed post-incs in load/store optimizer" This reverts commit `3b34b06fc5` as runtime errors were reported.	2021-02-19 13:15:10 +00:00
Fraser Cormack	d9531a3097	[RISCV] Address some clang-tidy warnings. NFCI.	2021-02-19 12:10:28 +00:00
Carl Ritson	8181dcd30f	[AMDGPU] WQM/WWM: Fix marking of partial definitions Track lanes when processing definitions for marking WQM/WWM. If all lanes have been defined then marking can stop. This prevents marking unnecessary instructions as WQM/WWM. In particular this fixes a bug where values passing through V_SET_INACTIVE would me marked as requiring WWM. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D95503	2021-02-19 20:45:24 +09:00
Simon Pilgrim	2258b367db	[X86][AVX] getFauxShuffleMask - decode VBROADCAST(EXTRACT_VECTOR_ELT(V,0)) Handle the case where we're broadcasting a scalar extracted from another vector.	2021-02-19 11:06:53 +00:00
Wang, Pengfei	c98644c2ec	[X86] Fix a codegen crash in getSetCCResultType This patch fixes some crashes coming from X86ISelLowering::getSetCCResultType, which would occasionally return an EVT constructed from an invalid MVT, which has a null Type pointer. This patch refers to D95434. Differential Revision: https://reviews.llvm.org/D97036	2021-02-19 17:30:10 +08:00
Sjoerd Meijer	260f90bb3d	[AArch64] Add some missing Neoverse features This enables AES fusion and the post RA scheduler for the Neoverse cores. And while we are it also for the A55 that we had missed earlier. Differential Revision: https://reviews.llvm.org/D96866	2021-02-19 09:18:35 +00:00
Craig Topper	cd4051ac80	[RISCV] Prune unneeded indexed load/store pseudo instructions. We were creating more combinations of value and index lmul than we needed. I've copied the loop structure used here from VPseudoAMOEI with all data sew values instead of just 32/64. Similar can be done for segment loads/store. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D97008	2021-02-18 23:08:39 -08:00
Serge Pavlov	2c4f60e45b	[FPEnv][AArch64] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96836	2021-02-19 13:16:51 +07:00
Craig Topper	8ed3bbbcc3	[RISCV] Split zvlsseg searchable table into 4 separate tables. Index by properties rather than intrinsic ID. Intrinsic ID is a 32-bit value which made each row of the table 4 byte aligned. The remaining fields used 5 bytes. This meant 3 bytes of padding per row. This patch breaks the table into 4 separate tables and indexes them by properties we know about the intrinsic. NF, masked, strided, ordered, etc. The indexed load/store tables have no padding in their rows now. All together this reduces the size of llc binary by ~28K. I'm considering adding similar tables for isel of non-segment load/store as well to cut down the size of the isel table and probably improve our isel performance. Those tables would need to indexed from intrinsics, IR loads/stores, gathers/scatters, and RISCVISD opcodes. So having a table that can be indexed without using intrinsic ID is more flexible. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D96894	2021-02-18 19:00:49 -08:00
Craig Topper	cf34559104	[RISCV] Enable PrimaryKeyEarlyOut on RISCVVPseudosTable. This table is queried in RISCVMCInstLower without knowing whether the instruction is a vector pseudo. Due to the way the binary search works, we have to do log2(tablesize) checks just to determine a non-vector instruction isn't in the table. Conveniently, all the vector pseudos are pretty tightly packed within the internal instruction enum. By enabling the PrimaryKeyEarlyOut, tablegen will emit a check against the beginning and end of the table before doing the binary search. This gives a quick early out on the search for the majority of non-vector instructions. Differential Revision: https://reviews.llvm.org/D97016	2021-02-18 18:59:32 -08:00
Leonard Chan	c77659e549	[llvm][IR] Do not place constants with static relocations in a mergeable section This patch provides two major changes: 1. Add getRelocationInfo to check if a constant will have static, dynamic, or no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.) 2. Only allow a constant with no relocations (static or dynamic) to be placed in a mergeable section. This will allow unused symbols that contain static relocations and happen to fit in mergeable constant sections (.rodata.cstN) to instead be placed in unique-named sections if -fdata-sections is used and subsequently garbage collected by --gc-sections. See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html. Differential Revision: https://reviews.llvm.org/D95960	2021-02-18 15:39:00 -08:00
Matt Arsenault	62d946e133	GlobalISel: Merge some AMDGPU ABI lowering code to generic code AMDGPU currently has a lot of pre-processing code to pre-split argument types into 32-bit pieces before passing it to the generic code in handleAssignments. This is a bit sloppy and also requires some overly fancy iterator work when building the calls. It's better if all argument marshalling code is handled directly in handleAssignments. This handles more situations like decomposing large element vectors into sub-element sized pieces. This should mostly be NFC, but does change the generated code by shifting where the initial argument packing instructions are placed. I think this is nicer looking, since it now emits the packing code directly after the relevant copies, rather than after the copies for the remaining arguments. This doubles down on gfx6/gfx7 using the gfx8+ ABI for 16-bit types. This is ultimately the better option, but incompatible with the DAG. Fixing this requires more work, especially for f16.	2021-02-18 17:26:55 -05:00
Nikita Popov	70e3c9a8b6	[BasicAA] Always strip single-argument phi nodes We can always look through single-argument (LCSSA) phi nodes when performing alias analysis. getUnderlyingObject() already does this, but stripPointerCastsAndInvariantGroups() does not. We still look through these phi nodes with the usual aliasPhi() logic, but sometimes get sub-optimal results due to the restrictions on value equivalence when looking through arbitrary phi nodes. I think it's generally beneficial to keep the underlying object logic and the pointer cast stripping logic in sync, insofar as it is possible. With this patch we get marginally better results: aa.NumMayAlias \| 5010069 \| 5009861 aa.NumMustAlias \| 347518 \| 347674 aa.NumNoAlias \| 27201336 \| 27201528 ... licm.NumPromoted \| 1293 \| 1296 I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(), as we're past the point where we can explicitly spell out everything that's getting stripped. Differential Revision: https://reviews.llvm.org/D96668	2021-02-18 23:07:50 +01:00
Craig Topper	0db938312a	[RISCV] Simplify VPseudoAMOEI multiclass. NFC lmul was already iterated in one of the loops. We don't need to recreate it from a string.	2021-02-18 12:40:51 -08:00
Stanislav Mekhanoshin	5247a0d9e6	[AMDGPU] Correct gfx90c feature list Looks like we have forced FeatureXNACK and forgot FeatureMadMacF32Insts. Differential Revision: https://reviews.llvm.org/D96989	2021-02-18 12:40:27 -08:00
Jessica Clarke	74df1ffaad	[RISCV] Use XLenRI alias for RegInfoByHwMode instances This avoids tedious repetition and matches what we do for the ValueTypeByHwMode uses. Reviewed By: craig.topper, luismarques Differential Revision: https://reviews.llvm.org/D96649	2021-02-18 19:38:36 +00:00
Sean Fertile	bb260b1ca7	[PowerPC][AIX] Add support for vector arg passing on the stack. Enable passing more vector arguments then available vector argument passing registers. Differential Revision: https://reviews.llvm.org/D96415	2021-02-18 13:32:40 -05:00
Heejin Ahn	6f2999b36a	[WebAssembly] Handle multiple EH_LABELs in EH pad Usually `EH_LABEL`s are placed in - Before an `invoke` (which becomes calls in the backend) - After an `invoke` - At the start of an EH pad I don't know exactly why, but I noticed there are cases of multiple, not a single, `EH_LABEL` instructions in the beginning of an EH pad. In that case `global.set` instruction placed to restore `__stack_pointer` ended up between two `EH_LABEL` instructions before `CATCH`. It should follow after the `EH_LABEL`s and `CATCH`. This CL fixes that case. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D96970	2021-02-18 10:18:00 -08:00
Craig Topper	156fc07e19	[RISCV] Add support for fixed vector MULHU/MULHS. This uses to division by constant optimization to use MULHU/MULHS. Reviewed By: frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D96934	2021-02-18 09:15:08 -08:00
Craig Topper	792627be35	[RISCV] Add support for fixed vector sign/zero extend from mask types. Due to vXi64 on RV32, I've directly emitted this using _VL ISD opcodes. If it wasn't for that we could just use fixed vector BUILD_VECTOR and VSELECT and let those each be legalized. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96910	2021-02-18 09:08:10 -08:00
Craig Topper	c7dd92e8a5	[RISCV] Support isel of scalable vector bitcasts These should be NOPs so we can just replace with the input. This matches what SVE does with isel patterns for all permutations. Custom isel saves us from having to list all permurations for all LMULs. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96921	2021-02-18 09:01:13 -08:00
Bradley Smith	8bad8a43c3	[AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD Adjust generateFMAsInMachineCombiner to return false if SVE is present in order to combine fmul+fadd into fma. Also add new pseudo instructions so as to select the most appropriate of FMLA/FMAD depending on register allocation. Depends on D96599 Differential Revision: https://reviews.llvm.org/D96424	2021-02-18 16:55:16 +00:00
Bradley Smith	5b094bfeb3	[AArch64] Allow folding FMUL/FADD into FMA for FP16 types isFMAFasterThanFMulAndFAdd should return true for FP16 types when HasFullFP16 is present, since we have the instructions to handle it for both SVE and NEON. (SVE patterns and tests will follow). Differential Revision: https://reviews.llvm.org/D96599	2021-02-18 16:51:22 +00:00
Hsiangkai Wang	065a187f33	[RISCV] Fix typo. Use ValueType instead of LLVMType.	2021-02-18 23:21:27 +08:00
David Green	3b34b06fc5	[ARM] Expand the range of allowed post-incs in load/store optimizer Currently the load/store optimizer will only fold in increments of the same size as the load/store. This patch expands that to any legal immediate for the post-inc instruction. Differential Revision: https://reviews.llvm.org/D95885	2021-02-18 14:59:02 +00:00
Baptiste Saleil	34dc1ccb96	[PowerPC] Exploit the vinsw, vinsd, and vins[wd][lr]x instructions on P10 This patch generates the vinsw, vinsd, vinsblx, vinshlx, vinswlx, vinsdlx, vinsbrx, vinshrx, vinswrx and vinsdrx instructions for vector insertion on P10. Differential Revision: https://reviews.llvm.org/D94454	2021-02-18 14:17:47 +00:00
Hsiangkai Wang	f1efa8abaf	[RISCV] Fix bugs in pseudo instructions for masked segment load. For masked segment load, the destination register should not overlap with mask register. It could not be V0. In the original implementation, there is no segment load/store register class without V0. In this patch, I added these register classes and modify `GetVRegNoV0` to get the correct one. Differential Revision: https://reviews.llvm.org/D96937	2021-02-18 22:17:00 +08:00
Hsiangkai Wang	b97d8b32c3	[NFC][RISCV] Use concise way to describe load/store instructions. Differential Revision: https://reviews.llvm.org/D96923	2021-02-18 22:17:00 +08:00
David Green	33ba220611	[ARM] Ensure types provided to getIntrinsicCost are valid It appears that pointer types were causing issues for the min/max cost code in getIntrinsicInstrCost. This makes sure that when matching icmp/select to a min/max, we only do that for normal int or float types.	2021-02-18 14:00:23 +00:00
Stefan Pintilie	b80357d46e	[PowerPC] Add option for ROP Protection Added -mrop-protection for Power PC to turn on codegen that provides some protection from ROP attacks. The option is off by default and can be turned on for Power 8, Power 9 and Power 10. This patch is for the option only. The feature will be implemented by a later patch. Reviewed By: amyk Differential Revision: https://reviews.llvm.org/D96512	2021-02-18 12:15:50 +00:00
David Green	1a6744e3dc	[ARM] Add larger than legal ICmp costs A v8i32 compare will produce a v8i1 predicate, but during codegen the v8i32 will be split into two v4i32, potentially requiring two v4i1 predicates to be merged into a single v8i1. Because this merging of two v4i1's into a v8i1 is very expensive, we need to make the cost of the compare equally high. This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost. Because we don't know whether the user of the predicate can be split, and the cost model is mostly pre-instruction, we may be pessimistic but that should only be for larger and legal types. This also adds min/max detection to the costmodel where it can be detected, to keep those in line with the cost of simple min/max instructions. Otherwise for the most part, costs that were already expensive have become more expensive. Differential Revision: https://reviews.llvm.org/D96692	2021-02-18 11:42:17 +00:00
Benjamin Kramer	ae1e6c3557	[RISCV] Rewrite assert to not give unused variable warnings in Release builds NFCI	2021-02-18 11:42:36 +01:00
Fraser Cormack	d876214990	[RISCV] Begin to support more subvector inserts/extracts This patch adds support for INSERT_SUBVECTOR and EXTRACT_SUBVECTOR (nominally where both operands are scalable vector types) where the vector, subvector, and index align sufficiently to allow decomposition to subregister manipulation: * For extracts, the extracted subvector must correctly align with the lower elements of a vector register. * For inserts, the inserted subvector must be at least one full vector register, and correctly align as above. This approach should work for fixed-length vector insertion/extraction too, but that will come later. Reviewed By: craig.topper, khchen, arcbbb Differential Revision: https://reviews.llvm.org/D96873	2021-02-18 10:18:27 +00:00
Fraser Cormack	0176fecfbc	[SVE][CodeGen] Expand SVE MULH[SU] and [SU]MUL_LOHI nodes This patch fixes a codegen crash introduced in `fde2466171`, where the DAGCombiner started generating optimized MULH[SU] or [SU]MUL_LOHI nodes unless the target opted out. The AArch64 backend cannot currently select any of these nodes, so ensure that they are not generated in the first place. This issue was raised by @huihuiz in D94501. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D96849	2021-02-18 10:06:24 +00:00
Wang, Pengfei	e9c11c1934	[X86] Zero AMX config buffer for non AVX512 cases. Zero AMX config buffer for non AVX512 cases. Differential Revision: https://reviews.llvm.org/D96927	2021-02-18 13:26:09 +08:00
Craig Topper	016eca8f90	[RISCV] Guard LowerINSERT_VECTOR_ELT against fixed vectors. The type legalizer can call this code based on the scalar type so we need to verify the vector type is a scalable vector. I think due to how type legalization visits nodes, the vector type will have already been legalized so we don't have an issue with using MVT here like we did for EXTRACT_VECTOR_ELT. I've added a test just in case.	2021-02-17 19:27:08 -08:00
Craig Topper	00c4e0a8f6	[RISCV] Guard the ISD::EXTRACT_VECTOR_ELT handling in ReplaceNodeResults against fixed vectors and non-MVT types. The type legalizer is calling this code based on the scalar type so we need to verify the input type is a scalable vector. The vector type has also not been legalized yet when this is called so we need to use EVT for it.	2021-02-17 18:25:38 -08:00
Stanislav Mekhanoshin	75997e8407	[AMDGPU] Fixed msan build LoadStoreOptimizer was using uninitialized SCC value for instructions where it is unsupported.	2021-02-17 18:01:23 -08:00
Chen Zheng	5517923b1c	[XCOFF][NFC] make csect properties optional for getXCOFFSection We are going to support debug sections for XCOFF. So the csect properties are not necessary. This patch makes these properties optional. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D95931	2021-02-17 20:51:42 -05:00
Stanislav Mekhanoshin	48d2e04152	[AMDGPU] Mark SMRD atomics We did not have atomic flags on SMRD, did not copy TSFlags to real instructions, and did not have ret/noret atomic map. At the moment it is NFC, but needed for D96469. Differential Revision: https://reviews.llvm.org/D96823	2021-02-17 16:47:02 -08:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Yusra Syeda	8b624a3164	[SystemZ] Separate LoZ ELF specifics in tablegen. Separate the LoZ ELF calling convention in tablegen. This will make it easier to add the z/OS ABI in future patches. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D96867	2021-02-17 16:11:58 -05:00
Heejin Ahn	da01a9db8b	[WebAssemblly] Fix EHPadStack update in fixCallUnwindMismatches Updating `EHPadStack` with respect to `TRY` and `CATCH` instructions have to be done after checking all other conditions, not before. Because we did this before checking other conditions, when we encounter `TRY` and we want to record the current mismatching range, we already have popped up the entry from `EHPadStack`, which we need to access to record the range. The `baz` call in the added test needs try-delegate because the previous TRY marker placement for `quux` was placed before `baz`, because `baz`'s return value was stackified in RegStackify. If this wasn't stackified this try-delegate is not strictly necessary, but at the moment it is not easy to identify cases like this. I plan to transfer `nounwind` attributes from the LLVM IR to prevent cases like this. The call in the test does not have `unwind` attribute in order to test this bug, but in many cases of this pattern the previous call has `nounwind` attribute. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D96711	2021-02-17 12:14:11 -08:00
Craig Topper	3bdd02735b	[RISCV] Localize RISCVZvlssegTable to RISCVISelDAGToDAG.cpp, the only place it is used.	2021-02-17 11:37:28 -08:00
Craig Topper	799f7865c8	[RISCV] Use bits<7> instead of bits<11> for the EEW field size in the RISCVZvlsseg searchable table. NFCI We only support 8, 16, 32, and 64 for EEW. These only need 7 bits to represent.	2021-02-17 11:12:36 -08:00
Heejin Ahn	7c594bab00	[WebAssembly] Change catch_all's opcode We decided to change `catch_all`'s opcode from 0x05, which is the same as `else`, to 0x19, to avoid some complicated handling in the tools. See: https://github.com/WebAssembly/exception-handling/issues/147 Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D96863	2021-02-17 10:16:23 -08:00
Craig Topper	d4353a3101	[RISCV] Merge the handlers for masked and unmasked segment loads/stores. A lot of the code for the masked and unmasked is the same. This patch adds a boolean to handle the differences so we can share the code. Differential Revision: https://reviews.llvm.org/D96841	2021-02-17 10:08:33 -08:00
Craig Topper	6f30d0035a	[RISCV] Merge the vsetvli and vsetvlimax intrinsic selection These have very similar code just with a different number of operands and handling for vsetivl. Differential Revision: https://reviews.llvm.org/D96834	2021-02-17 10:08:33 -08:00
Sidharth Baveja	cb2876800c	[PowerPC][AIX] Enable Shrinkwrapping on 32 and 64 bit AIX. Summary: Currently Shrinkwrap is not enabled on AIX. This patch enables shrink wrap on 32 and 64 bit AIX, and 64 bit ELF. Reviewed By: sfertile, nemanjai Differential Revision: https://reviews.llvm.org/D95094	2021-02-17 14:54:57 +00:00
Sean Fertile	4e127bce2d	[PowerPC] Handle FP physical register in inline asm constraint. Do not defer to the base class when the register constraint is a physical fpr. The base class will select SPILLTOVSRRC as the register class and register allocation will fail on subtargets without VSX registers. Differential Revision: https://reviews.llvm.org/D91629	2021-02-17 09:27:03 -05:00
David Green	6d835c5fcd	[ARM] Add MVE abs costs Similar to min/max, this increases the accuracy of abs intrinsics costs under MVE.	2021-02-17 14:21:09 +00:00
Piotr Sobczak	c72a63b4b0	[AMDGPU] Add implicit vcc_lo on S_CBRANCH_VCCNZ in wave32 * Update skip-if-dead.ll with tests for wave32. * Fix the crash in verifier in one newly enabled test by adding missing fixImplicitOperands in branch insertion code. ``` * Bad machine code: Using an undefined physical register * - function: test_kill_divergent_loop - basic block: %bb.2 bb (0xad96308) - instruction: S_CBRANCH_VCCNZ %bb.1, implicit $vcc_lo - operand 1: implicit $vcc_lo LLVM ERROR: Found 1 machine code errors. ``` * Simplify "cbranch_kill" to not use interp instructions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96793	2021-02-17 15:14:57 +01:00
luxufan	709ea8bc87	[RISCV] Simplify BP initialisation We can re-use copyPhysReg rather than writing a specialised copy. Differential Revision: https://reviews.llvm.org/D95227	2021-02-17 20:33:20 +08:00
Simon Pilgrim	05c64ea672	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) (REAPPLIED) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Fixes issue raised by @saugustine in rG5aa8f4c0843a where we were failing to replace null shuffle operands from MergeInnerShuffle to UNDEFs. Differential Revision: https://reviews.llvm.org/D96345	2021-02-17 11:42:43 +00:00
Jay Foad	c8be7e96bb	[AMDGPU] Rename simplifyI24 to simplifyMul24 Also simplify one of its call sites. NFC.	2021-02-17 11:33:49 +00:00
Piotr Sobczak	08131c7439	[AMDGPU] Fix a miscompile with S_ADD/S_SUB The helper function isBoolSGPR is too aggressive when determining when a v_cndmask can be skipped on a boolean value because the function does not check the operands of and/or/xor. This can be problematic for the Add/Sub combines that can leave bits set even for inactive lanes leading to wrong results. Fix this by inspecting the operands of and/or/xor recursively. Differential Revision: https://reviews.llvm.org/D86878	2021-02-17 12:24:58 +01:00
Fraser Cormack	d81161646a	[RISCV] Add support for fixed vector vselect This patch adds support for fixed-length vector vselect. It does so by lowering them to a custom unmasked VSELECT_VL node with a vector length operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96768	2021-02-17 10:59:00 +00:00
Hsiangkai Wang	a3c783dbf2	[RISCV] Spilling for RISC-V V extension. (2nd version) Differential Revision: https://reviews.llvm.org/D95148	2021-02-17 14:05:19 +08:00
Hsiangkai Wang	5a31a67385	[RISCV] Frame handling for RISC-V V extension. This patch proposes how to deal with RISC-V vector frame objects. The layout of RISC-V vector frame will look like \|---------------------------------\| \| scalar callee-saved registers \| \|---------------------------------\| \| scalar local variables \| \|---------------------------------\| \| scalar outgoing arguments \| \|---------------------------------\| \| RVV local variables && \| \| RVV outgoing arguments \| \|---------------------------------\| <- end of frame (sp) If there is realignment or variable length array in the stack, we will use frame pointer to access fixed objects and stack pointer to access non-fixed objects. \|---------------------------------\| <- frame pointer (fp) \| scalar callee-saved registers \| \|---------------------------------\| \| scalar local variables \| \|---------------------------------\| \| ///// realignment ///// \| \|---------------------------------\| \| scalar outgoing arguments \| \|---------------------------------\| \| RVV local variables && \| \| RVV outgoing arguments \| \|---------------------------------\| <- end of frame (sp) If there are both realignment and variable length array in the stack, we will use frame pointer to access fixed objects and base pointer to access non-fixed objects. \|---------------------------------\| <- frame pointer (fp) \| scalar callee-saved registers \| \|---------------------------------\| \| scalar local variables \| \|---------------------------------\| \| ///// realignment ///// \| \|---------------------------------\| <- base pointer (bp) \| RVV local variables && \| \| RVV outgoing arguments \| \|---------------------------------\| \| /////////////////////////////// \| \| variable length array \| \| /////////////////////////////// \| \|---------------------------------\| <- end of frame (sp) \| scalar outgoing arguments \| \|---------------------------------\| In this version, we do not save the addresses of RVV objects in the stack. We access them directly through the polynomial expression (a x VLENB + b). We do not reserve frame pointer when there is any RVV object in the stack. So, we also access the scalar frame objects through the polynomial expression (a x VLENB + b) if the access across RVV stack area. Differential Revision: https://reviews.llvm.org/D94465	2021-02-17 14:05:19 +08:00
Douglas Yung	0e3d7e6186	Fix gcc build after `de3a485d9` due to a gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92598 This should fix gcc based builders such as http://lab.llvm.org:8011/#/builders/76/builds/1683	2021-02-16 21:57:12 -08:00
Tony Tye	c62b737ad6	[AMDGPU] Correct rmw atomics s_waitcnt generation The AMD GPU SIMemoryLegalizer was using the ordering address space rather than the instruction address space when determining the s_waitcnt to generate to ensure that a read-modify-write atomic has completed. This resulted in additional unnecessary counters being waited on. Differential Revision: https://reviews.llvm.org/D96743	2021-02-17 01:32:29 +00:00
Sriraman Tallam	d1a838babc	Basic block sections should enable function sections implicitly. Basic block sections enables function sections implicitly, this is not needed and is inefficient with "=list" option. We had basic block sections enable function sections implicitly in clang. This is particularly inefficient with "=list" option as it places functions that do not have any basic block sections in separate sections. This causes unnecessary object file overhead for large applications. This patch disables this implicit behavior. It only creates function sections for those functions that require basic block sections. Further, there was an inconistent behavior with llc as llc was not turning on function sections by default. This patch makes llc and clang consistent and tests are added to check the new behavior. This is the first of two patches and this adds functionality in LLVM to create a new section for the entry block if function sections is not enabled. Differential Revision: https://reviews.llvm.org/D93876	2021-02-16 16:27:16 -08:00
Petr Hosek	16af973933	[MC][ELF] Support for zero flag section groups This change introduces support for zero flag ELF section groups to LLVM. LLVM already supports COMDAT sections, which in ELF are a special type of ELF section groups. These are generally useful to enable linker GC where you want a group of sections to always travel together, that is to be either retained or discarded as a whole, but without the COMDAT semantics. Other ELF assemblers already support zero flag ELF section groups and this change helps us reach feature parity. Differential Revision: https://reviews.llvm.org/D95851	2021-02-16 14:23:40 -08:00
Victor Huang	de3a485d9c	[NFC][PPC] Refactor TOC representation to allow several entries for the same symbol We currently represent TOC entries by an MCSymbol. This is not enough in some situations. For example, when accessing an initialized TLS variable v on AIX using the general dynamic model, we need to generate the two following entries for v: .tc .v[TC],v@m .tc v[TC],v One is for the region handle (with the @m relocation), the other is for the variable offset. This refactoring allows storing several entries for the same symbol with different VariantKind in the TOC. If the VariantKind is not specified, we default to VK_None. The AIX TLS implementation using this refactoring to generate the two entries will be posted in a subsequent patch. Patched By: bsaleil Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D96346	2021-02-16 21:32:16 +00:00
Sterling Augustine	5aa8f4c084	Revert "[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))" This reverts commit `5dfba562dd`. That commit causes an assertion failure with the following repro: typedef long b __attribute__((__vector_size__(16))); b d; b e; b __attribute__((__always_inline__)) c(b h, b i) { return (__attribute__((__vector_size__(8 sizeof(short)))) short)h + i; } j() { b k, l, m, n, o[6], p, q; m = d[5]; b r = m; b s = f(r, 8); q = s; l = d[1]; p = l; t(q); n = c(m, l); o[1] = c(s, f(p, 8)); k = __builtin_shufflevector(n, o[1], 0, 2); e = __builtin_ia32_psrlwi128(k, j); } ./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c	2021-02-16 12:48:15 -08:00
Craig Topper	61a238e6e1	[RISCV] Add isel patterns for fixed vector fmsub/fnmadd/fnmsub.	2021-02-16 12:03:33 -08:00
Jessica Paquette	962b73dd0f	Revert "[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE" This reverts commit `61b4702a40`. We were seeing some test failures in SPECINT2006 due to this change. Reverting to investigate.	2021-02-16 10:50:12 -08:00
Craig Topper	07ca13fe07	[RISCV] Add support for fixed vector mask logic operations. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96741	2021-02-16 09:34:00 -08:00
Florian Hahn	211147c5ba	[AArch64] Convert CMP/SELECT sign patterns to OR & ASR. ICMP & SELECT patterns extracting the sign of a value can be simplified to OR & ASR (see https://alive2.llvm.org/ce/z/Xx4iZ0). This does not save any instructions in IR, but it is profitable on AArch64, because we need at least 2 extra instructions to materialize 1 and -1 for the SELECT. The improvements result in ~5% speedups on loops of the form static int sign_of(int x) { if (x < 0) return -1; return 1; } void foo(const int x, int res, int cnt) { for (int i=0;i<cnt;i++) res[i] = sign_of(x[i]); } Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D96596	2021-02-16 17:17:34 +00:00
David Green	1e007cf43c	[ARM] Use rGPR for writeback vldrs From what I can tell, a writeback is unpredictable with LR for both loads and stores. This changes the operand from a gprnopc to a rGPR in both cases (which I believe is essentially a NFC due to the tied-def already being a rGPR.) Differential Revision: https://reviews.llvm.org/D96723	2021-02-16 16:44:47 +00:00
Matt Arsenault	a7455d7b7c	AMDGPU: Remove kills following clusters of memory instruction In a future commit, soft clauses will be hinted with kill instructions rather than forced together with bundles. Look for kills that look like this, and erase them. I'm not sure if the check for specific uses is worthwhile, or if it would be better to just unconditionally erase kills. This reduces test churn in a future patch.	2021-02-16 10:49:28 -05:00
Simon Pilgrim	5dfba562dd	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Differential Revision: https://reviews.llvm.org/D96345	2021-02-16 15:46:34 +00:00
Matt Arsenault	c320e8196a	AMDGPU: Fix debug info handling in post-RA bundler This was allowing debug instructions to break the bundling, which would change scheduling behavior. Bundle debug info / kills inside the bundle. This seems to work OK, although the asm printer doesn't understand these in a bundle. This implicitly expects the memory legalizer to unbundle. It would probably be slightly nicer to move these after. Rewrite the loop to be clearer and make sure we don't end a bundle on a meta instruction, only allow them in between other valid bundle instructions.	2021-02-16 10:42:06 -05:00
David Truby	e86f9ba15c	[llvm][Aarch64][SVE] Remove extra fmov instruction with certain literals When a literal that cannot fit in the immediate form of the fmov instruction is used to initialise an SVE vector, an extra unnecessary fmov is currently generated. This patch adds an extra codegen pattern preventing the extra instruction from being generated. Differential Revision: https://reviews.llvm.org/D96700 Co-Authored-By: Paul Walker <paul.walker@arm.com>	2021-02-16 14:16:33 +00:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Fraser Cormack	04977ce5ce	[RISCV] Fix a crash in fixed-length build_vector lowering Non-splatted non-integer build_vector nodes were mistakenly being lowered as VID expressions, which should not happen. VID can only be used to select integer build_vector nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96718	2021-02-16 10:25:15 +00:00
Fraser Cormack	b870199020	[RISCV] Add patterns for scalable-vector fabs & fcopysign The patterns mostly follow the scalar counterparts, save for some extra optimizations to match the vector/scalar forms. The patch adds a DAGCombine for ISD::FCOPYSIGN to try and reorder ISD::FNEG around any ISD::FP_EXTEND or ISD::FP_TRUNC of the second operand. This helps us achieve better codegen to match vfsgnjn. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96028	2021-02-16 10:21:09 +00:00
Craig Topper	29b894a8d3	[RISCV] Add expicit i32/i64 types to RV32 or RV64 only isel patterns. NFC This stops tablegen from generating patterns with the opposite type in the opposite HwMode. This just adds wasted bytes to the isel table. This reduces the isel table by about 1800 bytes.	2021-02-15 14:36:05 -08:00
Matt Arsenault	392e0fcfd1	GlobalISel: Handle arguments partially passed on the stack The API is a bit awkward since you need to index into an array in the passed struct. I guess an alternative would be to pass all of the individual fields.	2021-02-15 17:06:14 -05:00
Craig Topper	7ba2e1c601	[RISCV] Add support for fixed vector floating point setcc. This is annoying because the condition code legalization belongs to LegalizeDAG, but our custom handler runs in Legalize vector ops which occurs earlier. This adds some of the mask binary operations so that we can combine multiple compares that we need for expansion. I've also fixed up RISCVISelDAGToDAG.cpp to handle copies of masks. This patch contains a subset of the integer setcc patch as well. That patch is dependent on the integer binary ops patch. I'll rebase based on what order the patches go in. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96567	2021-02-15 12:52:25 -08:00
Duncan P. N. Exon Smith	22a52dfddc	TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto) This commit fixes how metadata is handled in CloneModule to be sound, and improves how it's handled in CloneFunctionInto (although the latter is still awkward when called within a module). Ruiling Song pointed out in PR48841 that CloneModule was changed to unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in `fa35c1f80f` for clarity). This flag papered over a crash caused by other various changes made to CloneFunctionInto over the past few years that made it unsound to use cloning between different modules. (This commit partially addresses PR48841, fixing the repro from preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode became unsound in `df763188c9` and this commit does not address that regression.) RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use, avoiding unnecessary clones of all referenced metadata when linking between modules (with IRMover, the source module is discarded after linking). It never makes sense to use when you're not discarding the source. This commit drops its incorrect use in CloneModule. Sadly, the right thing to do with metadata when cloning a function is complicated, and this patch doesn't totally fix it. The first problem is that there are two different types of referenceable metadata and it's not obvious what to with one of them when remapping. - `!0 = !{!1}` is metadata's version of a constant. Programatically it's called "uniqued" (probably a better term would be "constant") because, like `ConstantArray`, it's stored in uniquing tables. Once it's constructed, it's illegal to change its arguments. - `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal to change the operands after construction. What should be done with distinct metadata when cloning functions within the same module? - Should new, cloned nodes be created? - Should all references point to the same, old nodes? The answer depends on whether that metadata is effectively owned by a function. And that's the second problem. Referenceable metadata's ownership model is not clear or explicit. Technically, it's all stored on an LLVMContext. However, any metadata that is `distinct`, that transitively references a `distinct` node, or that transitively references a GlobalValue is specific to a Module and is effectively owned by it. More specifically, some metadata is effectively owned by a specific Function within a module. Effectively function-local metadata was introduced somewhere around `c10d0e5ccd`, which made it illegal for two functions to share a DISubprogram attachment. When cloning a function within a module, you need to clone the function-local debug info and suppress cloning of global debug info (the status quo suppresses cloning some global debug info but not all). When cloning a function to a new/different module, you need to clone all of the debug info. Here's what I think we should do (eventually? soon? not this patch though): - Distinguish explicitly (somehow) between pure constant metadata owned by the LLVMContext, global metadata owned by the Module, and local metadata owned by a GlobalValue (such as a function). - Update CloneFunctionInto to trigger cloning of all "local" metadata (only), perhaps by adding a bit to RemapFlag. Alternatively, split out a separate function CloneFunctionMetadataInto to prime the metadata map that callers are updated to call ahead of time as appropriate. Here's the somewhat more isolated fix in this patch: - Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to an enum called `CloneFunctionChangeType` that is one of LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule. - The code maintaining the "functions uniquely own subprograms" invariant is now only active in the first two cases, where a function is being cloned within a single module. That's necessary because this code inhibits cloning of (some) "global" metadata that's effectively owned by the module. - The code maintaining the "all compile units must be explicitly referenced by !llvm.dbg.cu" invariant is now only active in the DifferentModule case, where a function is being cloned into a new module in isolation. - CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create uses LocalChangeOnly, since `fa635d730f` only set `ModuleLevelChanges` to trigger cloning of local metadata. - CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs and special handling of !llvm.dbg.cu. - Fixed some outdated header docs and left a couple of FIXMEs. Differential Revision: https://reviews.llvm.org/D96531	2021-02-15 11:56:00 -08:00
Stanislav Mekhanoshin	5cf9292ce3	[AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn We are using AtomicNoRet map in multiple places to determine if an instruction atomic, rtn or nortn atomic. This method does not work always since we have some instructions which only has rtn or nortn version. One such instruction is ds_wrxchg_rtn_b32 which does not have nortn version. This has caused changes in memory legalizer tests. Differential Revision: https://reviews.llvm.org/D96639	2021-02-15 11:27:59 -08:00
Florian Hahn	ca23b2c8ed	[AArch64] Move machine bundle unpacking to PreEmit2 phase. This patch adjusts the placement of the bundle unpacking to just before code emission. In particular, this means bundle unpacking happens AFTER the machine outliner. With the previous position, the machine outliner may outline parts of a bundle, which breaks them up. This is an issue for BLR_RVMARKER handling, as illustrated by the rvmarker-pseudo-expansion-and-outlining.mir test case. The machine outliner should not break up the bundles created during pseudo expansion. This should fix PR49082. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D96294	2021-02-15 16:10:43 +00:00
David Green	0a98efb049	[ARM] Add some basic Min/Max costs This adds basic MVE costs for SMIN/SMAX/UMIN/UMAX, as well as MINNUM and MAXNUM representing fmin and fmax. It tightens up the costs, not using a ICmp+Select cost. Differential Revision: https://reviews.llvm.org/D96603	2021-02-15 15:06:19 +00:00
Caroline Concatto	b52e6c5891	[CostModel]Add cost model for experimental.vector.reverse This patch uses the function getShuffleCost with SK_Reverse to compute the cost for experimental.vector.reverse. For scalable vector type, it adds a table will the legal types on AArch64TTIImpl::getShuffleCost to not assert in BasicTTIImpl::getShuffleCost, and for fixed vector, it relies on the existing cost model in BasicTTIImpl. Depends on D94883 Differential Revision: https://reviews.llvm.org/D95603	2021-02-15 14:23:57 +00:00
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
David Green	a838a4f69f	[ARM] Extend search for increment in load/store optimizer Currently the findIncDecAfter will only look at the next instruction for post-inc candidates in the load/store optimizer. This extends that to a search through the current BB, until an instruction that modifies or uses the increment reg is found. This allows more post-inc load/stores and ldm/stm's to be created, especially in cases where a schedule might move instructions further apart. We make sure not to look any further for an SP, as that might invalidate stack slots that are still in use. Differential Revision: https://reviews.llvm.org/D95881	2021-02-15 13:17:21 +00:00
Sjoerd Meijer	357237e93e	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `effc3b0799`, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Sjoerd Meijer	effc3b0799	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `cd6de0e8de`.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	cd6de0e8de	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Fraser Cormack	4bd5bd4009	[RISCV] Convert VSLIDE(UP\|DOWN) nodes to "VL" versions (NFC) This patch prepares the RISCV VSLIDEUP and VSLIDEDOWN custom nodes to ones carrying additional mask and vector-length operands. This is primarily so they can be used by both systems. This also takes the opportunity to create some helper functions to deal with the common task of getting the default (unmasked) VL operands. Reviewed By: craig.topper, arcbbb Differential Revision: https://reviews.llvm.org/D96505	2021-02-15 10:32:56 +00:00
Arlo Siemsen	080866470d	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Carl Ritson	aef781b47a	[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic Add intrinsic which demotes all active lanes to helper lanes. This is used to implement demote to helper Vulkan extension. In practice demoting a lane to helper simply means removing it from the mask of live lanes used for WQM/WWM/Exact mode. Where the shader does not use WQM, demotes just become kills. Additionally add llvm.amdgcn.live.mask intrinsic to complement demote operations. In theory llvm.amdgcn.ps.live can be used to detect helper lanes; however, ps.live can be moved by LICM. The movement of ps.live cannot be remedied without changing its type signature and such a change would require ps.live users to update as well. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94747	2021-02-15 08:45:46 +09:00
Tony Tye	8a91b68b95	[AMDGPU] Limit memory scope for scratch, LDS and GDS Changes for AMD GPU SIMemoryLegalizer: - Limit the memory scope to maximum supported by the scratch, LDS and GDS address spaces. - Improve assertion checking. - Correct toSIAtomicScope argument name. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96643	2021-02-14 17:34:12 +00:00
Kazu Hirata	b4c0d610a6	[AMDGPU] Fix build breakage	2021-02-14 09:02:55 -08:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Ben Shi	efb1cb752b	[AVR] Fix a bug in 16-bit shifts Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D96590	2021-02-14 11:54:55 +08:00
Craig Topper	3520371ddb	[RISCV] Rename the RVVBaseAddr ComplexPattern to just BaseAddr and use it to merge some scalar load/store patterns too.	2021-02-13 12:01:51 -08:00
Heejin Ahn	35f5f797a6	[WebAssemblly] Fix rethrow's argument computation Previously we assumed `rethrow`'s argument was always 0, but it turned out `rethrow` follows the same rule with `br` or `delegate`: https://github.com/WebAssembly/exception-handling/pull/137 https://github.com/WebAssembly/exception-handling/issues/146#issuecomment-777349038 Currently `rethrow`s generated by our backend always rethrow the exception caught by the innermost enclosing catch, so this adds a function to compute that and replaces `rethrow`'s argument with its computed result. This also renames `EHPadStack` in `InstPrinter` to `TryStack`, because in CFGStackify we use `EHPadStack` to mean the range between `catch`~`end`, while in `InstPrinter` we used it to mean the range between `try`~`catch`, so choosing different names would look clearer. Doesn't contain any functional changes in `InstPrinter`. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D96595	2021-02-13 03:43:15 -08:00
Kazu Hirata	96c90a6d14	[AMDGPU] Drop unnecessary const from a return type (NFC) Identified with readability-const-return-type.	2021-02-12 23:44:32 -08:00
Serge Pavlov	816053bc71	[FPEnv][ARM] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96501	2021-02-13 11:16:29 +07:00
Craig Topper	532d4bf025	[RISCV] Move riscv_vfmv_v_f_vl patterns to RISCVInstrInfoVVLPatterns.td for consistency with riscv_vmv_v_x_vl. NFC	2021-02-12 16:08:27 -08:00
Craig Topper	4220a81c84	[RISCV] Add support for fixed vector fabs	2021-02-12 15:33:36 -08:00
Craig Topper	36658376d5	[RISCV] Add support for fixed vector sqrt.	2021-02-12 15:33:29 -08:00
Jessica Paquette	61b4702a40	[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE This is pretty much just ports `performGlobalAddressCombine` from AArch64ISelLowering. (AArch64 doesn't use the generic DAG combine for this.) This adds a pre-legalize combine which looks for this pattern: ``` %g = G_GLOBAL_VALUE @x %ptr1 = G_PTR_ADD %g, cst1 %ptr2 = G_PTR_ADD %g, cst2 ... %ptrN = G_PTR_ADD %g, cstN ``` And then, if possible, transforms it like so: ``` %g = G_GLOBAL_VALUE @x %offset_g = G_PTR_ADD %g, -min(cst) %ptr1 = G_PTR_ADD %offset_g, cst1 %ptr2 = G_PTR_ADD %offset_g, cst2 ... %ptrN = G_PTR_ADD %offset_g, cstN ``` Where min(cst) is the smallest out of the G_PTR_ADD constants. This means we should save at least one G_PTR_ADD. This also updates code in the legalizer + selector which assumes that G_GLOBAL_VALUE will never have an offset and adds/updates relevant tests. Differential Revision: https://reviews.llvm.org/D96624	2021-02-12 14:55:15 -08:00
Craig Topper	d32ed9b27e	[RISCV] Use a ComplexPattern to merge the PatFrags for removing unneeded masks on shift amounts. Rather than having patterns with and without an AND, use a ComplexPattern to handle both cases. Reduces the isel table by about 700 bytes.	2021-02-12 14:03:23 -08:00
Stanislav Mekhanoshin	c96e214b9c	[AMDGPU] Fix Windows build A trivial fix, 64 bit constant is 1ull, not 1ul on Windows. Fixed build broken by `c0d7a8bc62`.	2021-02-12 12:30:52 -08:00
Amara Emerson	5d6d9b63a3	[GlobalISel] Propagate extends through G_PHIs into the incoming value blocks. This combine tries to do inter-block hoisting of extends of G_PHIs, into the originating blocks of the phi's incoming value. The idea is to expose further optimization opportunities that are normally obscured by the PHI. Some basic heuristics, and a target hook for AArch64 is added, to allow tuning. E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine since it may be folded into the addressing mode during selection. There are very minor code size improvements on AArch64 -Os, but the real benefit is that it unlocks optimizations like AArch64 conditional compares on some benchmarks. Differential Revision: https://reviews.llvm.org/D95703	2021-02-12 11:52:52 -08:00
David Green	875f0cbcc6	[ARM] Optimize fp store of extract to integer store if already available. Given a floating point store from an extracted vector, with an integer VGETLANE that already exists, storing the existing VGETLANEu directly can be better for performance. As the value is known to already be in an integer registers, this can help reduce fp register pressure, removed the need for the fp extract and allows use of more integer post-inc stores not available with vstr. This can be a bit narrow in scope, but helps with certain biquad kernels that store shuffled vector elements. Differential Revision: https://reviews.llvm.org/D96159	2021-02-12 18:34:58 +00:00
Simon Pilgrim	4841a225b7	[DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets. This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization. We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation. The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987. Differential Revision: https://reviews.llvm.org/D96413	2021-02-12 18:22:57 +00:00
Stanislav Mekhanoshin	c0d7a8bc62	[AMDGPU] Allow accvgpr_read/write decode with opsel These two instructions are VOP3P and have op_sel_hi bits, however do not use op_sel_hi. That is recommended to set unused op_sel_hi bits to 1. However, we cannot decode both representations with 1 and 0 if bits are set to default value 1. If bits are set to be ignored with '?' initializer then encoding defaults them to 0. The patch is a hack to force ignored '?' bits to 1 on encoding for these instructions. There is still canonicalization happens on disasm print if incoming values are non-default, so that disasm output does not match binary input, but this is pre-existing problem for all instructions with '?' bits. Fixes: SWDEV-272540 Differential Revision: https://reviews.llvm.org/D96543	2021-02-12 10:04:47 -08:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Craig Topper	1697cc78b1	[RISCV] Add support for integer fixed vector setcc I believe I've covered all orderings of splat operands here. Better canonicalization in lowering might help reduce this. I did not handle the immediate adjustments needed for set(u)gt/set(u)lt. Testing here is limited to byte types because the scalable vector type used for masks for the store is calculated assuming 8 byte elements. But for the setcc its based on the element count of the container type for the setcc input. So they don't agree. We'll need to enhanced D96352 to handle this I think. Differential Revision: https://reviews.llvm.org/D96443	2021-02-12 09:29:41 -08:00
Craig Topper	875c76de2b	[RISCV] Add support for matching .vx and .vi forms of binary instructions for fixed vectors. Unlike scalable vectors, I'm only using a ComplexPattern for the immediate itself. The vmv_v_x is matched explicitly. We igore the VL argument when matching a binary operator, but we do check it when matching splat directly. I left out tests for vXi64 as they fail on rv32 right now. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96365	2021-02-12 09:18:10 -08:00
Jay Foad	7e9ceed9a2	[TableGen][GlobalISel] Allow duplicate RendererFns Allow different GICustomOperandRenderers to use the same RendererFn. This avoids the need for targets to define a bunch of identical C++ renderer functions with different names. Without this fix TableGen would have emitted code that tried to define the GICR enumeration with duplicate enumerators. Differential Revision: https://reviews.llvm.org/D96587	2021-02-12 15:05:32 +00:00
David Green	541828e35d	[ARM] Single source VMOVNT Our current lowering of VMOVNT goes via a shuffle vector of the form <0, N, 2, N+2, 4, N+4, ..>. That can of course also be a single input shuffle of the form <0, 0, 2, 2, 4, 4, ..>, where we use a VMOVNT to insert a vector into the top lanes of itself. This adds lowering of that case, re-using the existing isVMOVNMask. Differential Revision: https://reviews.llvm.org/D96065	2021-02-12 14:28:57 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
luxufan	feaf1d81e3	[RISCV] Change parseVTypeI function Change parseVTypeI function to Make the added vset instruction test cases report more concrete error message. Differential Revision: https://reviews.llvm.org/D96218	2021-02-12 19:38:34 +08:00
Fraser Cormack	e88da1d677	[RISCV] Add support for integer fixed min/max This patch extends the initial fixed-length vector support to include smin, smax, umin, and umax. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96491	2021-02-12 09:19:45 +00:00
Heejin Ahn	2968611fda	[WebAssembly] Fix delegate's argument computation I previously assumed `delegate`'s immediate argument computation followed a different rule than that of branches, but we agreed to make it the same (https://github.com/WebAssembly/exception-handling/issues/146). This removes the need for a separate `DelegateStack` in both CFGStackify and InstPrinter. When computing the immediate argument, we use a different function for `delegate` computation because in MIR `DELEGATE`'s instruction's destination is the destination catch BB or delegate BB, and when it is a catch BB, we need an additional step of getting its corresponding `end` marker. Reviewed By: tlively, dschuff Differential Revision: https://reviews.llvm.org/D96525	2021-02-11 21:57:28 -08:00
Craig Topper	7a7836b4d8	[RISCV] Add a pattern for a scalable vector mask vnot. We can use a vnand.mm with the same register for both inputs. This avoids materializing an alls ones constant with vmset.mm.	2021-02-11 15:34:58 -08:00
ShihPo Hung	9e62c9146d	[RISCV] Initial support for insert/extract subvector This patch handles cast-like insert_subvector & extract_subvector in which case: 1. index starts from 0. 2. inserting a fixed-width vector into a scalable vector, or extracting a fixed-width vector from a scalable vector. Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D96352	2021-02-11 14:35:49 -08:00
Pengxuan Zheng	61cca0f2e5	[AArch64] Adding Neon Sm3 & Sm4 Intrinsics This adds SM3 and SM4 Intrinsics support for AArch64, specifically: vsm3ss1q_u32 vsm3tt1aq_u32 vsm3tt1bq_u32 vsm3tt2aq_u32 vsm3tt2bq_u32 vsm3partw1q_u32 vsm3partw2q_u32 vsm4eq_u32 vsm4ekeyq_u32 Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D95655	2021-02-11 14:20:20 -08:00
Stanislav Mekhanoshin	cb41ee92da	[AMDGPU] Fix promote alloca with double use in a same insn If we have an instruction where more than one pointer operands are derived from the same promoted alloca, we are fixing it for one argument and do not fix a second use considering this user done. Fix this by deferring processing of memory intrinsics until all potential operands are replaced. Fixes: SWDEV-271358 Differential Revision: https://reviews.llvm.org/D96386	2021-02-11 11:42:25 -08:00
Matt Arsenault	e3c6fa3611	AMDGPU: Restrict soft clause bundling at half of the available regs Fixes a testcase that was overcommitting large register tuples to a bundle, which the register allocator could not possibly satisfy. This was producing a bundle which used nearly all of the available SGPRs with a series of 16-dword loads (not all of which are freely available to use). This is a quick hack for some deeper issues with how the clause bundler tracks register pressure. Overall the pressure tracking used here doesn't make sense and is too imprecise for what it needs to avoid the allocator failing. The pressure estimate does not account for the alignment requirements of large SGPR tuples, so this was really underestimating the pressure impact. This also ignores the impact of the extended live range of the use registers after the bundle is introduced. Additionally, it didn't account for some wide tuples not being available due to reserved registers. This regresses a few cases. These end up introducing more spilling. This is also a function of the global pressure being used in the decision to bundle, not the local pressure impact of the bundle itself.	2021-02-11 14:08:59 -05:00
Yonghong Song	74975d35b4	BPF: Add LLVMAnalysis in CMakefile LINK_COMPONENTS buildbot reported a build error like below: BPFTargetMachine.cpp:(.text._ZN4llvm19TargetTransformInfo5ModelINS_10BPFTTIImplEED2Ev [_ZN4llvm19TargetTransformInfo5ModelINS_10BPFTTIImplEED2Ev]+0x14): undefined reference to `llvm::TargetTransformInfo::Concept::~Concept()' lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BPFTargetMachine.cpp.o: In function `llvm::TargetTransformInfo::Model<llvm::BPFTTIImpl>::~Model()': Commit `a260ae7160` ("BPF: Implement TTI.IntImmCost() properly") added TargetTransformInfo to BPF, which requires LLVMAnalysis dependence. In certain cmake configurations, lacking explicit LLVMAnalysis dependency may cause compilation error. Similar to other targets, this patch added LLVMAnalysis in CMakefile LINK_COMPONENTS explicitly.	2021-02-11 10:24:22 -08:00
Jay Foad	23db2d363f	[AMDGPU] Better selection of base offset when merging DS reads/writes When merging a pair of DS reads or writes needs to materialize the base offset in a vgpr, choose a value that is aligned to as high a power of two as possible. This maximises the chance that different pairs can use the same base offset, in which case the base offset registers can be commoned up by MachineCSE. Differential Revision: https://reviews.llvm.org/D96421	2021-02-11 17:46:09 +00:00
Craig Topper	5744502a13	[TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL. If we wait until the type is legalized, we'll lose information about the orginal type and need to use larger magic constants. This gets especially bad on RISCV64 where i64 is the only legal type. I've limited this to simple scalar types so it only works for i8/i16/i32 which are most likely to occur. For more odd types we might want to do a small promotion to a type where MULH is legal instead. Unfortunately, this does prevent some urem/srem+seteq matching since that still require legal types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96210	2021-02-11 09:43:13 -08:00
Craig Topper	033b1bd185	[RISCV] Add support loads, stores, and splats of vXi1 fixed vectors. This refines how we determine which masks types are legal and adds support for loads, stores, and all ones/zeros splats. I left a fixme in store handling where I think we need to zero extra bits if the type isn't a multiple of a byte. If I remember right from X86 there was some case we could have a store of a 1, 2, or 4 bit mask and have a scalar zextload that then expected the bits to be 0. Its tricky to zero the bits with RVV. We need to do something like round VL up, zero a register, lower the VL back down, then do a tail undisturbed move into the zero register. Another option might be to generate a mask of 1/2/4 bits set with a VL of 8 and use that to mask off the bits. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96468	2021-02-11 09:13:16 -08:00
Yonghong Song	a260ae7160	BPF: Implement TTI.IntImmCost() properly This patch implemented TTI.IntImmCost() properly. Each BPF insn has 32bit immediate space, so for any immediate which can be represented as 32bit signed int, the cost is technically free. If an int cannot be presented as a 32bit signed int, a ld_imm64 instruction is needed and a TCC_Basic is returned. This change is motivated when we observed that several bpf selftests failed with latest llvm trunk, e.g., #10/16 strobemeta.o:FAIL #10/17 strobemeta_nounroll1.o:FAIL #10/18 strobemeta_nounroll2.o:FAIL #10/19 strobemeta_subprogs.o:FAIL #96 snprintf_btf:FAIL The reason of the failure is due to that SpeculateAroundPHIsPass did aggressive transformation which alters control flow for which currently verifer cannot handle well. In llvm12, SpeculateAroundPHIsPass is not called. SpeculateAroundPHIsPass relied on TTI.getIntImmCost() and TTI.getIntImmCostInst() for profitability analysis. This patch implemented TTI.getIntImmCost() properly for BPF backend which also prevented transformation which caused the above test failures. Differential Revision: https://reviews.llvm.org/D96448	2021-02-11 08:35:25 -08:00
David Green	b1ef919aad	[ARM] Add CostKind to getMVEVectorCostFactor. This adds the CostKind to getMVEVectorCostFactor, so that it can automatically account for CodeSize costs, where it returns a cost of 1 not the MVEFactor used for Throughput/Latency. This helps simplify the caller code and allows us to get the codesize cost more correct in more cases.	2021-02-11 15:33:59 +00:00
Simon Tatham	69f1a7ad82	[ARM] Copy-paste error in ARMv87a architecture definition. In the tablegen architecture definition, the Name field for the ARMv87a record read "ARMv86a". All the other records contain their own names. Corrected it to "ARMv87a", and added the necessary value in ARMArchEnum for that to refer to. Reviewed By: pratlucas Differential Revision: https://reviews.llvm.org/D96493	2021-02-11 13:35:56 +00:00
David Green	e771614bae	[ARM] Change getScalarizationOverhead overload used in gather costs. NFC This changes which of the getScalarizationOverhead overloads is used in the gather/scatter cost to use the base variant directly, not relying on the version using heuristics on the number of args with no args provided. It should still produce the same costs for scalarized gathers/scatters.	2021-02-11 11:58:55 +00:00
Carl Ritson	c16f776028	[AMDGPU] Move kill lowering to WQM pass and add live mask tracking Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Use live lane tracking to enable early termination of shader at any point in control flow. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94746	2021-02-11 20:31:29 +09:00
David Green	7786ac8377	[ARM] Remove dead mov's in preheader of tail predicated loops With t2DoLoopDec we can be left with some extra MOV's in the preheaders of tail predicated loops. This removes them, in the same way we remove other dead variables. Differential Revision: https://reviews.llvm.org/D91857	2021-02-11 10:48:20 +00:00
Sander de Smalen	703130fb01	[TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount This will be needed in the loop-vectorizer where the minimum VF requested may be a scalable VF. getMinimumVF now takes an additional operand 'IsScalableVF' that indicates whether a scalable VF is required. Reviewed By: kparzysz, rampitec Differential Revision: https://reviews.llvm.org/D96020	2021-02-11 09:08:48 +00:00
David Green	1db7b9ceaa	[ARM] Make a BE predicate bitcast consistent with the rest of llvm We were storing predicate registers, such as a <8 x i1>, in the opposite order to how the rest of llvm expects. This actually turns out to be correct for the one place that usually uses it - the ScalarizeMaskedMemIntrin pass, but only because the pass was incorrect itself. This fixes the order so that bits are stored in the opposite order and bitcasts work as expected. This allows the Scalarization pass to be fixed, as in https://reviews.llvm.org/D94765. Differential Revision: https://reviews.llvm.org/D94867	2021-02-11 08:59:52 +00:00
Sander de Smalen	3b4f706ae1	[AArch64][SVE] Asm: Fix supported immediates for DUP/CPY This patch fixes an issue in the implementation of DUP/CPY where certain immediates were not accepted. Immediates should be interpreted as a two's complement encoding of a value that fits the number of bits of the element type. mov z0.b, p0/z, #127 <=> mov z0.b, p0/z, #-129 <=> mov z0.b, p0/z, #0xffffffffffffff7f This behaviour is in line with the GNU assembler. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D94776	2021-02-11 08:14:15 +00:00
Carl Ritson	e5b0b434f6	[AMDGPU] Refactor MIMG tables to better handle hardware variants Add mimgopc object to represent the opcode allowing different opcodes for different hardware variants. This enables image_atomic_fcmpswap, image_atomic_fmin, and image_atomic_fmax on GFX10 Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D96309	2021-02-11 13:22:41 +09:00
Craig Topper	5189c5b940	[X86] Simplify patterns for avx512 vpcmp. NFC This removes the commuted PatFrags that only existed to carry an SDNodeXForm in its OperandTransform field. We know all the places that need to use the commuted SDNodeXForm and there is one transform shared by signed and unsigned compares. So just hardcode the the SDNodeXForm where it is needed and use the non commuted PatFrag in the pattern. I think when I wrote this I thought the SDNodeXForm name had to match what is in the PatFrag that is being used. But that's not true. The OperandTransform is only used when the PatFrag is used in an instruction pattern and not a separate Pat pattern. All the commuted cases are Pat patterns.	2021-02-10 19:24:27 -08:00
Jessica Clarke	ca606dc988	[RISCV] More whitespace and comment typo fixes in RISCVInstrInfoC.td	2021-02-11 02:32:36 +00:00
Jessica Clarke	0973ce8596	[RISCV] Fix whitespace in RISCVInstrInfoC.td	2021-02-11 02:23:09 +00:00
Craig Topper	350ab4e617	[RISCV] Use OperandTransform field of ImmLeaf to slightly simplify a couple bitmanip patterns. NFC This binds the SDNodeXForm to the ImmLeaf so we only need to mention the ImmLeaf in both the input and output pattern.	2021-02-10 17:52:07 -08:00
Jessica Paquette	1514f3b2c8	[AArch64][GlobalISel] Don't perform the mul const combine with G_PTR_ADD A G_MUL + G_PTR_ADD can also be folded into a madd. So, conservatively, we shouldn't combine when the G_MUL is used by a G_PTR_ADD either. Differential Revision: https://reviews.llvm.org/D96457	2021-02-10 15:30:45 -08:00
Jessica Paquette	5f7a4d8d05	[AArch64][GlobalISel] Perform load/store extended reg folding with optsize GlobalISel was only doing this with minsize. SDAG does this with optsize. (See: `SelectionDAG::shouldOptForSize()`) This is a 0.3% code size improvement for CTMark at -Os. (Best: 1.1% improvements on lencod + pairlocalalign) Differential Revision: https://reviews.llvm.org/D96451	2021-02-10 14:42:25 -08:00
Jessica Paquette	9283058abb	[AArch64][GlobalISel] Fold G_ADD into the cset for G_ICMP When we have a G_ADD which is fed by a G_ICMP on one side, we can fold it into the cset for the G_ICMP. e.g. Given ``` %cmp = G_ICMP ... %x, %y %add = G_ADD %cmp, %z ``` We would normally emit a cmp, cset, and add. However, `%add` is either `%z` or `%z + 1`. So, we can just use `%z` as the source of the cset rather than wzr, saving an instruction. This would probably be cleaner in AArch64PostLegalizerLowering, but we'd need to change the way we represent G_ICMP to do that, I think. For now, it's easiest to implement in selection. This is a 0.1% code size improvement on CTMark/pairlocalalign at -Os. Example: https://godbolt.org/z/7KdrP8 Differential Revision: https://reviews.llvm.org/D96388	2021-02-10 13:28:01 -08:00
Craig Topper	fc4d780eaf	[RISCV] Remove superfluous semicolon. NFC	2021-02-10 11:20:29 -08:00
Nick Desaulniers	68945a8686	[Thumb2] support `movs pc, lr` alias for `subs pc, lr, #0`/`eret` This is used by the Linux kernel built with CONFIG_THUMB2_KERNEL. Because different operands are not permitted to `movs`, the diagnostics now provide multiple suggestions along the lines of using a non-pc destination operand or lr source operand. Forked from D95586. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96304	2021-02-10 11:00:42 -08:00
Craig Topper	cb161b3a88	[RISCV] Add support for matching .vf forms of fadd/fsub/fmul/fdiv/fma for fixed vectors. fma+neg will come in a different patch since I haven't done it for .vv yet either. Differential Revision: https://reviews.llvm.org/D96375	2021-02-10 10:16:27 -08:00
Craig Topper	0c254b4a69	[RISCV] Add support for selecting vrgather.vx/vi for fixed vector splat shuffles. The test cases extract a fixed element from a vector and splat it into a vector. This gets DAG combined into a splat shuffle. I've used some very wide vectors in the test to make sure we have at least a couple tests where the element doesn't fit into the uimm5 immediate of vrgather.vi so we fall back to vrgather.vx. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96186	2021-02-10 10:01:56 -08:00
Jay Foad	2114b458b0	[AMDGPU] Fix comments in SILoadStoreOptimizer::offsetsCanBeCombined	2021-02-10 14:49:33 +00:00
Daniel Cederman	ad3b023c88	[Sparc] Support relocatable expressions in the assembler Allow assembler expressions to start with an identifier. This allows for expressions such as ``` b symbol + 4 ``` and ``` mov symEnd - symStart, %g1 ``` The patch builds upon https://reviews.llvm.org/D47136. Reviewed By: joerg Differential Revision: https://reviews.llvm.org/D47458	2021-02-10 14:52:44 +01:00
Fraser Cormack	a3c74d6d53	[RISCV] Add support for selecting vid.v from build_vector This patch optimizes a build_vector "index sequence" and lowers it to the existing custom RISCVISD::VID node. This pattern is common in autovectorized code. The custom node was updated to allow it to be used by both scalable and fixed-length vectors, thus avoiding pattern duplication. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96332	2021-02-10 10:58:40 +00:00
Simon Pilgrim	eb31c3c5cb	Revert rGe1172959226689a "[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - merge VPERMILPD ops with different low/high masks." Revert this while I investigate a downstream breakage report.	2021-02-10 10:26:44 +00:00
Sam Parker	9d81ccc02f	[WebAssembly] Enable loop unrolling Enable partial and runtime unrolling with a threshold of 30, which was derived from a large number of kernels running on node and wasmtime for amd64 and aarch64. Unrolling is enabled by default at -O2 and -O3 and is disabled at -Oz and -Os. Compiling with -Os is recommended if the wasm binary size is the most important factor. Differential Revision: https://reviews.llvm.org/D95125	2021-02-10 08:25:46 +00:00
Jessica Paquette	7eee858585	[AArch64][GlobalISel] Fold selects fed by G_PTR_ADD Similar to the case for G_ADD. There was a function in CTMark/pairlocalalign which was missing this case, causing GlobalISel to emit a add + csel when a csinc is all that is necessary. https://godbolt.org/z/ax69E9 Minor code size improvements on CTMark at -Os. Differential Revision: https://reviews.llvm.org/D96390	2021-02-10 00:03:13 -08:00
Jessica Paquette	0e85d63486	[AArch64][GlobalISel] Allow vector load legalization into 128-bit-wide types Similar to `3d25fdc5c2` This fixes bad codegen in cases like so: https://godbolt.org/z/hePhz1 Differential Revision: https://reviews.llvm.org/D96296	2021-02-09 13:35:59 -08:00
Artem Belevich	2aa01ccec3	[CUDA, NVPTX] Allow targeting sm_86 GPUs. The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o adding any new functionality. Differential Revision: https://reviews.llvm.org/D95974	2021-02-09 11:01:10 -08:00
Matt Arsenault	f4ca6d8289	AMDGPU: Fix verifier error with argument passed in CSR SGPR We need to avoid setting the kill flag on the CSR spill if there's an additional use of the register after the spill. This does rely on consistency between the entry block liveins and the MRI's function live ins, which is not something the verifier checks now.	2021-02-09 13:49:44 -05:00
Matt Arsenault	b72a23650f	GlobalISel: Fix using wrong calling convention for callees This was taking the calling convention from the parent function, instead of the callee. Avoids regressions in a future patch when the caller and callee have different type breakdowns. For some reason AArch64's lowerFormalArguments seems to intentionally ignore the parent isVarArg.	2021-02-09 13:48:56 -05:00
Craig Topper	18ff7e045a	[RISCV] Make the min and max vector width command line options more consistent and check their relationship to each other.	2021-02-09 10:47:23 -08:00
Craig Topper	fd5adae02c	[RISCV] Remove SRO* and SLO* instructions from bitmanip. As of the current draft these are no longer being considered for the bitmanip spec. It wasn't clear what sub extension they belonged in in the 0.93 spec. So remove them. They can always be added back if something changes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96157	2021-02-09 09:35:05 -08:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Simon Pilgrim	89d9ff8229	[X86][SSE] foldShuffleOfHorizOp - add SHUFPS v4f32 handling Fold shufps(hop(x,y),hop(z,w)) -> permute(hop(x,z)) - this is very similar to the equivalent unpack fold. I did start trying to convert foldShuffleOfHorizOp to handle generic shuffle masks but we're relying on a lot of special cases at the moment.	2021-02-09 14:18:45 +00:00
Nemanja Ivanovic	a5222aa085	[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets As of commit `284f2bffc9`, the DAG Combiner gets rid of the masking of the input to this node if the mask only keeps the bottom 16 bits. This is because the underlying library function does not use the high order bits. However, on PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits from the register. Therefore, the library implementation of __gnu_h2f_ieee will return an incorrect result if the bits aren't cleared. This combine is desired for ARM (and possibly other targets) so this patch adds a query to Target Lowering to check if this zeroing needs to be kept. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092 Differential revision: https://reviews.llvm.org/D96283	2021-02-09 06:33:48 -06:00
Nemanja Ivanovic	f6e4b9fc06	[RISCV] Fix shared libs build Commit `a2d19bad07` introduced a dependency in the RISCV disassembler on two additional libraries (MC, RISCVDesc) which wasn't added to the CMakeLists.txt. This causes shared library builds to break. This patch just adds them to fix failures seen on some bots, such as the PPC64LE Multistage.	2021-02-09 06:14:25 -06:00
Dylan McKay	2ccb941740	[AVR] Fix global references to function symbols References to functions are in program memory and need a `pm()` fixup. This should fix trait objects for Rust on AVR. Differential Revision: https://reviews.llvm.org/D87631 Patch by Alex Mikhalev.	2021-02-10 00:40:49 +13:00
Hsiangkai Wang	a2d19bad07	[RISCV] Use whole register load/store for generic load/store. In vector v0.10, there are whole vector register load/store instructions. I suggest to use the whole register load/store instructions for generic load/store for scalable vector types. It could save up vset{i}vl{i} for these load/store. For fractional LMUL, I keep to use vle{eew}.v/vse{eew}.v instructions to load/store partial vector registers. Differential Revision: https://reviews.llvm.org/D95853	2021-02-09 15:52:04 +08:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
LemonBoy	45e33e8ba9	[SPARC] Recognize and handle the %lm(sym) operator Reviewed By: joerg Differential Revision: https://reviews.llvm.org/D77737	2021-02-08 19:25:33 -05:00
Hsiangkai Wang	a5b07a221a	[RISCV] Initial support of LoopVectorizer for RISC-V Vector. Define an option -riscv-vector-bits-max to specify the maximum vector bits for vectorizer. Loop vectorizer will use the value to check if it is safe to use the whole vector registers to vectorize the loop. It is not the optimum solution for loop vectorizing for scalable vector. It assumed the whole vector registers will be used to vectorize the code. If it is possible, we should configure vl to do vectorize instead of using whole vector registers. We only consider LMUL = 1 in this patch. This patch just an initial work for loop vectorizer for RISC-V Vector. Differential Revision: https://reviews.llvm.org/D95659	2021-02-09 06:32:18 +08:00
Matt Arsenault	bcf723b2fd	AMDGPU: Stop adding stack passed wide arguments to call conv handler The generated calling convention code shouldn't see these types since we split large types into 32-bit chunks before the calling convention code is triggered. GlobalISel ends up directly calls the generated CC code before checking for the register count breakdown. Arguably this difference is a bug, but this was dead code for the DAG anyway.	2021-02-08 17:09:28 -05:00

... 4 5 6 7 8 ...

61829 Commits