llvm-project

Commit Graph

Author	SHA1	Message	Date
Eli Friedman	6c04b7dd4f	[AArch64] Optimize overflow checks for [s\|u]mul.with.overflow.i32. Saves one instruction for signed, uses a cheaper instruction for unsigned. Differential Revision: https://reviews.llvm.org/D105770	2021-07-12 15:30:42 -07:00
Eli Friedman	ec1cdee6aa	[SelectionDAG][RISCV] Support @llvm.vscale.i64() on 32-bit targets. Not really useful on its own, but D105673 depends on it. Differential Revision: https://reviews.llvm.org/D105840	2021-07-12 14:53:42 -07:00
Amy Kwan	35909ff6cf	[PowerPC] Fix the splat immediate in PPCMIPeephole depending on if we have an Altivec and VSX splat instruction. An assertion of the following can occur because Altivec and VSX splats use a different operand number for the immediate: ``` int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed. ``` This patch updates PPCMIPeephole.cpp assign the correct splat immediate. Differential Revision: https://reviews.llvm.org/D105790	2021-07-12 16:20:11 -05:00
Jinsong Ji	2377eca93c	[PowerPC] Custom Lowering BUILD_VECTOR for v2i64 for P7 as well The lowering for v2i64 is now guarded with hasDirectMove, however, the current lowering can handle the pattern correctly, only lowering it when there is efficient patterns and corresponding instructions. The original guard was added in D21135, and was for Legal action. The code has evloved now, this guard is not necessary anymore. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D105596	2021-07-12 17:56:10 +00:00
Thomas Lively	cbabfc63b1	[WebAssembly] Custom combines for f32x4.demote_zero_f64x2 Replace the clang builtin function and LLVM intrinsic for f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern. Differential Revision: https://reviews.llvm.org/D105755	2021-07-12 10:32:18 -07:00
Craig Topper	d5c97f4bf0	[X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack. There are some calls to functions like `__alloca` that are missing a regmask operand. Lack of a regmask operand means that all registers that aren't mentioned by def operands are preserved. __alloca only updates EAX and ESP and has def operands for them so this is ok. Because there is no regmask the register allocator won't spill the FP registers across the call. Assuming we want to keep the FP stack untoched across these calls, we need to handle this is in the FP stackifier. We might want to add a proper regmask operand to the code that creates these calls to indicate all registers are preserved, but we'd still need this change to the FP stackifier to know to preserve the FP stack for such a regmask. The test is kind of long, but bugpoint wasn't able to reduce it any further. Fixes PR50782 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D105762	2021-07-12 10:15:38 -07:00
Jinsong Ji	28fb69e00a	[AIX] Emit version string in .file directive AIX .file directive support including compiler version string. https://www.ibm.com/docs/en/aix/7.2?topic=ops-file-pseudo-op This patch adds the support so that it will be easier to identify build compiler in objects. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D105743	2021-07-12 17:03:52 +00:00
David Green	af6f136a8c	[ARM] Expand types in VQDMULH tests. NFC	2021-07-12 17:56:11 +01:00
Albion Fung	ef49d925e2	[PowerPC] Implement trap and conversion builtins for XL compatibility This patch implements trap and FP to and from double conversions. The builtins generate code that mirror what is generated from the XL compiler. Intrinsics are named conventionally with builtin_ppc, but are aliased to provide the same builtin names as the XL compiler. Differential Revision: https://reviews.llvm.org/D103668	2021-07-12 11:04:17 -05:00
Bradley Smith	112c09039b	[SelectionDAG] Simplify PromoteIntRes_INSERT_SUBVECTOR to only handle result Let other parts of legalization handle the rest of the node, this allows re-use of existing optimizations elsewhere. Differential Revision: https://reviews.llvm.org/D105624	2021-07-12 15:20:44 +00:00
Jonas Paulsson	96421af5f8	[SystemZ] Bugfix for the 'N' code for inline asm operand. Don't use a local MachineOperand copy in SystemZAsmPrinter::PrintAsmOperand() and change the register as it may break the MRI tracking of register uses. Use an MCOperand instead. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D105757	2021-07-12 15:04:08 +02:00
David Truby	c305557acd	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-12 11:14:17 +01:00
Simon Pilgrim	e4aa6ad132	[X86][SSE] X86ISD::FSETCC nodes (cmpss/cmpsd) return a 0/-1 allbits signbits result Annoyingly, i686 cmpsd handling still fails to remove the unnecessary neg(and(x,1))	2021-07-12 09:56:59 +01:00
Simon Pilgrim	99718d5377	[X86][SSE] Add signbit tests to show cmpss/cmpsd ops not recognised as 'allbits' results.	2021-07-12 09:41:10 +01:00
Craig Topper	f0393deb33	[RISCV] Add tests for suboptimal handling of negative constants for i32 uaddo/usubo on RV64. NFC We end up zero extending constants when we promote to i64. We should sign extend instead to allow use of addiw or improve constant materialization.	2021-07-11 12:38:51 -07:00
Craig Topper	6644a61121	[RISCV] Add tests for suboptimal handling of negative constants on the LHS of i32 shifts/rotates/subtracts on RV64. NFC The constants end up getting zero extended to i64, but sign extend would be better for constant materialization. We're using W instructions so either behavior is correct since the upper bits aren't read.	2021-07-11 11:54:34 -07:00
Craig Topper	1410aab622	[RISCV] Remove stale FIXME from a test. NFC sext has been used for sltu/sltiu since `e0e62e97`.	2021-07-11 10:25:07 -07:00
Daniel Egger	98c2e4115d	[ARM] Add lowering of uadd_sat to uq{add\|sub}8 and uq{add\|sub}16 This follow the lead of https://reviews.llvm.org/D68974 to add lowering of unsigned saturated addition/subtraction. Differential Revision: https://reviews.llvm.org/D105413	2021-07-11 15:58:11 +01:00
David Green	dc0bbc9d89	[IfCvt] Don't use pristine register for counting liveins for predicated instructions. The test case here hits machine verifier problems. There are volatile long loads that the results of do not get used, loading into two dead registers. IfCvt will predicate them and as it does will add implicit uses of the predicating registers due to thinking they are live in. As nothing has used the register, the machine verifier disagrees that they are really live and we end up with a failure. The registers come from Pristine regs that LivePhysRegs counts as live. This patch adds a addLiveInsNoPristines method to be used instead in IfCvt, so that only really live in regs need to be added as implicit operands. Differential Revision: https://reviews.llvm.org/D90965	2021-07-11 14:45:54 +01:00
Craig Topper	99b8c46828	[RISCV] Restore non-constant srem test I accidentally deleted. NFC	2021-07-10 18:02:13 -07:00
Craig Topper	86109fa9e8	[RISCV] Add test cases for div/rem with constant left hand side. NFC Some of these would produce better code if we used W instructions, but constant LHS currently prevents that.	2021-07-10 17:22:40 -07:00
David Green	a6470408cf	[ARM] Extra widening and narrowing combinations tests. NFC	2021-07-10 22:08:30 +01:00
Simon Pilgrim	a328ee6577	[X86] Add tests from D93707 for fsub_strict(x,fneg(y)) -> fadd_strict(x,y) folds. Also, add matching i686 coverage to strict-fadd-combines.ll and regenerate checks.	2021-07-10 15:08:58 +01:00
Amara Emerson	97c426394a	[AArch64][GlobalISel] Implement moreElements legalization for G_SHUFFLE_VECTOR. Differential Revision: https://reviews.llvm.org/D103301	2021-07-10 00:25:26 -07:00
Amara Emerson	58a2cb5143	[GlobalISel] Add a new artifact combiner for unmerge which looks through general artifact expressions. The original motivation for this was to implement moreElementsVector of shuffles on AArch64, which resulted in complex sequences of artifacts like unmerge(unmerge(concat...)) which the combiner couldn't handle. It seemed here that the better option, instead of writing ever-more-complex combines, was to have a way to find the original "non-artifact" source registers for a given definition, walking through arbitrary expressions of unmerge/concat/insert. As long as the bits aren't extended or truncated, this is a pretty simple algorithm that avoids the need for lots of combines and instead jumps straight to the final result we want. I've only used this new technique in 2 places within tryCombineUnmerge, using it in more general situations resulted in infinite loops in AMDGPU. So for now it's used when we would otherwise fail to combine and that seems to work. In order to support looking through G_INSERTs, I also had to add it as an artifact in isArtifact(), which caused a whole lot of issues in tests. AMDGPU started infinite looping since full legalization of G_INSERT doensn't seem to be there. To work around this, I've temporarily added a CLI option to use the old behaviour so that the MIR tests will still run and terminate. Other minor changes include no longer making >128b G_MERGE/UNMERGE legal. We never had isel support for that anyway and it was a remnant of the legacy legalizer rules. However being legal prevented the combiner from checking if it was dead and deleting them. Differential Revision: https://reviews.llvm.org/D104355	2021-07-09 22:35:00 -07:00
Thomas Lively	e5220104d0	[WebAssembly] Custom combines for f64x2.promote_low_f32x4 Replace the clang builtin function and LLVM intrinsic previously used to select the f64x2.promote_low_f32x4 instruction with custom combines from standard SelectionDAG nodes. Implement the new combines to share code with the similar combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232. Differential Revision: https://reviews.llvm.org/D105675	2021-07-09 18:59:29 -07:00
Derek Schuff	ac02baab48	WebAssembly: Update datalayout to match fp128 ABI change This fix goes along with `d1a96e906c` and makes the fp128 alignment match clang's long double alignment. Differential Revision: https://reviews.llvm.org/D105749	2021-07-09 16:51:36 -07:00
Jessica Paquette	47aeeffc8f	[GlobalISel] Use GCDTy when extracting GCD ty from leftover regs in insertParts `LegalizerHelper::insertParts` uses `extractGCDType` on registers split into a desired type and a smaller leftover type. This is used to populate a list of registers. Each register in the list will have the same type as returned by `extractGCDType`. If we have - `ResultTy` = s792 - `PartTy` = s64 - `LeftoverTy` = s24 When we call `extractGCDType`, we'll end up with two different types appended to the list: Part: gcd(792, 64, 24) => s8 Leftover: gcd(792, 24, 24) => s24 When this happens, we'll hit an assert while trying to build a G_MERGE_VALUES. This patch changes the code for the leftover type so that we reuse the GCD from the desired type. e.g. Leftover: gcd(792, 8, 24) => s8 https://llvm.godbolt.org/z/137Kqxj6j Differential Revision: https://reviews.llvm.org/D105674	2021-07-09 14:15:44 -07:00
Wouter van Oortmerssen	9647a6f719	[WebAssembly] Added initial type checker to MC Assembler This to protect against non-sensical instruction sequences being assembled, which would either cause asserts/crashes further down, or a Wasm module being output that doesn't validate. Unlike a validator, this type checker is able to give type-errors as part of the parsing process, which makes the assembler much friendlier to be used by humans writing manual input. Because the MC system is single pass (instructions aren't even stored in MC format, they are directly output) the type checker has to be single pass as well, which means that from now on .globaltype and .functype decls must come before their use. An extra pass is added to Codegen to collect information for this purpose, since AsmPrinter is normally single pass / streaming as well, and would otherwise generate this information on the fly. A `-no-type-check` flag was added to llvm-mc (and any other tools that take asm input) that surpresses type errors, as a quick escape hatch for tests that were not intended to be type correct. This is a first version of the type checker that ignores control flow, i.e. it checks that types are correct along the linear path, but not the branch path. This will still catch most errors. Branch checking could be added in the future. Differential Revision: https://reviews.llvm.org/D104945	2021-07-09 14:07:25 -07:00
Stanislav Mekhanoshin	3e97d11df8	[AMDGPU] Added v_accvgpr_read_b32 rematerialization test. NFC.	2021-07-09 12:59:02 -07:00
Stanislav Mekhanoshin	4a3b055653	[AMDGPU] Fix flags of V_MOV_B64_PSEUDO In particular it was not rematerializable. Differential Revision: https://reviews.llvm.org/D105724	2021-07-09 12:49:28 -07:00
Stanislav Mekhanoshin	b379ab4193	[AMDGPU] Add VOP rematerialization test. NFC.	2021-07-09 12:16:08 -07:00
Nikita Popov	ff8b1b1b9c	Reapply [IR] Don't mark mustprogress as type attribute Reapply with fixes for clang tests. ----- This is a simple enum attribute. Test changes are because enum attributes are sorted before type attributes, so mustprogress is now in a different position.	2021-07-09 20:57:44 +02:00
Jeremy Morse	30cce54dad	[X86] Return src/dest register from stack spill/restore recogniser LLVM provides target hooks to recognise stack spill and restore instructions, such as isLoadFromStackSlot, and it also provides post frame elimination versions such as isLoadFromStackSlotPostFE. These are supposed to return the store-source and load-destination registers; unfortunately on X86, the PostFE recognisers just return "1", apparently to signify "yes it's a spill/load". This patch alters the hooks to correctly return the store-source and load-destination registers: This is really useful for debug-info as we it helps follow variable values as they move on/off the stack. There should be no codegen changes: the only other users of these PostFE target hooks are MachineInstr::getRestoreSize and MachineInstr::getSpillSize, which don't attempt to interpret the returned register location. While we're here, delete the (InstrRef) LiveDebugValues heuristic that tries to find the spill source register by looking for a killed reg -- we should be able to rely on the target hooks for that. This involves temporarily turning off a n InstrRef LivedDebugValues test on aarch64 (patch to re-enable it is in D104521). Differential Revision: https://reviews.llvm.org/D105428	2021-07-09 18:12:30 +01:00
Nikita Popov	23dd750279	Revert "[IR] Don't mark mustprogress as type attribute" This reverts commit `84ed3a794b`. A number of clang tests are also affected by this change. Revert until I can update them.	2021-07-09 18:46:00 +02:00
Nikita Popov	84ed3a794b	[IR] Don't mark mustprogress as type attribute This is a simple enum attribute. Test changes are because enum attributes are sorted before type attributes.	2021-07-09 18:24:16 +02:00
Nico Weber	97c675d3d4	Revert "Revert "Temporarily do not drop volatile stores before unreachable"" This reverts commit `52aeacfbf5`. There isn't full agreement on a path forward yet, but there is agreement that this shouldn't land as-is. See discussion on https://reviews.llvm.org/D105338 Also reverts unreviewed "[clang] Improve `-Wnull-dereference` diag to be more in-line with reality" This reverts commit `f4877c78c0`. And all the related changes to tests: This reverts commit `9a0152799f`. This reverts commit `3f7c9cc274`. This reverts commit `329f8197ef`. This reverts commit `aa9f58cc2c`. This reverts commit `2df37d5ddd`. This reverts commit `a72a441812`.	2021-07-09 11:44:34 -04:00
zhijian	841077a7e9	[AIX][XCOFF] Use bit order of has_vec and longtbtable bits as defined in AIX header debug.h Summary: The bit order of the has_vec and longtbtable bits in the traceback table generated by the XL compiler flipped at some point after v12.1. This is different from the definition is the AIX header debug.h. The change in the XL compiler that caused the deviation from the OS header definition was unintentional. Since both orderings are extant and the XL compiler runtime also expects the ordering defined by the OS, we will correct the output from LLVM to match the defined ordering given by the OS (which is also consistent with the Assembler Language Reference). Mitigation for traceback tables encoded with the wrong ordering is required for either ordering. Reviewers: XingXue, HubertTong Differential Revision: https://reviews.llvm.org/D105487	2021-07-09 11:06:46 -04:00
Roman Lebedev	52aeacfbf5	Revert "Temporarily do not drop volatile stores before unreachable" This reverts commit `4e413e1621`, which landed almost 10 months ago under premise that the original behavior didn't match reality and was breaking users, even though it was correct as per the LangRef. But the LangRef change still hasn't appeared, which might suggest that the affected parties aren't really worried about this problem. Please refer to discussion in: * https://reviews.llvm.org/D87399 (`Revert "[InstCombine] erase instructions leading up to unreachable"`) * https://reviews.llvm.org/D53184 (`[LangRef] Clarify semantics of volatile operations.`) * https://reviews.llvm.org/D87149 (`[InstCombine] erase instructions leading up to unreachable`) clang has `-Wnull-dereference` which will diagnose the obvious cases of null dereference, it was adjusted in `f4877c78c0`, but it will only catch the cases where the pointer is a null literal, it will not catch the cases where an arbitrary store is expected to trap. Differential Revision: https://reviews.llvm.org/D105338	2021-07-09 14:16:54 +03:00
Simon Pilgrim	9dbeac16ba	[X86] ReplaceNodeResults - fp_to_sint/uint - manually widen v2i32 results to let us add AssertSext/AssertZext Its proving tricky to move this to the generic legalizer code, so manually insert the v2i32 subvector into v4i32, insert the AssertSext/AssertZext node, then extract the subvector again. This avoids masks in the truncation/pack code, which means we avoid a PSHUFB in the fp_to_sint/uint code for sub-128 bit types (specific targets can still combine the packs to a pshufb if they have fast variable per-lane shuffles). This was noticed when I was trying to improve fp_to_sint/uint costs with D103695 (and some targets had very high fp_to_sint costs due to the PSHUFB), so we can then update the fp_to_uint codegen from D89697.	2021-07-09 12:07:33 +01:00
Roman Lebedev	2df37d5ddd	[NFC][Codegen] Harden a few tests to not rely that volatile store to null isn't erased	2021-07-09 13:30:42 +03:00
Kai Luo	55bd12d4b7	[PowerPC] Remove implicit use register after transformToImmFormFedByLI() When the instruction has imm form and fed by LI, we can remove the redundat LI instruction. Below is an example: ``` renamable $x5 = LI8 2 renamable $x4 = exact SRD killed renamable $x4, killed renamable $r5, implicit $x5 ``` will be converted to: ``` renamable $x5 = LI8 2 renamable $x4 = exact RLDICL killed renamable $x4, 62, 2, implicit killed $x5 ``` But when we do this optimization, we forget to remove implicit killed $x5 This bug has caused a lnt case error. This patch is to fix above bug. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D85288	2021-07-09 04:42:54 +00:00
Muhammad Omair Javaid	932e3d9960	Revert "GlobalISel/AArch64: don't optimize away redundant branches at -O0" This reverts commit `458c230b5e`. This broke LLDB buildbot testcase where breakpoint set at start of loop failed to hit. https://lab.llvm.org/buildbot/#/builders/96/builds/9404 https://github.com/llvm/llvm-project/blob/main/lldb/test/API/commands/process/attach/main.cpp#L15 Differential Revision: https://reviews.llvm.org/D105238	2021-07-09 08:23:36 +05:00
Ben Shi	ed102ce20a	[RISCV][test] Add new tests for mul optimization in the zba extension with SH*ADD This patch will show the following optimization by future patches. (mul x imm) -> (SH1ADD x, (SLLI x, bits)) when imm = 2^n + 2. (mul x imm) -> (SH2ADD x, (SLLI x, bits)) when imm = 2^n + 4. (mul x imm) -> (SH3ADD x, (SLLI x, bits)) when imm = 2^n + 8. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D105614	2021-07-09 09:48:23 +08:00
Stanislav Mekhanoshin	e5b0fe1b83	[AMDGPU] Mark more SOP instructions as rematerializable The rest of the SOP instructions implicitly set SCC and not suitable for the rematerialization. Differential Revision: https://reviews.llvm.org/D105670	2021-07-08 16:00:45 -07:00
Thomas Lively	3dd75f5371	[WebAssembly] Scalarize extract_vector_elt of binops Override the `shouldScalarizeBinop` target lowering hook using the same implementation used in the x86 backend. This causes `extract_vector_elt`s of vector binary ops to be scalarized if the scalarized version would be supported. Differential Revision: https://reviews.llvm.org/D105646	2021-07-08 14:31:53 -07:00
David Green	e2bc88f175	[ARM] Extra v8i16 -> i64 reduction tests with loads. NFC	2021-07-08 22:27:23 +01:00
Alexey Bataev	0d74fd3fdf	[SLP][COST][X86]Improve cost model for masked gather. Revived D101297 in its original form + added some changes in X86 legalization cehcking for masked gathers. This solution is the most stable and the most correct one. We have to check the legality before trying to build the masked gather in SLP. Without this check we have incorrect cost (for SLP) in case if the masked gather is not legal/slower than the gather. And we're missing some vectorization opportunities. This can be fixed in the cost model, but in this case we need to add special checks for the cost of GEPs for ScatterVectorize node, add special check for small trees, etc., i.e. there are a lot of corner cases here and there, which insrease code base and make it harder to maintain the code. > Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed. The question from D101297. Actually, no, it can't. Actually, simple gather may give us better result, especially after we started vectorization of insertelements. Plus, like I said before, the cost for non-legal masked gathers leads to missed vectorization opportunities. Differential Revision: https://reviews.llvm.org/D105042	2021-07-08 11:53:30 -07:00
Alexey Bataev	9d826fdb28	[X86][NFC]Add run lines for AVX512VL for masked gather test, NFC.	2021-07-08 11:30:31 -07:00
Stanislav Mekhanoshin	de5582be26	[AMDGPU] Fix more indention in llc-pipeline test. NFC.	2021-07-08 11:20:00 -07:00

1 2 3 4 5 ...

39539 Commits