llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	78abad569c	[RISCV] Add missing SEW=64 tests to vmslt-rv32.ll. NFC	2021-04-20 18:31:36 -07:00
Adrian Prantl	81cad0be68	Make sure PHIElimination doesn't copy debug locations across basic blocks. PHIElimination may insert copy instructions in multiple basic blocks. Moving debug locations across basic block boundaries would be misleading as illustrated by the test case. rdar://75463656 Differential Revision: https://reviews.llvm.org/D100886	2021-04-20 17:03:29 -07:00
Thomas Lively	693d767c60	[WebAssembly] More codegen for f64x2.convert_low_i32x4_{s,u} `af7925b4dd` added a custom DAG combine for recognizing fp-to-ints of extract_subvectors that could be lowered to f64x2.convert_low_i32x4_{s,u} instructions. This commit extends the combines to recognize equivalent extract_subvectors of fp-to-ints as well. Differential Revision: https://reviews.llvm.org/D100790	2021-04-20 12:37:13 -07:00
Simon Pilgrim	2a419a0b99	[X86][SSE] combineX86ShuffleChain - check if we're blending with zero into already zero elements Add a SelectionDAG::MaskedElementsAreZero helper that wraps SelectionDAG::MaskedValueIsZero testing for entirely zero vector elements	2021-04-20 17:09:49 +01:00
Jay Foad	ec8c61efdf	[AMDGPU] Allow multiple uses of the same literal In GFX10 VOP3 can have a literal, which opens up the possibility of two operands using the same literal value, which is allowed and only counts as one use of the constant bus. AMDGPUAsmParser::validateConstantBusLimitations already knew about this but SIInstrInfo::verifyInstruction did not. Differential Revision: https://reviews.llvm.org/D100770	2021-04-20 16:44:01 +01:00
David Green	21a8b9d9e9	[ARM] Limit PerformExtractEltToVMOVRRD to when f64 is legal. The generic SoftFloatVectorExtract.ll test was failing when run on arm machines, as it tries to create a f64 under soft float. Limit the transform to when f64 is legal. Also add a missing override, as reported in D100244.	2021-04-20 16:24:36 +01:00
Matt Arsenault	1cb8a9d595	AMDGPU/GlobalISel: Fix uitofp/sitofp with non-power-of-2 integers	2021-04-20 11:13:29 -04:00
Matt Arsenault	83a25a1010	GlobalISel: Restrict narrow scalar for fptoui/fptosi results This practically only works for the f16 case AMDGPU uses, not wider types. Fixes bug 49710 by failing legalization.	2021-04-20 10:54:40 -04:00
Bradley Smith	b8b075d8d7	[AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types, and lower them to the appropriate SVE instructions. Additionally now that the MULH nodes are legal, integer divides can be expanded into a more performant code sequence. Differential Revision: https://reviews.llvm.org/D100487	2021-04-20 15:18:06 +01:00
David Green	48cef1fa8e	[ARM] Create VMOVRRD from adjacent vector extracts This adds a combine for extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the same time in a single instruction, and thanks to the other VMOVRRD folds we have added recently can help reduce the amount of executed instructions. Floating point types are very similar, but will include a bitcast to an integer type. This also adds a shouldRewriteCopySrc, to prevent copy propagation from DPR to SPR, which can break as not all DPR regs can be extracted from directly. Otherwise the machine verifier is unhappy. Differential Revision: https://reviews.llvm.org/D100244	2021-04-20 15:15:43 +01:00
Fraser Cormack	60622b82a7	[RISCV][NFC] Add tests for scalable-vector DAGCombiner improvements These will all be improved by future patches.	2021-04-20 14:26:26 +01:00
Nemanja Ivanovic	03e7fefff8	[PowerPC] Canonicalize shuffles on big endian targets as well Extend shuffle canonicalization and conversion of shuffles fed by vectorized scalars to big endian subtargets. For big endian subtargets, loads and direct moves of scalars into vector registers put the data in the correct element for SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is narrower, the value still ends up in the wrong place - althouth a different wrong place than on little endian targets. This patch extends the combine that keeps values where they are if they feed a shuffle to big endian targets. Differential revision: https://reviews.llvm.org/D100478	2021-04-20 07:29:47 -05:00
David Green	806b47ade3	[ARM] Regenerate a couple of tests. NFC	2021-04-20 10:54:41 +01:00
Serguei Katkov	70193bdfc0	Re-land [GreedyRA ORE] Add Cost of spill locations into remark Re-land the patch with a fix of clang test. Cost of spill location is computed basing on relative branch frequency where corresponding spill/reload/copy are located. While the number itself is highly depends on incoming IR, the total cost can be used when do some changes in RA. Revert "Revert "[GreedyRA ORE] Add Cost of spill locations into remark"" This reverts commit `680f3d6de7`.	2021-04-20 16:21:07 +07:00
Fraser Cormack	b4a358a7ba	[RISCV] Fix missing emergency slots for scalable stack offsets This patch adds an additional emergency spill slot to RVV code. This is required as RVV stack offsets may require an additional register to compute. This patch includes an optimization by @HsiangKai <kai.wang@sifive.com> to reduce the number of registers required for the computation of stack offsets from 3 to 2. Otherwise we'd need two additional emergency spill slots. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100574	2021-04-20 09:59:41 +01:00
Qiu Chaofan	2432d80d3b	[PowerPC] Use mtvsrdd to put callee-saved GPR into VSR This patch exploits mtvsrdd instruction (available in ISA3.0+) to save two callee-saved GPR registers into a single VSR, making it more efficient. Reviewed By: jsji, nemanjai Differential Revision: https://reviews.llvm.org/D62565	2021-04-20 16:43:24 +08:00
Jun Ma	1ef5699d1a	[DAGCombiner] Support fold zero scalar vector. This patch changes ISD::isBuildVectorAllZeros to ISD::isConstantSplatVectorAllZeros which handles zero sclar vector. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D100813	2021-04-20 16:28:43 +08:00
Jay Foad	b22721f01a	[AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used Don't shrink VOP3 instructions if there are any uses of a carry-out operand, because the shrunken form of the instruction would write the carry-out to vcc instead of to a virtual register. Differential Revision: https://reviews.llvm.org/D100760	2021-04-20 09:17:52 +01:00
Qiu Chaofan	b820339752	[PowerPC] Support f128 under VSX This patch is the last one in backend to support fp128 type in pre-POWER9 subtargets with VSX, removing temporary option and updating remaining tests. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92374	2021-04-20 15:49:52 +08:00
Fraser Cormack	457da7f298	[SelectionDAG] Relax constraints on STEP_VECTOR step operand This patch relaxes the requirement that the STEP_VECTOR step constant must be of a type at least as large as the vector element type. This does not permit its use on targets which have legal vector element types larger than the largest legal scalar type, such as i64 vectors on RV32. As such, the requirement has been loosened so that the step operand must be any scalar type so long as the constant immediate is non-negative and the value fits inside the vector element type. This limits combining optimizations in certain circumstances but in practice it's unlikely to be a hindrance. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D100660	2021-04-20 08:41:42 +01:00
Ben Shi	b7249bf3b5	[RISCV][test] Add a new test of addition Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D100767	2021-04-20 12:11:56 +08:00
Serguei Katkov	680f3d6de7	Revert "[GreedyRA ORE] Add Cost of spill locations into remark" This reverts commit `328377307a`. This commit causes buildbot failures due to some clang tests are not updated. Temporary revert to fix clang tests.	2021-04-20 11:08:24 +07:00
Serguei Katkov	328377307a	[GreedyRA ORE] Add Cost of spill locations into remark Cost of spill location is computed basing on relative branch frequency where corresponding spill/reload/copy are located. While the number itself is highly depends on incoming IR, the total cost can be used when do some changes in RA. Reviewers: reames, MatzeB, anemet, thegameg Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D100020	2021-04-20 10:47:22 +07:00
Jun Ma	5c6ac3b4a2	[AArch64][SVE] Combine add and index_vector This patch tries to combine pattern add(index_vector(zero, step), dup(X)) into index_vector(X, step) TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D100107	2021-04-20 11:38:37 +08:00
David Penry	ca8eef7e3d	[CodeGen] Use ProcResGroup information in SchedBoundary When the ProcResGroup has BufferSize=0, 1. if there is a subunit in the list of write resources for the scheduling class, do not attempt to schedule the ProcResGroup. 2. if there is not a subunit in the list of write resources for the scheduling class, choose a subunit to use instead of the ProcResGroup. 3. having both the ProcResGroup and any of its subunits in the resources implied by a InstRW is not supported. Used to model parallel uses from a pool of resources. Differential Revision: https://reviews.llvm.org/D98976	2021-04-19 21:27:45 +01:00
David Penry	78a871abf7	[ARM] Use ProcResGroup in Cortex-M7 scheduling model Used to model structural hazards on FP issue, where some instructions take up 2 issue slots and others one as well as similar structural hazards on load issue, where some instructions take up two load lanes and others one. Differential Revision: https://reviews.llvm.org/D98977	2021-04-19 21:23:05 +01:00
Thomas Lively	e657c84fa1	[WebAssembly] Use v128.const instead of splats for constants We previously used splats instead of v128.const to materialize vector constants because V8 did not support v128.const. Now that V8 supports v128.const, we can use v128.const instead. Although this increases code size, it should also increase performance (or at least require fewer engine-side optimizations), so it is an appropriate change to make. Differential Revision: https://reviews.llvm.org/D100716	2021-04-19 12:43:59 -07:00
madhur13490	6a4d9cb7e0	[AMDGPU] Remove error check for indirect calls and add missing queue-ptr This patch removes -fixed-abi check for indirect calls and also adds queue-ptr which is required for indirect calls to work. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100633	2021-04-20 00:35:17 +05:30
Pavel Iliin	2ec16103c6	[AArch64] Peephole rule to remove redundant cmp after cset. Comparisons to zero or one after cset instructions can be safely removed in examples like: cset w9, eq cset w9, eq cmp w9, #1 ---> <removed> b.ne .L1 b.ne .L1 cset w9, eq cset w9, eq cmp w9, #0 ---> <removed> b.ne .L1 b.eq .L1 Peephole optimization to detect suitable cases and get rid of that comparisons added. Differential Revision: https://reviews.llvm.org/D98564	2021-04-19 19:58:38 +01:00
Craig Topper	7ed01a420a	[RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte. As noted in the FIXME there's a sort of agreement that the any extra bits stored will be 0. The generated code is pretty terrible. I was really hoping we could use a tail undisturbed trick, but tail undisturbed no longer applies to masked destinations in the current draft spec. Fingers crossed that it isn't common to do this. I doubt IR from clang or the vectorizer would ever create this kind of store. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100618	2021-04-19 11:05:18 -07:00
Jessica Paquette	65f257a215	[AArch64][GlobalISel] Implement custom legalization for s32 and s64 G_CTPOP This is a partial port of AArch64TargetLowering::LowerCTPOP. This custom lowering tries to uses NEON instructions to give a more efficient CTPOP lowering when possible. In the non-NEON/noimplicitfloat case, this should use the generic lowering (see: https://godbolt.org/z/GcaPvWe4x). I think that's worth implementing after implementing the widening code for s16/s8 though. Differential Revision: https://reviews.llvm.org/D100399	2021-04-19 10:56:02 -07:00
Jessica Paquette	91bbb914e0	[AArch64][GlobalISel] Regbankselect + select @llvm.aarch64.neon.uaddlv It turns out we actually import a bunch of selection code for intrinsics. The imported code checks that the register banks on the G_INTRINSIC instruction are correct. If so, it goes ahead and selects it. This adds code to AArch64RegisterBankInfo to allow us to correctly determine register banks on intrinsics which have known register bank constraints. For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for porting AArch64TargetLowering::LowerCTPOP. Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction. This seems a little nicer than having to know about how intrinsic instructions are structured. Differential Revision: https://reviews.llvm.org/D100398	2021-04-19 10:47:49 -07:00
Jay Foad	ef443390a9	[AMDGPU] Remove MachineDCE after SIFoldOperands Remove the MachineDCE pass after the first SIFoldOperands pass now that SIFoldOperands deletes its own dead instructions. Reapply after fixing dependent change D100188. Differential Revision: https://reviews.llvm.org/D100189	2021-04-19 12:08:02 +01:00
Fraser Cormack	c9a93c3e01	[RISCV] Lower vector shuffles to vrgather operations This patch extends the lowering of RVV fixed-length vector shuffles to avoid the default stack expansion and instead lower to vrgather instructions. For "permute"-style shuffles where one vector is swizzled, we can lower to one vrgather. For shuffles involving two vector operands, we lower to one unmasked vrgather (or splat, where appropriate) followed by a masked vrgather which blends in the second half. On occasion, when it's not possible to create a legal BUILD_VECTOR for the indices, we use vrgatherei16 instructions with 16-bit index types. For 8-bit element vectors where we may have indices over 255, we have a fairly blunt fallback to the stack expansion to avoid custom-splitting of the vector types. To enable the selection of masked vrgather instructions, this patch extends the various RISCVISD::VRGATHER nodes to take a passthru operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100549	2021-04-19 11:13:13 +01:00
David Sherwood	83f5fa519e	[CodeGen] Improve code generation for clamping of constant indices with scalable vectors When trying to clamp a constant index into a scalable vector we can test if the index is less than the minimum number of elements in the vector. If so, we can simply return the index because we know it is guaranteed to fit inside the vector. Differential Revision: https://reviews.llvm.org/D100639	2021-04-19 08:34:17 +01:00
Serguei Katkov	9f33943ee0	[GreedyRA ORE] Add stats for copy of virtual registers. Greedy RA adds copies of virtual registers when splitting live interval. This stat might be useful. Reviewers: reames, MatzeB, anemet, thegameg Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D100017	2021-04-19 12:43:44 +07:00
Serguei Katkov	61d22f2e4e	[Greedy RA] Add a check to MachineVerifier If Virtual Register is alive in landing pad its def must be before the call causing the exception or it should be statepoint instruction itself and in this case def actually means the relocation of gc pointer and is alive in landing pad. The test shows the triggering this check for an option under development use-registers-for-gc-values-in-landing-pad which is off by default until it is functionally correct. Reviewers: reames, void, jyknight, nickdesaulniers, efriedma, arsenm, rnk Reviewed By: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D100525	2021-04-19 12:31:18 +07:00
Nemanja Ivanovic	ff769dd111	[PowerPC] Minor improvement for insert_vector_elt codegen For v2f64, all VSX subtargets can insert an element with a single XXPERMDI.	2021-04-16 18:52:37 -05:00
Philip Reames	f549176ad9	[funcattrs] Add the maximal set of implied attributes to definitions Have funcattrs expand all implied attributes into the IR. This expands the infrastructure from D100400, but for definitions not declarations this time. Somewhat subtly, this mostly isn't semantic. Because the accessors did the inference, any client which used the accessor was already getting the stronger result. Clients that directly checked presence of attributes (there are some), will see a stronger result now. The old behavior can end up quite confusing for two reasons: * Without this change, we have situations where function-attrs appears to fail when inferring an attribute (as seen by a human reading IR), but that consuming code will see that it should have been implied. As a human trying to sanity check test results and study IR for optimization possibilities, this is exceeding error prone and confusing. (I'll note that I wasted several hours recently because of this.) * We can have transforms which trigger without the IR appearing (on inspection) to meet the preconditions. This change doesn't prevent this from happening (as the accessors still involve multiple checks), but it should make it less frequent. I'd argue in favor of deleting the extra checks out of the accessors after this lands, but I want that in it's own review as a) it's purely stylistic, and b) I already know there's some disagreement. Once this lands, I'm also going to do a cleanup change which will delete some now redundant duplicate predicates in the inference code, but again, that deserves to be a change of it's own. Differential Revision: https://reviews.llvm.org/D100226	2021-04-16 14:22:19 -07:00
Thomas Lively	5c729750a6	[WebAssembly] Remove saturating fp-to-int target intrinsics Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead. This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u}, which are now represented in IR as a saturating truncation to a v2i32 followed by a concatenation with a zero vector. Differential Revision: https://reviews.llvm.org/D100596	2021-04-16 12:11:20 -07:00
Malhar Jajoo	093f1828e5	[ARM] Prevent phi-node-elimination from generating copy above t2WhileLoopStartLR This patch prevents phi-node-elimination from generating a COPY operation for the register defined by t2WhileLoopStartLR, as it is a terminator that defines a value. This happens because of the presence of phi-nodes in the loop body (the Preheader of which is the block containing the t2WhileLoopStartLR). If this is not done, the COPY is generated above/before the terminator (t2WhileLoopStartLR here), and since it uses the value defined by t2WhileLoopStartLR, MachineVerifier throws a 'use before define' error. This essentially adds on to the change in differential D91887/D97729. Differential Revision: https://reviews.llvm.org/D100376	2021-04-16 16:45:07 +01:00
Nigel Perks	23f8993f32	Restore lit feature object-emission. Omit DebugInfo/Generic on XCore. D73568 removed the lit feature object-emission, because it was introduced for a target which did not support the integrated assembler, and that target no longer required the feature. XCore still does not support the integrated assembler, so a build with XCore as the default target fails tests requiring object-emission. This issue was not publicly visible because there was not a buildbot for XCore as the default target. We fixed the failures downstream. We now have builder clang-xcore-ubuntu-20-x64 on the staging buildmaster, which shows the failures. We would like to make upstream build green. Omit DebugInfo/Generic on XCore to avoid annotating 70 separate files. Differential Revision: https://reviews.llvm.org/D98508	2021-04-16 13:02:14 +01:00
Caroline Concatto	394eb91854	[NFC][AArch64][SVE] Move select-sve.ll tests to sve-select.ll This patch merges the two select tests: select-sve.ll and sve-select.ll into sve-select.ll as they are both testing SELECT instruction	2021-04-16 11:59:53 +01:00
David Green	00a6045473	[ARM] Combine sub 0, csinc X, Y, CC -> csinv -X, Y, CC Combine sub 0, csinc X, Y, CC to csinv -X, Y, CC providing that the negation of X is cheap, currently just handling constants. This comes up during the splat of an i1 to a predicate, where we now generate csetm, as opposed to cset; rsb. Differential Revision: https://reviews.llvm.org/D99940	2021-04-16 11:52:31 +01:00
Fraser Cormack	ec0f7c6923	[RISCV] Rerun stack test through update_llc_test_checks.py Adjusts formatting of comments only. Just to reduce diffs in future patches.	2021-04-16 11:08:58 +01:00
Nick Desaulniers	bb7016f8f5	[Aarch64] handle "o" inline asm memory constraints This Linux kernel is making use of this inline asm constraint which is causing an ICE. PR49956 Link: https://github.com/ClangBuiltLinux/linux/issues/1348 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100412	2021-04-15 23:36:21 -07:00
Jim Lin	2893570e86	[RISCV] Don't emit save-restore call if function is a interrupt handler It has to save all caller-saved registers before a call in the handler. So don't emit a call that save/restore registers. Reviewed By: simoncook, luismarques, asb Differential Revision: https://reviews.llvm.org/D100532	2021-04-16 12:54:47 +08:00
Momchil Velikov	f9d932e673	[clang][AArch64] Correctly align HFA arguments when passed on the stack When we pass a AArch64 Homogeneous Floating-Point Aggregate (HFA) argument with increased alignment requirements, for example struct S { __attribute__ ((__aligned__(16))) double v[4]; }; Clang uses `[4 x double]` for the parameter, which is passed on the stack at alignment 8, whereas it should be at alignment 16, following Rule C.4 in AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules) Currently we don't have a way to express in LLVM IR the alignment requirements of the function arguments. The align attribute is applicable to pointers only, and only for some special ways of passing arguments (e..g byval). When implementing AAPCS32/AAPCS64, clang resorts to dubious hacks of coercing to types, which naturally have the needed alignment. We don't have enough types to cover all the cases, though. This patch introduces a new use of the stackalign attribute to control stack slot alignment, when and if an argument is passed in memory. The attribute align is left as an optimizer hint - it still applies to pointer types only and pertains to the content of the pointer, whereas the alignment of the pointer itself is determined by the stackalign attribute. For byval arguments, the stackalign attribute assumes the role, previously perfomed by align, falling back to align if stackalign` is absent. On the clang side, when passing arguments using the "direct" style (cf. `ABIArgInfo::Kind`), now we can optionally specify an alignment, which is emitted as the new `stackalign` attribute. Patch by Momchil Velikov and Lucas Prates. Differential Revision: https://reviews.llvm.org/D98794	2021-04-15 22:58:14 +01:00
Krzysztof Parzyszek	280678122d	[Hexagon] Avoid infinite loops in type legalization when lowering SETCC Only widen SETCC if the operands can be widened. Not checking that caused infinite widen-split loops in legalization.	2021-04-15 13:34:37 -05:00
Stelios Ioannou	bf147c4653	[LSR] Fix for pre-indexed generated constant offset This patch changed the isLegalUse check to ensure that LSRInstance::GenerateConstantOffsetsImpl generates an offset that results in a legal addressing mode and formula. The check is changed to look similar to the assert check used for illegal formulas. Differential Revision: https://reviews.llvm.org/D100383 Change-Id: Iffb9e32d59df96b8f072c00f6c339108159a009a	2021-04-15 16:44:42 +01:00

1 2 3 4 5 ...

38554 Commits