llvm-project

Commit Graph

Author	SHA1	Message	Date
Amara Emerson	95ac3d15e9	[AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full scalarization. For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize completely if the source is <= 64b. This change adds support for that in the legalizer. If the source has a pow-2 num elements, then we can do a tree reduction using the scalar operation in the individual elements. Otherwise, we just create a sequential chain of operations. For AArch64, we only need to scalarize if the input is <64b. If it's great than 64b then we can first do a fewElements step to 64b, taking advantage of vector instructions until we reach the point of scalarization. I also had to relax the verifier checks for reductions because the intrinsics support <1 x EltTy> types, which we lower to scalars for GlobalISel. Differential Revision: https://reviews.llvm.org/D108276	2021-08-19 16:38:52 -07:00
Amara Emerson	a0051f7149	[AArch64][GlobalISel] Fix miscompile of <16 x s8> G_EXTRACT_VECTOR_ELT. When support for copying vector s8 lanes was added recently, this also had the side effect of fixing a fallback for <16 x s8> extracts since both used the same helper. However, there was a bug in another helper to get the regclass for a specific FPR-native type, which was assigning FPR16 to s8 instead of FPR8.	2021-08-19 16:22:32 -07:00
Thomas Lively	fd0557dbf1	[WebAssembly] More convert_low and promote_low codegen The convert_low and promote_low instructions can widen the lower two lanes of a four-lane vector, but we were previously scalarizing patterns that widened lanes besides the low two lanes. The commit adds a shuffle to move the widened lanes into the low lane positions so the convert_low and promote_low instructions can be used instead of scalarizing. Depends on D108266. Differential Revision: https://reviews.llvm.org/D108341	2021-08-19 15:37:12 -07:00
Thomas Lively	b311a040ef	[WebAssembly] Pattern match SIMD convert_low and promote_low during ISel Since the simplest DAG patterns for convert_low and promote_low instructions involved v2i32, v2f32, v4i64, and v4f64 types, which are not legal in the WebAssembly backend and would be eliminated by type legalization, we were previously matching those patterns in a DAG combine before the type legalization stage. However in cases where the vectors were wider than 128 bits, the patterns we matched were not created until the type legalization stage when the wide vectors were split up. Type legalization would continue to eliminate the illegal types we were matching as well, so the code ended up scalarized. To make the ISel for these instructions more robust, match the scalarized patterns rather than the patterns containing illegal types. Add tests with double-wide vectors to show that this works as intended. Fixes PR51098. Depends on D107502. Differential Revision: https://reviews.llvm.org/D108266	2021-08-19 15:24:28 -07:00
Thomas Lively	b69374ca58	[WebAssembly] Legalize vector types by widening The default legalization of unsupported vector types is to promote the integers in each lane, which leads to extra sign or zero extending and masking when moving data into and out of vectors. Switch our preferred type legalization from the default to vector widening, which keeps the data in the low lanes of the vector rather than in the low bits of each lane. The unused high lanes can be ignored. Half-wide vectors are now loaded from memory into the low 64 bits of the v128 rather than spread out among the lanes. As a result, v128.load64_splat is a much more common operation, so add new patterns to support it. Differential Revision: https://reviews.llvm.org/D107502	2021-08-19 12:07:33 -07:00
Stanislav Mekhanoshin	8d7d89b081	[AMDGPU] Add alias.scope metadata to lowered LDS struct Alias analysis is unable to disambiguate accesses to the structure fields without it unlike distinct variables. As a result we cannot combine ds_read and ds_write operations in a case of any store in between which always considered clobbering. Differential Revision: https://reviews.llvm.org/D108315	2021-08-19 11:40:30 -07:00
Tim Northover	edab411ee6	AArch64: copy all parts of the mem operand across when combining a store In particular we were dropping volatility, which can lead to unwanted transformations.	2021-08-19 18:26:39 +01:00
Owen Anderson	06a4c85890	Use v16i8 rather than v2i64 as the VT for memset expansion on AArch64. This allows the instruction selector to realize that it can directly broadcast the low byte of the memset value, rather than replicating it to a 64-bit GPR before broadcasting. This fixes PR50985. Differential Revision: https://reviews.llvm.org/D108354	2021-08-19 16:54:07 +00:00
Thomas Preud'homme	9d476f0af9	Fix CodeGen/X86/fsafdo_test2.ll fail in release Require debug build for CodeGen/X86/fsafdo_test2.ll since it checks for messages only printed in debug mode. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D108364	2021-08-19 16:54:04 +01:00
David Green	d10f23a25d	[ISel] Expand saddsat and ssubsat via asr and xor This changes the lowering of saddsat and ssubsat so that instead of using: r,o = saddo x, y c = setcc r < 0 s = c ? INTMAX : INTMIN ret o ? s : r into using asr and xor to materialize the INTMAX/INTMIN constants: r,o = saddo x, y s = ashr r, BW-1 x = xor s, INTMIN ret o ? x : r https://alive2.llvm.org/ce/z/TYufgD This seems to reduce the instruction count in most testcases across most architectures. X86 has some custom lowering added to compensate for cases where it can increase instruction count. Differential Revision: https://reviews.llvm.org/D105853	2021-08-19 16:08:07 +01:00
David Green	765a421276	[ARM] Add MVE min/max intrinsic tests. NFC	2021-08-19 14:33:34 +01:00
Ben Shi	b10e74389e	[RISCV][test] Improve tests for (add (mul x, c1), c2) Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107710	2021-08-19 21:04:35 +08:00
Fraser Cormack	e6b1ac8546	[LegalizeTypes][VP] Add widening support for binary VP ops This patch adds the beginnings of more thorough support in the legalizers for vector-predicated (VP) operations. The first step is the ability to widen illegal vectors. The more complicated scenario in which the result/operands need widening but the mask doesn't has not been handled here. That would require a lot of code without an in-tree target on which to test it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107904	2021-08-19 13:08:47 +01:00
Ben Shi	9e40a32620	[RISCV][test] Add new tests for add optimization in the zba extension Reviewed By: asb Differential Revision: https://reviews.llvm.org/D108188	2021-08-19 19:59:23 +08:00
Simon Pilgrim	c1d9c2fb87	[X86] Regenerate store_op_load_fold.ll test checks	2021-08-19 12:42:09 +01:00
Rong Xu	5fdaaf7fd8	[SampleFDO] Flow Sensitive Sample FDO (FSAFDO) profile loader This patch implements Flow Sensitive Sample FDO (FSAFDO) profile loader. We have two profile loaders for FS profile, one before RegAlloc and one before BlockPlacement. To enable it, when -fprofile-sample-use=<profile> is specified, add "-enable-fs-discriminator=true \ -disable-ra-fsprofile-loader=false \ -disable-layout-fsprofile-loader=false" to turn on the FS profile loaders. Differential Revision: https://reviews.llvm.org/D107878	2021-08-18 18:37:35 -07:00
Jessica Paquette	3d91d5b757	[AArch64][GlobalISel] Mark G_FMINNUM/G_FMAXNUM as floating point opcodes We need to ensure that these end up on FPR to allow imported patterns to select them. This will also ensure that we get good regbank selection when dealing with instructions like G_PHI/G_LOAD/G_STORE which deduce their banks from their uses/users. Differential Revision: https://reviews.llvm.org/D108260	2021-08-18 13:32:19 -07:00
Jessica Paquette	45e1a6bd25	[AArch64][GlobalISel] Legalize scalar G_FMINNUM + G_FMAXNUM For subtargets with full FP16, this is legal for s16, s32, and s64. Without full FP16, it's legal for s32 and s64. For s128, this is a libcall. We also support some vector types, but for now, let's just support scalars. Differential Revision: https://reviews.llvm.org/D108259	2021-08-18 13:30:03 -07:00
Simon Pilgrim	ba1f6ffb8d	[PowerPC] Regenerate 2007-09-08-unaligned.ll test checks	2021-08-18 19:54:11 +01:00
Joe Nash	9dbc968ed9	[AMDGPU] Fix atomic float max/min intrinsics Hooked up raw.buffer.atomic.fmin/max.f64 This instruction should be available on GFX6, GFX7, and GFX10. It was implemented for GFX90a with a different name. Added intrinsic def for image_atomic_fmin/fmax; the instruction defs were already there. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D108208 Change-Id: I473f98d28b2afbeeb2c27822d9686b5e86634e2f	2021-08-18 14:12:42 -04:00
Craig Topper	3f9b37ccb1	[RISCV] Remove sext_inreg+add/sub/mul/shl isel patterns. Let the sext_inreg be selected to sext.w. Remove unneeded sext.w during PostProcessISelDAG. This gives opportunities for some other isel patterns to match like the ADDIPair or matching mul with immediate to shXadd. This becomes possible after D107658 started selecting W instructions based on users. The sext.w will be considered a W user so isel will often select a W instruction for the sext.w input and we can just remove the sext.w. Otherwise we can combine the sext.w with a ADD/SUB/MUL/SLLI to create a new W instruction in parallel to the the original instruction. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107708	2021-08-18 11:07:11 -07:00
Jessica Paquette	791006fb8c	[GlobalISel] Implement lowering for G_ISNAN + use it in AArch64 GlobalISel equivalent to `TargetLowering::expandISNAN`. Use it in AArch64 and add a testcase. Differential Revision: https://reviews.llvm.org/D108227	2021-08-18 10:54:25 -07:00
Jessica Paquette	d9873711cb	[GlobalISel] Add IRTranslator support for G_ISNAN Translate the `@llvm.isnan` intrinsic to G_ISNAN when we see it. This is pretty much the same as the associated SelectionDAGBuilder code. Main difference is that we don't expand it here. It makes more sense to do that during legalization in GlobalISel. GlobalISel will just legalize the generated illegal types. Differential Revision: https://reviews.llvm.org/D108226	2021-08-18 10:48:10 -07:00
Craig Topper	6d7ea597ef	[RISCV] Insert sext_inreg when type legalizing add/sub/mul with constant LHS. We already do this for non-constants RHS. This just removes the special case. I believe the special case may have been needed because the ANY_EXTEND of a constant used to create zero extended constants, but we recently changed that to produce sign extended constants. D107658 is needed to prevent some regressions. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107697	2021-08-18 10:44:25 -07:00
Craig Topper	20e6265873	[RISCV] Improve constant materialization for stores of i16 or i32 negative constants. DAGCombiner::visitStore can clear the upper bits of constants used by stores. This leads prevents them from being recognized as sign extended negative values making them more expensive to materialize. This patch uses the hasAllNBitUsers method from D107658 to make a negative constant if none of the users care about the upper bits. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D108052	2021-08-18 10:25:12 -07:00
Craig Topper	d9ba1a9c5c	[RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used. We normally select these when the root node is a sext_inreg, but SimplifyDemandedBits can sometimes bypass the sext_inreg for some users. This can create situation where sext_inreg+add/sub/mul/shl is selected to a W instruction, and then the add/sub/mul/shl is separately selected to a non-W instruction with the same inputs. This patch tries to detect when it would still be ok to use a W instruction without the sext_inreg by checking the direct users. This can allow the W instruction to CSE with one created for a sext_inreg+add/sub/mul/shl. To minimize complexity and cost of checking, we make no attempt to determine if the CSE will happen and just always use a W instruction when we can. Differential Revision: https://reviews.llvm.org/D107658	2021-08-18 10:22:00 -07:00
Simon Pilgrim	6cc11090a1	[X86] avx512bw-intrinsics-upgrade.ll - cleanup whitespace and use nounwind to avoid unnecessary cfi tags. NFCI.	2021-08-18 17:53:55 +01:00
Roman Lebedev	df1033d8db	[NFC][X86][Codegen] Add exhaustive test coverage for PR50971 Produced via https://godbolt.org/z/5hEdGY5x3	2021-08-18 15:02:52 +03:00
Qiu Chaofan	2e5e33807e	Pre-commit frem test in PowerPC	2021-08-18 17:52:53 +08:00
Amara Emerson	284006079e	[AArch64][GlobalISel] Add support for selection of s8:fpr = G_UNMERGE <8 x s8>	2021-08-18 00:34:06 -07:00
Petr Hosek	2d4470ab89	Revert "Allow rematerialization of virtual reg uses" This reverts commit `877572cc19` which introduced PR51516.	2021-08-18 00:12:41 -07:00
Thomas Lively	4ade3af133	[WebAssembly] Autogenerate checks for simd-conversions.ll In preparation for adding more tests more simply. Differential Revision: https://reviews.llvm.org/D108264	2021-08-17 21:35:23 -07:00
jacquesguan	a7ebc4d145	[DAGCombiner] Teach isKnownToBeAPowerOfTwo handle SPLAT_VECTOR Make DAGCombine turn mul by power of 2 into shl for scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107883	2021-08-18 10:10:40 +08:00
Wang, Pengfei	2379949aad	[X86] AVX512FP16 instructions enabling 3/6 Enable FP16 conversion instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105265	2021-08-18 09:03:41 +08:00
Simon Pilgrim	fb81271e8b	[AMDGPU] Fix lowering of AMDGPU::G_CTTZ_ZERO_UNDEF to AMDGPU::G_AMDGPU_FFBL_B32 As mentioned on D107474, there was a copy+paste typo repeating G_CTLZ_ZERO_UNDEF that coverity reported as dead code. Differential Revision: https://reviews.llvm.org/D108210	2021-08-17 18:09:57 +01:00
Fraser Cormack	f3e9047249	[VP] Add vector-predicated reduction intrinsics This patch adds vector-predicated ("VP") reduction intrinsics corresponding to each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the unpredicated reductions, all VP reductions have a start value. This start value is returned when the no vector element is active. Support for expansion on targets without native vector-predication support is included. This patch is based on the ["reduction slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch (https://reviews.llvm.org/D57504). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104308	2021-08-17 17:56:35 +01:00
Roman Lebedev	2078c4ecfd	[X86] Lower insertions into upper half of an 256-bit vector as broadcast+blend (PR50971) Broadcast is not worse than extract+insert of subvector. https://godbolt.org/z/aPq98G6Yh Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D105390	2021-08-17 18:45:10 +03:00
David Green	52e0cf9d61	[ARM] Enable subreg liveness This enables subreg liveness in the arm backend when MVE is present, which allows the register allocator to detect when subregister are alive/dead, compared to only acting on full registers. This can helps produce better code on MVE with the way MQPR registers are made up of SPR registers, but is especially helpful for MQQPR and MQQQQPR registers, where there are very few "registers" available and being able to split them up into subregs can help produce much better code. Differential Revision: https://reviews.llvm.org/D107642	2021-08-17 14:10:33 +01:00
Sebastian Neubauer	fbae34635d	[GlobalISel] Add combine for PTR_ADD with regbanks Combine two G_PTR_ADDs, but keep the register bank of the constant. That way, the combine can be used in post-regbank-select combines. Introduce two helper methods in CombinerHelper, getRegBank and setRegBank that get and set an optional register bank to a register. That way, they can be used before and after register bank selection. Differential Revision: https://reviews.llvm.org/D103326	2021-08-17 13:58:16 +02:00
Bing1 Yu	bcec4ccd04	[X86] [AMX] Replace bitcast with specific AMX intrinsics with X86 specific cast. There is some discussion on the bitcast for vector and x86_amx at https://reviews.llvm.org/D99152. This patch is to introduce a x86 specific cast for vector and x86_amx, so that it can avoid some unnecessary optimization by middle-end. On the other way, we have to optimize the x86 specific cast by ourselves. This patch also optimize the cast operation to eliminate redundant code. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D107544	2021-08-17 17:04:26 +08:00
Yunde Zhong	9790a2a72f	[tests] precommit tests for D107692	2021-08-17 13:05:02 +08:00
Christudasan Devadasan	686607676f	[AMDGPU] Skip pseudo MIs in hazard recognizer Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE are only placeholders to prevent certain unwanted transformations and will get discarded during assembly emission. They should not be counted during nop insertion. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D108022	2021-08-16 23:11:14 -04:00
David Green	9236dea255	[ARM] Create MQQPR and MQQQQPR register classes Similar to the MQPR register class as the MVE equivalent to QPR, this adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR and QQQQPR registers. The MVE MQPR seemed have worked out quite well, and adding MQQPR and MQQQQPR allows us to a little more accurately specify the number of registers, calculating register pressure limits a little better. Differential Revision: https://reviews.llvm.org/D107463	2021-08-16 22:58:12 +01:00
Anshil Gandhi	f22ba51873	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-16 14:56:01 -06:00
Stanislav Mekhanoshin	877572cc19	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-16 12:42:42 -07:00
Stanislav Mekhanoshin	b9e433b02a	Prevent machine licm if remattable with a vreg use Check if a remateralizable nstruction does not have any virtual register uses. Even though rematerializable RA might not actually rematerialize it in this scenario. In that case we do not want to hoist such instruction out of the loop in a believe RA will sink it back if needed. This already has impact on AMDGPU target which does not check for this condition in its isTriviallyReMaterializable implementation and have instructions with virtual register uses enabled. The other targets are not impacted at this point although will be when D106408 lands. Differential Revision: https://reviews.llvm.org/D107677	2021-08-16 12:09:00 -07:00
Nikita Popov	735a590471	[MemorySSA] Remove -enable-mssa-loop-dependency option This option has been enabled by default for quite a while now. The practical impact of removing the option is that MSSA use cannot be disabled in default pipelines (both LPM and NPM) and in manual LPM invocations. NPM can still choose to enable/disable MSSA using loop vs loop-mssa. The next step will be to require MSSA for LICM and drop the AST-based implementation entirely. Differential Revision: https://reviews.llvm.org/D108075	2021-08-16 20:59:37 +02:00
Simon Pilgrim	778440f199	[X86] Add i128 funnel shift tests Test coverage for D108058	2021-08-16 17:31:17 +01:00
Simon Pilgrim	d6fe8d37c6	[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597	2021-08-16 16:06:54 +01:00
Simon Pilgrim	2c5c06c5cf	[X86] Add PR46315 test case	2021-08-16 13:13:56 +01:00

1 2 3 4 5 ...

40081 Commits