llvm-project

Commit Graph

Author	SHA1	Message	Date
James Farrell	5032467034	Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible. This reverts commit `40d5eeac6c`. Differential Revision: https://reviews.llvm.org/D114885	2021-12-06 14:57:47 +00:00
Kazushi (Jam) Marukawa	9d20fa09eb	[VE] Support VE specific data directives in MC Support VE specific data directives, .word/.long/.llong, in MC layer. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D115120	2021-12-06 20:07:44 +09:00
Ties Stuij	0fbb17458a	[ARM] Implement setjmp BTI placement for PACBTI-M This patch intends to guard indirect branches performed by longjmp by inserting BTI instructions after calls to setjmp. Calls with 'returns-twice' are lowered to a new pseudo-instruction named t2CALL_BTI that is later expanded to a bundle of {tBL,t2BTI}. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Alexandros Lamprineas - Ties Stuij Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D112427	2021-12-06 11:07:10 +00:00
Kazushi (Jam) Marukawa	6b41eb7f26	[VE] Change to use R_VE_SREL32 Change to use R_VE_SREL32 for relative branch instructions instead of R_VE_PC_LO32 in order to check ranges of relative branch isntructions at link time correctly. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D115097	2021-12-06 20:06:37 +09:00
David Green	d8495e0352	[ARM] Add a vrinta.f16.f16 alias The v8.1-m ARMARM uses the vrinta.f16.f16 names, as opposed to vrinta.f16. This adds an alias for it in the same way that we have for f32 and f64. Differential Revision: https://reviews.llvm.org/D68127	2021-12-06 11:06:25 +00:00
Paulo Matos	a96d828510	[WebAssembly] Implementation of intrinsic for ref.null and HeapType removal This patch implements the intrinsic for ref.null. In the process of implementing int_wasm_ref_null_func() and int_wasm_ref_null_extern() intrinsics, it removes the redundant HeapType. This also causes the textual assembler syntax for ref.null to change. Instead of receiving an argument: `func` or `extern`, the instruction mnemonic is either ref.null_func or ref.null_extern, without the need for a further operand. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D114979	2021-12-06 09:46:15 +01:00
Zi Xuan Wu	bdd7c53dc5	[CSKY] Add compressed instruction mapping between 32-bit and 16-bit instruction Add all CompressPat to map instructions between 16-bit and 32-bit with using the CompressInstEmitter infra. Although it's only used in asm printer, also enable it in asm parser to debug mapping when -enable-csky-asm-compressed-inst is on. Differential Revision: https://reviews.llvm.org/D115026	2021-12-06 14:04:54 +08:00
Qiu Chaofan	e3c2694da9	[PowerPC] Implement general back2back fusion Implement 'back-to-back' FX fusion according to Power10 User Manual '19.1.5.4 Fusion', not enabled by default. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D114345	2021-12-06 10:15:05 +08:00
Jack Andersen	f108c7f59d	[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues. Expanding on D109750. Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune unused register operands as their defs are erased. Consequently, this renders `MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a substantial performance improvement. The only necessary changes involve making relevant passes consider invalid DBG_VALUE vregs uses as valid. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D112852	2021-12-05 15:55:59 -05:00
Phoebe Wang	f37d9b4112	[X86][FP16] Replace vXi16 to vXf16 instead of v8f16 Fixes pr52561 Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D114304	2021-12-05 19:19:11 +08:00
David Green	57ff805a6d	[DAG] Create fptoui.sat from clamped fptosi As an extension to D111976, this converts clamp fptosi, clamped between 0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with conversions that naturally saturate, such as Arm. X86 disables the transform as some of the test cases increases in size. A fptoui.sat necessitates a fp clamp without native support, so there is little use in converting if the instruction is just going to be expanded. Differential Revision: https://reviews.llvm.org/D112428	2021-12-05 09:25:52 +00:00
Michael Liao	53fc971a4b	Fix `-Wunused-variable` warning. NFC.	2021-12-04 23:34:55 -05:00
Matt Arsenault	729bf9b26b	AMDGPU: Enable fixed function ABI by default Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway. Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.	2021-12-04 10:49:18 -05:00
Matt Arsenault	2959e082e1	AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default Previously we would require adding an attribute to kernels to enable the inputs passed in the kernarg segment, accessed by llvm.amdgcn.implicitarg.ptr. This violates the principle of being correct by default. Some OpenMP testcases were broken recently since it wasn't correctly setting this attribute, and no known frontends are setting this to anything other than the maximum. Most of the test changes are from load widening of argument loads since there now more implied dereferenceable bytes.	2021-12-04 10:38:25 -05:00
Matt Arsenault	ae0ba7dedd	AMDGPU: Optimize out implicit kernarg argument allocation if unused We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be unused. Start using it to avoid allocating the implicit arguments if unneeded.	2021-12-04 10:38:25 -05:00
Jay Foad	2774bad112	[AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to match how the hardware instruction works. Don't change the API of the corresponding OpenCL builtins. Differential Revision: https://reviews.llvm.org/D115032	2021-12-04 10:32:11 +00:00
Stanislav Mekhanoshin	3b17cb1506	[AMDGPU] Kill def when folding immediate in two-addr pass Two-address pass works right before RA and if an immediate was folded into an instruction there is nothing to remove the dead def. We end up with something like: v_mov_b32_e32 v14, 0xc1700000 v_mov_b32_e32 v14, 0x41200000 v_fmaak_f32 v51, s67, v19, 0xc1700000 v_fmaak_f32 v38, v51, v19, 0x4120000 The patch kills the dead move instruction right in the folding. Differential Revision: https://reviews.llvm.org/D114999	2021-12-03 09:37:49 -08:00
David Green	ab0c5cea0b	[ARM] Use v2i1 for MVE and CDE intrinsics This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal type, to use a <2 x i1> as opposed to emulating the predicate with a <4 x i1>. The v4i1 workarounds have been removed leaving the natural v2i1 types, notably in vctp64 which now generates a v2i1 type. AutoUpgrade code has been added to upgrade old IR, which needs to convert the old v4i1 to a v2i1 be converting it back and forth to an integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be optimized away in the final assembly. Differential Revision: https://reviews.llvm.org/D114455	2021-12-03 15:27:58 +00:00
Nemanja Ivanovic	d6c0ef7887	[PowerPC] Handle base load with reservation mnemonic The Power ISA defined l[bhwdq]arx as both base and extended mnemonics. The base mnemonic takes the EH bit as an operand and the extended mnemonic omits it, making it implicitly zero. The existing implementation only handles the base mnemonic when EH is 1 and internally produces a different instruction. There are historical reasons for this. This patch simply removes the limitation introduced by this implementation that disallows the base mnemonic with EH = 0 in the ASM parser. This resolves an issue that prevented some files in the Linux kernel from being built with -fintegrated-as. Also fix a crash if the value is not an integer immediate.	2021-12-03 09:13:02 -06:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
Petar Avramovic	0b34ffe4a6	AMDGPU/GlobalISel: Add clamp combine Add clamp combine. Source is fminnum(fmaxnum(Val, 0.0), 1.0) or fmaxnum(fminnum(Val, 1.0), 0.0) or fmed3 intrinsic with 0.0 and 1.0 as two out of three operands. Differential Revision: https://reviews.llvm.org/D90052	2021-12-03 12:49:39 +01:00
Petar Avramovic	ec54867d75	AMDGPU/GlobalISel: Add floating point med3 combine Add floating point version of med3 combine. Source is fminnum(fmaxnum(Val, K0), K1) or fmaxnum(fminnum(Val, K1), K0) where K0 and K1 are constants and K0 <= K1. Differential Revision: https://reviews.llvm.org/D90051	2021-12-03 12:49:39 +01:00
Petar Avramovic	ab01f4d264	AMDGPU/GlobalISel: Do not fcanonicalize const splat padded with undef Recognize constant splat padded with undef in isCanonicalized. Fcanonicalize will be removed by RemoveFcanonicalize in post-legalizer combiner. We will treat undef as value that will result in a splat in clamp combine after regbankselect. Differential Revision: https://reviews.llvm.org/D104408	2021-12-03 12:49:38 +01:00
Victor Perez	9eb7322748	[RISCV][VP] Add RVV codegen for vp.select Lower vp.select instrinsic to VSELECT_VL. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D114629	2021-12-03 11:02:20 +00:00
Jessica Clarke	a3530dc199	[AArch64][NFC] Alter ComplexPattern types to be consistent with their uses When used as a non-leaf node, TableGen does not currently use the type of a ComplexPattern for type inference, which also means it does not check it doesn't conflict with the use. This differs from when used as a leaf value, where the type is used for inference. Fixing that discrepancy is something I intend to upstream as a subsequent review. AArch64 currently has several ComplexPatterns that are used in contexts where they're expected to be an iPTR. The cases that lead to type contradictions are separated out in D108759, but there are additional differences to the TableGen output when using my locally-patched TableGen. None of these appear to matter, at least for passing all the CodeGen tests, but it's safer to avoid such changes (and similar changes were causing issues on some AMDGPU tests, causing failures to select). Changing these additional ComplexPatterns to use iPTR rather than i64 ensures that the TableGen output remains bit-for-bit identical (compared to without having this patch and my TableGen patch, as well as the intermediate state of having this patch but not my TableGen patch), and more accurately captures the higher-level meaning of these patterns. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D109034	2021-12-03 07:04:59 +00:00
Jessica Clarke	3ee56eed2f	[AMDGPU][NFC] Alter ComplexPattern types to be consistent with their uses When used as a non-leaf node, TableGen does not currently use the type of a ComplexPattern for type inference, which also means it does not check it doesn't conflict with the use. This differs from when used as a leaf value, where the type is used for inference. Fixing that discrepancy is something I intend to upstream as a subsequent review. AMDGPU currently has several ComplexPatterns that are used in contexts where they're expected to be an iPTR, and where using an iPTR instead of a fixed-width integer type matters. With my locally-patched TableGen, none of these mismatches result in type contradictions, but do change the patterns and cause various failures to select. These changes to the ComplexPatterns' types reflect how they are actually used, result in bit-for-bit identical TableGen output (without my local TableGen patch), and ensure that with improved type inference AMDGPU's backend will continue to work. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109032	2021-12-03 07:04:59 +00:00
Jessica Clarke	0cb44cfbb7	[AArch64][NFC] Fix ComplexPattern types conflicting with uses When used as a non-leaf node, TableGen does not currently use the type of a ComplexPattern for type inference, which also means it does not check it doesn't conflict with the use. This differs from when used as a leaf value, where the type is used for inference. Fixing that discrepancy is something I intend to upstream as a subsequent review, but these are all the type conflicts found (all legitimate) by my locally-patched TableGen. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D108759	2021-12-03 07:04:59 +00:00
Kirill Stoimenov	021ecbbb44	[ASan] Changed intrisic implemenation to use PLT safe registers. Changed registers to R10 and R11 because PLT resolution clobbers them. Also changed the implementation to use R11 instead of RCX, which saves a push/pop. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D115002	2021-12-03 04:06:30 +00:00
Daniil Fukalov	ab05ab59a7	[CostModel][AMDGPU] Fix instructions costs estimation for vector types. 1. Fixed vector instructions costs estimations incosistency - removed different logic for "not simple types" since it biases costs for these types. 2. Fixed legalization penalty for vectors too big for the target: changed from overwrite default legalization cost value estimation to added penalty. 3. Fixed few typos in tests. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114893	2021-12-03 03:08:08 +03:00
Matt Arsenault	0eebe2e36c	AMDGPU: Sanitized functions require implicit arguments Do not infer no-amdgpu-implicitarg-ptr for sanitized functions. If a function is explicitly marked amdgpu-no-implicitarg-ptr and sanitize_address, infer that it is required.	2021-12-02 17:55:43 -05:00
Amy Kwan	c27734c183	[PowerPC] Fix load/store selection infrastructure when load/store intrinsics are used on P10. The load/store infrastructure previously made an incorrect assumption that whenever it is used with a load/store intrinsic on Power10 - those intrinsics would automatically be the lxvp/stxvp intrinsics introduced in Power10. However, this is obviously not the case as there are multiple instances of pre-P10 intrinsics that use the refactored load/store implementation. This patch corrects this assumption, and produces the expected intrinsic on pre-P10. Differential Revision: https://reviews.llvm.org/D114978	2021-12-02 15:59:29 -06:00
Kazu Hirata	262dd1e42d	[llvm] Use range-based for loops (NFC)	2021-12-02 09:27:47 -08:00
David Green	b8f1ccb0ac	[ARM] Introduce i8neg and i8pos addressing modes Some instructions with i8 immediate ranges can only hold negative values (like t2LDRHi8), only hold positive values (like t2STRT) or hold +/- depending on the U bit (like the pre/post inc instructions. e.g t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8, AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear. This allows us to get the offset ranges of t2LDRHi8 correct in the load/store optimizer, fixing issues where we could end up creating instructions with positive offsets (which may then be encoded as ldrht). Differential Revision: https://reviews.llvm.org/D114638	2021-12-02 17:10:26 +00:00
David Stuttard	0e8590f065	[AMDGPU] Add support for in-order bvh in waitcnt pass bvh should be handled separately from vmem and vmem with sampler instructions for waitcnt handling. Differential Revision: https://reviews.llvm.org/D114794	2021-12-02 14:26:11 +00:00
Simon Moll	e92429d99b	[VE][NFC] Cleanup redundant namespace wrapper	2021-12-02 15:23:37 +01:00
Matt Devereau	4244f95cc6	[AArch64][SVE] Enable bf16 vector.insert Allow passthrough bf16 registers for vector.insert Differential revision: https://reviews.llvm.org/D114858	2021-12-02 12:59:19 +00:00
David Green	e629302558	[ARM] Correct range in isLegalAddressImm The ranges in isLegalAddressImm were off by one, not allowing the maximum values for unscaled offsets. Differential Revision: https://reviews.llvm.org/D114636	2021-12-02 11:33:40 +00:00
Simon Moll	e37000f3bf	[VE][NFC] Fix use-after-free in PVFMK expansion There is custom expansion code for packed VFMK Pseudos in the VE backend. This code erased the Pseudo without telling ExpandPostRAPseudos about it, causing the generic expansion function to access the erased Pseudo. This bug triggered in the test/CodeGen/VE/VELIntrinsics/vfmk.ll test with asan-enabled builds. Detected by: sanitizer-x86_64-linux-fast (https://lab.llvm.org/buildbot/#/builders/5/builds/15393)	2021-12-02 10:41:11 +01:00
David Green	646c872f9d	[ARM] Teach getIntImmCostInst about the cost of saturating fp converts Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants, we can now generate a fptosi.sat. But in the arm backend, the constant can be treated as high cost, pulling it out of the basic block in a way that the DAG combine can no longer see it. This teaches it again that it is a low cost constant, not worth hoisting out. Recommitted from `0e98659ea1` with a fix for APInt comparison. Differential Revision: https://reviews.llvm.org/D114380	2021-12-02 07:56:27 +00:00
Austin Kerbow	da067ed569	[AMDGPU] Set most sched model resource's BufferSize to one Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memory ops and their consumers on an in-order processor. After this change, the scheduler will treat the data edges from loads as blocking so that stalls are guaranteed when waiting for data to be retreaved from memory. Since we don't actually track waitcnt here, this should do a better job at modeling their behavior. Practically, this means that the scheduler will trigger the 'STALL' heuristic more often. This type of change needs to be evaluated experimentally. Preliminary results are positive. Fixes: SWDEV-282962 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114777	2021-12-01 22:31:28 -08:00
Phoebe Wang	f13b43d570	[X86][FP16] Only generate approximate rsqrt when Reciprocal is true for half type We have reasonable fast sqrt and accurate rsqrt for half type due to the limited fractions. So neither do we need multi steps refinement for rsqrt nor replace sqrt by rsqrt. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114844	2021-12-02 13:52:45 +08:00
Phoebe Wang	4756a2f157	[X86] Insert FMUL for estimated non reciprocal SQRT when `RefinementSteps` = 0 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D114843	2021-12-02 13:52:45 +08:00
Christudasan Devadasan	399b7de0ea	[AMDGPU] Add a regclass flag for scalar registers Along with vector RC flags, this scalar flag will make various regclass queries like `isVGPR` more accurate. Regclasses other than vectors are currently set with the new flag even though certain unallocatable classes aren't truly scalars. It would be ok as long as they remain unallocatable. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D110053	2021-12-01 23:31:07 -05:00
Petar Avramovic	641906da8d	AMDGPU/GlobalISel: Fix constant bus restriction errors for med3 Detected on targets older then gfx10 (e.g. gfx9) for constants that are too large to be inlined (constant are sgpr by default). In med3 combine it is expected that regbankselect maps all operands of min/max we try to match to vgpr. However constants are mapped to sgpr and there will be a sgpr-to-vgpr copy. Matchers look through sgpr-to-vgpr copies and return sgpr and these break constant bus restriction. Build med3 with all vgpr operands. Use existing sgpr-to-vgpr copies for matched sgprs. If there is no such copy (not expected) build one. Differential Revision: https://reviews.llvm.org/D114700	2021-12-01 21:36:37 +01:00
Craig Topper	2f6beb7b0e	[RISCV] Add inline expansion for vector ftrunc/fceil/ffloor. This prevents scalarization of fixed vector operations or crashes on scalable vectors. We don't have direct support for these operations. To emulate ftrunc we can convert to the same sized integer and back to fp using round to zero. We don't need to do a convert if the value is large enough to have no fractional bits or is a nan. The ceil and floor lowering would be better if we changed FRM, but we don't model FRM correctly yet. So I've used the trunc lowering with a conditional add or subtract with 1.0 if the truncate rounded in the wrong direction. There are also missed opportunities to use masked instructions. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113543	2021-12-01 11:25:28 -08:00
Simon Moll	435d44bf8a	[VE][NFC] Fix use-after-free in VEInstrInfo First call getOperand, then erase the MachineInstr. Not the other way round. Expected to fix test/CodeGen/VE/VELIntrinsics/lvm.ll Detected by asan buildbot: sanitizer-x86_64-linux-fast (https://lab.llvm.org/buildbot/#/builders/5/builds/15384)	2021-12-01 19:30:27 +01:00
Reid Kleckner	c6fa4c481a	[AArch64] Fix unused variable warning with NDEBUG, NFC	2021-12-01 09:28:22 -08:00
Simon Pilgrim	19d34f6e95	[X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation. But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation. This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result. Differential Revision: https://reviews.llvm.org/D113371	2021-12-01 16:37:49 +00:00
Simon Pilgrim	1bd01defff	[VE] Remove switch with only default case statement to fix MSVC warning. NFC.	2021-12-01 16:37:48 +00:00
Yousuf Ali	415e821a50	[PowerPC][AIX] Add toc-data support for 64-bit AIX small code model. The patch expands the existing 32-bit toc-data attribute support to 64-bit. In both 32-bit and 64-bit it is supported for small code model only. Differential Revision: https://reviews.llvm.org/D114654	2021-12-01 10:56:21 -05:00

1 2 3 4 5 ...

65097 Commits