llvm-project

Commit Graph

Author	SHA1	Message	Date
Paulo Matos	00bf4755e9	[WebAssembly] Refactor and fix emission of external IR global decls This patches fixes the visibility and linkage information of symbols referring to IR globals. Emission of external declarations is now done in the first execution of emitConstantPool rather than in emitLinkage (and a few other places). This is the point where we have already gathered information about used symbols (by running the MC Lower PrePass) and not yet started emitting any functions so that any declarations that need to be emitted are done so at the top of the file before any functions. This changes the order of a few directives in the final asm file which required an update to a few tests. Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D118122	2022-01-31 11:42:21 +01:00
Florian Hahn	17ebd68ae6	[AArch64] Fix costs of float vector compare/selects pairs. The current cost-model overestimates the cost of vector compares & selects for ordered floating point compares. This patch fixes that by extending the existing logic for integer predicates. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118256	2022-01-31 10:18:29 +00:00
Nikita Popov	0801940c17	[RISCV] Avoid pointer element type access for masked atomicrmw intrinsics masked.atomicrmw.*.i32 intrinsics access an i32 (and then possibly mask it), so hardcode MVT::i32 as the access type here, rather than determining it from the pointer element type. Differential Revision: https://reviews.llvm.org/D118336	2022-01-31 09:28:39 +01:00
Craig Topper	5fbc3cda9e	[RISCV] Use existing variable intead of calling getOperand again. NFCI This is a slight change because I'm using the ANY_EXTEND result instead of the original operand, but getNode should constant fold. While there, add a comment about why the code specifically checks for a ConstantSDNode.	2022-01-30 18:42:19 -08:00
Craig Topper	744be8c502	[RISCV] Lower riscv_zip/unzip intrinsic to RISCVISD::SHFL/UNSHFL. These are special versions of the more general shfli/unshfli instructions. We can use the general ISD opcodes with the correct immediates.	2022-01-30 13:27:41 -08:00
Craig Topper	e1075186a6	[RISCV] Custom lower brev8 intrinsic to RISCVISD::GREV. We can use the RISCVISD::GREV encoding that swaps the bits in each byte. This allows it to use the existing computeKnownBits support for RISCVISD::GREV.	2022-01-30 12:41:09 -08:00
Simon Pilgrim	156f83adc2	[X86] combineVectorTruncation - use PACKUSDW(BLENDW(X,0),BLENDW(Y,0)) for v8i32->v8i16 truncation Limit this to SSE41 - AVX1 targets to avoid UNPCKL(PSHUFB,PSHUFB), pre-SSE41 we don't have PACKUSDW/BLENDW and with AVX2 we can perform this as PERMQ(PSHUFB()).	2022-01-30 20:07:04 +00:00
Simon Pilgrim	b7e04ccd99	[X86][AVX] matchUnaryShuffle - avoid creation of on-the-fly nodes (PR45974) Don't extract the ANY/ZERO_EXTEND_VECTOR_INREG subvector source until we're definitely combining to a new node.	2022-01-30 17:59:14 +00:00
Simon Pilgrim	2cdbaca394	[X86] Attempt to fold MOVMSK(CMPEQ(AND(X,C1),0)) -> MOVMSK(NOT(SHL(X,C2))) Allows pow2 mask tests to avoid an unnecessary constant load. Noticed while investigating how to extend MatchVectorAllZeroTest to support more allof/anyof patterns.	2022-01-30 15:53:21 +00:00
Craig Topper	524545317c	[RISCV] Remove RISCVISD::BREV8 and use RISCVISD::GREV instead. We already have an ISD opcode for the more general GREV/GREVI instructon. We can just use it with the encoding that corresponds to the behavior of brev8. This is similar to what we do for orc.b where we use the GORC ISD opcode.	2022-01-29 22:45:43 -08:00
Craig Topper	0405ac0150	[RISCV] Rerrange RISCVInstrInfoZB.td to better group related wthings. NFC Especially placing W instructions/patterns near their non-W versions.	2022-01-29 21:16:15 -08:00
Craig Topper	815786eb67	[RISCV] Use RVBUnary to simplify ZEXT_H_RV32/ZEXT_H_RV64 definitions. NFC	2022-01-29 18:28:14 -08:00
Simon Pilgrim	ee9eeed773	[X86] LowerFunnelShift - enable v8i16 lowering	2022-01-29 16:20:36 +00:00
Simon Pilgrim	6777289dd9	[X86] lowerShuffleAsBlend - pull out repeated getVectorNumElements() calls. NFC.	2022-01-29 16:16:29 +00:00
Simon Pilgrim	3a1fd17027	[WebAssembly] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr The pointers are dereferenced immediately, so assert the cast is correct instead of returning nullptr	2022-01-29 16:08:06 +00:00
Simon Pilgrim	f1305f2369	[X86] combinePredicateReduction - always use PMOVMSKB(PCMPEQB()) for allof(icmp_eq()) reductions This greatly simplifies the codegen for recognising PTEST patterns and matches the codegen from the very similar LowerVectorAllZero	2022-01-29 15:16:59 +00:00
Simon Pilgrim	67a399fd57	[X86] SimplifyDemandedBits - add X86ISD::BLENDV SimplifyMultipleUseDemandedBits handling Lets us see through multiple use operands	2022-01-29 14:26:41 +00:00
Simon Pilgrim	7e849fd97b	[X86] LowerFunnelShift - allow non-constant vXi8 unpack(y,x) << zext(z) lowering pre-AVX512 Without AVX512 (which can efficiently extend/truncate to vXi16/vXi32), unpacking/packing to vXi16 is more efficient that relying on the (uops-heavy) PBLENDV shift expansion	2022-01-29 13:58:30 +00:00
Jim Lin	33fe0872cd	[M68k] Add addressing modes ARIPI and ARIPD support for BTST BTST missed ARIPI and ARIPD addressing modes support. Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D116580	2022-01-29 21:05:10 +08:00
Paul Walker	30efee764d	[SVE] Remove AArch64ISD::PFALSE. AArch64ISD::PFALSE does not provide any value, in fact it can prevent common combines from firing. We only needed to lower to PFALSE until ISD::SPLAT_VECTOR became generally available. Differential Revision: https://reviews.llvm.org/D118469	2022-01-29 11:31:00 +00:00
Paul Walker	3bc876d0a3	[AArch64] Add isel for bitcasting between bfloat and half types. Differential Revision: https://reviews.llvm.org/D118420	2022-01-29 11:26:13 +00:00
Craig Topper	8faf2a0638	[RISCV] Correct predicate orc.b pattern to not include Zbkb. This was incorrectly lumped in when the predicate was changed for the rotate instructions.	2022-01-29 00:10:54 -08:00
Paulo Matos	864767ab09	[WebAssembly][NFC] Refactor WasmSymbol type setting code This refactors some code dealing with setting Wasm symbol types. Some of the code dealing with types was moved from `WebAssemblyUtilities` to `WebAssemblyTypeUtilities`. Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D118121	2022-01-29 09:00:51 +01:00
Craig Topper	d8f929a567	[RISCV] Custom legalize BITREVERSE with Zbkb. With Zbkb, a bitreverse can be split into a rev8 and a brev8. Reviewed By: VincentWu Differential Revision: https://reviews.llvm.org/D118430	2022-01-28 23:11:12 -08:00
Luo, Yuanke	be44177ede	[X86][avx512fp16] Promote fp16 to fp32 for frem. Promote fp16 to fp32 for frem. Differential Revision: https://reviews.llvm.org/D118470	2022-01-29 11:41:27 +08:00
jacquesguan	1276678982	[RISCV] Improve extract_vector_elt for fixed mask registers. Now the backend promotes mask vector to an i8 vector and extract element from that. We could bitcast to a widen element vector, and extract from it to GPR, then use I instruction to extract the certain bit. Differential Revision: https://reviews.llvm.org/D117389	2022-01-29 11:07:53 +08:00
Sheng	e64feaf00f	[M68k][GlobalISel] Legalize more instruction in M68k Legalizer. This patch legalizes more instructions and data types. Differential Revision: https://reviews.llvm.org/D117264	2022-01-29 09:59:58 +08:00
Ahmed Bougacha	634ca7349d	[ObjCARC] Require the function argument in the clang.arc.attachedcall bundle. Currently, the clang.arc.attachedcall bundle takes an optional function argument. Depending on whether the argument is present, calls with this bundle have the following semantics: - on x86, with the argument present, the call is lowered to: call _target mov rax, rdi call _objc_retainAutoreleasedReturnValue - on AArch64, without the argument, the call is lowered to: bl _target mov x29, x29 and the objc runtime call is expected to be emitted separately. That's because, on x86, the objc runtime checks for both the mov and the call on x86, and treats the combination as the ARC autorelease elision marker. But on AArch64, it only checks for the dedicated NOP marker, as that's historically been sufficiently unique. Thanks to that, the runtime call wasn't required to be adjacent to the NOP marker, so it wasn't emitted as part of the bundle sequence. This patch unifies both architectures: on AArch64, we now emit all 3 instructions for the bundle. This guarantees that the runtime call is adjacent to the marker in the sequence, and that's information the runtime can use to further optimize this. This helps simplify some of the handling, in particular BundledRetainClaimRVs, which no longer needs to know whether the bundle is sufficient or not: it now always should be. Note that this does not include an AutoUpgrade for the nullary bundles, as they are only produced in ObjCContract as part of the obj/asm emission pipeline, and are not expected to be in bitcode. Differential Revision: https://reviews.llvm.org/D118214	2022-01-28 12:41:45 -08:00
Jay Foad	68e3946270	[AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects This just helps to keep the lists shorter and faster to sort. NFCI. Differential Revision: https://reviews.llvm.org/D118384	2022-01-28 18:03:42 +00:00
Craig Topper	06bd56d47d	[RISCV] Update comments about getInstSizeInBytes hard-coding the number of bytes. After D118175, we get the information from the tablegen definition. Differential Revision: https://reviews.llvm.org/D118488	2022-01-28 09:51:49 -08:00
Craig Topper	ea05ee9059	[RISCV] Preserve VL when truncating i64 gather/scatter indices on RV32. We were creating a truncate with the default for the type, but for VP intrinsics we have a VL that we should use. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118406	2022-01-28 09:25:30 -08:00
Craig Topper	de0c2d75bf	[RISCV] Use tablegen size for getInstSizeInBytes. Fix the pseudos to have the correct size in the MCInstrDesc description. Inspired by D118009 and D117970. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D118175	2022-01-28 09:21:28 -08:00
Kito Cheng	a9d5bb926d	[RISCV] Use __extendhfsf2/__truncsfhf2 for fp16 <-> fp32 `__gnu_h2f_ieee` and `__gnu_f2h_ieee` are introduce by ARM and set that as default name for fp16 and fp32 conversion in LLVM. However RISC-V GCC using default naming scheme for that, which is `__extendhfsf2` and `__truncsfhf2` for that, that cause runtime ABI incompatible issue. Although we didn't have formal runtime ABI spec to specify those naming convention yet, but I think it would be great to fix the incompatible issue first. And I've plan to create a runtime ABI spec undere psABI spec this year. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D118207	2022-01-29 00:01:00 +08:00
Sanjay Patel	b4b97ec813	[x86] try harder to scalarize a vector load with extracted integer op uses extract_vec_elt (load X), C --> scalar load (X+C) As noted in the comment, DAGCombiner has this fold -- and the code in this patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but x86 should benefit even if the loaded vector has other uses as long as we apply some other x86-specific conditions. The motivating example from #50310 is shown in vec_int_to_fp.ll. Fixes #50310 Differential Revision: https://reviews.llvm.org/D118376	2022-01-28 10:22:52 -05:00
Alex Bradbury	588f121ada	[RISCV][NFC] Make Zb* instruction naming match the convention used elsewhere in the RISC-V backend Where the instruction mnemonic contains a dot, we name the corresponding instruction in the .td file using a _ in the place of the dot. e.g. LR_W rather than LRW. This commit updates RISCVInstrInfoZb.td to follow that convention.	2022-01-28 15:20:37 +00:00
Simon Pilgrim	c7bb3665a1	[X86] SimplifyDemandedBitsForTargetNode - fold MOVMSK(YMM) -> MOVMSK(XMM) If we don't demand the upper elements of the 256-bit vector, then just perform as a 128-bit vector	2022-01-28 14:42:53 +00:00
Amy Kwan	9cc5b064f1	[PowerPC] Update handling of splat loads for v4i32/v4f32/v2i64 to require non-extending loads. This patch updates how splat loads handled and is an extension of D106555. Particularly, for v2i64/v4f32/v4i32 types, they are updated to handle only non-extending loads. For v8i16/v16i8 types, they are updated to handle extending loads only if the memory VT is the same vector element VT type. A test case has been added to illustrate a scenario where a PPCISD::LD_SPLAT node should not be produced. In this test, it depicts the following f64 extending load used in a v2f64 build vector, but the extending load is actually used in more places other than the build vector (such as in t12 and t16). ``` Type-legalized selection DAG: %bb.0 'test:entry' SelectionDAG has 20 nodes: t0: ch = EntryToken t4: i64,ch = CopyFromReg t0, Register:i64 %1 t6: i64,ch = CopyFromReg t0, Register:i64 %2 t11: f64,ch = load<(load (s64) from %ir.b, !tbaa !7)> t0, t4, undef:i64 t16: f64 = fadd t31, t37 t34: ch = store<(store (s64) into %ir.c, !tbaa !7)> t31:1, t16, t6, undef:i64 t36: ch = TokenFactor t34, t37:1 t27: v2f64 = BUILD_VECTOR t37, t37 t22: ch,glue = CopyToReg t36, Register:v2f64 $v2, t27 t12: f64 = fadd t11, t37 t28: ch = store<(store (s64) into %ir.b, !tbaa !7)> t11:1, t12, t4, undef:i64 t31: f64,ch = load<(load (s64) from %ir.c, !tbaa !7)> t28, t6, undef:i64 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t37: f64,ch = load<(load (s32) from %ir.a, !tbaa !3), anyext from f32> t0, t2, undef:i64 t23: ch = PPCISD::RET_FLAG t22, Register:v2f64 $v2, t22:1 ``` Differential Revision: https://reviews.llvm.org/D117803	2022-01-28 08:23:01 -06:00
David Truby	81bd67e18a	[AArch64][SVE][VLS] Move extends into arguments of comparisons When a comparison is extended and it would be free to extend the arguments to that comparison, we can propagate the extend into those arguments. This prevents extra instructions being generated to extend the result of the comparison, which is not free to extend. This is a resubmission of D116812 with fixes that need another review. Differential Revision: https://reviews.llvm.org/D118139	2022-01-28 14:16:08 +00:00
Simon Pilgrim	2a13beaa70	[X86] combineSetCCMOVMSK - don't fold MOVMSK(BITCAST(PCMPEQ(X,0))) -> PTESTZ(X,X) if we're not testing every element comparison	2022-01-28 13:22:37 +00:00
Paul Walker	49178a2c4e	[SVE] Extend isel pattern coverage for BIC. Adds patterns of the form "(and a, (not b)) -> bic". NOTE: With this support I'm inclined to remove AArch64ISD::BIC, but will leave that investigation for another time. Differential Revision: https://reviews.llvm.org/D118365	2022-01-28 13:14:46 +00:00
Simon Pilgrim	cce6490eca	[X86] combineSetCCMOVMSK - match all_of patterns with X86ISD::CMP as well as X86ISD::SUB Previous folds by combineSetCCMOVMSK might have converted these to CMP when changing the bitwidth, and the CMP->SUB fold might not have happened (or will happen)	2022-01-28 11:43:10 +00:00
Simon Pilgrim	93c9b39d25	[X86] Fix MOVMSK(CONCAT(X,Y)) -> MOVMSK(AND/OR(X,Y)) fold for float types and demanded elements rG9103b73fe052 was assuming that we could OR/AND with the source vector, but that will fail on float/double vectors without bitcasting - it also missed the case that any_of checks might be testing less than all the source elements	2022-01-28 11:01:47 +00:00
Rainer Orth	a584b1a4d1	[Sparc] Implement BFD_RELOC_NONE `instrprof-icall-promo.test` `FAIL`s on Solaris/sparcv9: Profile-sparc :: instrprof-icall-promo.test Profile-sparcv9 :: instrprof-icall-promo.test when compiling `compiler-rt/test/profile/Inputs/instrprof-icall-promo_2.cpp` with fatal error: error in backend: Relocation for CG Profile could not be created: unknown relocation name This happens because the Sparc backend doesn't implement `BFD_RELOC_NONE`. This patch fixes that, following what X86 does. Tested on `sparcv9-sun-solaris2.11`. Differential Revision: https://reviews.llvm.org/D118136	2022-01-28 10:44:22 +01:00
Nikita Popov	93122b2567	[ARM] Don't look through pointer types in canTailPredicateLoop() Inspecting the pointer element type here is incompatible with opaque pointers, and doesn't seem necessary to me. I think the intention might have been to check the type of load/store pointer arguments, but I believe those should get checked through their return type or value operand anyway. I don't get any test failures if I simply drop this. Differential Revision: https://reviews.llvm.org/D118353	2022-01-28 09:34:13 +01:00
Christian Sigg	f7da4a5d4d	[NVPTX] Remove fmin/fmax.NaN.f64 again Added in https://reviews.llvm.org/D117204, but it does not exist. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D118398	2022-01-28 07:46:16 +01:00
Heejin Ahn	4f1244d7cc	[WebAssembly] Nullify unnecessary setjmp invokes This is similar to D116619, but now it handles `invoke`s. The reason we didn't handle `invoke`s back then was we didn't support Wasm EH + Wasm SjLj together, and the only case SjLj transformation will see `invoke`s is when we are using Wasm EH. (In Emscripten EH, they would have been transformed to `call`s to invoke wrappers.) But after D117610 we support Wasm EH + Wasm SjLj together and we can nullify `invoke`s to `setjmp` when there is no other longjmpable calls within the function. Actually this is very unlikely to happen in practice, because we treat destructors as longjmpable and also treat `__cxa_end_catch` as longjmpable even if it is not. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D118408	2022-01-27 21:11:10 -08:00
Heejin Ahn	20c1d9ce5e	[WebAssembly] Handle cleanuppad with no parent in Wasm SjLj Wasm SjLj converts longjmpable calls into `invoke`s that unwind to `%catch.longjmp.dispatch` BB, from where we check if the thrown exception is a `longjmp`. But in case a call already has a `funclet` attribute, i.e., it is within a catch scope, we have to unwind to its unwind destination first to preserve the scoping structure. That will eventually unwind to `%catch.longjmp.dispatch`, because all `catchswitch` and `cleanupret` that unwind to caller are redirected to `%catch.dispatch.longjmp` during Wasm SjLj transformation. But the prevous code assumed `cleanuppad`'s parent pad was always an instruction, and didn't handle when a `cleanuppad`'s parent is `none`. This CL handles this case, and makes the `while` loop more intuitive by removing `FromPad` condition and explicitly inserting `break`s. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D118407	2022-01-27 21:10:23 -08:00
Heejin Ahn	26d587cf0f	[WebAssembly] Error out for setjmp within catch clause for Wasm SjLj Wasm EH, used with either of Emscripten SjLj or Wasm SjLj, does not allow `setjmp` calls to be placed within a `catch` clause, because we don't support jumping into a `catch` block. Emscripten EH does not have this restriction. But I think this restriction wouldn't prevent most of use cases. This CL errors out with a clear messsage for this case. Reviewed By: dschuff, kripken Differential Revision: https://reviews.llvm.org/D118286	2022-01-27 21:05:43 -08:00
Heejin Ahn	786da40667	[WebAssembly] Don't copy noreturn attr to invokes When we create an invoke wrapper call, if the original call instruction has a `noreturn` attribute, we shouldn't copy it, because we expect invoke wrapper calls to return. This generated incorrect `free` call before an invoke wrapper call that calls `__cxa_throw`, because `__cxa_throw` has `noreturn` attribute. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D118274	2022-01-27 21:04:43 -08:00
Heejin Ahn	d9517efbb3	[WebAssembly] Treat __cxa_end_catch not longjmpable in Emscripten SjLj In D117610 we treated `__cxa_end_catch` longjmpable even though it was not to make unwind destination relationships correct. But we only need to do this in Wasm SjLj, and doing this in Emscripten SjLj does not make the code incorrect but add unnecessary invokes. This CL treats `__cxa_end_catch` longjmpable only in Wasm SjLj. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D117943	2022-01-27 21:03:23 -08:00
Chenbing.Zheng	6d6c44a3f3	[RISCV] Add support for matching vwmulsu from fixed vectors According to riscv-v-spec-1.0, widening signed(vs2)-unsigned integer multiply vwmulsu.vv vd, vs2, vs1, vm # vector-vector vwmulsu.vx vd, vs2, rs1, vm # vector-scalar It is worth noting that signed op is only for vs2. For vwmulsu.vv, we can swap two ops, and don't care which is sign extension, but for vwmulsu.vx signExt can not be a vector extended from scalar (rs1). I specifically added two functions ending with _swap in the test case. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118215	2022-01-28 02:33:30 +00:00
Jonas Paulsson	9ca9fee6e8	[SystemZ] Don't shrink 64-bit FP constants. Return false from ShouldShrinkFPConstant(), so that these constants are stored in their full size on the constant pool, even if they could have been shrunk and used with an extending load. This is better since LD is faster than LDE, and it also enables reg/mem opcodes. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D117927	2022-01-27 16:14:53 -06:00
Jonas Paulsson	f541a5048a	[SystemZ] Implement orderFrameObjects(). By reordering the objects on the stack frame after looking at the users, a better utilization of displacement operands will result. This means less needed Load Address instructions for the accessing of these objects. This is important for very large functions where otherwise small changes could cause a lot more/less accesses go out of range. Note: this is not yet enabled for SystemZXPLINKFrameLowering, but should be. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D115690	2022-01-27 16:09:19 -06:00
Craig Topper	70e1cc6792	[RISCV] Prefer vmslt.vx v0, v8, zero over vmsle.vi v0, v8, -1. At least when starting from a vmslt.vx intrinsic or ISD::SETLT. We don't handle the case where the user used vmsle.vx intrinsic with -1.	2022-01-27 11:48:27 -08:00
David Green	82973edfb7	[ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics Since it's introduction, the qrdmlah has been represented as a qrdmulh and a sadd_sat. This doesn't produce the same result for all input values though. This patch fixes that by introducing a qrdmlah (and qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah instructions. The old test cases will now produce a qrdmulh and sqadd, as expected. Fixes #53120 and #50905 and #51761. Differential Revision: https://reviews.llvm.org/D117592	2022-01-27 19:19:46 +00:00
Simon Pilgrim	9103b73fe0	[X86] Fold MOVMSK(CONCAT(X,Y)) -> MOVMSK(AND/OR(X,Y)) for all_of/any_of patterns Makes it easier for later folds and avoids unnecessary 256-bit ops (especially on AVX1-only targets where we miss a lot of integer instructions)	2022-01-27 18:28:09 +00:00
Jay Foad	4b133cee80	[AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner Rejecting AGPR DS_WRITE instructions before adding them to any mergeable list seems cleaner than adding them to the list and rejecting them later. Differential Revision: https://reviews.llvm.org/D118368	2022-01-27 18:20:46 +00:00
Jay Foad	94a4594c54	[AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions Using separate lists for AGPR and non-AGPR instructions seems like a cleaner solution than putting them all in the same list and then later refusing to merge instructions of different AGPR-ness. Differential Revision: https://reviews.llvm.org/D118367	2022-01-27 18:20:46 +00:00
Jay Foad	8a52fef1e0	[AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC. Change CombineInfo::setMI to take a reference to the SILoadStoreOptimizer instance, for easy access to common fields like TII and STM. Differential Revision: https://reviews.llvm.org/D118366	2022-01-27 18:20:46 +00:00
Matt Arsenault	33b45ee44b	AMDGPU: Handle addrspacecast of constant 32-bit to flat I accidentally made this work on the GlobalISel path, and there's no real reason not to handle this.	2022-01-27 11:01:44 -05:00
Sander de Smalen	af1c8f0d14	[AArch64][SVE] Folds VSELECT if the predicate is all active. This adds the following changes: * Fold: vselect(<all active predicate>, x, y) => x * Extend isAllActivePredicate to take vscale_range into account, e.g. isAllActivePredicate(vl16) for nxv16i1 and vscale == 1 => true. isAllActivePredicate(vl32) for nxv16i1 and vscale == 2 => true. Differential Revision: https://reviews.llvm.org/D118147	2022-01-27 15:58:56 +00:00
Matt Arsenault	aa88b65392	AMDGPU/GlobalISel: Fix assert on invalid cond code for llvm.amdgcn.icmp	2022-01-27 10:34:06 -05:00
Matt Arsenault	f482e86980	AMDGPU/GlobalISel: Fix flat_scratch_init handling for shaders I don't think this is actually defined for mesa, but this is what we were doing on the DAG path.	2022-01-27 10:20:52 -05:00
Simon Pilgrim	ccda0f2226	[X86][SSE] Add combineBitOpWithShift for BITOP(SHIFT(X,Z),SHIFT(Y,Z)) -> SHIFT(BITOP(X,Y),Z) vector folds InstCombine performs this more generally with SimplifyUsingDistributiveLaws, but we don't need anything that complex here - this is mainly to fix up cases where logic ops get created late on during lowering, often in conjunction with sext/zext ops for type legalization. https://alive2.llvm.org/ce/z/gGpY5v	2022-01-27 14:54:41 +00:00
Jay Foad	185cb8e82c	[AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access Swizzled accesses are not merged, but there is no particular reason not to merge two instructions if any of the intervening instructions happens to be a swizzled access. This moves the check for swizzled accesses out of checkAndPrepareMerge into collectMergeableInsts where I think it makes more sense. Differential Revision: https://reviews.llvm.org/D118267	2022-01-27 14:40:58 +00:00
Simon Pilgrim	389ae775e4	[X86] Fold TESTZ(OR(LO(X),HI(X)),OR(LO(Y),HI(Y))) -> TESTZ(X,Y) Helps fix a number of poor codegen cases for allof(cmp()) with 256-bit vectors on AVX1	2022-01-27 13:20:36 +00:00
Sander de Smalen	dafd1f29da	[AArch64][SVE] Avoid using ptrue for unpredicated predicate AND. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118146	2022-01-27 13:00:23 +00:00
Sander de Smalen	417a75c6d0	[AArch64][SVE] Avoid using ptrue for ptest in VECREDUCE_OR. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118145	2022-01-27 11:44:49 +00:00
Sander de Smalen	c9da81d997	[AArch64][SVE] Implement missing lowering for extract_subvector for predicates. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D118057	2022-01-27 11:01:11 +00:00
Sander de Smalen	d58757e522	[AArch64][SVE] Implement PFALSE with explicit AArch64ISD node. The ISel patterns for PFALSE helps recognise the instructions as being free of side-effects, which helps MachineCSE remove redundant PFALSE instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118054	2022-01-27 10:30:13 +00:00
Jay Foad	95857a7058	[AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile SILoadStoreOptimizer::collectMergeableInsts already ends the current block if it sees a volatile (or ordered) memory access, so there is no need to check for them again when scanning the instructions between two pairing candidates in a block. Differential Revision: https://reviews.llvm.org/D118266	2022-01-27 10:14:53 +00:00
Nikita Popov	fc72f3a168	[BTFDebug] Avoid pointer element type access Use the global value type instead.	2022-01-27 10:30:21 +01:00
Fraser Cormack	84e85e025e	[SelectionDAG][VP] Provide expansion for VP_MERGE This patch adds support for expanding VP_MERGE through a sequence of vector operations producing a full-length mask setting up the elements past EVL/pivot to be false, combining this with the original mask, and culminating in a full-length vector select. This expansion should work for any data type, though the only use for RVV is for boolean vectors, which themselves rely on an expansion for the VSELECT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118058	2022-01-27 09:00:41 +00:00
Wu Xinlong	6a4d3f37b5	[RISCV] fix dead code fix dead code mentioned on https://reviews.llvm.org/D98136 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118323	2022-01-27 16:00:01 +08:00
Zi Xuan Wu	4ad517e6b0	[CSKY] Add floating operation support including float and double CSKY arch has multiple FPU instruction versions such as FPU, FPUv2 and FPUv3 to implement floating operations. For now, we just only support FPUv2 and FPUv3. It includes the encoding, asm parsing of instructions and codegen of DAG nodes.	2022-01-27 15:54:04 +08:00
Wu Xinlong	615d71d9a3	[RISCV][CodeGen] Implement IR Intrinsic support for K extension This revision implements IR Intrinsic support for RISCV Scalar Crypto extension according to the specification of version [[ https://github.com/riscv/riscv-crypto/releases/tag/v1.0.0-scalar \| 1.0]] Co-author：@ksyx & @VincentWu & @lihongliang & @achieveartificialintelligence Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102310	2022-01-27 15:53:35 +08:00
Ting Wang	6f25cb8685	[PowerPC] Add the Power10 XS[MAX\|MIN]CQP instruction Add the Power 10 instruction XS[MAX\|MIN]CQP. Reviewed By: shchenz, amyk Differential Revision: https://reviews.llvm.org/D118036	2022-01-26 23:00:43 -05:00
Craig Topper	b3bec6e453	[RISCV] Use vnsrl.wx with x0 instead of vnsrl.vi for truncate. This matches what the spec uses for the vncvt.x.x.w assembly pseudoinstruction. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D118295	2022-01-26 18:38:13 -08:00
Stanislav Mekhanoshin	409c4436f9	[AMDGPU] Validate dst and src2 non-overlapping restriction in asm Differential Revision: https://reviews.llvm.org/D118089	2022-01-26 15:14:06 -08:00
Stanislav Mekhanoshin	dbf278b984	[AMDGPU] Prevent aliasing of SrcC and Dst in MAI Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read. Note that this inevitably increases register pressure to the point where some programs will become uncompilable. This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst. Fixes: SWDEV-318900 Differential Revision: https://reviews.llvm.org/D117844	2022-01-26 14:48:20 -08:00
Craig Topper	f487a76430	[RISCV] Add hasStdExtZbp() to hasAndNotCompare.	2022-01-26 13:54:05 -08:00
Matt Arsenault	045be6ff36	AMDGPU/GlobalISel: Fold wave address into mubuf addressing modes	2022-01-26 15:25:26 -05:00
Matt Arsenault	09fc311af7	AMDGPU/GlobalISel: Mostly fix BFI patterns Most importantly, fixes constant bus errors in the 64-bit cases. It's surprising to me these were even passing the selection test using SReg_* sources. Also fixes pattern matching in the 32-bit cases, with simple operands. These patterns aren't working in a few cases, like with mixed SGPR inputs. The patterns aren't looking through the SGPR->VGPR copies like they need to. The vector cases also have some unmerges of build_vector which are obscuring the inputs.	2022-01-26 15:06:50 -05:00
Craig Topper	b3d94b199c	[RISCV] Remove references to 'B' extension from AssemblerPredicate and SubtargetFeature strings. For Zba/Zbb/Zbc/Zbs I've removed the 'B' completely and used the extension names as presented at the start of Chapter 1 of the 1.0.0 Bitmanipulation spec. For the unratified extensions, I've replaced 'B' with 'Zb' and otherwise left them unchanged. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D117822	2022-01-26 11:08:29 -08:00
Matt Arsenault	e6564f39c7	AMDGPU: Emit user sgpr count directives in text asm We were emitting these in the object file but not printing them.	2022-01-26 13:51:12 -05:00
Konstantina	aa418b9133	[AMDGPU][SIWholeQuadMode] Use the right VCC register to activate the correct lanes. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D118096	2022-01-26 08:54:39 -08:00
Stanislav Mekhanoshin	4e077c0a0b	[AMDGPU] Remove feature register-banking Since RegBankReassign pass was removed this feature is not use for anything. Differential Revision: https://reviews.llvm.org/D118195	2022-01-26 08:39:17 -08:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Simon Pilgrim	99ae5c13f6	[X86] Add 'getSplitVectorSrc' helper to determine if subvectors all come from the same source Helps determine if the subvector ops come from the same larger vector and match the lower/upper extractions	2022-01-26 15:17:21 +00:00
Nemanja Ivanovic	0c56bc92e4	[PowerPC] Fix eq/ne comparison of v2i64 pre-Power8 In commit `1674d9b6b2`, I fixed the bug where we didn't consider both words of the result of the comparison. However, the logic needs to be different for eq and ne. Namely for eq, we need both words of the doubleword to equal so it is an AND. OTOH for ne, we need either word to be unequal so it is an OR.	2022-01-26 08:59:08 -06:00
Nikita Popov	a5e324e3e2	[AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointers AMDGPUHSAMetadataStreamer currently assumes that pointer arguments without align attribute have ABI alignment of the pointee type. This is incompatible with opaque pointers, but also plain incorrect: Pointer arguments without explicit alignment have alignment 1. It is the responsibility of the frontent to add correct align annotations. Differential Revision: https://reviews.llvm.org/D118229	2022-01-26 15:45:14 +01:00
Alban Bridonneau	2feddb37b4	Implement correct cost for SVE bitcasts We have some bitcasts which we know will be simplified, so their cost is zero. Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D118019	2022-01-26 14:25:44 +00:00
Simon Moll	5ceb0bc7ea	[VE] Packed 32/64bit broadcast isel and tests Packed-mode broadcast of f32/i32 requires the subregister to be replicated to the full I64 register prior. Add repl_i32 and repl_f32 to faciliate this. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D117878	2022-01-26 14:16:06 +01:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
Paul Walker	66bd7ebdf7	[SVE] Use DUPM to handling more splat immediate cases. NOTE: Only considers i64 based vectors at this time because smaller element types require extra isel operand parsing. Differential Revision: https://reviews.llvm.org/D118040	2022-01-26 12:04:44 +00:00
Maciej Gabka	c5263cd518	Restrict performPostLD1Combine to 64 and 128 bit vectors When wider vectors are used, for example fixed width SVE, there is no patterns to select AArch64ISD::LD1LANEpost nodes, so we should do an early exit. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D117674	2022-01-26 09:57:44 +00:00
jacquesguan	267711e38b	[RISCV] Fix support of vlen = 64. In the Zve* extensions, the vlen could be 64. This patch change the vlen constraint of low bound to 64. Differential Revision: https://reviews.llvm.org/D118217	2022-01-26 16:31:21 +08:00
Jim Lin	da1cac7d19	[NFC] Remove duplicate include	2022-01-26 15:10:16 +08:00
Qiu Chaofan	ad0345aed1	[PowerPC] Emit gnu_attribute according to float-abi metadata According to GNU as documentation, PowerPC supports some .gnu_attribute tags to represent the vector and float ABI type in the object file. Some linkers like GNU ld respects the attribute and will prevent objects with conflicting ABIs being linked. This patch emits gnu_attribute value in assembly printer according to the float-abi metadata. More attributes for soft-fp, hard single/double and even vector ABI need to be supported in the future. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D117193	2022-01-26 13:28:50 +08:00

1 2 3 4 5 ...

65876 Commits