llvm-project

Commit Graph

Author	SHA1	Message	Date
Liu, Chen3	57e8f840b6	[X86][FP16] Fix a bug when Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A). This bug was introduced by D109953. The operand order of generated FMA is wrong. Differential Revision: https://reviews.llvm.org/D110606	2021-09-28 11:38:53 +08:00
Jozef Lawrynowicz	6cfb4d46ba	[llvm-readobj] Support dumping of MSP430 ELF attributes The MSP430 ABI supports build attributes for specifying the ISA, code model, data model and enum size in ELF object files. Differential Revision: https://reviews.llvm.org/D107969	2021-09-28 00:56:11 +03:00
Roman Lebedev	2a7a768dad	[X86][Costmodel] Load/store i16 Stride=4 VF=32 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For this tuple, measuring becomes problematic since there's a lot of spilling going on, but apparently all these memory ops do not affect worst-case estimate at all here. For load we have: https://godbolt.org/z/zP4hd8MT6 - for intels `Block RThroughput: =150.0`; for ryzens, `Block RThroughput: <=59` So pick cost of `150`. For store we have: https://godbolt.org/z/vKb8zTK8E - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=24.0` So pick cost of `64`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110548	2021-09-27 22:20:01 +03:00
Roman Lebedev	ee5a050e2e	[X86][Costmodel] Load/store i16 Stride=4 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =75.0`; for ryzens, `Block RThroughput: <=29.5` So pick cost of `75`. (note that `# 32-byte Reload` does not affect throughput there.) For store we have: https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=12.0` So pick cost of `32`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110543	2021-09-27 22:20:01 +03:00
Roman Lebedev	5615d6a6dd	[X86][Costmodel] Load/store i16 Stride=4 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/dd8T5P471 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=14.5` So pick cost of `33`. For store we have: https://godbolt.org/z/zPxcKWhn4 - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=6.0` So pick cost of `10`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110541	2021-09-27 22:20:01 +03:00
Roman Lebedev	df2b42d12e	[X86][Costmodel] Load/store i16 Stride=4 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/rnsf639Wh - for intels `Block RThroughput: =17.0`; for ryzens, `Block RThroughput: <=7.5` So pick cost of `17`. For store we have: https://godbolt.org/z/565KKrcY6 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0` So pick cost of `6`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110537	2021-09-27 22:20:01 +03:00
Roman Lebedev	45caac91c4	[X86][Costmodel] Load/store i16 Stride=4 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/5EYc6r9nh - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `6`. For store we have: https://godbolt.org/z/z61e5d6GE - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110536	2021-09-27 22:20:01 +03:00
Praveen Velliengiri	e90b512c4d	[AMDGPU] Change ASAN init/fini kernels linkage to external. HSA runtime fails to find the symbols for Init and Fini kernels as they mark with internal linkage, changing the linkage to external to fix those errors. Differential Revision: https://reviews.llvm.org/D110054	2021-09-27 11:50:37 -06:00
Quinn Pham	682e15f371	[PowerPC] Fix td pattern for P10 VSLDBI and VSRDBI This patch fixes the pattern for the P10 instructions Vector Shift Left Double by Bit Immediate VN-form and Vector Shift Right Double by Bit Immediate VN-form. The third argument should be a target constant (`timm`) instead of an `i32` because an immediate is expected. Reviewed By: lei Differential Revision: https://reviews.llvm.org/D109920	2021-09-27 12:36:18 -05:00
Craig Topper	a2a07e8db3	[RISCV] Fold store of vmv.x.s to a vse with VL=1. This can avoid a loss of decoupling with the scalar unit on cores with decoupled scalar and vector units. We should support FP too, but those use extract_element and not a custom ISD node so it is a little different. I also left a FIXME in the test for i64 extract and store on RV32. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109482	2021-09-27 09:54:46 -07:00
Craig Topper	933182e948	[RISCV] Improve support for forming widening multiplies when one input is a scalar splat. If one input of a fixed vector multiply is a sign/zero extend and the other operand is a splat of a scalar, we can use a widening multiply if the scalar value has sufficient sign/zero bits. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D110028	2021-09-27 09:37:07 -07:00
Kazu Hirata	b68a62b3a9	[Lanai] Remove redundant declaration getTheLanaiTarget (NFC) Note that getTheLanaiTarget is declared in TargetInfo/LanaiTargetInfo.h, which LanaiDisassembler.cpp includes. Identified with readability-redundant-declaration.	2021-09-27 08:58:27 -07:00
Sebastian Neubauer	bf980930e5	[AMDGPU] Ignore KILLs when forming clauses KILL instructions are sometimes present and prevented hard clauses from being formed. Fix this by ignoring all meta instructions in clauses. Differential Revision: https://reviews.llvm.org/D106042	2021-09-27 16:33:52 +02:00
Roman Lebedev	7424deb743	[X86][Costmodel] Load/store i16 Stride=2 VF=32 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/q6GbK89br - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: <=7.0` So pick cost of `18`. For store we have: https://godbolt.org/z/Yzfoo5TnW - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `8`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110507	2021-09-27 14:21:12 +03:00
Roman Lebedev	a5113e9445	[X86][Costmodel] Load/store i16 Stride=2 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/Y1E7qnjz8 - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.5` So pick cost of `9`. For store we have: https://godbolt.org/z/Y1E7qnjz8 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110506	2021-09-27 14:20:11 +03:00
Roman Lebedev	70c90cc5bd	[X86][Costmodel] Load/store i16 Stride=2 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/e5YE99a4P - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0` So pick cost of `6`. For store we have: https://godbolt.org/z/3vM4KsE1n - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `3`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110505	2021-09-27 14:18:29 +03:00
Roman Lebedev	49e532aa52	[X86][Costmodel] Load/store i16 Stride=2 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1j3nf3dro - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/4n1zvP37j - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110504	2021-09-27 14:15:25 +03:00
David Green	bb2d23dcd4	[ARM] Improve detection of fallthough when aligning blocks We align non-fallthrough branches under Cortex-M at O3 to lead to fewer instruction fetches. This improves that for the block after a LE or LETP. These blocks will still have terminating branches until the LowOverheadLoops pass is run (as they are not handled by analyzeBranch, the branch is not removed until later), so canFallThrough will return false. These extra branches will eventually be removed, leaving a fallthrough, so treat them as such and don't add unnecessary alignments. Differential Revision: https://reviews.llvm.org/D107810	2021-09-27 11:21:21 +01:00
Simon Pilgrim	468ff703e1	[X86] combineVectorHADDSUB - remove the broken HOP(x,x) merging code (PR51974) This intention of this code turns out to be superfluous as we can handle this with shuffle combining, and it has a critical flaw in that it doesn't check for dependencies. Fixes PR51974	2021-09-27 10:41:22 +01:00
Fraser Cormack	d48f6df1f8	[RISCV] Create the correct mask type when lowering EXTRACT_VECTOR_ELT This particular case was creating a `VMSET_VL` using the old fixed-length type in order to pass a mask to other custom nodes operating on the scalable container type. This kind of thing wasn't caught for us; I only noticed when experimenting with odd-length vectors, where it was trying to generate an invalid `v3i1` MVT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110420	2021-09-27 09:43:40 +01:00
Freddy Ye	902ec6142a	[X86][ISel] Lowering FROUND(f16) and FROUNDEVEN(f16) When AVX512FP16 is enabled, FROUND(f16) cannot be dealt with TypeLegalize, and no libcall in libm is ready for fround(f16) now. FROUNDEVEN(f16) has related instruction in AVX512FP16. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110312	2021-09-27 13:35:03 +08:00
Simon Pilgrim	c0eff50fc5	[X86][SSE] combineMulToPMADDWD - enable sext_extend_vector_inreg(vXi16) -> zext_extend_vector_inreg(vXi16) fold The plan is to allow combineMulToPMADDWD to match illegal vector types (as long as they're still pow2), which should allow us to start removing the 128-bit limit on more of the PMADDWD combines.	2021-09-26 19:37:23 +01:00
Simon Pilgrim	ed3e4917b3	[X86] Fold PACK(_EXTEND_VECTOR_INREG, UNDEF) -> _EXTEND_VECTOR_INREG For 128-bit vectors, we can remove a PACK of a EXTEND_VECTOR_INREG node and just create a smaller extension to the result/packed type.	2021-09-26 19:37:22 +01:00
Simon Pilgrim	3fe9767204	[X86] Fold ADD(VPMADDWD(X,Y),VPMADDWD(Z,W)) -> VPMADDWD(SHUFFLE(X,Z), SHUFFLE(Y,W)) Merge addition of VPMADDWD nodes if each element pair doesn't use the upper element in each pair (i.e. its zero) - we can generalize this to either element in the pair if we one day create VPMADDWD with zero lower elements. There are still a number of issues with extending/shuffling with 256/512-bit VPMADDWD nodes so this initially only works for v2i32/v4i32 cases - I'm working on removing all these limitations but there's still a bit of yak shaving to go.....	2021-09-26 18:08:29 +01:00
Kazu Hirata	c4ae4a745d	[RISCV] Remove redundant declaration RISCVMnemonicSpellCheck (NFC) Note that RISCVMnemonicSpellCheck is defined in RISCVGenAsmMatcher.inc, which RISCVAsmParser.cpp includes. Identified with readability-redundant-declaration.	2021-09-26 09:26:57 -07:00
Roman Lebedev	d9413f46b3	[X86][Costmodel] Load/store i16 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/M8vEKs5jY - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/Kx1nKz7je - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103144	2021-09-26 19:13:23 +03:00
Simon Pilgrim	3538ee763d	[CostModel][X86] Improve AVX1/AVX2 v16i32->v16i16/v16i8 truncation costs (PR51972) Based off worst case btver2 (AVX1) and haswell (AVX2) llvm-mca reports	2021-09-26 13:43:46 +01:00
Simon Pilgrim	8c83bd3bd4	[CostModel][X86] Adjust vXi32 multiply costs if it can be performed using PMADDWD Update the costs to match the codegen from combineMulToPMADDWD - not only can we use PMADDWD is its zero-extended, but also if its a constant or sign-extended from a vXi16 (which can be replaced with a zero-extension).	2021-09-25 16:28:48 +01:00
Simon Pilgrim	eb7c78c2c5	[X86][SSE] combineMulToPMADDWD - mask off upper bits of sign-extended vXi32 constants If we are multiplying by a sign-extended vXi32 constant, then we can mask off the upper 16 bits to allow folding to PMADDWD and make use of its implicit sign-extension from i16	2021-09-25 15:50:45 +01:00
Simon Pilgrim	2a4fa0c27c	[X86][SSE] combineMulToPMADDWD - enable sext(v8i16) -> zext(v8i16) fold on sub-128 bit vectors	2021-09-25 15:50:45 +01:00
Kazu Hirata	44c401bdc3	[Mips] Remove redundant declarations (NFC) Note that identical declarations immediately precede what's being removed in this patch. Identified with readability-redundant-declaration.	2021-09-25 07:41:11 -07:00
Simon Pilgrim	f5a26ccae2	[X86][SSE] combineMulToPMADDWD - enable sext(v8i16) -> zext(v8i16) fold on pre-SSE41 targets We already do this on SSE41 targets where we have sext/zext instructions, now that combineShiftToPMULH handles SSE2 targets, we can enable this here as well.	2021-09-25 14:35:31 +01:00
Simon Pilgrim	4c72b10f0a	[X86] X86FastISel::fastMaterializeConstant - break if-else chain to fix llvm-else-after-return warning. NFCI All previous if-else cases return	2021-09-25 14:31:14 +01:00
Simon Pilgrim	a25f25c3b7	[X86] combineShiftToPMULH - relax from ISA from SSE41 to SSE2 With improved shuffle combines (in particular canonicalizeShuffleWithBinOps), we can now usefully perform this on any SSE2+ target. We should be able to remove this entirely and just use DAGCombiner's combineShiftToMULH if we can someday get it to support illegal (pre-widened) types.	2021-09-25 14:08:03 +01:00
David Green	883758ed48	[ARM] Fix Arm block placement creating branches after jump tables. Given: - A jump table - Which jumps to the next block - The next block ends in a WLS - Where the WLS conditionally jumps to block earlier in the program. The Arm block placement pass would attempt to move the block containing the WLS earlier, as the WLS instruction can only branch forward. In doing so it would add a branch from the jumptable block to the WLS block, thinking it previously fell-through. This in itself would be fine, if a little inefficient, but the constant island pass expects all instructions after a jump-table branch to have been removed by analyzeBranch. So it gets confused and can assign the same labels to multiple jump table blocks. I've changed the condition to the same as used in analyzeBranch.	2021-09-25 11:32:25 +01:00
Jim Lin	ed687c0211	[RISCV] Fix incorrect operand type of inst alias for InstR4 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110381	2021-09-25 11:25:12 +08:00
Craig Topper	715cf6ffb9	[RISCV] Add another isel optimization for (and (shl X, c2), c1). Where c1 is a shifted mask with 32-c2 leading zeros and c3 trailing zeros and c3>c2. We can select it as (slli (srliw X, c3-c2), c3).	2021-09-24 15:10:25 -07:00
Stanislav Mekhanoshin	cf74ef134c	[AMDGPU] Limit promote alloca max size in functions Non-entry functions have 32 caller saved VGPRs available. If we promote alloca to consume more registers we will have to spill CSRs. There is no reason to eliminate scratch access to get another scratch access instead. Differential Revision: https://reviews.llvm.org/D110372	2021-09-24 13:38:39 -07:00
Anirudh Prasad	a9ae2436fc	[SystemZ][z/OS] Introduce the GOFFMCAsmInfo Interface for z/OS - This patch adds in the GOFFMCAsmInfo interfaces for the z/OS target. - This patch decouples the previously existing SystemZMCAsmInfo interface for the ELF target and the z/OS target. - This patch also removes a small test in the SystemZAsmLexerTest.cpp. The reason for this is because, the test is set up for the s390x-ibm-linux (SystemZ ELF triple), and the test checks a function which is overridden only for the z/OS target. The reason we can't change the test to use a z/OS triple outright is because there is still missing support which prevents the successful running of a test (assert in AsmParser.cpp due to missing GOFFAsmParser support) Reviewed By: uweigand, abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D110077	2021-09-24 16:25:41 -04:00
Anirudh Prasad	ebe06910ce	[NFC] Replace hard-coded usages of SystemZ::R15D with SpecialRegisters API This patch changes hard-coded usages of SystemZ::R15D with calls to the getStackPointerRegister function. Uses in the LowerCall function are avoided to avoid merge conflicts with an expected upcoming patch. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D109702	2021-09-24 15:20:57 -04:00
Anirudh Prasad	e09a1dc475	[SystemZ][z/OS] Add GOFF Support to the DataLayout - This patch adds in the GOFF mangling support to the LLVM data layout string. A corresponding additional line has been added into the data layout section in the language reference documentation. - Furthermore, this patch also sets the right data layout string for the z/OS target in the SystemZ backend. Reviewed By: uweigand, Kai, abhina.sreeskantharajan, MaskRay Differential Revision: https://reviews.llvm.org/D109362	2021-09-24 14:09:01 -04:00
Victor Huang	6e1aaf18af	[PowerPC] Mark splat immediate instructions as rematerializable This patch marks splat immediate instructions XXSPLTIW and XXSPLTIDP as rematerializable to prevent MachineLICM from moving them out of loops. Reviewed By: lei, amy Differential revision: https://reviews.llvm.org/D108823	2021-09-24 12:03:34 -05:00
Stanislav Mekhanoshin	082e22f3d7	[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all since an attempt to write to SGPRs mapped to FLAT_SCRATCH results in memory violation. This is not needed since GFX10 with architected flat scratch though since special SGPRs are not carving space from normal SGPRs. Differential Revision: https://reviews.llvm.org/D110376	2021-09-24 09:46:31 -07:00
Simon Pilgrim	d8fc9f8727	[X86][SSE] combineMulToPMADDWD - replace sext(v8i16) -> zext(v8i16) As suggested on D108522, if we're sign extending a v4i16 source before multiplying as a v4i32, then we can replace that with a zero extension and rely on the implicit sign-extension of PMADDWD.	2021-09-24 16:42:01 +01:00
Sanjay Patel	09e71c367a	[x86] convert logic-of-FP-compares to FP logic-of-vector-compares This is motivated by the examples and discussion in: https://llvm.org/PR51245 ...and related bugs. By using vector compares and vector logic, we can convert 2 'set' instructions into 1 'movd' or 'movmsk' and generally improve throughput/reduce instructions. Unfortunately, we don't have a complete vector compare ISA before AVX, so I left SSE-only out of this patch. Ie, we'd need extra logic ops to simulate the missing predicates for SSE 'cmpp*', so it's not as clearly a win. Differential Revision: https://reviews.llvm.org/D110342	2021-09-24 11:38:19 -04:00
Hsiangkai Wang	7d39a8a921	[RISCV] (1/2) Add the tail policy argument to builtins/intrinsics. Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list. For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below. In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them. * Use dest argument to control tail policy vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument) vfmerge.vfm (add _t builtins with additional dest argument) vmv.v.v (add _t builtins with additional dest argument) vmv.v.x (add _t builtins with additional dest argument) vmv.v.i (add _t builtins with additional dest argument) vfmv.v.f (add _t builtins with additional dest argument) vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument) vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument) * Always has tail argument for masked/unmasked intrinsics Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Reduction Operations (add _t and _mt builtins) Vector Slideup Instructions (add _t and _mt builtins) Vector Slidedown Instructions (add _t and _mt builtins) Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101 Differential Revision: https://reviews.llvm.org/D105092	2021-09-24 17:09:50 +08:00
Simon Pilgrim	dade83c02a	[X86][SLM] Fix ADDQ/SUBQ/CMPEQQ throughput to account for running on either port. Testing on a SLM box suggests these can run on either port, but the throughput is 4cy on either (inc MMX versions). Confirmed with Intel AoM / Agner / InstLatX64.	2021-09-24 10:06:14 +01:00
Jonas Paulsson	ea92283449	[SystemZ] Implement ISD::BITCAST for fp128 -> i128. The type legalizer has by default no method of doing this bitcast other than storing and reloading the value from stack. This patch implements a custom lowering of this operation using extractions of subregs (z13 and earlier using FP128 register pairs), or of vector elements (with 'vector enhancements 1' using VR128 FP registers). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D110346	2021-09-24 10:26:45 +02:00
Amara Emerson	661ab70314	[AArch64][GlobalISel] Fix crash in the extend(extract_vector_elt) optimization. It was assuming that GPR extends could only have destination sizes of 32 or 64 bits, but for AArch64 we allow < 32 bits even without matching size physregs.	2021-09-23 23:07:16 -07:00
Christudasan Devadasan	7a62a5b56d	[AMDGPU] Legalize initialized LDS variables We don't allow an initializer for LDS variables and there is an early abort during instruction selection. This patch legalizes them by ignoring the init values. During assembly emission, proper error reporting already exists for such instances. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109901	2021-09-23 22:53:20 -04:00

1 2 3 4 5 ...

64315 Commits