llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	8805e5d1b7	[Hexagon] Fix -Wunused-variable in non-assertion builds after `f6e7ad5604`	2022-09-21 14:14:45 -07:00
Jay Foad	5c7ee894f8	AMDGPU: Stop validating earlyclobber operands in assembler This validation was introduced in D34003 for v_qsad/v_mqsad instructions but it applies to all instructions with earlyclobber operands, which now includes v_mad_i64/v_mad_u64. In all these cases I do not think there is documentation saying that the destination must not overlap the sources. Rather there are some cases where the instruction may not function correctly if there is an overlap, and we are using earlyclobber as a conservative way of preventing codegen from generating those cases. I think it is unhelpful for the assembler to enforce the earlyclobber restriction because it prevents assembling cases where the programmer knows that in fact the overlap is safe. See also: https://github.com/llvm/llvm-project/issues/57610 Differential Revision: https://reviews.llvm.org/D134272	2022-09-21 21:46:59 +01:00
Scott Linder	552539bdac	Revert "[NFC][AMDGPU] Refactor AMDGPUDisassembler" This reverts commit `f583151461`.	2022-09-21 18:48:42 +00:00
Krzysztof Parzyszek	f6e7ad5604	[Hexagon] Revamp type legalization of ext/trunc/sat in HVX Resizing operations (e.g. sign extension) in DAG can go from any width to any other width, e.g. i8 -> i32. If the input and the result differ by a factor larger than 2, the operation cannot be legal in HVX, since the only two legal vector sizes in HVX are a single vector and a pair of vectors. To simplify the legalization, such operations are expanded into steps that only double/halve the type size, so that each such step can be fully legalized on its own. The complication is that DAG will automatically fold these steps back into one, e.g. sext(sext) -> sext. To prevent that new HexagonISD nodes are introduced: TL_EXTEND and TL_TRUNCATE. Once legalized, these nodes are replaced with the original opcodes. The type legalization is now common to aext/sext/zext/trunc and Hexagon- specific ssat/usat nodes.	2022-09-21 11:25:27 -07:00
Florian Hahn	ac434afed8	[AArch64] Try to fold shuffle (tbl2, tbl2) to tbl4. shuffle (tbl2, tbl2) can be folded into a single tbl4 if the mask for the selected elements is constant. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133491	2022-09-21 19:15:56 +01:00
Simon Pilgrim	839ba13c3e	[CostModel][X86] Add vbmi2 costs for funnelshift/rotate intrinsics Add costs for the funnel shift instructions - fixes some discrepancies I was hitting with costs numbers from the 'cost-tables vs llvm-mca' script D103695	2022-09-21 13:48:22 +01:00
Kazushi (Jam) Marukawa	eaa263485d	[VE] Remove obsolete ANDrm patterns Remove obsolete ANDrm patterns for MIMM operands. We add these translations to optimize commonly used cast operations before we support MIMM operands directly by each isntruction. Such translations are obsolete now. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134341	2022-09-21 19:23:34 +09:00
David Green	4f78e022ee	[AArch64] Lower scalar sqxtn intrinsics to use fp registers The llvm.aarch64.neon.scalar.sqxtn.i32.i64 intrinsics take and return integer types, but operate on fp registers. This can create some inefficiencies in their lowering, where the registers are converted to fp a little too late. This patch adds lowering for the intrinsics, creating bitcasts to/from fp types to allow nicer folding later when the instructions are selected, especially around insert/extracts. Differential Revision: https://reviews.llvm.org/D134024	2022-09-21 10:46:43 +01:00
Kazushi (Jam) Marukawa	021d05a1ab	[VE][NFC] Change to use l2i/i2l to simplify code We previously added l2i/i2l macros to simpily EXTRACT_SUBREG/INSERT_SUBREG conversions. This patch changes VEInstrInfo.td to use such macros to simplify existing code. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134118	2022-09-21 18:04:29 +09:00
Kazushi (Jam) Marukawa	337e54ec95	[VE] Add maxnum and minnum Add maxnum and minnum for float and double. Lowering is already implemented, so this patch changes them legal and adds regression tests. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134108	2022-09-21 18:03:49 +09:00
Kazushi (Jam) Marukawa	3ee64ea5cf	[VE] Change to expand FMA VE has fused multiply-add instruction for only vector calculations. This patch forces to expand scalar FMA to multiply and add instructions. This patch also adds regression test. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D134107	2022-09-21 18:02:55 +09:00
David Green	9a20596f48	[AArch64] Insert/Extract of bitcast patterns This adds some quick tablegen patterns for vector_insert(bitcast(..)) and bitcast(vector_extract(..)), allowing us to avoid a round-trip through GPRs. Differential Revision: https://reviews.llvm.org/D134022	2022-09-21 09:54:17 +01:00
David Sherwood	64bef3d568	[AArch64][SME] Disable inlining when SME attributes require smstart/smstop or lazy-save. Inlining must be disabled when the call-site needs to toggle PSTATE.SM or when the callee's function body is executed in a different streaming mode than its caller. This is needed because function calls are the boundaries for streaming mode changes. More details about the SME attributes and design can be found in D131562. Differential Revision: https://reviews.llvm.org/D131581	2022-09-21 09:35:47 +01:00
Craig Topper	70a64fe7b1	[RISCV] Remove support for the unratified Zbt extension. This extension does not appear to be on its way to ratification. Out of the unratified bitmanip extensions, this one had the largest impact on the compiler. Posting this patch to start a discussion about whether we should remove these extensions. We'll talk more at the RISC-V sync meeting this Thursday. Reviewed By: asb, reames Differential Revision: https://reviews.llvm.org/D133834	2022-09-20 20:26:48 -07:00
jacquesguan	1cbf44bd50	[RISCV] Support peephole optimization to fold vmerge.vvm that has tail agnostic policy and unmasked intrinsics. This patch supports the tail agnostic part of D130442. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132923	2022-09-21 10:56:37 +08:00
Changpeng Fang	3ae4c3589e	AMDGPU: Implicit kernel arguments related optimization when uniform-workgroup-size=true Summary: Under code object version 5, ockl_get_local_size returns the value computed by the expression: workgroup_id < hidden_block_count ? hidden_group_size : hidden_remainder For functions with the attribute uniform-work-group-size=true. we can evaluate workgroup_id < hidden_block_count as true, and thus hidden_group_size is returned for ockl_get_local_size. With uniform-workgroup-size=true, this work also set all remainders to zero, and if there is reqd_work_group_size, we also set work-group-size to the required value from the metadata. Reviewers: arsenm and bcahoon Differential Revision: https://reviews.llvm.org/D131276	2022-09-20 17:25:52 -07:00
Scott Linder	f583151461	[NFC][AMDGPU] Refactor AMDGPUDisassembler Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler. Use lit.local.cfg substitutions and more idiomatic use of split-file to simplify and extend existing kernel-descriptor disassembly tests. Add a comment to AMDHSAKernelDescriptor.h, as at least one small set towards keeping all kernel-descriptor sensitive code in sync. Reviewed By: kzhuravl, arsenm Differential Revision: https://reviews.llvm.org/D130105	2022-09-20 20:37:19 +00:00
Anshil Gandhi	a0c53524a5	[AMDGPU] Fix size of SOPK instructions to 4 bytes Instructions in SOPK format may not have 32-bit literal constants following the instruction. Differential Revision: https://reviews.llvm.org/D133972	2022-09-20 14:27:09 -06:00
Matt Arsenault	28e03692ae	AMDGPU: Fix expansion of 16-bit atomicrmw Fixes issue 57830	2022-09-20 14:47:40 -04:00
Anton Sidorenko	3cd503f181	[NFC][RISCV] Move calculations of SDNode policy operand idx to a separate function Since there is no guaranteed correspondence of SDNode and MI operands, we need getters simular to RISCVII::get*OpNum for SDNodes. More uses of getVecPolicyOpIdx will be added in D130895. Reviewed By: craig.topper, arcbbb Differential Revision: https://reviews.llvm.org/D134179	2022-09-20 10:36:47 -07:00
Philip Reames	eda2af575f	[RISCV][MC] Add support for experimental Zawrs extension This implements experimental support for the Zawrs extension as specified here: https://github.com/riscv/riscv-zawrs/releases/download/V1.0-rc3/Zawrs.pdf. Despite the 1.0 version name, this has not been ratified and there was a major change to proposed specification between rc2 and rc3. Once this is ratified, it'll move out of experimental status. This change adds assembly support, but does not include C language or IR intrinsics. We can decide if we want them, and handle that in a separate patch. Differential Revision: https://reviews.llvm.org/D133443	2022-09-20 10:15:11 -07:00
Jay Foad	f19cc793d2	[AMDGPU] Disable fp atomic to s_denorm_mode hazard for GFX11 This hazard only exists on GFX10. Differential Revision: https://reviews.llvm.org/D134276	2022-09-20 17:40:49 +01:00
David Green	cb375e8c1f	[AArch64] Enable LSLFast for modern OoO cpus This patch enables the LSLFast feature for Cortex-A76, Cortex-A77, Cortex-A78, Cortex-A78C, Cortex-A710, Cortex-X1, Cortex-X2, Neoverse N1, Neoverse N2, Neoverse V1 and the Neoverse 512TB pseudo-cpu, in-line with the software optimization guides for those CPUs. Differntial revision: https://reviews.llvm.org/D134273	2022-09-20 17:09:14 +01:00
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Simon Pilgrim	0015edeefd	Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.	2022-09-20 14:24:07 +01:00
Caroline Concatto	d32b8fdbdb	[LLVM][AArch64] Replace aarch64.sve.ld by aarch64.sve.ldN.sret This patch removes the intrinsic aarch64.sve.ldN from tablegen in favour of using arch64.sve.ldN.sret. Depends on: D133023 Differential Revision: https://reviews.llvm.org/D133025	2022-09-20 13:15:07 +01:00
gonglingqin	7328ff75ba	[LoongArch] Add codegen support for fmaxnum_ieee and fminnum_ieee Thanks for @xry111's previous bug fixes. See https://github.com/loongson/llvm-project/pull/1 for more details. Differential Revision: https://reviews.llvm.org/D133478	2022-09-20 19:22:32 +08:00
Simon Pilgrim	70582bc4d3	Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warnings. NFCI.	2022-09-20 10:35:32 +01:00
Serge Pavlov	181279ffcd	[X86][GlobalISel] Add support for sret demotion The change add support for the cases when return value is passed in memory rathen than in registers. Differential Revision: https://reviews.llvm.org/D134181	2022-09-20 11:47:53 +07:00
Craig Topper	94049db913	[RISCV] Make computeIncomingVLVTYPE more conservative when merging predecessor state. If we have already calculated the incoming state before, use that as our starting point to ensure we are conservative. This fixes an infinite loop found in our downstream where we we allowed two waves of updates to propagate through a loop and the merge points allowed us to toggle back and forth between states. No small reproducer right now. Differential Revision: https://reviews.llvm.org/D134229	2022-09-19 15:57:55 -07:00
Alexander Timofeev	2e8817b90a	[AMDGPU] SIFixSGPRCopies reworking to use one pass over the MIR for analysis and lowering. This change finalizes the series of patches aiming to replace the old strategy of VGPR to SGPR copy lowering. # Following the https://reviews.llvm.org/D128252 and https://reviews.llvm.org/D130367 code parts that are no longer used were removed. # The first pass over the MachineFunctoin collects all the necessary information. # Lowering is done in 3 phases: - VGPR to SGPR copies analysis lowering - REG_SEQUENCE, PHIs, and SGPR to VGPR copies lowering - SCC copies lowering is done in a separate pass over the Machine Function Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131246	2022-09-19 23:31:45 +02:00
Craig Topper	0cec96ab25	[RISCV] Manage the InQueue flag in insertvli correctly. We were only setting this flag the first time we added the blocks not when we mark them for revisiting. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D134193	2022-09-19 14:28:22 -07:00
Haojian Wu	eec19987c0	Fix one more unused warning in release build, NFC	2022-09-19 20:56:39 +02:00
Haojian Wu	20822e2d42	Fix an unused warning in release build, NFC	2022-09-19 20:45:51 +02:00
Krzysztof Parzyszek	94a71361d6	[Hexagon] Implement [SU]INT_TO_FP and FP_TO_[SU]INT for HVX	2022-09-19 11:11:20 -07:00
Krzysztof Parzyszek	ec51e38062	[Hexagon] Add HVX patterns for ISD::ABS	2022-09-19 10:12:15 -07:00
Krzysztof Parzyszek	3eee45cdc8	[Hexagon] Rework SplitHvxPairOp to be a general vector splitting utiity Enable creating an idiom: V -> opJoin(SplitVectorOp(V))	2022-09-19 09:42:13 -07:00
Simon Pilgrim	6b4d409f69	[CostModel][X86] Add CostKinds handling for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-19 17:37:58 +01:00
Krzysztof Parzyszek	e5844462f6	[Hexagon] Use proper output chain when widening HVX loads	2022-09-19 09:04:13 -07:00
Simon Pilgrim	135c9b2c4b	[CostModel][X86] Add CostKinds handling for vector ctlz instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-19 16:44:09 +01:00
Simon Pilgrim	2538adde5c	[CostModel][X86] Add CostKinds handling for cttz This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-19 15:57:03 +01:00
Simon Pilgrim	d90a42d64c	[CostModel][X86] Add CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF cost handling Without LZCNT/BMI, the *_ZERO_UNDEF costs are cheaper as they can avoid the zero handling.	2022-09-19 14:06:33 +01:00
David Green	908b3b6ccb	[AArch64] Use fast-math-flags in isAssociativeAndCommutative Previously only using the UnsafeFPMath option, this now looks for the fast moth flags on the instructions, using the same flag flags as other backends.	2022-09-19 11:34:00 +01:00
LiaoChunyu	2e74157ad4	[RISCV]Preserve (and X, 0xffff) in targetShrinkDemandedConstant shrinkdemandedconstant does some optimizations, but is not very friendly to riscv, targetShrinkDemandedConstant to limit the damage. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134155	2022-09-19 14:19:38 +08:00
LiaoChunyu	8fee91c435	[RISCV][NFC]Remove outdated comment from targetShrinkDemandedConstant Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134154	2022-09-19 10:23:06 +08:00
Kazu Hirata	cf07277fb4	[X86] Fix the LEA optimization pass The LEA optimization pass visits each basic block of a given machine function. In each basic block, for each pair of LEAs that differ only in their displacement fields, we replace all uses of the second LEA with the first LEA while adjusting the displacement. Now, without this patch, after all the replacements are made, the following assert triggers: assert(MRI->use_empty(LastVReg) && "The LEA's def register must have no uses"); The replacement loop uses: for (MachineOperand &MO : llvm::make_early_inc_range(MRI->use_operands(LastVReg))) { which is equivalent to: for (auto UI = MRI->use_begin(LastVReg), UE = MRI->use_end(); UI != UE;) { MachineOperand &MO = UI++; // <-- Look! That is, immediately after the post increment, make_early_inc_range already has the iterator for the next iteration in its mind. The problem is that in one iteration of the loop, we could replace two uses in a debug instruction like: DBG_VALUE_LIST !"r", !DIExpression(DW_OP_LLVM_arg, 0), %0:gr64, %0:gr64, ... So, the iterator for the next iteration becomes invalid. We end up traversing a garbage use list from that point on. In turn, we don't get to visit remaining uses. The patch fixes the problem by switching to a "draining" while loop: while (!MRI->use_empty(LastVReg)) { MachineOperand &MO = MRI->use_begin(LastVReg); MachineInstr &MI = *MO.getParent(); The credit goes to Simon Pilgrim for reducing the test case. Fixes https://github.com/llvm/llvm-project/issues/57673 Differential Revision: https://reviews.llvm.org/D133631	2022-09-18 17:50:17 -07:00
Carl Ritson	930315f6aa	[AMDGPU] Fix isSGPRReg for special registers Special registers, e.g. MODE, do not have register classes so will cause null pointer exception if passed to isSGPRReg. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134025	2022-09-19 08:49:43 +09:00
Kazu Hirata	20d764aff0	[llvm] Don't including SetVector.h (NFC) llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without including SetVector.h, so this patch adds an appropriate #include there.	2022-09-17 12:36:43 -07:00
Sander de Smalen	bed214cf0f	[AArch64][SME] Add intrinsics for enabling/disabling ZA. This adds the intrinsics: * void @llvm.aarch64.sme.za.enable() -> smstart za * void @llvm.aarch64.sme.za.disable() -> smstop za Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D133894	2022-09-17 16:41:42 +00:00
Sander de Smalen	5fae000f36	[AArch64][SME] Disable tail-call optimization when streaming mode change or lazy-save may be required. When a streaming mode change is (or may be) required for a call, it will need to restore the original mode after the call, which prevents the use of tail-call optimization. The same holds true for a call that requires the lazy-save mechanism to be set up before the call, and possibly restored after. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131579	2022-09-17 16:15:07 +00:00
Jessica Paquette	1076b31da8	[GlobalISel] Combine select + fcmp to fminnum/fmaxnum/fminimum/fmaximum This is a partial port of the code used by the SelectionDAGBuilder to translate selects. In particular, see matchSelectPattern in ValueTracking.cpp. This is a GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum. I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler. On the AArch64-end, it seems like the FP cases are more important for perf right now, so I bit the bullet and went at the more complicated problem. :) I elected to do this as a post-legalize combine rather than in the IRTranslator because Deciding which fmax/fmin to use can depend on legalization rules Philosophically-speaking (TM), putting it in a combine just feels cleaner Being able to enable/disable the combine is handy Another option would be to use the ValueTracking code in the IRTranslator and match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat annoying since we'd need to write lowerings back into the selects in the legalizer. I'm not strongly opposed to the approach. We'd also want to be careful with vector selects once that's implemented, which explicitly check if a vector select is legal on the target. That'd probably need a hook. From what I can tell, doing this as a combine is probably a cleaner option long-term. Differential Revision: https://reviews.llvm.org/D116702	2022-09-16 13:35:46 -07:00
Craig Topper	61595c45af	[RISCV] Simplify some code in vector fp<->int handling. NFC We changed the way container types are selected since this code was written. We no longer need to use the largest type.	2022-09-16 12:56:42 -07:00
David Majnemer	8a868d8859	Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"" This reverts commit `cd20a18286` and adds a "let Heading" to NoStackProtectorDocs.	2022-09-16 19:39:48 +00:00
Simon Pilgrim	23cb1c42cd	[CostModel][X86] Update throughput costs for CTLZ ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (and recent fixes to the bdver2 + alderlake models) Adding full CostKinds costs are affecting some other tests as they make assumptions about SizeLatency costs, so they need addressing first	2022-09-16 16:56:49 +01:00
Dmitry Preobrazhensky	ef8feb6359	[AMDGPU][MC][NFC] Correct error message Differential Revision: https://reviews.llvm.org/D134028	2022-09-16 18:22:08 +03:00
Sander de Smalen	bd4935c175	[AArch64][SME] Implement ABI for calls from streaming-compatible functions. When a function is streaming-compatible and calls a function with a normal or streaming interface, it may need to enable/disable stremaing mode before the call, and needs to restore PSTATE.SM after the call. This patch implements this with a Pseudo node that gets expanded to a conditional branch and smstart/smstop node. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131578	2022-09-16 14:48:37 +00:00
Simon Pilgrim	89e4cb603d	[X86] Add missing (unsupported) zmm vector move classes Although unsupported on HSW, we reuse this model for KNL which does require them Noticed when running the cost model fuzz script from D103695 with -mcpu=knl	2022-09-16 15:31:26 +01:00
Sander de Smalen	b00c36c295	[AArch64][SME] Implement ABI for calls to/from streaming functions. This patch implements the ABI for calls from: Normal -> Streaming Normal -> Streaming-compatible Streaming -> Normal Streaming -> Streaming-compatible Streaming -> Streaming The compiler inserts SMSTART/SMSTOP instructions before and after the call, depending on the required transition. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131576	2022-09-16 14:07:47 +00:00
Florian Hahn	6b86b481e3	[AArch64] Use tbl for truncating vector FPtoUI conversions. On AArch64, doing the vector truncate separately after the fptoui conversion can be lowered more efficiently using tbl.4, building on D133495. https://alive2.llvm.org/ce/z/T538CC Depends on D133495 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133496	2022-09-16 14:57:43 +01:00
Simon Pilgrim	f8fa04295f	[CostModel][X86] Add CostKinds handling for vector integer comparisons These were based off a mixture of vector integer add/sub costs and the numbers from the 'cost-tables vs llvm-mca' script from D103695 - the extra costs for different predicates are still proving tricky to implement, but I've gotten most costs to within +/1 now - the AVX512 are tricky as we still don't handle predicate results properly, so most of these were done by hand.	2022-09-16 13:03:41 +01:00
Florian Hahn	8491d01cc3	[AArch64] Lower vector trunc using tbl. Similar to using tbl to lower vector ZExts, tbl4 can be used to lower vector truncates. The initial version support i32->i8 conversions. Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133495	2022-09-16 12:42:49 +01:00
Florian Hahn	5871f18827	[AArch64] Lower extending uitofp using tbl. On AArch64, doing the zero-extend separately first can be lowered more efficiently using tbl, building on D120571. https://alive2.llvm.org/ce/z/8Je595 Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133494	2022-09-16 10:20:25 +01:00
Philip Reames	fdff1bb103	[RISCV] Verify merge operand is tied properly Differential Revision: https://reviews.llvm.org/D133957	2022-09-15 13:06:52 -07:00
Philip Reames	32cfafddb1	[RISCV] Verify VL operand on instructions if present These should only be immediate values or GPR registers. Differential Revision: https://reviews.llvm.org/D133953	2022-09-15 13:06:52 -07:00
Alexander Timofeev	fbdea5a2e9	[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133593	2022-09-15 22:03:56 +02:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Simon Pilgrim	94620e4fc3	[CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 16:51:58 +01:00
Jay Foad	3822a01e0b	[AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction Differential Revision: https://reviews.llvm.org/D133928	2022-09-15 16:46:14 +01:00
Matt Arsenault	69153d6c0a	AMDGPU: Use GlobalPriority for largest register tuples Only do this for 16 and 32 register tuples, although we might want to extend to 8 tuples. It's incredibly expensive to spill these, and doing so majorly interferes with the ability to allocate anything else in the function. The lit tests show mostly sizeable improvements with a handful of tiny regressions with large vectors.	2022-09-15 11:45:02 -04:00
Sander de Smalen	45d28779c5	[AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm() A thread may not have access to SME or TPIDR2_EL0, so in order to safely query PSTATE.SM in a streaming-compatible function, the code should call `__arm_sme_state()`, as described in the ABI: `c2bb09c4d4` This means that the value of pstate.sm is: * 0 if the function is non-streaming. * 1 if the function has `arm_streaming` or `arm_locally_streaming`. * evaluated at runtime by a call to __arm_sme_state() otherwise. This patch also adds a calling convention for calls to SME support routines. At some point we can remove the need for the llvm.aarch64.get.pstatesm() intrinsic and use function calls (with the corresponding cc) directly instead. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131571	2022-09-15 15:14:13 +00:00
Matt Arsenault	63d1d37d35	RegAllocGreedy: Avoid overflowing priority bitfields The class priority is expected to be at most 5 bits before it starts clobbering bits used for other fields. Also clamp the instruction distance in case we have millions of instructions. AMDGPU was accidentally overflowing into the global priority bit in some cases. I think in principal we would have wanted this, but in the cases I've looked at, it had the counter intuitive effect and de-prioritized the large register tuple. Avoid using weird bit hack PPC uses for global priority. The AllocationPriority field is really 5 bits, and PPC was relying on overflowing this to 6-bits to forcibly set the global priority bit. Split this out as a separate flag to avoid having magic behavior for values above 31.	2022-09-15 10:38:40 -04:00
Dmitry Preobrazhensky	0e868aff43	[AMDGPU][MC][GFX11] Add validation of constant bus limitations for VOPD Differential Revision: https://reviews.llvm.org/D133881	2022-09-15 16:36:19 +03:00
Dmitry Preobrazhensky	c89e60bf1f	[AMDGPU][MC][GFX11] Add VOPD literals validation Differential Revision: https://reviews.llvm.org/D133864	2022-09-15 16:29:53 +03:00
Dmitry Preobrazhensky	8bb5c89205	[AMDGPU][MC][NFC] Refactor AMDGPUAsmParser::validateVOPLiteral Differential Revision: https://reviews.llvm.org/D133861	2022-09-15 16:26:14 +03:00
Simon Pilgrim	0ec028fe10	[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 14:05:30 +01:00
wanglei	a65557d4b3	[LoongArch] Fixup value adjustment in applyFixup A complete implementation of `applyFixup` for D132323. Makes `LoongArchAsmBackend::shouldForceRelocation` to determine if the relocation types must be forced. This patch also adds range and alignment checks for `b*` instructions' operands, at which point the offset to a label is known. Differential Revision: https://reviews.llvm.org/D132818	2022-09-15 21:00:22 +08:00
Ivan Kosarev	693f816288	[AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D133787	2022-09-15 13:48:51 +01:00
Ilia Diachkov	3544d200d9	[SPIRV] add IR regularization pass The patch adds the regularization pass that prepare LLVM IR for the IR translation. It also contains following changes: - reduce indentation, make getNonParametrizedType, getSamplerType, getPipeType, getImageType, getSampledImageType static in SPIRVBuiltins, - rename mayBeOclOrSpirvBuiltin to getOclOrSpirvBuiltinDemangledName, - move isOpenCLBuiltinType, isSPIRVBuiltinType, isSpecialType from SPIRVGlobalRegistry.cpp to SPIRVUtils.cpp, renaming isSpecialType to isSpecialOpaqueType, - implment getTgtMemIntrinsic() in SPIRVISelLowering, - add hasSideEffects = 0 in Pseudo (SPIRVInstrFormats.td), - add legalization rule for G_MEMSET, correct G_BRCOND rule, - add capability processing for OpBuildNDRange in SPIRVModuleAnalysis, - don't correct types of registers holding constants and used in G_ADDRSPACE_CAST (SPIRVPreLegalizer.cpp), - lower memset/bswap intrinsics to functions in SPIRVPrepareFunctions, - change TargetLoweringObjectFileELF to SPIRVTargetObjectFile in SPIRVTargetMachine.cpp, - correct comments. 5 LIT tests are added to show the improvement. Differential Revision: https://reviews.llvm.org/D133253 Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com> Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com> Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>	2022-09-15 15:53:44 +03:00
esmeyi	6e0e926c2f	[PowerPC] Converts to comparison against zero even when the optimization doesn't happened in peephole optimizer. Summary: Converting a comparison against 1 or -1 into a comparison against 0 can exploit record-form instructions for comparison optimization. The conversion will happen only when a record-form instruction can be used to replace the comparison during the peephole optimizer (see function optimizeCompareInstr). In post-RA, we also want to optimize the comparison by using the record form (see D131873) and it requires additional dataflow analysis to reliably find uses of the CR register set. It's reasonable to common the conversion for both peephole optimizer and post-RA optimizer. Converting to comparison against zero even when the optimization doesn't happened in peephole optimizer may create additional opportunities for the post-RA optimization. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D131374	2022-09-15 06:06:25 -04:00
Marco Elver	72e7575ffe	[GlobalISel][AArch64] Fix pcsections for expanded atomics and add more tests Add fix for propagation of !pcsections metadata for expanded atomics, together with more tests for interesting atomic instructions (based on llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133710	2022-09-15 10:36:11 +02:00
Sheng	bea33f75e2	[M68k] Fix the crash of fast register allocator `MOVEM` is used to spill the register, which will cause problem with 1 byte data, since it only supports word (2 bytes) and long (4 bytes) size. We change to use the normal `move` instruction to spill 1 byte data. Fixes #57660 Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D133636	2022-09-15 09:24:22 +08:00
Craig Topper	5888c157a7	[RISCV] Simplify some code in RISCVInstrInfo::verifyInstruction. NFCI This code was written as if it lived in the MC layer instead of the CodeGen layer. We get the MCInstrDesc directly from MachineInstr. And we can use RISCVSubtarget::is64Bit instead of going to the Triple. Differential Revision: https://reviews.llvm.org/D133905	2022-09-14 17:07:21 -07:00
Philip Reames	e395915ac0	[RISCV] Verify SEW/VecPolicy immediate values Copy the asserts from the printing code, and turn them into actual verifier rules. Doing this revealed an existing bug - see `0a14551`. Differential Revision: https://reviews.llvm.org/D133869	2022-09-14 14:45:16 -07:00
Philip Reames	0a145516a2	[RISCV] Fix a silent miscompile in copyPhysReg Found this when adding verifier rules. The case which arises is that we have a DefMBBI which has a VecPolicy operand. The code was not expecting this, and the unconditional copy of the last two operands resulted in the SEW and VecPolicy fields being added to the VMV_V_V as AVL and SEW respectively. Oddly, this appears to be a silent in practice. There's no test change despite verifier changes proving that we definitely hit this in existing tests. Differential Revision: https://reviews.llvm.org/D133868	2022-09-14 14:45:01 -07:00
Piotr Sobczak	abd927e5a8	[AMDGPU] Check for num elts in SelectVOP3PMods The rest of the code section assumes there are exactly two elements in the vector (Lo, Hi), so add the check before entering the section. Differential Revision: https://reviews.llvm.org/D133852	2022-09-14 20:00:19 +02:00
David Spickett	3acaf04033	[LLVM][AArch64] Don't warn about clobbering X16 when Speculative Load Hardening is used SLH will fall back to a different technique if X16 is being used, so there is no need to warn for inline asm use. Only prevent other codegen from using it. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D133766	2022-09-14 15:19:53 +00:00
Zain Jaffal	d1dec04d76	[AArch64] Disable nontemproal load for Big Endian The current code for generating nontemporal load outputs the wrong assembly for big endian architecture. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133789	2022-09-14 14:49:55 +01:00
Simon Pilgrim	854a4595b6	[CostModel][X86] getArithmeticInstrCost - move GLM/SLM custom costs AFTER constant shift -> multiply canonicalization Corrects the shift by constant costs to better account for them being converted to multiples for lowering - which demonstrates that we should probably be trying harder NOT to convert these to multiplies for some CPUs (v4i32 in particular).	2022-09-14 11:46:26 +01:00
Simon Pilgrim	40ab7875f8	[CostModel][X86] Fix throughput costs for AVX512BW v32i16 shifts Fixes regression from `a931dbfbd3`	2022-09-14 11:18:23 +01:00
Jon Chesterfield	cdb9738963	[amdgpu] Expand all ConstantExpr users of LDS variables in instructions Bug noted in D112717 can be sidestepped with this change. Expanding all ConstantExpr involved with LDS up front makes the variable specialisation simpler. Excludes ConstantExpr that don't access LDS to avoid disturbing codegen elsewhere. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133422	2022-09-14 07:55:46 +01:00
Ruiling Song	0404aafbe3	AMDGPU: Factor out hasDivergentBranch(). NFC This is helpful for detecting whether a block ends with divergent branch in passes before lowering the pseudo control flow instructions. Differential Revision: https://reviews.llvm.org/D133184	2022-09-14 13:27:21 +08:00
Antonio Frighetto	c63e05dc07	[AArch64InstPrinter] Introduce register markup tags emission AArch64 assembly syntax emission now leverages markup tags for registers, if enabled. Reviewed By: MaskRay, david-arm Differential Revision: https://reviews.llvm.org/D129870	2022-09-13 20:52:02 -07:00
jacquesguan	ecf327f154	[RISCV] Add cost model for vector insert/extract element. This patch adds cost model for vector insert/extract element instructions. In RVV, we could use vector scalar move instruction to insert or extract the first element, and use vslide to move it. But for mask vector or i64 vector in i32 target, we need special instructions to make it. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133007	2022-09-14 11:10:18 +08:00
Shivam Gupta	e2632fbcdd	[NVPTX] Use MBB.begin() instead MBB.front() in NVPTXFrameLowering::emitPrologue The second argument of `NVPTXFrameLowering::emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB)` is the first MBB of the MF. In that function, it assumes the first MBB always contains instructions, so it gets the first instruction by MachineInstr *MI = &MBB.front();. However, with the reproducer/test case attached, all instructions in the first MBB is cleared in a previous pass for stack coloring. As a consequence, MBB.front() triggers the assertion that the first node is actually a sentinel node. Hence we are using MachineBasicBlock::iterator to iterate over MBB. Fix #52623. Differential Revision: https://reviews.llvm.org/D132663	2022-09-14 08:30:55 +05:30
Yeting Kuo	1b56b2b267	[RISCV] Transform VMERGE_VVM_<LMUL>_TU with all ones mask to VADD_VI_<LMUL>_TU. The transformation is benefit because vmerge.vvm always needs mask operand but vadd.vi may not. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133255	2022-09-14 10:01:37 +08:00
Han-Kuan Chen	dd53a0bb30	[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type. Differential Revision: https://reviews.llvm.org/D133688	2022-09-13 18:50:20 -07:00
gonglingqin	7fd7d48b4b	[LoongArch] Categorize code by function. NFC. Differential Revision: https://reviews.llvm.org/D133754	2022-09-14 09:48:26 +08:00
Fangrui Song	ab1c259613	[RISCV] Assemble `call foo` to R_RISCV_CALL_PLT R_RISCV_CALL/R_RISCV_CALL_PLT distinction isn't necessary. R_RISCV_CALL has been deprecated as a resolution to https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/98 . ld.lld and mold treat the two relocation types the same. GNU ld has a custom handling for undefined weak functions which is unnecessary: calling an unresolved undefined weak function is UB and GNU ld can handle the case without a relocation error (such a function call is usually guarded by a zero value check and should be allowed). This patch assembles `call foo` to use R_RISCV_CALL_PLT instead of the deprecated R_RISCV_CALL. Note: the code generator still differentiates `call foo` and (maybe preemptible) `call foo@plt`, but the difference is purely aesthetic. Note: D105429 does not support R_RISCV_CALL_PLT correctly. Changed the test to force R_RISCV_CALL for now. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D132530	2022-09-13 18:47:55 -07:00
Fanchen Kong	28557e8c98	[WebAssembly] Improve codegen for shuffles with undefined lane indices For undefined lane indices, fill the mask with {0..N} instead of zeros to allow further reduction to word/dword shuffle on the VM. Reviewed By: tlively, penzn Differential Revision: https://reviews.llvm.org/D133473	2022-09-13 16:03:18 -07:00
Philip Reames	09d73fe8cd	[RISCV] Add MIR comments for VecPolicy operands Analogous to what we already do for SEW operands, aimed at making the resulting MIR readable by a human.	2022-09-13 15:36:33 -07:00
Philip Reames	cc45687e1c	[RISCV] Simpify operand index calculation in createMIROperandComment [nfc]	2022-09-13 15:06:40 -07:00
Xiang Li	1e3d4c4344	[DirectX backend] Remove Attribute not for DXIL on CallInst Remove Attribute on CallInst which is not for DXIL when prepare for DXIL. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D133279	2022-09-13 13:45:19 -07:00
Jay Foad	2e8863b6a1	[AMDGPU] Don't shrink VOP3 instructions pre-RA on GFX10+ In GFX10, there is no advantage to shrinking these instructions pre-RA, so this just saves a bit of work. In GFX11 there is an advantage to not shrinking them pre-RA, because the register classes for 16-bit operands are less restrictive in the VOP3 form than in the shrunk form. This patch is a prerequisite for actually setting up those register classes correctly for 16-bit vs non-16-bit operands. Differential Revision: https://reviews.llvm.org/D133769	2022-09-13 20:26:08 +01:00
Craig Topper	8d7e73effe	[RISCV] Teach lowerVECTOR_SHUFFLE to recognize some shuffles as vnsrl. Unary shuffles such as <0,2,4,6,8,10,12,14> or <1,3,5,7,9,11,13,15> where half the elements are returned, can be lowered using vnsrl. SelectionDAGBuilder lowers such shuffles as a build_vector of extract_elements since the mask has less elements than the source. To fix this, I've enable the extractSubvectorIsCheapHook to allow DAGCombine to rebuild the shuffle using 2 extract_subvectors preceding the shufffle. I've gone very conservative on extractSubvectorIsCheapHook to minimize test impact and match what we have test coverage for. This can be improved in the future. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133736	2022-09-13 11:07:11 -07:00
Alex Bradbury	c44c1e9d3e	[RISCV] Implement isMaskAndCmp0FoldingBeneficial hook This hook is currently only used by CodeGenPrepare, which will sink and duplicate an 'and' into a block that has an 'icmp 0' user of it if the hook returns true. This hook is less useful for RISC-V than for targets like AArch64 that have a TBZ (test bit and branch if zero instruction), but may still be profitable if Zbs is available and a BEXTI can be selected. Conservatively, we return false even if Zbs is enabled for any masks that fit in the ANDI immediate because it's possible the only use is a branch on the result, and ANDI+BNEZ => BEXTI+BNEZ isn't a profitable transformation. Differential Revision: https://reviews.llvm.org/D131492	2022-09-13 18:54:00 +01:00
Alex Bradbury	547160848c	[RISCV] Return true in hasBitTest when Zbs is enabled and update BEXTI pattern for resulting canonicalisation As the Zbs extension includes bext[i] for bit extract, we can unconditionally return true from this hook. This hook causes the DAG combiner to perform the following canonicalisation: and (not (srl X, C)), 1 --> (and X, 1<<C) == 0 and (srl (not X), C)), 1 --> (and X, 1<<C) == 0 As simply changing the hook causes a codegen regression, this patch also modifies a BEXTI pattern to match this canonicalised form. As BSETINVMask is now used for BEXT as well as BSET and BINV, it has been renamed to the more generic SingleBitSetMask. There is one codegen change in bittest.ll for bittest_31_i64 (NOT+BEXTI rather than NOT+SRLIW). This is neutral in terms of code quality. Differential Revision: https://reviews.llvm.org/D131482	2022-09-13 16:51:47 +01:00
Craig Topper	5224bae613	[RISCV] Fix a bug in i32 FP_TO_UINT_SAT lowering on RV64. We use the saturating behavior of fcvt.wu.h/s/d but forgot to take into account that fcvt.wu will sign extend the saturated result. According to computeKnownBits a promoted FP_TO_UINT_SAT is expected to zero extend the saturated value. In many case the upper bits aren't be demanded so this wouldn't be an issue. But if we computeKnownBits caused an AND to be removed it would be a bug. This patch inserts an AND during to zero the upper bits. Unfortunately, this pessimizes code if we aren't able to tell if the upper bits are demanded. To fix that we could custom type promote the FP_TO_UINT_SAT with SEXT_INREG after it, but I'll leave that for future work. I haven't found a failure from this, I was revisiting the code to add vector support and spotted it. Differential Revision: https://reviews.llvm.org/D133746	2022-09-13 08:41:32 -07:00
David Green	993b203b6a	[AArch64] Sink splat(s/zext(..)) to uses If the Shuffle is a splat and the operand is a zext/sext, sinking the operand and the s/zext can help create indexed s/umull. This is especially useful to prevent i64 mul being scalarized. Differential Revision: https://reviews.llvm.org/D133355	2022-09-13 15:47:41 +01:00
David Spickett	0b8a44388e	[llvm][AArch64] Explain why certain registers are reserved on Arm64EC This extends `4658366d95` to add a note explaining why the register is reserved. note: x13 is clobbered by asynchronous signals when using Arm64EC. I've added testing for w/x registers and v/q/s/d and h floating point registers. llvm will accept, but silently do nothing with, b registers. So they are not tested here (clang rejects them so at least for C you're safe anyway). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D133701	2022-09-13 10:13:06 +00:00
Dmitry Preobrazhensky	a80116efec	[AMDGPU][MC][GFX11] Add a helper function for identification of VOPD instructions Differential Revision: https://reviews.llvm.org/D133608	2022-09-13 12:41:39 +03:00
Dmitry Preobrazhensky	815ba49068	[AMDGPU][MC] Add detection of mandatory literals in parser Differential Revision: https://reviews.llvm.org/D133606	2022-09-13 12:37:30 +03:00
Sylvestre Ledru	cd20a18286	Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView" Causing: https://github.com/llvm/llvm-project/issues/57709 This reverts commit `ab56719acd`.	2022-09-13 10:53:59 +02:00
Zi Xuan Wu (Zeson)	955e6ac499	[CSKY] Fix the Predicates of instruction selection Some select node Pattern with register cmp instruction should be guarded by iHas2E3.	2022-09-13 15:02:22 +08:00
Haojian Wu	7ed68182d7	Fix a -Wswitch warning.	2022-09-13 08:57:43 +02:00
jacquesguan	b98b4fae75	[RISCV] Add cost model for compare and select instructions. This patch adds cost model for vector compare and select instructions. For vector FP compare instruction, it only add the comparisions supported natively. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132296	2022-09-13 14:44:46 +08:00
Yeting Kuo	5fcb5d7759	[RISCV] Add assertion of hasVecPolicyOp to catch masked intrinsic without policy operand. The original code may have incorrect result if there is a masked instruction without policy operand to make us set its policy to TUMU. The patch adds an assertion to catch the instruction. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133302	2022-09-13 10:09:49 +08:00
gonglingqin	3de3439bd7	[LoongArch] Add codegen support for ISD::FMA Differential Revision: https://reviews.llvm.org/D133281	2022-09-13 10:04:41 +08:00
David Majnemer	ab56719acd	[clang, llvm] Add __declspec(safebuffers), support it in CodeView __declspec(safebuffers) is equivalent to __attribute__((no_stack_protector)). This information is recorded in CodeView. While we are here, add support for strict_gs_check.	2022-09-12 21:15:34 +00:00
Kazu Hirata	9606608474	[llvm] Use x.empty() instead of llvm::empty(x) (NFC) I'm planning to deprecate and eventually remove llvm::empty. I thought about replacing llvm::empty(x) with std::empty(x), but it turns out that all uses can be converted to x.empty(). That is, no use requires the ability of std::empty to accept C arrays and std::initializer_list. Differential Revision: https://reviews.llvm.org/D133677	2022-09-12 13:34:35 -07:00
Craig Topper	38ffa2bb96	[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants. For remainder: If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer type, this urem will be expand by DAGCombiner using multiply by magic constant. We do have to take into account that adding high and low together can produce a carry, making it a (BitWidth / 2)+1 bit number. So we need to also add back in the carry from the first addition. For division: We can use the above trick to compute the remainder, subtract that remainder from the dividend, then multiply by the multiplicative inverse of the Divisor modulo (1 << BitWidth). This is based on the section "Remainder by Summing Digits" in Hacker's delight. The remainder trick is similar to a trick you may have learned for determining if a decimal number is divisible by 3. You can add all the digits together and see if the sum is divisible by 3. If you're not sure if the sum is divisible by 3, you can add its digits together. This can be repeated until you have a single decimal digit. If that digit is 3, 6, or 9, then the original number is divisible by 3. This works because 10 % 3 == 1. gcc already does this same trick. There are additional tricks gcc does urem as well as srem, udiv, and sdiv that I plan to add in future patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130862	2022-09-12 10:34:52 -07:00
Craig Topper	d49280e0a4	[RISCV] Rename WriteFALU* and ReadFALU* to WriteFAdd/ReadFAdd. ALU seems a little vague. FAdd felt more precise even though it also include FSUB instructions. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D133632	2022-09-12 09:37:28 -07:00
Craig Topper	4186a49d79	[RISCV] Custom type legalize i32 loads by sign extending. The default is to use extload which can become a zextload or sextload if it is followed by an 'and' or sext_inreg. Sometimes type legalization will introduce an 'and' from promoting something like 'srl X, C' and a sext_inreg from from a setcc. The 'and' could be freely folded with the promoted 'srl' by using srliw, but the sext_inreg can't be folded into a compare. DAG combiner will see both of these choices and may decide to fold the 'and' instead of the 'sext_inreg'. This forces the sext_inreg to become a sext.w. By picking sextload in the type legalizer we take this choice away. Looking at spec2006 compiled with Zba and Zbb this appeared to be net reduction in lines of code in the objdump disassembly output. This is similar to what we do with i32 add/sub/mul/shl in type legalization where we always emit a sext_inreg. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130397	2022-09-12 09:13:07 -07:00
Matthias Gehre	c1502425ba	Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetLowering in the new pass manager. Differential Revision: https://reviews.llvm.org/D133691	2022-09-12 17:06:16 +01:00
Simon Pilgrim	20ad05f9b4	[CostModel][X86] Add CostKinds handling for abs ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695	2022-09-12 16:34:37 +01:00
Matt Arsenault	7834194837	TableGen: Introduce generated getSubRegisterClass function Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a manually implemented version for AMDGPU. This will be used to improve subregister support in the allocator.	2022-09-12 09:03:37 -04:00
Sander de Smalen	cf72dddaef	[AArch64][SME] Add utility class for handling SME attributes. This patch adds a utility class that will be used in subsequent patches for parsing the function/callsite attributes and determining whether changes to PSTATE.SM are needed, or whether a lazy-save mechanism is required. It also implements some of the restrictions on the SME attributes in the IR Verifier pass. More details about the SME attributes and design can be found in D131562. Reviewed By: david-arm, aemerson Differential Revision: https://reviews.llvm.org/D131570	2022-09-12 12:41:30 +00:00
Simon Pilgrim	bd0109f392	[CostModel][X86] Move AVX512/AVX2 uniform shift costs into the generic uniform cost tables They shouldn't be happening after XOP shift costs - AVX2 shift supports takes preference over XOP for everything but vXi8 shifts - the improvement is pretty limited as it only affects bdver4 targets but it does help clean up a fraction of the messy shift cost logic....	2022-09-12 12:08:42 +01:00
David Spickett	739b69e655	[LLVM][AArch64] Explain that X19 is used as the frame base pointer register Fixes #50098 LLVM uses X19 as the frame base pointer, if it needs to. Meaning you can get warnings if you clobber that with inline asm. However, it doesn't explain why. The frame base register is not part of the ABI so it's pretty confusing why you get that warning out of the blue. This adds a method to explain a reserved register with X19 as the first one. The logic is the same as getReservedRegs. I could have added a return parameter to isASMClobberable and friends but found that there's a lot of things that call isReservedReg in various ways. So while one more method on the pile isn't great design, it is simpler right now to do it this way and only pay the cost if you are actually using a reserved register. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D133213	2022-09-12 09:18:09 +00:00
Johannes Doerfert	c922cac868	Revert "[Attributor] AAPointerInfo should allow "harmless" uses" Revert "[Attributor] Teach AAPointerInfo to look into aggregates" This reverts commit `844f6c5d03` and `4ed0a88cd8` as they broke the buildbots that run openmp/libomptarget/test/offloading/bug49021.cpp.	2022-09-11 21:37:54 -07:00
Johannes Doerfert	4ed0a88cd8	[Attributor] Teach AAPointerInfo to look into aggregates If we have a constant aggregate, e.g., as an initializer, we usually failed to extract the proper value/type from it. This patch provides the size and offset information necessary to extract the right part of the constant.	2022-09-11 20:16:11 -07:00
Simon Pilgrim	a931dbfbd3	[CostModel][X86] Merge AVX512BW vXi8/vXi16 shifts into default AVX512BW cost table We only need to handle the uniform cases early	2022-09-10 18:18:42 +01:00
Simon Pilgrim	10edf88458	[CostModel][X86] Update CTPOP costs With the bdver2 model updates, many of the AVX1 costs were far too high - it also helped expose some costs mismatches for Atom/Silvermont	2022-09-10 17:57:20 +01:00
Simon Pilgrim	4994f87ca1	[X86] Fix bdver2 128-bit shuffles throughputs Noticed while trying to get vector ctpop/ctlz/cttz costs fixed using the script from D103695 - all of these are full-rate but the throughput costs were weirdly high for bdver2 Matches AMD 15h SoG, Agner and instlatx64	2022-09-10 17:34:40 +01:00
Simon Pilgrim	7785bd34e7	[X86] Fix bdver2 128-bit ALU/logic/shift throughputs Noticed while trying to get vector shifts costs fixed using the script from D103695 - all of these are full-rate but the throughput costs were weirdly high for bdver2 Matches AMD 15h SoG, Agner and instlatx64	2022-09-10 16:23:29 +01:00
Mingming Liu	8aa800614b	[AArch64][CostModel] Detects that {extract,insert}-element at lane 0 has the same cost as the other lane for vector instructions in the IR. Currently, {extract,insert}-element has zero cost at lane 0 [1]. However, there is a cost (by fmov instruction [2], or ext/ins instruction) to move values from SIMD registers to GPR registers, when the element is used explicitly as integers. See https://godbolt.org/z/faPE1nTn8, when fmov is generated for d* register -> x* register conversion. Implementation-wise, add a private method `AArch64TTIImpl::getVectorInstrCostHelper` as a helper function. This way, instruction-based method could share the core logic (e.g., returning zero cost if type is legalized to scalar). [1] `2cf320d41e/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (L1853)` [2] `2cf320d41e/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L8150-L8157)` Differential Revision: https://reviews.llvm.org/D128302	2022-09-09 09:47:30 -07:00
zhongyunde	7a81782585	[AArch64][CodeGen]Fold the mov and lsl into ubfiz Fix the issue exposed by D132322, depand on D132939 Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D132325	2022-09-09 23:50:29 +08:00
Jay Foad	8901f7cebc	[AMDGPU] Fix crash legalizing G_EXTRACT_VECTOR_ELT with negative index Fixes https://github.com/llvm/llvm-project/issues/57408 Differential Revision: https://reviews.llvm.org/D132938	2022-09-09 15:53:34 +01:00
Simon Pilgrim	05f56f10ed	[X86] Fix VPPERM load folding latency Noticed while investigating BITREVERSE cost numbers with the D103695 script - VPPERM folded loads was using the WriteVarShuffleX defaults and was missing an override like the VPPERM reg-reg variants	2022-09-09 13:57:39 +01:00
Dmitry Preobrazhensky	6b79610fd5	[AMDGPU][MC][GFX11][NFC] Correct VOPD parsing Differential Revision: https://reviews.llvm.org/D133492	2022-09-09 13:03:29 +03:00
Simon Pilgrim	55b78e28d8	[CostModel][X86] Add missing i8 throughput cost	2022-09-09 10:58:51 +01:00
gonglingqin	da8c9521ee	[LoongArch] Add codegen support for frint According to the revised description in `LoongArch Reference Manual v1.02`, frint.[s/d] does not judge whether floating-point inexact exceptions are allowed indicated by FCSR, i.e. always executes roundToIntegralExact(x). What's more, the manual also specifically defines that frint.s/d is only necessary to be defined in LA64. So ISD::FRINT is legal for LA64. Differential Revision: https://reviews.llvm.org/D133337	2022-09-09 14:25:34 +08:00
Sheng	88bdc4687d	[NFC][M68k] Correct debug message.	2022-09-09 10:58:37 +08:00
Alex Bradbury	51ae462447	[RISCV] Add the GlobalMerge pass (disabled by default) Split out from D129178, this just adds the GlobalMerge tests (other than global-merge-minsize.ll which is testing a specific configuration of the pass when it's enabled) and exposes `-riscv-enable-global-merge` and //doesn't enable it by default//. Note that the comment "// FIXME: Unify control over GlobalMerge." is copied from the Arm and AArch64 backends, which expose the same flag. Presumably the author is imagining some later refactoring that provides a target-independent flag. Reviewed By: craig.topper, reames, hiraditya Differential Revision: https://reviews.llvm.org/D130481	2022-09-08 18:40:38 -07:00
Fangrui Song	f9b5924975	[AArch64] Fix -Wunused-variable. NFC	2022-09-08 18:27:16 -07:00
zhongyunde	b6655333c2	[Peephole] rewrite INSERT_SUBREG to SUBREG_TO_REG if upper bits zero Restrict the 32-bit form of an instruction of integer as too many test cases will be clobber as the register number updated. From %reg = INSERT_SUBREG %reg, %subreg, subidx To %reg:subidx = SUBREG_TO_REG 0, %subreg, subidx Try to prefix the redundant mov instruction at D132325 as the SUBREG_TO_REG should not generate code. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D132939	2022-09-09 09:00:54 +08:00
Craig Topper	5f3a8b585b	[RISCV] Add RecurKind::FMulAdd to isLegalToVectorizeReduction for scalable vectors. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133511	2022-09-08 12:34:59 -07:00
David Green	3875c38adf	[AArch64] Fix formatting of the Shuffle Cost tables. NFC	2022-09-08 19:54:12 +01:00
Jay Foad	afa0ed33df	[AMDGPU] Fix shrinking of F16 FMA on newer subtargets D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of the opcode which is used on GFX9+. Differential Revision: https://reviews.llvm.org/D133489	2022-09-08 16:41:04 +01:00
Jonas Paulsson	de0e3117d4	[SystemZ] Improve handling of vector alignments. Make the DataLayout string always hold a vector alignment of 8 bytes, regardless of the vector ABI. This makes the datalayout depend only on the target triple which is the general expectation (in assertions). On older architectures where vectors use the natural alignment (16 bytes), the front end will maintain the same behavior and produce an overalignment compared to the datalayout. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D131158	2022-09-08 17:33:05 +02:00
Thomas Lively	ac3b8df8f2	[WebAssembly] Prototype `f32x4.relaxed_dot_bf16x8_add_f32` As proposed in https://github.com/WebAssembly/relaxed-simd/issues/77. Only an LLVM intrinsic and a clang builtin are implemented. Since there is no bfloat16 type, use u16 to represent the bfloats in the builtin function arguments. Differential Revision: https://reviews.llvm.org/D133428	2022-09-08 08:07:49 -07:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
Krzysztof Parzyszek	3c817574c2	[Hexagon] Handle shifts of short vectors of i8	2022-09-08 07:52:16 -07:00
Ivan Kosarev	57c943d581	[AMDGPU] Only raise wave priority if there is a long enough sequence of VALU instructions. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D124671	2022-09-08 15:21:30 +01:00
liqinweng	723245bfac	[AARCH64][COST] Improve cost of reverse shuffles for AArch64 Update the comments for reverse shuffles and add tests Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D132730	2022-09-08 18:55:49 +08:00
liqinweng	9b4e75ee76	[RISCV][COST] Add cost model for mask vector select instruction when its condition is a scalar type Reviewed By: jacquesguan Differential Revision: https://reviews.llvm.org/D132992	2022-09-08 18:55:49 +08:00
David Spickett	e428baf001	[LLVM][ARM] Remove options for armv2, 2A, 3 and 3M Fixes #57486 These pre v4 architectures are not specifically supported by codegen. As demonstrated in the linked issue. GCC has not supported 3M since GCC 9 and presumably 2 and 2A earlier than that. So we are aligned in that sense. (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2abd6e34fcf3bd9f9ffafcaa47cdc3ed443f9add) This removes the options and associated testing. The Pre_v4 build attribute remains mainly because its absence would be more confusing. It will not be used other than to complete the list of build attributes as shown in the ABI. https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#3352the-target-related-attributes Reviewed By: nickdesaulniers, peter.smith, rengolin Differential Revision: https://reviews.llvm.org/D133109	2022-09-08 09:49:48 +00:00
gonglingqin	d5f7a2182d	[LoongArch] Add codegen support for atomicrmw xchg operation on LA32 Depends on D131228 Differential Revision: https://reviews.llvm.org/D131229	2022-09-08 13:57:53 +08:00
gonglingqin	b60f801607	[LoongArch] Add codegen support for atomicrmw xchg operation on LA64 In order to avoid the patch being too large, the atomicrmw xchg operation on LA32 will be added later Differential Revision: https://reviews.llvm.org/D131228	2022-09-08 13:57:26 +08:00
Justin Bogner	a81c7dbf0d	[AMDGPU] Drop _oneuse checks from med3 patterns We use _oneuse checks to make sure combines won't accidentally increase code size, but this prevents the optimization in cases where we happen to want to clamp multiple values to the same range It's safe to drop these checks for two reasons: 1. The pattern of max/min operations for med3 is complicated enough it's unlikely to come up by accident, so this will still only fire when appropriate to do so 2. Even if every intermediate is used and we don't save a single operation, we still won't end up with more operations since the med3 replaces the final max/min. In pathological cases we could potentially end up with a larger encoding size or possibly slightly increased vgpr pressure, but the risk of that is low, especially considering the upside. Differential Revision: https://reviews.llvm.org/D132621	2022-09-07 16:31:49 -07:00
Stanislav Mekhanoshin	fb28bf3fb4	[AMDGPU] Fix liveness verifier error in hazard recognizer After D133067 we are inserting swaps to use a new physical register. I have noticed verifier errors about undefined physical register uses if we are tracking liveness post RA. We have no access to LIS at this point, so mark new register uses as undef to calm down the verifier. Liveness should not matter at this point anyway. Note the description of the RegState::Undef: "Value of the register doesn't matter." I.e. it does not say it is strictly undefined. In fact that is what we really need: this value does not matter. I also had to modify the test a bit since with tracking enabled it does not pass verification even before the recognizer. Differential Revision: https://reviews.llvm.org/D133459	2022-09-07 16:30:36 -07:00
Krzysztof Parzyszek	c37acb6426	[Hexagon] Move vectorization checks from subtarget to TTI	2022-09-07 14:47:24 -07:00
Stanislav Mekhanoshin	95d497ff2a	[AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR In this case gfx90a uses v0 instead of the correct register. Swap the value temporarily with a lower register and then swap it back. Unfortunately hazard recognizer works after wait count insertion, so we cannot simply reuse an arbitrary register, hence w/a also includes a full waitcount. This can be avoided if we run it from expandPostRAPseudo, but that is a complete misplacement. Differential Revision: https://reviews.llvm.org/D133067	2022-09-07 14:23:49 -07:00
Jon Chesterfield	23f6c8d635	[amdgpu] Always, instead of mostly, remove unused LDS symbols Currently LDS variables are removed by the lower module pass if they have a use which is caught by the replace with struct control flow. This makes tests brittle to changes to that control flow which induces noise when trying to improve lowering. Some tests already check that variables are removed, while others checked that they are not removed. LDS variables are not (currently) externally accessible, and if that changes the machinery which makes them externally accessible will look like a use. This change therefore breaks no applications. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133028	2022-09-07 18:28:16 +01:00
Philip Reames	a4a29438f4	[RISCV][MC] Add minimal support for Ztso extension This is a minimalist implementation which simply adds the extension (in the experimental namespace since its not ratified), and wires up the setting of the required ELF header flag. Future changes will include codegen changes to exploit the stronger memory model. This is intended to implement v0.1 of the proposed specification which can be found in Chapter 25 of https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-spec.pdf. Differential Revision: https://reviews.llvm.org/D133239	2022-09-07 09:30:57 -07:00
Simon Pilgrim	e74102a963	[CostModel][X86] Merge getTypeBasedIntrinsicInstrCost into getIntrinsicInstrCost For the few non type based intrinsic cases we can just check for !isTypeBasedOnly() to access the args directly. I don't think we have a need to keep getTypeBasedIntrinsicInstrCost in BasicTTIImpl.h any more and can do a similar merge there as well - but it's a messier refactor and will take a while.	2022-09-07 12:04:09 +01:00
Jay Foad	96dfa523c2	[AMDGPU] Refactor SIFoldOperands. NFC. Refactor static functions into class methods so they have access to TII, MRI etc.	2022-09-07 11:05:01 +01:00
Marco Elver	31a548021b	[GlobalISel] Propagate PCSections metadata to MachineInstr Propagate (most) PC sections metadata to MachineInstr when GlobalISel is doing instruction selection. This change results in support for architectures using GlobalISel (such as -O0 with AArch64). Not all instructions may be supported yet, and requires further target-specific handling (such as done for AArch64 pseudo-atomics). Expanding supported instructions is planned on a case-by-case basis and new use cases for PC sections metadata. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130886	2022-09-07 11:36:02 +02:00
Marco Elver	0ba8886af5	[FastISel] Propagate PCSections metadata to MachineInstr Propagate PC sections metadata to MachineInstr when FastISel is doing instruction selection. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130884	2022-09-07 11:36:01 +02:00
Jay Foad	5291c3dd36	[AMDGPU] Simplify mad/mac patterns. NFC. Simplify instruction selection patterns for mad/mac: - Use any_fmad consistently to make it clear that all patterns treat fmad and AMDGPUfmad_ftz identically. - For mad, put the patterns on the instruction definitions. For mac the patterns are still out-of-line because we want to set AddedComplexity and to have special handling of the source modifiers. Differential Revision: https://reviews.llvm.org/D133305	2022-09-07 09:58:28 +01:00
Zi Xuan Wu (Zeson)	162131257f	[CSKY] Fix the compiling error about missing Log2 function with Log2_64	2022-09-07 14:49:40 +08:00
Xiang1 Zhang	c836ddaf72	[X86][NFC] Refine load/store reg to StackSlot for extensibility Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D133078	2022-09-07 14:35:42 +08:00
Simon Pilgrim	648e182d92	[CostModel][X86] getIntrinsicInstrCost - convert to CostKindTblEntry Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers This should allow us to merge getTypeBasedIntrinsicInstrCost into getIntrinsicInstrCost and remove all remaining references	2022-09-06 22:05:32 +01:00
raghavmedicherla	57f01fee1e	[AMDGPU/Metadata] Rename HSAMD::MetadataStreamer classes Renamed all HSAMD::MetadataStreamer classes to improve readability of the code. Differential Revision: https://reviews.llvm.org/D133156	2022-09-06 16:46:37 -04:00
Alexander Shaposhnikov	6a2442e9be	[AArch64] Increase AddedComplexity of BIC This diff adjusts AddedComplexity of BIC to bump its position in the list of patterns to make LLVM pick it instead of MVN + AND. MVN + AND requires 2 cycles, so does e.g. MOV + BIC, but the latter outperforms the former if the instructions producing the operands of BIC can be issued in parallel. One may consider the following example: ldur x15, [x0, #2] # 4 cycles mvn x10, x15 # 1 cycle (depends on ldur) and x9, x10, #0x8080808080808080 vs. ldur x15, [x0, #2] # 4 cycles mov x9, #0x8080808080808080 # 1 cycle (can be executed in parallel with ldur) bic x9, x9, x15. # 1 cycle Test plan: ninja check-all Differential revision: https://reviews.llvm.org/D133345	2022-09-06 20:31:24 +00:00
Markus Böck	f049b2c3fc	[MC] Emit Stackmaps before debug info This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment. The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections. This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op. Differential Revision: https://reviews.llvm.org/D132708	2022-09-06 20:20:56 +02:00
Guozhi Wei	3cf4ab5447	[AArch64] Add an option to reserve physical registers from RA This patch adds an option --reserve-regs-for-regalloc, so we can reserve a list of physical registers. These registers will not be used by register allocator, but can still be used as ABI requests such as passing arguments to function call. Its main purpose is simulating high register pressure by reserving many physical registers. So it will be much easier to test and debug register allocation changes. Differential Revision: https://reviews.llvm.org/D132717	2022-09-06 17:18:01 +00:00
Craig Topper	5d30565d80	[RISCV] Improve vector fround lowering by changing FRM. This is a follow up to D133238 which did this for ceil/floor. Reviewed By: arcbbb, frasercrmck Differential Revision: https://reviews.llvm.org/D133335	2022-09-06 09:33:13 -07:00
Simon Pilgrim	10e0f3e948	[CostModel][X86] Add CostKinds handling for ctpop ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually) Some of the pre-AVX values still aren't great - atom/slm worst case numbers for ctpop expansion really affect these (especially throughput/latency), so we need to clean them up in a more consistent way - its a pity we don't have models for more older cpus (merom/nehalem etc.) as other examples.	2022-09-06 17:27:24 +01:00
Matthias Gehre	2090e85fee	[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64 This adds the ExpandLargeDivRem to the default pass pipeline. The limit at which it expands div/rem instructions is configured via a new TargetTransformInfo hook (default: no expansion) X86, Arm and AArch64 backends implement this hook to expand div/rem instructions with more than 128 bits. Differential Revision: https://reviews.llvm.org/D130076	2022-09-06 15:32:04 +01:00
Simon Pilgrim	83552e8c72	[CostModel][X86] Add CostKinds handling for SSE FCMP_ONE/FCMP_UEQ predicates These require special handling to account for their expansion in lowering. I'm trying very hard not to have to add predicate specific costs - but it might be inevitable.....	2022-09-06 12:05:22 +01:00
John Brawn	e26cadcc32	[ARM] Constant pools need 4-byte alignment if we only have tADR When the only ADR instruction we have is the 16-bit thumb one then all constant pool entries need to be 4-byte aligned, as tADR has an offset that's a multiple of 4. It looks like previously there happened to be no situations in which we encountered a constant pool entry with alignment less than 4, so failing to do this didn't cause any problems, but the expansion of cttz to a table added by D128911 does use a constant pool with alignment 1, so we now need to handle it correctly. Differential Revision: https://reviews.llvm.org/D133199	2022-09-06 11:36:12 +01:00
Simon Pilgrim	c1b5e36d74	[CostModel][X86] Add CostKinds handling for fcmp ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually) SSE numbers are still too low for FCMP_ONE/FCMP_UEQ cases which expand to a more complex sequence than the existing 'ExtraCost' system can manage.	2022-09-06 10:34:53 +01:00
Freddy Ye	d5fa8b1c2c	[X86] Support SAE for VCVTPS2PH from intrinsic. For now, clang and gcc both failed to generate sae version from _mm512_cvt_roundps_ph: https://godbolt.org/z/oh7eTGY5z. Intrinsic guide description is also wrong, which will be update soon. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D132641	2022-09-06 11:28:12 +08:00
Craig Topper	f0332d12ae	[RISCV] Improve vector fceil/ffloor lowering by changing FRM. This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the VFCVT. Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from RVV intrinsics as I could. A followup patch will use this approach for FROUND too. Still need to fix the cost model. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D133238	2022-09-05 19:03:44 -07:00
gonglingqin	067aab0a85	[LoongArch] Fix annotations not matching predicates. NFC.	2022-09-06 09:14:20 +08:00
Eli Friedman	2b9cec6244	[ARM64EC 5/?] Fix names of __chkstk and __security_check_cookie. Part of initial Arm64EC patchset. Arm64EC code needs to use functions with a different name, to avoid using the x64 versions. Differential Revision: https://reviews.llvm.org/D125417	2022-09-05 13:19:54 -07:00
Eli Friedman	5637ec0983	[ARM64EC 4/?] Add LLVM support for varargs calling convention. Part of patchset to add initial support for ARM64EC. The ARM64EC calling convention is the same as ARM64 for non-varargs functions, but for varargs, the convention is significantly different. Basically, only x0-x3 registers are used for passing arguments, and x4 and x5 describe the address/size of the arguments passed in memory. (See https://docs.microsoft.com/en-us/windows/uwp/porting/arm64ec-abi for more details; see https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention for the x64 calling convention rules, which this convention needs to match.) Note that this currently doesn't handle i128 arguments correctly; as noted in review, that's sort of complicated to handle, so I'm leaving it for a followup. Differential Revision: https://reviews.llvm.org/D125415	2022-09-05 13:05:48 -07:00
Eli Friedman	4658366d95	[ARM64EC 3/?] Mark reserved registers specific to ARM64EC ABI. Part of patchset to add initial support for ARM64EC. I'm not completely sure I understand the reason for this restriction, but Microsoft documentation says that asynchronous signals clobber these registers, so we can't ever use them. As far as I know, none of these registers have any hardcoded meaning, so reserving them shouldn't have any significant side-effects. Differental Revision: https://reviews.llvm.org/D125413	2022-09-05 12:59:39 -07:00
Eli Friedman	63335afb4e	[ARM64EC 2/?] Add target triple, and allow targeting it. Part of patchset to add initial support for ARM64EC. Per discussion on review, using the triple arm64ec-pc-windows-msvc. The parsing works the same way as Apple's alternate Arm ABI "arm64e". Differential Revision: https://reviews.llvm.org/D125412	2022-09-05 12:27:10 -07:00
David Green	3c6edc0b2f	[AArch64][GlobalISel] Recognise some CCMPri This is a simple addition to emitConditionalComparison, to match CCMP with immediates using getIConstantVRegValWithLookThrough, letting it select the CCMPri variants of the instructions. Differential Revision: https://reviews.llvm.org/D131073	2022-09-05 19:43:23 +01:00
Ivan Kosarev	5db8d6fd2b	[AMDGPU][CodeGen] Support (base \| offset) SMEM loads. Prevents generation of unnecessary s_or_b32 instructions. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132552	2022-09-05 14:22:06 +01:00
Simon Pilgrim	bd0801cddf	[X86] Cleanup SLM SSE shift and CMPGTQ scheduler model numbers These were causing weird mismatches for the D103695 script report as I'm trying to enable cost kinds support for vector shift and integer comparisons. The SSE shifts by (non-constant) scalar are half-rate but still only 1uop and PCMPGT is half-rate and only on Pipe0 (although not as slow as PCMPEQQ which we already handle).	2022-09-05 13:44:05 +01:00
Ivan Kosarev	f33645301e	[AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130263	2022-09-05 12:53:05 +01:00
wanglei	bf47954703	[LoongArch] Add more fixups and relocations This patch makes the assembler support all modifiers defined in gnu-as. Also changes some diagnostic information. Differential Revision: https://reviews.llvm.org/D132633	2022-09-05 14:55:18 +08:00
gonglingqin	bc743bf666	[LoongArch] Add codegen support for fcopysign Differential Revision: https://reviews.llvm.org/D133185	2022-09-05 11:03:54 +08:00
Vitaly Buka	6c52736e02	Revert "[llvm] Use range-based for loops (NFC)" range-based loop should not be used here, as fixupImmediateBr push_backs into the container. http://lab.llvm.org/buildbot/#/builders/168 http://lab.llvm.org/buildbot/#/builders/74 http://lab.llvm.org/buildbot/#/builders/5 http://lab.llvm.org/buildbot/#/builders/239 http://lab.llvm.org/buildbot/#/builders/237 http://lab.llvm.org/buildbot/#/builders/236 This reverts commit `fedc59734a`.	2022-09-04 15:28:53 -07:00
Simon Pilgrim	8534f51474	[CostModel][X86] Add CostKinds handling for sqrt intrinsicc This was achieved using the 'cost-tables vs llvm-mca' script from D103695 Some of the znver1/znver2 latency/throughput numbers were really weird (some copy+paste afaict) - I've used the numbers from the AMD SoG, which roughly match the 'worst case' range value from Agner	2022-09-04 18:39:21 +01:00
Simon Pilgrim	626a84db47	[CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers	2022-09-04 17:59:08 +01:00
Simon Pilgrim	80d4b3a275	Revert rG06e73626cf0fc33b025a0f98f1eee4a302279982 "[CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry" Some arm buildbots are complaining about a phase ordering test failure in unsigned-multiply-overflow-check.ll - I guess this test needs making x86 specific first	2022-09-04 17:51:11 +01:00

... 2 3 4 5 6 ...

69035 Commits