llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	f8c63a7fbf	[SDAG] Allow scalable vectors in ComputeNumSignBits This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines. The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations. Differential Revision: https://reviews.llvm.org/D137141	2022-11-18 10:50:06 -08:00
Matt Arsenault	08ec15e44b	AMDGPU/GlobalISel: Fix strictfp fmul	2022-11-18 08:53:49 -08:00
Philip Reames	bc0fea0d55	[SDAG] Allow scalable vectors in ComputeKnownBits his is the SelectionDAG equivalent of D136470, and is thus an alternate patch to D128159. The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations. This patch also includes an implementation for SPLAT_VECTOR as without it, the lane wise reasoning has no base case. The original patch which inspired this (D128159), also included STEP_VECTOR. I plan to do that as a separate patch. Differential Revision: https://reviews.llvm.org/D137140	2022-11-18 07:40:32 -08:00
Alexander Timofeev	32bd75716c	PEI should be able to use backward walk in replaceFrameIndicesBackward. The backward register scavenger has correct register liveness information. PEI should leverage the backward register scavenger. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137574	2022-11-18 15:57:34 +01:00
Phoebe Wang	d558255650	[X86] Use lock add/sub for cases that we only care about the EFLAGS This fixes #36373, #36905 and partial of #58685. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D137711	2022-11-18 21:43:47 +08:00
Benjamin Maxwell	34d88cf6cf	[DAG] Allow folding AND of anyext masked_load with >1 user to zext version This now allows folding an AND of a anyext masked_load to a zext_masked_load even if the masked load has multiple users. Doing is eliminates some redundant ANDs/MOVs for certain AArch64 SVE code. I'm not sure if there's any cases where doing this could negatively the other users of the masked_load. Looking at other optimizations of masked loads, most don't apply if the load is used more than once, so it doesn't look like this would interfere. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D137844	2022-11-18 10:38:09 +00:00
luxufan	18c5f3c35d	[RegisterScavenger][RISCV] Don't search for FrameSetup instrs if we were searching from Non-FrameSetup instrs Otherwise, the spill position may point to position where before FrameSetup instructions. In which case, the spill instruction may store to caller's frame since the stack pointer has not been adjustted. Fixes https://github.com/llvm/llvm-project/issues/58286 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D135693	2022-11-18 15:13:52 +08:00
Matt Arsenault	fe5b9a6a11	AMDGPU/GlobalISel: Make strict fadd, fmul and fma legal	2022-11-17 20:50:04 -08:00
YingChi Long	7a715bf317	[VP] Add support for vp.inttoptr & vp.ptrtoint Add vp.inttoptr & vp.ptrtoint support by lowering them into vp.zext / vp.truncate with in SelectionDAGBuilder. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D137169	2022-11-18 10:42:24 +08:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Philip Reames	4105794e66	[SDAG] Assert we don't see scalable VECTOR_SHUFFLES It was pointed out in review of D137140 that this case should be impossible. This patch converts an existing bailout into an assert instead.	2022-11-17 08:18:51 -08:00
Alex Richardson	754d25844a	[CGP] Update MemIntrinsic alignment if possible Previously it was only being done if shouldAlignPointerArgs() returned true, which right now is only true for ARM targets. Updating the argument alignment attributes of memcpy/memset intrinsics if the underlying object has larger alignment can be beneficial even when CGP didn't increase alignment (as can be seen from the test changes), so invert the loop and if condition. Differential Revision: https://reviews.llvm.org/D134281	2022-11-17 11:59:35 +00:00
Anton Sidorenko	b6c790736e	[MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns This patch adds tranformation of fmul+fadd/fsub chains to fused multiply instructions: * fmul+fadd->fmadd * fmul+fsub->fmsub/fnmsub We also will try to combine these instructions if the fmul has more than one use and cannot be deleted. However, removing the dependence between fmul and fadd can still be profitable, and we rely on machine combiner approximations of scheduling. Differential Revision: https://reviews.llvm.org/D136764	2022-11-17 13:24:04 +03:00
Jay Foad	96a661de4b	[GlobalISel] Better verification of G_UNMERGE_VALUES Verify three cases of G_UNMERGE_VALUES separately: 1. Splitting a vector into subvectors (the converse of G_CONCAT_VECTORS). 2. Splitting a vector into its elements (the converse of G_BUILD_VECTOR). 3. Splitting a scalar into smaller scalars (the converse of G_MERGE_VALUES). Previously #1 allowed strange combinations like this: %1:_(<2 x s16>),%2:_(<2 x s16>) = G_UNMERGE_VALUES %0(<2 x s32>) This has been tightened up to check that the source and destination element types match, and some MIR test cases updated accordingly. Differential Revision: https://reviews.llvm.org/D111132	2022-11-17 08:19:57 +00:00
Sinan Lin	4ad8952d2d	[CodeGen][BasicBlockSections] Fix wrong alignment directive placement in basic block section cases MachineBlockPlacement pass sets an alignment attribute to the loop header MBB and this attribute will lead to an alignment directive during emitting asm. In the case of the basic block section, the alignment directive is put before the section label, and thus the alignment is set to the predecessor of the loop header, which is not what we expect and increases the code size (both inserting nop and set section alignment). Reviewed By: rahmanl Differential Revision: https://reviews.llvm.org/D137535	2022-11-17 15:01:57 +08:00
zhongyunde	8fbb6f8678	[NFC] Fix typo in comment Address comment in https://reviews.llvm.org/D137936 Differential Revision: https://reviews.llvm.org/D138124	2022-11-16 23:35:53 +08:00
David Green	71609871dd	[AArch64][MachineCombiner] Use MIMetadata to copy pcsections metadata to reassociated instructions. D134260/D138107 exposed that the MachineCombiner was not copying pcsections metadata where it should. This patch switches the MIBuild methods to use MIMetadata that can copy the debug loc and pcsections at the same time. Differential Revision: https://reviews.llvm.org/D138112	2022-11-16 13:22:48 +00:00
Simon Pilgrim	a92f5a08a1	[DAG] simplifySelect - add support for vselect(0, T, F) -> F fold We still need to add handling for the non-zero T fold (which requires getBooleanContents handling)	2022-11-16 13:11:14 +00:00
OCHyams	a1ac6efcb0	[NFC][SelectionDAG][DebugInfo] Refactor DanglingDebugInfo class Hide the underlying DbgValueInst by adding methods to extract the necessary information and by adding a raw_ostream &operator<< overload to print it. Remove the DebugLoc field as this is always the same as the DbgValueInst's DebugLoc (see D136247). Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D136249	2022-11-16 10:10:24 +00:00
OCHyams	9792744650	[NFC][SelectionDAG][DebugInfo] Remove duplicate parameter from handleDebugValue handleDebugValue has two DebugLoc parameters that appear to always take the same value. Remove one of the duplicate parameters. See phabricator review for more detail. Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D136247	2022-11-16 09:59:35 +00:00
Matt Arsenault	116c894d72	DAG: Fix assert on load casted to vector with attached range metadata AMDGPU legalizes i64 loads to loads of <2 x i32>, leaving the i64 MMO with attached range metadata alone. The known bit width was using the scalar element type, and asserting on a mismatch.	2022-11-15 23:28:55 -08:00
Yeting Kuo	ed9638c44b	[VP][RISCV] Add vp.nearbyint and RISC-V support. nearbyint has the property to execute without exception. For not modifying fflags, the patch added new machine opcode PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair of frflags and fsflags. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D137685	2022-11-16 14:05:35 +08:00
Yeting Kuo	5c3ca10b09	[VP][RISCV] Add vp.bswap and RISC-V support. The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D137928	2022-11-16 11:36:38 +08:00
Craig Topper	f387918dd8	[TargetLowering][RISCV][ARM][AArch64][Mips] Reduce the number of AND mask constants used by BSWAP expansion. We can reuse constants if we use SRL followed by AND and AND followed by SHL. Similar was done to bitreverse previously. Differential Revision: https://reviews.llvm.org/D138045	2022-11-15 14:36:01 -08:00
Fangrui Song	6c7666a408	Revert D137574 "PEI should be able to use backward walk in replaceFrameIndicesBackward." This reverts commit `e05ce03cfa`. Caused asan use-after-poison to 4 DebugInfo/AMDGPU/ tests. Triggered in PEI::replaceFrameIndicesBackward called llvm::MachineInstr::getNumOperands	2022-11-15 19:19:46 +00:00
Sanjay Patel	fe05a0a3dd	[SDAG] avoid udiv/urem transform for vector/scalar type mismatches This solves the crashing from issue #58994. I don't know anything about VE, so I don't know if the output is as expected or even correct.	2022-11-15 11:01:18 -05:00
Alexander Timofeev	e05ce03cfa	PEI should be able to use backward walk in replaceFrameIndicesBackward. The backward register scavenger has correct register liveness information. PEI should leverage the backward register scavenger. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137574	2022-11-15 15:20:25 +01:00
Serge Pavlov	ec893da990	[GlobalISel] Remove semantic operand of G_IS_FPCLASS Instruction G_IS_FPCLASS had an operand that represented floating-point semantics of its first operand. It allowed types that have the same length, like `bfloat16` and `half`, to be distinguished. Unfortunately, it is not sufficient, as other operation still cannot distinguish such types. Solution of this problem must be more general, so now this operand is removed. Differential Revision: https://reviews.llvm.org/D138004	2022-11-15 15:48:05 +07:00
Guozhi Wei	11e86868c1	[MachineCSE] Allow CSE for instructions with ignorable operands Ignorable operands don't impact instruction's behavior, we can safely do CSE on the instruction. It is split from D130919. It has big impact to some AMDGPU test cases. For example in atomic_optimizations_raw_buffer.ll, when trying to check if the following instruction can be CSEed %37:vgpr_32 = V_MOV_B32_e32 0, implicit $exec Function isCallerPreservedOrConstPhysReg is called on operand "implicit $exec", this function is implemented as - return TRI.isCallerPreservedPhysReg(Reg, MF) \|\| + return TRI.isCallerPreservedPhysReg(Reg, MF) \|\| TII.isIgnorableUse(MO) \|\| (MRI.reservedRegsFrozen() && MRI.isConstantPhysReg(Reg)); Both TRI.isCallerPreservedPhysReg and MRI.isConstantPhysReg return false on this operand, so isCallerPreservedOrConstPhysReg is also false, it causes LLVM failed to CSE this instruction. With this patch TII.isIgnorableUse returns true for the operand $exec, so isCallerPreservedOrConstPhysReg also returns true, it causes this instruction to be CSEed with previous instruction %14:vgpr_32 = V_MOV_B32_e32 0, implicit $exec So I got different result from here. AMDGPU's implementation of isIgnorableUse is bool SIInstrInfo::isIgnorableUse(const MachineOperand &MO) const { // Any implicit use of exec by VALU is not a real register read. return MO.getReg() == AMDGPU::EXEC && MO.isImplicit() && isVALU(MO.getParent()) && !resultDependsOnExec(MO.getParent()); } Since the operand $exec is not a real register read, my understanding is it's reasonable to do CSE on such instructions. Because more instructions are CSEed, so I get less instructions generated for these tests. Differential Revision: https://reviews.llvm.org/D137222	2022-11-14 19:34:59 +00:00
Nicholas Guy	d52e2839f3	[ARM][CodeGen] Add support for complex deinterleaving Adds the Complex Deinterleaving Pass implementing support for complex numbers in a target-independent manner, deferring to the TargetLowering for the given target to create a target-specific intrinsic. Differential Revision: https://reviews.llvm.org/D114174	2022-11-14 14:02:27 +00:00
Nikita Popov	feda983ff8	[TableGen] Use MemoryEffects to represent intrinsic memory effects (NFCI) The TableGen implementation was using a homegrown implementation of FunctionModRefInfo. This switches it to use MemoryEffects instead. This makes the code simpler, and will allow exposing the full representational power of MemoryEffects in the future. Among other things, this will allow us to map IntrHasSideEffects to an inaccessiblemem readwrite, rather than just ignoring it entirely in most cases. To avoid layering issues, this moves the ModRef.h header from IR to Support, so that it can be included in the TableGen layer. Differential Revision: https://reviews.llvm.org/D137641	2022-11-14 10:52:04 +01:00
chenglin.bi	8482247900	[GlobalISel] Correct constant type in matchReassocConstantInnerLHS When we match a pattern from m_GCst, the register type could be different from original op. So we can't replace the original op to vreg direct. This code create a new constant with original op type then replace the original op. Fix #58906 Reviewed By: arsenm, aemerson Differential Revision: https://reviews.llvm.org/D137778	2022-11-13 19:20:07 +08:00
Matt Arsenault	3cfa03856f	AtomicExpand: Support cmpxchg expansion for small FP types Handles f16 atomics for AMDGPU.	2022-11-10 22:16:11 -08:00
Nick Desaulniers	f2981a3bc9	[SelectDagISEL] refactor HandlePHINodesInSuccessorBlocks NFC. While working on this code to support outputs from callbr along indirect branches, I kept making these changes again and again. Precommit these. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D137445	2022-11-10 14:34:23 -08:00
Alexander Timofeev	27091e6227	[PEI][NFC] Refactoring of the debug instructions frame index replacement This is required for the upcoming backward PEI::replaceFrameIndices version. Both forward and backward versions will use same code for debug instruction processing. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137741	2022-11-10 14:02:03 +01:00
wlei	47b0758049	[SampleFDO] Persist profile staleness metrics into binary With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system. For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF `.llvm_stats` section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(`-persist-profile-staleness`) In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D136698	2022-11-09 22:34:33 -08:00
chenglin.bi	597f444092	[TypePromotion] Replace Zext to Truncate for the case src bitwidth is larger Fix: https://github.com/llvm/llvm-project/issues/58843 Reviewed By: samtebbs Differential Revision: https://reviews.llvm.org/D137613	2022-11-09 05:08:01 +08:00
Nathan James	6aa050a690	Reland "[llvm][NFC] Use c++17 style variable type traits" This reverts commit `632a389f96`. This relands commit `1834a310d0`. Differential Revision: https://reviews.llvm.org/D137493	2022-11-08 14:15:15 +00:00
Nathan James	632a389f96	Revert "[llvm][NFC] Use c++17 style variable type traits" This reverts commit `1834a310d0`.	2022-11-08 13:11:41 +00:00
Nathan James	1834a310d0	[llvm][NFC] Use c++17 style variable type traits This was done as a test for D137302 and it makes sense to push these changes Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D137493	2022-11-08 12:22:52 +00:00
Petar Avramovic	838d5d371a	AMDGPU/GlobalISel: Fix combine crash because LI is not set in prelegalizer Caused by legacy min/max combines (select + cmp) asking for legalizer info in prelegalizer (D135047 added combine to all_combines). Combine still does not work for AMDGPU since destination opcode is custom, not legal. Similar combine works on DAG since it asks for legal or custom. Differential Revision: https://reviews.llvm.org/D137274	2022-11-08 12:46:16 +01:00
Tobias Hieta	aa99b607b5	[clang][pdb] Don't include -fmessage-length in PDB buildinfo As discussed in https://reviews.llvm.org/D136474 -fmessage-length creates problems with reproduciability in the PDB files. This patch just drops that argument when writing the PDB file. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D137322	2022-11-08 10:05:59 +01:00
Mingming Liu	36e8e19337	[NFC][BlockPlacement]Add an option to renumber blocks based on function layout order. Use case: - When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwhile blocks could be numbered in a different order. - As a result, it's hard to map between the graph and pass output. With this option on, the basic blocks are renumbered in function layout order. This option is only useful when a function is to be visualized (i.e., when view options are on) to make it debugging only. Use https://godbolt.org/z/5WTW36bMr as an example: - As MBP pass output (shown in godbolt output window), `func2` is in a basic block numbered `2` (`bb.2`), and `func1` is in a basic block numbered `3` (`bb.3`); `bb.3` is a block with higher block frequency than `bb.2`, and `bb.3` is placed before `bb.2` in the functin layout. - Use [1] to get the dot graph (graph uploaded in [2]), the blocks are re-numbered. - `func1` is in 'if.end' block, and labeled `1` in visualized dot; `func2` is in 'if.then' blocks, and labeled `3` --> the labeled number and bb number won't map. - [[ `b5626ae975/llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp (L127)` \| DOTGraphTraits<MachineBlockFrequencyInfo *>::getNodeLabel ]] is where labeled numbers are based on function layout number, and [[ `a8d93783f3/llvm/include/llvm/Support/GraphWriter.h (L209)` \| called by graph writer ]]. So call 'MachineFunction::RenumberBlocks' would make labeled number (in dot graph) and block number (in pass output) consistent with each other. [1] `./bin/clang++ -O3 -S -mllvm -view-block-layout-with-bfi=count -mllvm -view-bfi-func-name=_Z9func_loopv -mllvm -print-after=block-placement -mllvm -filter-print-funcs=_Z9func_loopv test.c` [2] {F25201785} Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D137467	2022-11-07 07:52:45 -08:00
Thomas Preud'homme	c8be35293c	[SWP] Recognize mem carried dep with different base The loop-carried dependency detection logic in isLoopCarriedDep relies on the load and store using the same definition for the base register. This misses the case of post-increment loads and stores whose base register are different PHI initialized from the same initial value. This commit extends the logic to accept the load and store having different PHI base address provided that they had the same initial value when entering the loop and are incremented by the same amount in each loop. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D136463	2022-11-07 09:53:41 +00:00
Matt Arsenault	162d9030ab	GlobalISel: Pass through AA metadata for target memory intrinsics The corresponding change for the DAG was done in `fa4aac7335`	2022-11-06 22:14:12 -08:00
Amaury Séchet	82209fd96e	[NFC] Refactor DAGCombiner::foldSelectOfConstants to reduce nesting 2.0	2022-11-05 17:10:06 +00:00
Amaury Séchet	7c05f092c9	[NFC] Refactor DAGCombiner::foldSelectOfConstants to reduce nesting	2022-11-05 16:17:58 +00:00
Shilei Tian	1186e9d59f	[LLVM][AMDGPU] Specialize 32-bit atomic fadd instruction for generic address space The 32-bit floating-point atomic add instructions on AMDGPUs does not support a "flat" or "generic" address space. So, if the address space cannot be determined statically, the AMDGPU backend will fall back to a CAS loop (which does support "flat" addressing). Instead, this patch emits runtime address-space checks to allow native FP atomic add instructions for global and LDS memory (and non-atomic FP add instructions for private/scratch memory). In order to do that, this patch introduces a new interface function `emitExpandAtomicRMW`. It is expected to be called when a common atomic expand doesn't work for a specific target, such as the case we discussed here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129690	2022-11-04 14:11:05 -04:00
Nikita Popov	304f1d59ca	[IR] Switch everything to use memory attribute This switches everything to use the memory attribute proposed in https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579. The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly attributes are dropped. The readnone, readonly and writeonly attributes are restricted to parameters only. The old attributes are auto-upgraded both in bitcode and IR. The bitcode upgrade is a policy requirement that has to be retained indefinitely. The IR upgrade is mainly there so it's not necessary to update all tests using memory attributes in this patch, which is already large enough. We could drop that part after migrating tests, or retain it longer term, to make it easier to import IR from older LLVM versions. High-level Function/CallBase APIs like doesNotAccessMemory() or setDoesNotAccessMemory() are mapped transparently to the memory attribute. Code that directly manipulates attributes (e.g. via AttributeList) on the other hand needs to switch to working with the memory attribute instead. Differential Revision: https://reviews.llvm.org/D135780	2022-11-04 10:21:38 +01:00
Haohai Wen	e419620fc2	[CodeGenPrep] Change ValueToSExts from DeseMap to MapVector mergeSExts iterates throught ValueToSExts. Using DenseMap result in unstable optimization path so that output IR may vary even if the input IR is same. Reviewed By: wxiao3 Differential Revision: https://reviews.llvm.org/D137234	2022-11-04 11:15:18 +08:00

1 2 3 4 5 ...

33175 Commits