Commit Graph

33175 Commits

Author SHA1 Message Date
Philip Reames f8c63a7fbf [SDAG] Allow scalable vectors in ComputeNumSignBits
This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

Differential Revision: https://reviews.llvm.org/D137141
2022-11-18 10:50:06 -08:00
Matt Arsenault 08ec15e44b AMDGPU/GlobalISel: Fix strictfp fmul 2022-11-18 08:53:49 -08:00
Philip Reames bc0fea0d55 [SDAG] Allow scalable vectors in ComputeKnownBits
his is the SelectionDAG equivalent of D136470, and is thus an alternate patch to D128159.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

This patch also includes an implementation for SPLAT_VECTOR as without it, the lane wise reasoning has no base case. The original patch which inspired this (D128159), also included STEP_VECTOR. I plan to do that as a separate patch.

Differential Revision: https://reviews.llvm.org/D137140
2022-11-18 07:40:32 -08:00
Alexander Timofeev 32bd75716c PEI should be able to use backward walk in replaceFrameIndicesBackward.
The backward register scavenger has correct register
liveness information. PEI should leverage the backward register scavenger.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137574
2022-11-18 15:57:34 +01:00
Phoebe Wang d558255650 [X86] Use lock add/sub for cases that we only care about the EFLAGS
This fixes #36373, #36905 and partial of #58685.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137711
2022-11-18 21:43:47 +08:00
Benjamin Maxwell 34d88cf6cf [DAG] Allow folding AND of anyext masked_load with >1 user to zext version
This now allows folding an AND of a anyext masked_load to a
zext_masked_load even if the masked load has multiple users.  Doing is
eliminates some redundant ANDs/MOVs for certain AArch64 SVE code.

I'm not sure if there's any cases where doing this could negatively the
other users of the masked_load.  Looking at other optimizations of
masked loads, most don't apply if the load is used more than once, so it
doesn't look like this would interfere.

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D137844
2022-11-18 10:38:09 +00:00
luxufan 18c5f3c35d [RegisterScavenger][RISCV] Don't search for FrameSetup instrs if we were searching from Non-FrameSetup instrs
Otherwise, the spill position may point to position where before
FrameSetup instructions. In which case, the spill instruction may store
to caller's frame since the stack pointer has not been adjustted.

Fixes https://github.com/llvm/llvm-project/issues/58286

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135693
2022-11-18 15:13:52 +08:00
Matt Arsenault fe5b9a6a11 AMDGPU/GlobalISel: Make strict fadd, fmul and fma legal 2022-11-17 20:50:04 -08:00
YingChi Long 7a715bf317
[VP] Add support for vp.inttoptr & vp.ptrtoint
Add vp.inttoptr & vp.ptrtoint support by lowering them into
vp.zext / vp.truncate with in SelectionDAGBuilder.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137169
2022-11-18 10:42:24 +08:00
Stanislav Mekhanoshin bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Philip Reames 4105794e66 [SDAG] Assert we don't see scalable VECTOR_SHUFFLES
It was pointed out in review of D137140 that this case should be impossible.  This patch converts an existing bailout into an assert instead.
2022-11-17 08:18:51 -08:00
Alex Richardson 754d25844a [CGP] Update MemIntrinsic alignment if possible
Previously it was only being done if shouldAlignPointerArgs() returned
true, which right now is only true for ARM targets.
Updating the argument alignment attributes of memcpy/memset intrinsics
if the underlying object has larger alignment can be beneficial even
when CGP didn't increase alignment (as can be seen from the test changes),
so invert the loop and if condition.

Differential Revision: https://reviews.llvm.org/D134281
2022-11-17 11:59:35 +00:00
Anton Sidorenko b6c790736e [MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns
This patch adds tranformation of fmul+fadd/fsub chains to fused multiply
instructions:
  * fmul+fadd->fmadd
  * fmul+fsub->fmsub/fnmsub

We also will try to combine these instructions if the fmul has more than one use
and cannot be deleted. However, removing the dependence between fmul and fadd can
still be profitable, and we rely on machine combiner approximations of scheduling.

Differential Revision: https://reviews.llvm.org/D136764
2022-11-17 13:24:04 +03:00
Jay Foad 96a661de4b [GlobalISel] Better verification of G_UNMERGE_VALUES
Verify three cases of G_UNMERGE_VALUES separately:

1. Splitting a vector into subvectors (the converse of
   G_CONCAT_VECTORS).
2. Splitting a vector into its elements (the converse of
   G_BUILD_VECTOR).
3. Splitting a scalar into smaller scalars (the converse of
   G_MERGE_VALUES).

Previously #1 allowed strange combinations like this:
  %1:_(<2 x s16>),%2:_(<2 x s16>) = G_UNMERGE_VALUES %0(<2 x s32>)
This has been tightened up to check that the source and destination
element types match, and some MIR test cases updated accordingly.

Differential Revision: https://reviews.llvm.org/D111132
2022-11-17 08:19:57 +00:00
Sinan Lin 4ad8952d2d [CodeGen][BasicBlockSections] Fix wrong alignment directive placement in
basic block section cases

MachineBlockPlacement pass sets an alignment attribute to the loop
header MBB and this attribute will lead to an alignment directive during
emitting asm. In the case of the basic block section, the alignment
directive is put before the section label, and thus the alignment is set
to the predecessor of the loop header, which is not what we expect and
increases the code size (both inserting nop and set section alignment).

Reviewed By: rahmanl

Differential Revision: https://reviews.llvm.org/D137535
2022-11-17 15:01:57 +08:00
zhongyunde 8fbb6f8678 [NFC] Fix typo in comment
Address comment in https://reviews.llvm.org/D137936

Differential Revision: https://reviews.llvm.org/D138124
2022-11-16 23:35:53 +08:00
David Green 71609871dd [AArch64][MachineCombiner] Use MIMetadata to copy pcsections metadata to reassociated instructions.
D134260/D138107 exposed that the MachineCombiner was not copying
pcsections metadata where it should. This patch switches the MIBuild
methods to use MIMetadata that can copy the debug loc and pcsections at
the same time.

Differential Revision: https://reviews.llvm.org/D138112
2022-11-16 13:22:48 +00:00
Simon Pilgrim a92f5a08a1 [DAG] simplifySelect - add support for vselect(0, T, F) -> F fold
We still need to add handling for the non-zero T fold (which requires getBooleanContents handling)
2022-11-16 13:11:14 +00:00
OCHyams a1ac6efcb0 [NFC][SelectionDAG][DebugInfo] Refactor DanglingDebugInfo class
Hide the underlying DbgValueInst by adding methods to extract the necessary
information and by adding a raw_ostream &operator<< overload to print it.

Remove the DebugLoc field as this is always the same as the DbgValueInst's
DebugLoc (see D136247).

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D136249
2022-11-16 10:10:24 +00:00
OCHyams 9792744650 [NFC][SelectionDAG][DebugInfo] Remove duplicate parameter from handleDebugValue
handleDebugValue has two DebugLoc parameters that appear to always take the
same value. Remove one of the duplicate parameters. See phabricator review for
more detail.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D136247
2022-11-16 09:59:35 +00:00
Matt Arsenault 116c894d72 DAG: Fix assert on load casted to vector with attached range metadata
AMDGPU legalizes i64 loads to loads of <2 x i32>, leaving the
i64 MMO with attached range metadata alone. The known bit width
was using the scalar element type, and asserting on a mismatch.
2022-11-15 23:28:55 -08:00
Yeting Kuo ed9638c44b [VP][RISCV] Add vp.nearbyint and RISC-V support.
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137685
2022-11-16 14:05:35 +08:00
Yeting Kuo 5c3ca10b09 [VP][RISCV] Add vp.bswap and RISC-V support.
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137928
2022-11-16 11:36:38 +08:00
Craig Topper f387918dd8 [TargetLowering][RISCV][ARM][AArch64][Mips] Reduce the number of AND mask constants used by BSWAP expansion.
We can reuse constants if we use SRL followed by AND and AND followed by SHL.
Similar was done to bitreverse previously.

Differential Revision: https://reviews.llvm.org/D138045
2022-11-15 14:36:01 -08:00
Fangrui Song 6c7666a408 Revert D137574 "PEI should be able to use backward walk in replaceFrameIndicesBackward."
This reverts commit e05ce03cfa.

Caused asan use-after-poison to 4 DebugInfo/AMDGPU/ tests.
Triggered in PEI::replaceFrameIndicesBackward called llvm::MachineInstr::getNumOperands
2022-11-15 19:19:46 +00:00
Sanjay Patel fe05a0a3dd [SDAG] avoid udiv/urem transform for vector/scalar type mismatches
This solves the crashing from issue #58994.
I don't know anything about VE, so I don't know if the output
is as expected or even correct.
2022-11-15 11:01:18 -05:00
Alexander Timofeev e05ce03cfa PEI should be able to use backward walk in replaceFrameIndicesBackward.
The backward register scavenger has correct register
liveness information. PEI should leverage the backward register scavenger.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137574
2022-11-15 15:20:25 +01:00
Serge Pavlov ec893da990 [GlobalISel] Remove semantic operand of G_IS_FPCLASS
Instruction G_IS_FPCLASS had an operand that represented floating-point
semantics of its first operand. It allowed types that have the same length,
like `bfloat16` and `half`, to be distinguished. Unfortunately, it is
not sufficient, as other operation still cannot distinguish such types.
Solution of this problem must be more general, so now this operand is removed.

Differential Revision: https://reviews.llvm.org/D138004
2022-11-15 15:48:05 +07:00
Guozhi Wei 11e86868c1 [MachineCSE] Allow CSE for instructions with ignorable operands
Ignorable operands don't impact instruction's behavior, we can safely do CSE on
the instruction.

It is split from D130919. It has big impact to some AMDGPU test cases.
For example in atomic_optimizations_raw_buffer.ll, when trying to check if the
following instruction can be CSEed

  %37:vgpr_32 = V_MOV_B32_e32 0, implicit $exec

Function isCallerPreservedOrConstPhysReg is called on operand "implicit $exec",
this function is implemented as

  -  return TRI.isCallerPreservedPhysReg(Reg, MF) ||
  +  return TRI.isCallerPreservedPhysReg(Reg, MF) || TII.isIgnorableUse(MO) ||
            (MRI.reservedRegsFrozen() && MRI.isConstantPhysReg(Reg));

Both TRI.isCallerPreservedPhysReg and MRI.isConstantPhysReg return false on this
operand, so isCallerPreservedOrConstPhysReg is also false, it causes LLVM failed
to CSE this instruction.

With this patch TII.isIgnorableUse returns true for the operand $exec, so
isCallerPreservedOrConstPhysReg also returns true, it causes this instruction to
be CSEed with previous instruction

  %14:vgpr_32 = V_MOV_B32_e32 0, implicit $exec

So I got different result from here. AMDGPU's implementation of isIgnorableUse
is

  bool SIInstrInfo::isIgnorableUse(const MachineOperand &MO) const {
    // Any implicit use of exec by VALU is not a real register read.
    return MO.getReg() == AMDGPU::EXEC && MO.isImplicit() &&
           isVALU(*MO.getParent()) && !resultDependsOnExec(*MO.getParent());
  }

Since the operand $exec is not a real register read, my understanding is it's
reasonable to do CSE on such instructions.

Because more instructions are CSEed, so I get less instructions generated for
these tests.

Differential Revision: https://reviews.llvm.org/D137222
2022-11-14 19:34:59 +00:00
Nicholas Guy d52e2839f3 [ARM][CodeGen] Add support for complex deinterleaving
Adds the Complex Deinterleaving Pass implementing support for complex numbers in a target-independent manner, deferring to the TargetLowering for the given target to create a target-specific intrinsic.

Differential Revision: https://reviews.llvm.org/D114174
2022-11-14 14:02:27 +00:00
Nikita Popov feda983ff8 [TableGen] Use MemoryEffects to represent intrinsic memory effects (NFCI)
The TableGen implementation was using a homegrown implementation of
FunctionModRefInfo. This switches it to use MemoryEffects instead.
This makes the code simpler, and will allow exposing the full
representational power of MemoryEffects in the future. Among other
things, this will allow us to map IntrHasSideEffects to an
inaccessiblemem readwrite, rather than just ignoring it entirely
in most cases.

To avoid layering issues, this moves the ModRef.h header from IR
to Support, so that it can be included in the TableGen layer.

Differential Revision: https://reviews.llvm.org/D137641
2022-11-14 10:52:04 +01:00
chenglin.bi 8482247900 [GlobalISel] Correct constant type in matchReassocConstantInnerLHS
When we match a pattern from m_GCst, the register type could be different from original op. So we can't replace the original op to vreg direct.
This code create a new constant with original op type then replace the original op.

Fix #58906

Reviewed By: arsenm, aemerson

Differential Revision: https://reviews.llvm.org/D137778
2022-11-13 19:20:07 +08:00
Matt Arsenault 3cfa03856f AtomicExpand: Support cmpxchg expansion for small FP types
Handles f16 atomics for AMDGPU.
2022-11-10 22:16:11 -08:00
Nick Desaulniers f2981a3bc9 [SelectDagISEL] refactor HandlePHINodesInSuccessorBlocks NFC.
While working on this code to support outputs from callbr along indirect
branches, I kept making these changes again and again. Precommit these.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137445
2022-11-10 14:34:23 -08:00
Alexander Timofeev 27091e6227 [PEI][NFC] Refactoring of the debug instructions frame index replacement
This is required for the upcoming backward PEI::replaceFrameIndices version.
Both forward and backward versions will use same code for debug instruction processing.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137741
2022-11-10 14:02:03 +01:00
wlei 47b0758049 [SampleFDO] Persist profile staleness metrics into binary
With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system.

For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF `.llvm_stats` section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(`-persist-profile-staleness`)

In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D136698
2022-11-09 22:34:33 -08:00
chenglin.bi 597f444092 [TypePromotion] Replace Zext to Truncate for the case src bitwidth is larger
Fix: https://github.com/llvm/llvm-project/issues/58843

Reviewed By: samtebbs

Differential Revision: https://reviews.llvm.org/D137613
2022-11-09 05:08:01 +08:00
Nathan James 6aa050a690 Reland "[llvm][NFC] Use c++17 style variable type traits"
This reverts commit 632a389f96.

This relands commit
1834a310d0.

Differential Revision: https://reviews.llvm.org/D137493
2022-11-08 14:15:15 +00:00
Nathan James 632a389f96 Revert "[llvm][NFC] Use c++17 style variable type traits"
This reverts commit 1834a310d0.
2022-11-08 13:11:41 +00:00
Nathan James 1834a310d0
[llvm][NFC] Use c++17 style variable type traits
This was done as a test for D137302 and it makes sense to push these changes

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D137493
2022-11-08 12:22:52 +00:00
Petar Avramovic 838d5d371a AMDGPU/GlobalISel: Fix combine crash because LI is not set in prelegalizer
Caused by legacy min/max combines (select + cmp) asking for legalizer info
in prelegalizer (D135047 added combine to all_combines).
Combine still does not work for AMDGPU since destination opcode is custom,
not legal. Similar combine works on DAG since it asks for legal or custom.

Differential Revision: https://reviews.llvm.org/D137274
2022-11-08 12:46:16 +01:00
Tobias Hieta aa99b607b5
[clang][pdb] Don't include -fmessage-length in PDB buildinfo
As discussed in https://reviews.llvm.org/D136474 -fmessage-length
creates problems with reproduciability in the PDB files.

This patch just drops that argument when writing the PDB file.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D137322
2022-11-08 10:05:59 +01:00
Mingming Liu 36e8e19337 [NFC][BlockPlacement]Add an option to renumber blocks based on function layout order.
Use case:
- When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwhile blocks could be numbered in a different order.
- As a result, it's hard to map between the graph and pass output. With this option on, the basic blocks are renumbered in function layout order.

This option is only useful when a function is to be visualized (i.e., when view options are on) to make it debugging only.

Use https://godbolt.org/z/5WTW36bMr as an example:
- As MBP pass output (shown in godbolt output window), `func2` is in a basic block numbered `2` (`bb.2`), and `func1` is in a basic block numbered `3` (`bb.3`);
    `bb.3` is a block with higher block frequency than `bb.2`, and `bb.3` is placed before `bb.2` in the functin layout.
- Use [1] to get the dot graph (graph uploaded in [2]), the blocks are re-numbered.
   - `func1` is in 'if.end' block, and labeled `1` in visualized dot; `func2` is in 'if.then' blocks, and labeled `3` --> the labeled number and bb number won't map.
   - [[ b5626ae975/llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp (L127) | DOTGraphTraits<MachineBlockFrequencyInfo *>::getNodeLabel ]] is where labeled numbers are based on function layout number, and [[ a8d93783f3/llvm/include/llvm/Support/GraphWriter.h (L209)
 | called by graph writer ]].
        So call 'MachineFunction::RenumberBlocks' would make labeled number (in dot graph) and block number (in pass output) consistent with each other.

[1] `./bin/clang++ -O3 -S -mllvm -view-block-layout-with-bfi=count -mllvm -view-bfi-func-name=_Z9func_loopv -mllvm -print-after=block-placement -mllvm  -filter-print-funcs=_Z9func_loopv test.c`

[2] {F25201785}

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D137467
2022-11-07 07:52:45 -08:00
Thomas Preud'homme c8be35293c [SWP] Recognize mem carried dep with different base
The loop-carried dependency detection logic in isLoopCarriedDep relies
on the load and store using the same definition for the base register.
This misses the case of post-increment loads and stores whose base
register are different PHI initialized from the same initial value.

This commit extends the logic to accept the load and store having
different PHI base address provided that they had the same initial value
when entering the loop and are incremented by the same amount in each
loop.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D136463
2022-11-07 09:53:41 +00:00
Matt Arsenault 162d9030ab GlobalISel: Pass through AA metadata for target memory intrinsics
The corresponding change for the DAG was done in fa4aac7335
2022-11-06 22:14:12 -08:00
Amaury Séchet 82209fd96e [NFC] Refactor DAGCombiner::foldSelectOfConstants to reduce nesting 2.0 2022-11-05 17:10:06 +00:00
Amaury Séchet 7c05f092c9 [NFC] Refactor DAGCombiner::foldSelectOfConstants to reduce nesting 2022-11-05 16:17:58 +00:00
Shilei Tian 1186e9d59f [LLVM][AMDGPU] Specialize 32-bit atomic fadd instruction for generic address space
The 32-bit floating-point atomic add instructions on AMDGPUs does not support a
"flat" or "generic" address space. So, if the address space cannot be determined
statically, the AMDGPU backend will fall back to a CAS loop (which does support
"flat" addressing). Instead, this patch emits runtime address-space checks to
allow native FP atomic add instructions for global and LDS memory (and non-atomic
FP add instructions for private/scratch memory).

In order to do that, this patch introduces a new interface function
`emitExpandAtomicRMW`. It is expected to be called when a common atomic expand
doesn't work for a specific target, such as the case we discussed here.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129690
2022-11-04 14:11:05 -04:00
Nikita Popov 304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Haohai Wen e419620fc2 [CodeGenPrep] Change ValueToSExts from DeseMap to MapVector
mergeSExts iterates throught ValueToSExts. Using DenseMap result in
unstable optimization path so that output IR may vary even if the input
IR is same.

Reviewed By: wxiao3

Differential Revision: https://reviews.llvm.org/D137234
2022-11-04 11:15:18 +08:00