Currently per-function metadata consists of:
(start-pc, size, features)
This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D136078
I had reverted this before the holiday week because a problem was reported with a related change (D137140 - scalable vector known bits in DAG). I had initially confused the two patches, and then decided to leave this reverted out an abundance of caution. Now that we're through the holiday week, reapplying.
I also roled in fixes for several post commit review comments that hadn't landed with the original change.
Original commit message
This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.
The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.
Differential Revision: https://reviews.llvm.org/D137141
As discussed on Issue #59217, under certain circumstances the DAG can generate duplicate MUL and MUL_LOHI nodes, often during MULO legalization.
This patch attempts to replace MUL nodes with additional uses of the LO result from the MUL_LOHI node
Differential Revision: https://reviews.llvm.org/D138790
class support and introduce GlobalISel implementation for AMDGPU
Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic
for llvm.is.fpclass
This allows DemandedBits to see that the SVE count intrinsics (CNTB,
CNTH, CNTW, CNTD) sans multiplier will only ever produce small
positive integers. The maximum value you could get here is 256, which
is CNTB on a machine with a 2048bit vector size (the maximum for SVE).
Using this various redundant operations (zexts, sexts, ands, ors, etc)
can be eliminated.
Differential Revision: https://reviews.llvm.org/D138424
When a module contains more than one function, we should update debugify metadata
by increasing the number of variables in the function rather than overwritting it.
Previous revert issue is fixed: I forgot to strip all x86-related info from the
test.
Differential Revision: https://reviews.llvm.org/D136949
When a module contains more than one function, we should update debugify metadata
by increasing the number of variables in the function rather than overwritting it.
Differential Revision: https://reviews.llvm.org/D136949
This patch makes code less readable but it will clean itself after all functions are converted.
Differential Revision: https://reviews.llvm.org/D138665
A `select i1 %c, i1 true, i1 %d` is just an or and a `select i1 %c, i1 %d, i1 false`
is just an and. There are better treated as such in the logic of SelectOpt, allowing
the backend to optimize them to and/or directly.
Differential Revision: https://reviews.llvm.org/D138490
This patch replaces those occurrences of NoneType that would trigger
an error if the definition of NoneType were missing in None.h.
To keep this patch focused, I am deliberately not replacing None with
std::nullopt in this patch or updating comments. They will be
addressed in subsequent patches.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Differential Revision: https://reviews.llvm.org/D138539
or (xor x, y), x --> or x, y
or (xor x, y), y --> or x, y
or (xor x, y), (and x, y) --> or x, y
or (xor x, y), (or x, y) --> or x, y
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D138401
The existing behaviors and callbacks were overlapping and had very
confusing semantics: beginBasicBlock/endBasicBlock were not always
called, beginFragment/endFragment seemed like they were meant to mean
the same thing, but were slightly different, etc. This resulted in
confusing semantics, virtual method overloads, and control flow.
Remove the above, and replace with new beginBasicBlockSection and
endBasicBlockSection callbacks. And document them.
These are always called before the first and after the last blocks in
a function, even when basic-block-sections are disabled.
This is some quick debug messages for the SelectOptimize pass, adding
some information for the costs that are measured from getInstructionCost
calls, and re-using the existing optimization remarks to print some
information about if transforms were performed or not.
Differential Revision: https://reviews.llvm.org/D138108
This patch replaces:
return Optional<T>();
with:
return None;
to make the migration from llvm::Optional to std::optional easier.
Specifically, I can deprecate None (in my source tree, that is) to
identify all the instances of None that should be replaced with
std::nullopt.
Note that "return None" far outnumbers "return Optional<T>();". There
are more than 2000 instances of "return None" in our source tree.
All of the instances in this patch come from functions that return
Optional<T> except Archive::findSym and ASTNodeImporter::import, where
we return Expected<Optional<T>>. Note that we can construct
Expected<Optional<T>> from any parameter convertible to Optional<T>,
which None certainly is.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Differential Revision: https://reviews.llvm.org/D138464
This patch is an alternative of D100091. It solved the problems in `f80` type lowering.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D137946
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D136255
We generate erroneous trace for a basic block if it does not have at least one
predecessor when MinInstr strategy is used. Currently only this strategy is
implemented, so we always have a wrong trace for any entry block. This results in
wrong instructions heights calculation and also leads to wrong critical path.
The described behavior is demonstrated on a simple test. It shows that early
if-conv pass makes wrong decisions due to incorrectly calculated critical path
lenght.
Differential Revision: https://reviews.llvm.org/D138272
This patch replaces NoneType() and NoneType::None with None in
preparation for migration from llvm::Optional to std::optional.
In the std::optional world, we are not guranteed to be able to
default-construct std::nullopt_t or peek what's inside it, so neither
NoneType() nor NoneType::None has a corresponding expression in the
std::optional world.
Once we consistently use None, we should even be able to replace the
contents of llvm/include/llvm/ADT/None.h with something like:
using NoneType = std::nullopt_t;
inline constexpr std::nullopt_t None = std::nullopt;
to ease the migration from llvm::Optional to std::optional.
Differential Revision: https://reviews.llvm.org/D138376
Previously, we used SHA-1 for hashing the CodeView type records.
SHA-1 in `GloballyHashedType::hashType()` is coming top in the profiles. By simply replacing with BLAKE3, the link time is reduced in our case from 15 sec to 13 sec. I am only using MSVC .OBJs in this case. As a reference, the resulting .PDB is approx 2.1GiB and .EXE is approx 250MiB.
Differential Revision: https://reviews.llvm.org/D137101
If LogicNonShiftReg is the same to Shift1Base, and shift1 const is the same to MatchInfo.Shift2 const, CSEMIRBuilder will reuse the old shift1 when build shift2.
So, if we erase MatchInfo.Shift2 at the end, actually we remove old shift1. And it will cause crash later.
Solution for this issue is just erase it earlier to avoid the crash.
Fix#58423
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138187
The major change is falling through to ComputeKnownBits when we don't have an implementation of ComputeNumSignBits due to conservatism over scalable vectors. Right now, we're mostly conservative in the same cases, but this allows our results to improve when we change ComputeKnownBits without also needing to improve ComputeNumSignBits at the same time.
This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.
The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.
Differential Revision: https://reviews.llvm.org/D137141
his is the SelectionDAG equivalent of D136470, and is thus an alternate patch to D128159.
The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.
This patch also includes an implementation for SPLAT_VECTOR as without it, the lane wise reasoning has no base case. The original patch which inspired this (D128159), also included STEP_VECTOR. I plan to do that as a separate patch.
Differential Revision: https://reviews.llvm.org/D137140
This now allows folding an AND of a anyext masked_load to a
zext_masked_load even if the masked load has multiple users. Doing is
eliminates some redundant ANDs/MOVs for certain AArch64 SVE code.
I'm not sure if there's any cases where doing this could negatively the
other users of the masked_load. Looking at other optimizations of
masked loads, most don't apply if the load is used more than once, so it
doesn't look like this would interfere.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D137844
Otherwise, the spill position may point to position where before
FrameSetup instructions. In which case, the spill instruction may store
to caller's frame since the stack pointer has not been adjustted.
Fixes https://github.com/llvm/llvm-project/issues/58286
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D135693
Add vp.inttoptr & vp.ptrtoint support by lowering them into
vp.zext / vp.truncate with in SelectionDAGBuilder.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137169
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
Previously it was only being done if shouldAlignPointerArgs() returned
true, which right now is only true for ARM targets.
Updating the argument alignment attributes of memcpy/memset intrinsics
if the underlying object has larger alignment can be beneficial even
when CGP didn't increase alignment (as can be seen from the test changes),
so invert the loop and if condition.
Differential Revision: https://reviews.llvm.org/D134281
This patch adds tranformation of fmul+fadd/fsub chains to fused multiply
instructions:
* fmul+fadd->fmadd
* fmul+fsub->fmsub/fnmsub
We also will try to combine these instructions if the fmul has more than one use
and cannot be deleted. However, removing the dependence between fmul and fadd can
still be profitable, and we rely on machine combiner approximations of scheduling.
Differential Revision: https://reviews.llvm.org/D136764
Verify three cases of G_UNMERGE_VALUES separately:
1. Splitting a vector into subvectors (the converse of
G_CONCAT_VECTORS).
2. Splitting a vector into its elements (the converse of
G_BUILD_VECTOR).
3. Splitting a scalar into smaller scalars (the converse of
G_MERGE_VALUES).
Previously #1 allowed strange combinations like this:
%1:_(<2 x s16>),%2:_(<2 x s16>) = G_UNMERGE_VALUES %0(<2 x s32>)
This has been tightened up to check that the source and destination
element types match, and some MIR test cases updated accordingly.
Differential Revision: https://reviews.llvm.org/D111132
basic block section cases
MachineBlockPlacement pass sets an alignment attribute to the loop
header MBB and this attribute will lead to an alignment directive during
emitting asm. In the case of the basic block section, the alignment
directive is put before the section label, and thus the alignment is set
to the predecessor of the loop header, which is not what we expect and
increases the code size (both inserting nop and set section alignment).
Reviewed By: rahmanl
Differential Revision: https://reviews.llvm.org/D137535
D134260/D138107 exposed that the MachineCombiner was not copying
pcsections metadata where it should. This patch switches the MIBuild
methods to use MIMetadata that can copy the debug loc and pcsections at
the same time.
Differential Revision: https://reviews.llvm.org/D138112
Hide the underlying DbgValueInst by adding methods to extract the necessary
information and by adding a raw_ostream &operator<< overload to print it.
Remove the DebugLoc field as this is always the same as the DbgValueInst's
DebugLoc (see D136247).
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D136249
handleDebugValue has two DebugLoc parameters that appear to always take the
same value. Remove one of the duplicate parameters. See phabricator review for
more detail.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D136247
AMDGPU legalizes i64 loads to loads of <2 x i32>, leaving the
i64 MMO with attached range metadata alone. The known bit width
was using the scalar element type, and asserting on a mismatch.
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137685
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137928
We can reuse constants if we use SRL followed by AND and AND followed by SHL.
Similar was done to bitreverse previously.
Differential Revision: https://reviews.llvm.org/D138045
This reverts commit e05ce03cfa.
Caused asan use-after-poison to 4 DebugInfo/AMDGPU/ tests.
Triggered in PEI::replaceFrameIndicesBackward called llvm::MachineInstr::getNumOperands
Instruction G_IS_FPCLASS had an operand that represented floating-point
semantics of its first operand. It allowed types that have the same length,
like `bfloat16` and `half`, to be distinguished. Unfortunately, it is
not sufficient, as other operation still cannot distinguish such types.
Solution of this problem must be more general, so now this operand is removed.
Differential Revision: https://reviews.llvm.org/D138004
Ignorable operands don't impact instruction's behavior, we can safely do CSE on
the instruction.
It is split from D130919. It has big impact to some AMDGPU test cases.
For example in atomic_optimizations_raw_buffer.ll, when trying to check if the
following instruction can be CSEed
%37:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
Function isCallerPreservedOrConstPhysReg is called on operand "implicit $exec",
this function is implemented as
- return TRI.isCallerPreservedPhysReg(Reg, MF) ||
+ return TRI.isCallerPreservedPhysReg(Reg, MF) || TII.isIgnorableUse(MO) ||
(MRI.reservedRegsFrozen() && MRI.isConstantPhysReg(Reg));
Both TRI.isCallerPreservedPhysReg and MRI.isConstantPhysReg return false on this
operand, so isCallerPreservedOrConstPhysReg is also false, it causes LLVM failed
to CSE this instruction.
With this patch TII.isIgnorableUse returns true for the operand $exec, so
isCallerPreservedOrConstPhysReg also returns true, it causes this instruction to
be CSEed with previous instruction
%14:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
So I got different result from here. AMDGPU's implementation of isIgnorableUse
is
bool SIInstrInfo::isIgnorableUse(const MachineOperand &MO) const {
// Any implicit use of exec by VALU is not a real register read.
return MO.getReg() == AMDGPU::EXEC && MO.isImplicit() &&
isVALU(*MO.getParent()) && !resultDependsOnExec(*MO.getParent());
}
Since the operand $exec is not a real register read, my understanding is it's
reasonable to do CSE on such instructions.
Because more instructions are CSEed, so I get less instructions generated for
these tests.
Differential Revision: https://reviews.llvm.org/D137222
Adds the Complex Deinterleaving Pass implementing support for complex numbers in a target-independent manner, deferring to the TargetLowering for the given target to create a target-specific intrinsic.
Differential Revision: https://reviews.llvm.org/D114174
The TableGen implementation was using a homegrown implementation of
FunctionModRefInfo. This switches it to use MemoryEffects instead.
This makes the code simpler, and will allow exposing the full
representational power of MemoryEffects in the future. Among other
things, this will allow us to map IntrHasSideEffects to an
inaccessiblemem readwrite, rather than just ignoring it entirely
in most cases.
To avoid layering issues, this moves the ModRef.h header from IR
to Support, so that it can be included in the TableGen layer.
Differential Revision: https://reviews.llvm.org/D137641
When we match a pattern from m_GCst, the register type could be different from original op. So we can't replace the original op to vreg direct.
This code create a new constant with original op type then replace the original op.
Fix#58906
Reviewed By: arsenm, aemerson
Differential Revision: https://reviews.llvm.org/D137778
While working on this code to support outputs from callbr along indirect
branches, I kept making these changes again and again. Precommit these.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137445
This is required for the upcoming backward PEI::replaceFrameIndices version.
Both forward and backward versions will use same code for debug instruction processing.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D137741
With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system.
For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF `.llvm_stats` section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(`-persist-profile-staleness`)
In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D136698
This was done as a test for D137302 and it makes sense to push these changes
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D137493
Caused by legacy min/max combines (select + cmp) asking for legalizer info
in prelegalizer (D135047 added combine to all_combines).
Combine still does not work for AMDGPU since destination opcode is custom,
not legal. Similar combine works on DAG since it asks for legal or custom.
Differential Revision: https://reviews.llvm.org/D137274
As discussed in https://reviews.llvm.org/D136474 -fmessage-length
creates problems with reproduciability in the PDB files.
This patch just drops that argument when writing the PDB file.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D137322