We added a new post-isel CTRLoop pass in D122125. That pass will expand
the hardware loop related intrinsic to CTR loop or normal loop based
on the loop context. So we don't need to conservatively check the CTR
clobber now on the IR level.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D135847
There are a few issues with the code we generate for atomic operations and the way we generate it:
- Hard coded CR0 for compares
- Order of operands for compares not conducive to
emitting compare-immediate or for CSE of compares
- Missing MachineMemOperand for st[bhwd]cx intrinsics
- Missing intrinsic properties for the same
- Unnecessary blocks with store conditional
instructions to clear reservation (which ends
up hindering performance)
- Move from CR instructions just to compare the
result of a store conditional with zero (even
though it is a record-form)
This patch aims to resolve all of those issues.
Differential revision: https://reviews.llvm.org/D134783
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet. The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.
This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.
I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.
Differential Revision: https://reviews.llvm.org/D132287
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.
E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.
Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D117723
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
In some passes we need a valid number of cache line size to do analysis or
transformation, e.g., loop cache analysis and loop date prefetch. However,
for some backend targets, `TTIImpl->getCacheLineSize()` is not implemented
and hence 'TTI.getCacheLineSize()' would just return 0 which eventually might
produce invalid result.
In this patch we add a user-specified opt/llc option for cache line size.
If the option is specified by users we use the value supplied, otherwise we
fall-back to the default value obtained from `TTIImpl->->getCacheLineSize()`.
The powerpc target already has such an option, this patch generalizes
this option to TargetTransformInfo.cpp.
Reviewed By: bmahjour, #loopoptwg
Differential Revision: https://reviews.llvm.org/D127342
This fixes bug 55463, similar to D78668. This is a temporary fix since
we will switch to post-isel CTR loop determination in the future.
Reviewed By: dim, shchenz
Differential Revision: https://reviews.llvm.org/D125746
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.
This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.
This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).
Differential Revision: https://reviews.llvm.org/D116031
Commit 150681f increases
cost of producing MMA types (vector pair and quad).
However, it increases the cost for getUserCost() which is
used in unrolling. As a result, loops that contain these
types already (from the user code) cannot be unrolled
(even with the user's unroll pragma). This was an unintended
sideeffect. Reverting that portion of the commit to allow
unrolling such loops.
Differential revision: https://reviews.llvm.org/D115424
This patch prevents the optimizer from producing wide vectors in the IR,
specifically the MMA types (v256i1, v512i1). The idea is that on Power, we only
want to be producing these types only if the vector_pair and vector_quad types
are used specifically.
To prevent the optimizer from producing these types in the IR,
vectorCostAdjustmentFactor() is updated to return an instruction cost factor or
an invalid instruction cost if the current type is that of an MMA type. An
invalid instruction cost returned by this function signifies to other cost
computing functions to return the maximum instruction cost to inform the
optimizer that producing these types within the IR is expensive, and should not
be produced in the first place.
This issue was first seen in the test case included within this patch.
Differential Revision: https://reviews.llvm.org/D113900
It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual
FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this
intrinsic needs to remain as a call as it cannot be lowered to any instruction, which
also means we need to disable CTR loop generation for fma involving the ppcf128 type.
This patch accomplishes this behaviour.
Differential Revision: https://reviews.llvm.org/D107914
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D107271
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".
Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D96805
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D98874