Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D109535
Added support to check if architecture supports s_mulhi which is used as part of
the decision whether or not to use valu 24 bit mul (if the mulhi gets
transformed to a valu op anyway, then may as well use it).
This is an extension of the work in D97063
Differential Revision: https://reviews.llvm.org/D103321
Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14
Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.
The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.
There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.
Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.
Differential Revision: https://reviews.llvm.org/D100149
Add SReg_224, VReg_224, AReg_224, etc.
Link 224-bit types with v7i32/v7f32.
Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D104622
- Take the same principle as the conversion from f64 to i64 with extra
necessary pre- and post-processing. It helps to reduce that conversion
sequence by half compared to legacy one.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D104427
We were not reporting isFNegFree for v2f32, although it is effectively
free after legalization. The generic combine was pulling fneg out of
the fma source operands, and the AMDGPU combine was doing the
opposite.
In this particular example, we had a crash when compiling it
for several architectures. This patch extends the legalization
of extract_subvector to avoid this problem.
Differential Revision: https://reviews.llvm.org/D103344
When flat scratch is used, the stack pointer needs to be added when
writing arguments to the stack.
For buffer instructions, this is done in SelectMUBUFScratchOffen
and SelectMUBUFScratchOffset.
Move that to call argument lowering, like it is done in GlobalISel.
Differential Revision: https://reviews.llvm.org/D103166
Accesses to global module LDS variable start from null,
but kernel also thinks its variables start address is
null. Fixed by not using a null as an address.
Differential Revision: https://reviews.llvm.org/D102882
Improve the code generation of fp_to_sint
and fp_to_uint for integer on 16-bits.
Differential Revision: https://reviews.llvm.org/D101481
Patch by Julien Pagès!
Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when
possible. This significantly improves some game cases by eliminating
v_readfirstlane instructions when the result feeds into a scalar
operation, like the address calculation for a scalar load or store.
Since isDivergent is only an approximation of whether a value is in
SGPRs, it can potentially regress some situations where a uniform value
ends up in a VGPR. These should be rare in real code, although the test
changes do contain a number of examples.
Most of the test changes are just using s_mul instead of v_mul/mad which
is generally better for both register pressure and latency (at least on
GFX10 where sgpr pressure doesn't affect occupancy and vector ALU
instructions have significantly longer latency than scalar ALU). Some
R600 tests now use MULLO_INT instead of MUL_UINT24.
GlobalISel appears to handle more scenarios in the desirable way,
although it can also be thrown off and fails to select the 24-bit
multiplies in some cases.
Alternative solution considered and rejected was to allow selecting
MUL_[UI]24 to S_MUL_I32. I've rejected this because the definition of
those SD operations works is don't-care on the most significant 8 bits,
and this fact is used in some combines via SimplifyDemandedBits.
Based on a patch by Nicolai Hähnle.
Differential Revision: https://reviews.llvm.org/D97063
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.
Differential Revision: https://reviews.llvm.org/D88540
This commit marks i16 MULH as expand in AMDGPU backend,
which is necessary after the refactoring in D80485.
Differential Revision: https://reviews.llvm.org/D89965
These were introduced in r279902 on the grounds that using separate
MUL_U24/MUL_I24 and MULHI_U24/MULHI_I24 nodes would introduce multiple
uses of the operands, which would prevent SimplifyDemandedBits from
simplifying the operands.
This has since been fixed by D24672 "AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations"
No functional change intended. At least it has no effect on lit tests.
Differential Revision: https://reviews.llvm.org/D89706
In most of lib/Target we know that we are not dealing with scalable
types so it's perfectly fine to replace TypeSize comparison operators
with their fixed width equivalents, making use of getFixedSize()
and so on.
Differential Revision: https://reviews.llvm.org/D89101
The versions that take 'unsigned' will be removed in the future.
I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D87592
From the code after the 'break', they are processing 64bit scalar and
vector bitcast. So I think the break-condition should be (cond1 || cond2)
This means we only execute following code if (64bit and dest-is-vector).
Also remove a previous fix which is not needed with this new fix.
(introduced in: 1349a04ef5)
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D85804
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85220
This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend.
Differential Revision: https://reviews.llvm.org/D84056
Range that f16 can represent fits into i32.
Lower as f16->i32->i64 instead of f16->f32->i64
since f32->i64 has long expansion.
Differential Revision: https://reviews.llvm.org/D84166
These are treated identically to value aggregates placed in the kernel
argument list. A %struct.foo or %struct.foo addrspace(4)*
byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should
produce the same offsets and argument metadata.
This handles all 3 kernel ABI implementations, and the two HSA
metadata emission paths.
Fix two obvious errors in the code and also update the test check.
Also add one test to catch the failure.
Patch by Ruiling Song!
Differential Revision: https://reviews.llvm.org/D83280
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
Implement them on top of sdiv/udiv, similar to what we do for integer
types.
Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.
Differential Revision: https://reviews.llvm.org/D81511
This was promoting booleans to i32 to perform a comparison against
them to feed to a select condition. Just use the booleans
directly. This produces the same final code, since the combiner is
unable to undo the mess this creates. I untangled this logic when I
ported this code to GlobalISel, so port the cleanups back.
Tweak a few constant expressions involving numbers::pi etc to avoid
rounding errors. NFCI though it's possible some of these will now be
more accurate in the last bit.
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.
This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638
Reviewed By: RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D77319
We need to use it to handle <16 x double> indirect indexes
in the AMDGPU BE.
The only visible change from adding it is in ARM cost model.
To me it looks reasonable. With doubling a vector size it
quadruples the cost up to the size 8 and then it did only
double it. Now it also quadruples, which seems a logical
progression to me.
Actual AMDGPU code is to follow, this is a common part, plus
load/store legalization in the AMDGPU BE not to break what
works now.
Differential Revision: https://reviews.llvm.org/D79952
We can produce such vectors in the Promote Alloca pass,
but we are unable to use movrel to operate it and lower
via scratch. Making it legal makes SI_INDIRECT patterns
work.
There is more work to do in subsequent changes:
1. We initialize m0 twice to access each dword. It shall
be possible to only do it once and increment base register
number instead.
2. We also need v16i64/v16f64 but these first need to be
added to tablegen.
Differential Revision: https://reviews.llvm.org/D79808
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.
This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638
Reviewed By: RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D77319
This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions.
This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default.
Differential Revision: https://reviews.llvm.org/D76070
Instead, emit a trap and a warning. We force inlining of this
situation, so any function where this happens should be dead as
indirect or external calls are not yet supported. This should avoid
erroring on dead code.
Interpret these as extending to the next multiple of 32-bits. This had
no effect with i48 for example, which is really split into {i32, i16},
which should extend the high part.
GetDemandedBits mostly just calls SimplifyMultipleUseDemandedBits now, but it does a very blunt constant simplification that SimplifyMultipleUseDemandedBits avoids.
If we need to demand bits from constants we should handle this through ShrinkDemandedConstant/targetShrinkDemandedConstant.
@arsenm confirmed that the sign extended immediates are better for code size.
Differential Revision: https://reviews.llvm.org/D74857
We probably want this, and I've meant to turn this on for a long
time. SC actually emits a special case to early-out for a 1
denominator, which perhaps should also be considered.
The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not.
This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied.
It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on.
Differential Revision: https://reviews.llvm.org/D74221
https://reviews.llvm.org/D72312 introduced an infinite loop which involves
DAGCombiner::visitFMA and AMDGPUTargetLowering::performFNegCombine.
fma( a, fneg(b), fneg(c) ) => fneg( fma (a, b, c) ) => fma( a, fneg(b), fneg(c) ) ...
This only breaks with types where 'isFNegFree' returns flase, e.g. v4f32.
Reproducing the issue also needs the attribute 'no-signed-zeros-fp-math',
and no source mods allowed on one of the users of the Op.
This fix makes changes to indicate that it is not free to negate a fma if it
has users with source mods.
Differential Revision: https://reviews.llvm.org/D73939
Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.
This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled
Manually select this is as a tablegen workraound. Both SelectionDAG
and GlobalISel end up misplacing the copy to m0 when both instructions
in the output need it. Neither considers that both output instructions
depend on m0. I don't know of any other pattern we need to handle this
case, so it's less effort to just workaround this for now.
Currently BE allows only a little load narrowing because
of the fear it will produce sub-dword ext loads. However,
we can always allow narrowing if we are shrinking one
multi-dword load to another multi-dword load.
In particular we were unable to reduce s_load_dwordx8 into
s_load_dwordx4 if identity shuffle was used to extract
low 4 dwords.
Differential Revision: https://reviews.llvm.org/D73133
I'm mildly worried about potentially reordering exp/exp_done with
IntrWriteMem on the intrinsic.
Requires hacking out the illegal type on SI, so manually select that
case during lowering.
This allows us to clean up some places that were peeking through
the MERGE_VALUES node after the call. By returning the SDValues
directly, we can clean that up.
Unfortunately, there are several call sites in AMDGPU that wanted
the MERGE_VALUES and now need to create their own.
Summary:
At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number".
In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8.
Reviewers: foad, arsenm
Reviewed By: arsenm
Subscribers: foad, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Patch by Eugene Kuznetsov.
Differential Revision: https://reviews.llvm.org/D70367
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).
In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.
Committing this change will significantly reduce our merge conflicts
for each upstream merge.
Reviewers: spatel, bogner
Reviewed By: bogner
Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70917
Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.
This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.
The usage of target boolean checks is overly inflexible, since sext
and zext of a compare are equally cheap. The choice is arbitrary, but
using 0/1 to some degree is the choice of lower resistance since
that's what most targets use. This enables a few combines that don't
bother to support ZeroOrNegativeOneBooleanContent.
Rename old function to explicitly show that it cares only about alignment.
The new allowsMemoryAccess call the function related to alignment by default
and can be overridden by target to inform whether the memory access is legal or
not.
Differential Revision: https://reviews.llvm.org/D67121
llvm-svn: 372935
The static analyzer is warning about a potential null dereference, but we should be able to use cast<LoadSDNode> directly and if not assert will fire for us.
llvm-svn: 372528
* Reordered MVT simple types to group scalable vector types
together.
* New range functions in MachineValueType.h to only iterate over
the fixed-length int/fp vector types.
* Stopped backends which don't support scalable vector types from
iterating over scalable types.
Reviewers: sdesmalen, greened
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D66339
llvm-svn: 372099
The same stack is loaded for each workitem ID, and each use. Nothing
prevents you from creating multiple fixed stack objects with the same
offsets, so this was creating a load for each unique frame index,
despite them being the same offset. Re-use the same frame index so the
loads are CSEable.
llvm-svn: 371148