Use GCNHazardRecognizer in postra sched.
Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D109536
Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
Previously we assumed all callable functions did not need any
implicitly passed inputs, and added attributes to functions to
indicate when they were necessary. Requiring attributes for
correctness is pretty ugly, and it makes supporting indirect and
external calls more complicated.
This inverts the direction of the attributes, so an undecorated
function is assumed to need all implicit imputs. This enables
AMDGPUAttributor by default to mark when functions are proven to not
need a given input. This strips the equivalent functionality from the
legacy AMDGPUAnnotateKernelFeatures pass.
However, AMDGPUAnnotateKernelFeatures is not fully removed at this
point although it should be in the future. It is still necessary for
the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which
would be better served by a trivial analysis on the IR during
selection. Additionally, AMDGPUAnnotateKernelFeatures still
redundantly handles the uniform-work-group-size attribute to be
removed in a future commit.
At this point when not using -amdgpu-fixed-function-abi, we are still
modifying the ABI based on these newly negated attributes. In the
future, this option will be removed and the locations for implicit
inputs will always be fixed. We will then use the new attributes to
avoid passing the values when unnecessary.
This allows to lower an LDS variable into a kernel structure
even if there is a constant expression used from different
kernels.
Differential Revision: https://reviews.llvm.org/D103655
Before packing LDS globals into a sorted structure, make sure that
their alignment is properly updated based on their size. This will make
sure that the members of sorted structure are properly aligned, and
hence it will further reduce the probability of unaligned LDS access.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103261
Before packing LDS globals into a sorted structure, make sure that
their alignment is properly updated based on their size. This will make
sure that the members of sorted structure are properly aligned, and
hence it will further reduce the probability of unaligned LDS access.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103261
There is a trivial but severe bug in the recent code collecting
LDS globals used by kernel. It aborts scan on the first constant
without scanning further uses. That leads to LDS overallocation
with multiple kernels in certain cases.
Differential Revision: https://reviews.llvm.org/D103190
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.
Differential Revision: https://reviews.llvm.org/D90607
Alignment requirements for ds_read/write_b96/b128 for gfx9 and onward are
now the same as for other GCN subtargets. This way we can avoid any
unintentional use of these instructions on systems that do not support dword
alignment and instead require natural alignment.
This also makes 'SH_MEM_CONFIG.alignment_mode == STRICT' the default.
Differential Revision: https://reviews.llvm.org/D87821
This would assert with unaligned DS access enabled. The offset may not
be aligned. Theoretically the pattern predicate should check the
memory alignment, although it is possible to have the memory be
aligned but not the immediate offset.
In this case I would expect it to use ds_{read|write}_b64 with
unaligned access, but am not clear if there's a reason it doesn't.
Do not break down local loads and stores so ds_read/write_b96/b128 in
ISelLowering can be selected on subtargets that support them and if align
requirements allow them.
Differential Revision: https://reviews.llvm.org/D84403
Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.
Differential Revision: https://reviews.llvm.org/D82788
The previous implementation was incorrect, and based off incorrect
instruction definitions. Unfortunately we can't match natural
addressing in a lot of cases due to the shift/scale applied in
getelementptrs. This relies on reducing the 64-bit shift to 32-bits.
Summary:
Mem op clustering adds a weak edge in the DAG between two loads or
stores that should be clustered, but the direction of this edge is
pretty arbitrary (it depends on the sort order of MemOpInfo, which
represents the operands of a load or store). This often means that two
loads or stores will get reordered even if they would naturally have
been scheduled together anyway, which leads to test case churn and goes
against the scheduler's "do no harm" philosophy.
The fix makes sure that the direction of the edge always matches the
original code order of the instructions.
Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover
Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72706
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.
Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.
Some notes:
- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
to a constant at compile times, which means some tests can no longer
be applied.
The current "solution" is a terrible hack, but the intrinsic isn't
used by Mesa, so we can keep it for now.
- We no longer know the full LDS size per kernel at compile time, which
means that we can no longer generate a relevant error message at
compile time. It would be possible to add a check for the size of
individual variables, but ultimately the linker will have to perform
the final check.
Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275
Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin
Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61494
llvm-svn: 364297
Summary:
This handles def-after-use of physregs, and allows us to merge loads and
stores even across some physreg defs (typically M0 defs).
Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b
Reviewers: arsenm, mareko, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D42647
llvm-svn: 325882
GFX9 stopped using m0 for most DS instructions. Select
a different instruction without the use. I think this will
be less error prone than trying to manually maintain m0
uses as needed.
llvm-svn: 319270
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
hange explores the fact that LDS reads may be reordered even if access
the same location.
Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.
Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.
Differential Revision: https://reviews.llvm.org/D25944
llvm-svn: 285919
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.
llvm-svn: 256075
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.
InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.
llvm-svn: 250129
If the read2 produced was supposed to be writing into a
super register, it would use the wrong subregister indices.
Fix this by inserting copies, so we only ever write to a vreg_64.
Run the register coalescer again to clean this up, although this
isn't ideal and often does result in an extra move.
Also remove the assert that offset1 > offset0.
There isn't a real reason to not allow this other than a minor
convenience in the compiler, and it doesn't seem worth the effort
of avoiding it.
llvm-svn: 242174