This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Make MIMG NSA minimum addresses threshold an attribute that can
be set on a function or configured via command line.
This enables frontend tuning which allows increased NSA usage
where beneficial.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D134780
Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code more opaque.
Differential Revision: https://reviews.llvm.org/D128121
This patch includes MC layer support for VOP3 encoded instructions and generic VOP support
classes.
Some VOP1 and VOP2 instructions which share an encoding with gfx10 and are using
the AssemblerPredicate = isGFX10Plus are also enabled. That predicate
will be changed to isGFX10Only in a later patch.
Patch 15/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D126468
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D126475
Use default member initializers in AMDGPUSubtarget and subclasses. This
is to guard against adding a new feature boolean in AMDGPUSubtarget.h
but forgetting to initialize it to false in AMDGPUSubtarget.cpp.
This was mostly autogenerated by:
clang-tidy -checks=-*,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/*Subtarget.cpp
Differential Revision: https://reviews.llvm.org/D123613
Summary:
If implicitarg_ptr intrinsic is not used, set implicit kernarg size to 0, otherwise
set it to 256 bytes for code object version 5 (and beyond).
Reviewers: arsenm
Differential Revision: https://reviews.llvm.org/D123262
Use a subtarget feature instead of a command line argument to reduce
global state.
We want to enable flat scratch for graphics in some cases and this
doesn't work well with command line options.
Differential Revision: https://reviews.llvm.org/D119425
This was approximating the entry point logic for flat_scratch_init,
which is not really the point. We need to account for whether we need
to reserve the SGPR pair used for flat_scratch, not whether we needed
the initialization kernel argument. If this was an arbitrary function,
we would end up over-reporting the number of potentially free
SGPRs. The logic for architected flat scratch also only applies to the
initialization in the kernel, not the reserved registers at the end.
Avoids compile failures in a future patch from allocating more SGPRs
than the subtarget supports.
Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls
v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have the
same zeroing behaviour as on GFX8.
Differential Revision: https://reviews.llvm.org/D115729
Previously we would require adding an attribute to kernels to enable
the inputs passed in the kernarg segment, accessed by
llvm.amdgcn.implicitarg.ptr. This violates the principle of being
correct by default. Some OpenMP testcases were broken recently since
it wasn't correctly setting this attribute, and no known frontends are
setting this to anything other than the maximum.
Most of the test changes are from load widening of argument loads
since there now more implied dereferenceable bytes.
If a kernel had no formal arguments but did have the implicit
arguments, we were reporting a required kernarg alignment of 4. For
some reason we require an 8-byte alignment for this, even though
there's no real advantage and I don't see where this is documented in
the ABI.
The code object header code also claims the minimum alignment is 16,
which is what I thought you always got at runtime anyway so I don't
know why this matters.
As described in the comment, the way we change vcc to vcc_lo in these
operands confuses addPhysRegDataDeps into treating them as implicit
pseudo operands. Fix this by setting the correct latency from the
SchedModel after addPhysRegDataDeps wrongly set it to 0.
Differential Revision: https://reviews.llvm.org/D112317
With architected flat scratch it becomes readonly. We must always
reserve SGPR pair for it even if we do not use scratch at all since
an attempt to write to SGPRs mapped to FLAT_SCRATCH results in
memory violation.
This is not needed since GFX10 with architected flat scratch though
since special SGPRs are not carving space from normal SGPRs.
Differential Revision: https://reviews.llvm.org/D110376
Add an overload to pass the flat workgroup range in separately. This
will allow the attributor to use the assumed value for
amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu.
Use GCNHazardRecognizer in postra sched.
Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D109536
Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()`
and `R600TargetMachine::getSubtargetImpl()` had different return value type
than base class.
4. Minor forward declarations cleanup.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108596
Add maximum NSA size limit as an ISA feature.
Use this to reduce NSA usage on GFX10.1 to avoid stability issues
with 4 and 5 dwords NSA instructions.
Maintain use of longer NSA instructions on GFX10.3.
Note: this also contains some minor fixes for GlobalISel which
did not work correctly with non-NSA form instructions on GFX10.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103348
Added support to check if architecture supports s_mulhi which is used as part of
the decision whether or not to use valu 24 bit mul (if the mulhi gets
transformed to a valu op anyway, then may as well use it).
This is an extension of the work in D97063
Differential Revision: https://reviews.llvm.org/D103321
Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14
This is a followup patch on D103636 where
it seemed checking on amdgpu-calls and
amdgpu-stack-objects is unnecessary. Removing these
checks didn't regress any tests functionally.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D104513
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
These used to consistently be zeroed pre-gfx9, but gfx9 made the
situation complicated since now some still do and some don't. This
also manages to pick up a few cases that the pattern fails to optimize
away.
We handle some cases with instruction patterns, but some get
through. In particular this improves the integer cases.
This patch computes max SGPRs and VGPRs used by module
in presence of indirect calls and makes that
as register requirement for functions/kernels
which makes indirect calls.
This patch also refactors code AMDGPUSubTarget.cpp
which add a "base" variants of getMaxNumSGPRs which
is used by MachineFunction and new Function version.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103636
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.
This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.
Differential Revision: https://reviews.llvm.org/D101292
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.
Differential Revision: https://reviews.llvm.org/D98717
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only the mir.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D94341
Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
Currently, the compiler crashes in instruction selection of global
load/stores in gfx600 due to the lack of FLAT instructions. This patch
fix the crash by selecting MUBUF instructions for global load/stores
in gfx600.
Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com>
Reviewed by: t-tye
Differential revision: https://reviews.llvm.org/D92483