Commit Graph

7 Commits

Author SHA1 Message Date
Sebastian Neubauer 96e1fcb1e0 [AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and
v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU
instruction.

Differential Revision: https://reviews.llvm.org/D103322
2021-06-07 16:09:48 +02:00
Austin Kerbow f5199d7ae0 [AMDGPU] Revise handling of preexisting waitcnt
Preexisting waitcnt may not update the scoreboard if the instruction
being examined needed to wait on fewer counters than what was encoded in
the old waitcnt instruction. Fixing this results in the elimination of
some redudnat waitcnt.

These changes also enable combining consecutive waitcnt into a single
S_WAITCNT or S_WAITCNT_VSCNT instruction.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100281
2021-05-05 17:21:33 -07:00
Jessica Paquette 145549ff89 [GlobalISel] Combine (x + 0) -> x, G_PTR_ADD edition
Add it to right_identity_zero.

Differential Revision: https://reviews.llvm.org/D96621
2021-02-12 12:09:48 -08:00
Austin Kerbow 2291bd137d [AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.

The possible settings are:

- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On

GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".

The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.

Differential Revision: https://reviews.llvm.org/D85882
2021-01-26 11:25:51 -08:00
Tony 2f499b9aff [AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at
system scope for the address spaces accessed. This will ensure all
relevant caches will be bypassed.

A volatile atomic is not changed and still only bypasses caches upto
the level specified by the SyncScope operand.

Differential Revision: https://reviews.llvm.org/D94214
2021-01-09 00:52:33 +00:00
Sebastian Neubauer 221fdedc69 [AMDGPU][GlobalISel] Fold flat vgpr + constant addresses
Use getPtrBaseWithConstantOffset in selectFlatOffsetImpl to fold more
vgpr+constant addresses.

Differential Revision: https://reviews.llvm.org/D93692
2020-12-23 10:40:30 +01:00
Stanislav Mekhanoshin d15119a02d [AMDGPU][GlobalISel] GlobalISel for flat scratch
It does not seem to fold offsets but this is not specific
to the flat scratch as getPtrBaseWithConstantOffset() does
not return the split for these tests unlike its SDag
counterpart.

Differential Revision: https://reviews.llvm.org/D93670
2020-12-22 16:33:06 -08:00