Commit Graph

9 Commits

Author SHA1 Message Date
Matthias Braun 6ad3d05b68 MachineScheduler: Fully compare top/bottom candidates
In bidirectional scheduling this gives more stable results than just
comparing the "reason" fields of the top/bottom node because the reason
field may be higher depending on what other nodes are in the queue.

Differential Revision: http://reviews.llvm.org/D19401

llvm-svn: 273755
2016-06-25 00:23:00 +00:00
Matthias Braun 1e374a7aa6 AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.

Differential Revision: http://reviews.llvm.org/D21540

llvm-svn: 273751
2016-06-24 23:52:11 +00:00
Matt Arsenault f42c69206d AMDGPU: Run pointer optimization passes
llvm-svn: 272736
2016-06-15 00:11:01 +00:00
Changpeng Fang 71369b3a39 AMDGPU/SI: Enable load-store-opt by default.
Summary: Enable load-store-opt by default, and update LIT tests.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D20694

llvm-svn: 270894
2016-05-26 19:35:29 +00:00
Tom Stellard 1d5e6d4bdc AMDGPU/SI: Improve MachineSchedModel definition
This patch contains a few improvements to the model, including:

- Using a single resource with a defined buffers size for each memory unit.
- Setting the IssueWidth correctly.
- Fixing latency values for memory instructions.

shader-db stats:

16429 shaders in 3231 tests
Totals:
SGPRS: 318232 -> 312328 (-1.86 %)
VGPRS: 208996 -> 209346 (0.17 %)
Code Size: 7147044 -> 7166440 (0.27 %) bytes
LDS: 83 -> 83 (0.00 %) blocks
Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave
Max Waves: 49182 -> 49243 (0.12 %)
Wait states: 0 -> 0 (0.00 %)A

Differential Revision: http://reviews.llvm.org/D18453

llvm-svn: 264877
2016-03-30 16:35:13 +00:00
Matt Arsenault 26f8f3db39 AMDGPU: Rework how private buffer passed for HSA
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.

If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.

This also only selectively enables all of the input registers
which are really required instead of always enabling them.

llvm-svn: 254331
2015-11-30 21:16:03 +00:00
Matt Arsenault e4d0c142e8 AMDGPU: Add sdst operand to VOP2b instructions
The VOP3 encoding of these allows any SGPR pair for the i1
output, but this was forced before to always use vcc.
This doesn't yet try to use this, but does add the operand
to the definitions so the main change is adding vcc to the
output of the VOP2 encoding.

llvm-svn: 246358
2015-08-29 07:16:50 +00:00
Eric Christopher 572e03a396 Fix "the the" in comments.
llvm-svn: 240112
2015-06-19 01:53:21 +00:00
Tom Stellard 45bb48ea19 R600 -> AMDGPU rename
llvm-svn: 239657
2015-06-13 03:28:10 +00:00