Commit Graph

980 Commits

Author SHA1 Message Date
Yaxun Liu 5d977f8ed4 CodeGen: Let frame index value type match alloca addr space
Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.

However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.

This patch fixes that.

Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.

AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.

Differential Revision: https://reviews.llvm.org/D32021

llvm-svn: 300864
2017-04-20 18:15:34 +00:00
Matt Arsenault 4a48623e4f AMDGPU: Custom lower illegal small select types
Promote them to i32 vectors to avoid unpacking and re-packing
the vectors.

llvm-svn: 300754
2017-04-19 20:53:07 +00:00
Matt Arsenault 021a218dd2 AMDGPU: Don't emit amd_kernel_code_t for callable functions
This is inserted directly in the text section. The relocation
for the function ends up resolving to the beginning of the
amd_kernel_code_t header rather than the actual function
entry point.

Also skip some of the comments for initialization
that only makes sense for kernels.

llvm-svn: 300736
2017-04-19 19:38:10 +00:00
Matt Arsenault d3406bc45c StructurizeCFG: Directly invert cmp instructions
The most common case for a branch condition is
a single use compare. Directly invert the branch
predicate rather than adding a lot of xor i1 true
which the DAG will have to fold later.

This produces nicer to read structurizer output.

This produces some random changes in codegen
due to the DAG swapping branch conditions itself,
and then does a poor job of dealing with those
inverts.

llvm-svn: 300732
2017-04-19 18:29:07 +00:00
Matt Arsenault 6cb7b8a42f AMDGPU: Don't align callable functions to 256
llvm-svn: 300720
2017-04-19 17:42:39 +00:00
Matt Arsenault 4c1ecded63 AMDGPU: Change DivergenceAnalysis for function arguments
Stop assuming all functions are kernels.

llvm-svn: 300719
2017-04-19 17:42:34 +00:00
Matt Arsenault a3566f2149 AMDGPU: Use MachineRegisterInfo to find max used register
Avoid looping through program to determine register counts.
This avoids needing to look at regmask operands.

Also fixes some counting errors with flat_scr when there
are no stack objects.

llvm-svn: 300482
2017-04-17 19:48:30 +00:00
Stanislav Mekhanoshin eff0bc7839 [AMDGPU] set read_only access qualifier for pointers
If a kernel's pointer argument is known to be readonly
set access qualifier accordingly. This allows RT not to
flush caches before dispatches.

Differential Revision: https://reviews.llvm.org/D32091

llvm-svn: 300362
2017-04-14 19:11:40 +00:00
Stanislav Mekhanoshin 86b0a5465b [AMDGPU] added SIInstrInfo::getAddNoCarry() helper
Addressed rest of post submit comments from D31993.

Differential Revision: https://reviews.llvm.org/D32057

llvm-svn: 300288
2017-04-14 00:33:44 +00:00
Konstantin Zhuravlyov d24aeb20fc AMDGPU/GFX9: Do not use v_pack_b32_f16 when packing
Differential Revision: https://reviews.llvm.org/D31819

llvm-svn: 300275
2017-04-13 23:17:00 +00:00
Stanislav Mekhanoshin d026f79bd3 [AMDGPU] Combine DS operations with offsets bigger than byte
In many cases ds operations can be combined even if offsets do not
fit into 8 bit encoding. What it takes is to adjust base address.

Differential Revision: https://reviews.llvm.org/D31993

llvm-svn: 300227
2017-04-13 17:53:07 +00:00
Wei Ding 74da350b85 AMDGPU : Fix common dominator of two incoming blocks terminates with uniform branch issue.
Differential Revision: http://reviews.llvm.org/D31350

llvm-svn: 300142
2017-04-12 23:51:47 +00:00
Matt Arsenault 0d0d6c2f25 AMDGPU: Fix invalid copies when copying i1 to phys reg
Insert a VReg_1 virtual register so the i1 workaround pass
can handle it.

llvm-svn: 300113
2017-04-12 21:58:23 +00:00
Stanislav Mekhanoshin c90347d760 [AMDGPU] Generate range metadata for workitem id
If workgroup size is known inform llvm about range returned by local
id  and local size queries.

Differential Revision: https://reviews.llvm.org/D31804

llvm-svn: 300102
2017-04-12 20:48:56 +00:00
Sam Kolton aff8341da2 [AMDGPU] SDWA: make pass global
Summary: Remove checks for basic blocks.

Reviewers: vpykhtin, rampitec, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31935

llvm-svn: 300040
2017-04-12 09:36:05 +00:00
Kannan Narayanan acb089e12a [AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.
Based on comments in https://reviews.llvm.org/D31161.

llvm-svn: 300023
2017-04-12 03:25:12 +00:00
Matt Arsenault 9ac40026dd AMDGPU: Insert wait at start of callee functions
llvm-svn: 300000
2017-04-11 22:29:31 +00:00
Matt Arsenault efa9f4b210 AMDGPU: Refactor SIMachineFunctionInfo slightly
Prepare for handling non-entry functions.

llvm-svn: 299999
2017-04-11 22:29:28 +00:00
Matt Arsenault e622dc3803 AMDGPU: Refactor argument lowering
Split into smaller functions and prepare for handling
non-entry functions.

llvm-svn: 299998
2017-04-11 22:29:24 +00:00
Matt Arsenault fe78ffba92 AMDGPU: Fix folding reg_sequence into copy to phys reg
This was producing an illegal reg_sequence defining
a physical register with virtual register inputs.

llvm-svn: 299997
2017-04-11 22:29:19 +00:00
Yaxun Liu e95df719e1 [AMDGPU] Add A5 to data layout for amdgiz environment
Differential Revision: https://reviews.llvm.org/D31589

llvm-svn: 299964
2017-04-11 17:18:13 +00:00
Matt Arsenault f10061ec70 Add address space mangling to lifetime intrinsics
In preparation for allowing allocas to have non-0 addrspace.

llvm-svn: 299876
2017-04-10 20:18:21 +00:00
Matt Arsenault dd8fd9dcfd AMDGPU: Actually write nops for writeNopData
Before this was just writing 0s, which ends up looking like a
v_cndmask_b32 v0, s0, v0, vcc. Write out an encoded s_nop instead.

llvm-svn: 299816
2017-04-08 21:28:38 +00:00
Stanislav Mekhanoshin 478b81982f [AMDGPU] Unroll more to eliminate phis and conditions
Increase threshold to unroll a loop which contains an "if" statement
whose condition defined by a PHI belonging to the loop. This may help
to eliminate if region and potentially even PHI itself, saving on
both divergence and registers used for the PHI.

Add a small bonus for each of such "if" statements.

Differential Revision: https://reviews.llvm.org/D31693

llvm-svn: 299779
2017-04-07 16:26:28 +00:00
Sam Kolton 6e79529db4 [AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passes
Summary:
Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled.
With this change order of passes will not change.

Reviewers: arsenm, vpykhtin, rampitec

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31705

llvm-svn: 299757
2017-04-07 10:53:12 +00:00
Konstantin Zhuravlyov 4b3847e865 AMDGPU/GFX9: Fix shared and private aperture queries
Differential Revision: https://reviews.llvm.org/D31786

llvm-svn: 299727
2017-04-06 23:02:33 +00:00
Eli Friedman 5fba1e53f2 Turn on -addr-sink-using-gep by default.
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.

Differential Revision: https://reviews.llvm.org/D30596

llvm-svn: 299723
2017-04-06 22:42:18 +00:00
Matt Arsenault 21a438255d AMDGPU: Diagnose illegal SGPR to VGPR copies
This is possible in ways that are not compiler bugs,
so stop asserting on them.

This emits an extra error when emitting objects when it
can't encode the new pseudo, but I'm not sure that matters.

llvm-svn: 299712
2017-04-06 21:09:53 +00:00
Matt Arsenault 5cf4271883 AMDGPU: Replace fp16SrcZerosHighBits with a whitelist
FCOPYSIGN is lowered to bit operations which don't clear the high
bits.

llvm-svn: 299708
2017-04-06 20:58:30 +00:00
Stanislav Mekhanoshin ea57c38521 [AMDGPU] Eliminate barrier if workgroup size is not greater than wavefront size
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guarantied
to come to the same point at the same time.

Differential Revision: https://reviews.llvm.org/D31731

llvm-svn: 299659
2017-04-06 16:48:30 +00:00
Sam Kolton 9fa169601f [AMDGPU] Resubmit SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

llvm-svn: 299654
2017-04-06 15:03:28 +00:00
Ivan Krasin d4f70c70b9 Revert r299536. [AMDGPU] SDWA peephole: enable by default.
Reason: breaks multiple bots:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173

Original Review URL: https://reviews.llvm.org/D31671

llvm-svn: 299583
2017-04-05 19:58:12 +00:00
Sam Kolton 34e29784fb [AMDGPU] SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

llvm-svn: 299536
2017-04-05 12:00:45 +00:00
Matt Arsenault 3333968771 Verifier: Check some amdgpu calling convention restrictions
llvm-svn: 299457
2017-04-04 18:43:11 +00:00
Matt Arsenault 3e90f84806 AMDGPU: Remove legacy export intrinsic
llvm-svn: 299444
2017-04-04 16:34:39 +00:00
Matt Arsenault 236da200f1 AMDGPU: Remove legacy image intrinsics
llvm-svn: 299443
2017-04-04 16:34:35 +00:00
Matt Arsenault b600e138cc AMDGPU: Remove llvm.SI.vs.load.input
llvm-svn: 299391
2017-04-03 21:45:13 +00:00
Matt Arsenault c82768290d DAG: Fix missing legalization for any_extend_vector_inreg operands
llvm-svn: 299389
2017-04-03 21:28:13 +00:00
Matt Arsenault 754dd3eaef AMDGPU: Remove legacy bfe intrinsics
llvm-svn: 299372
2017-04-03 18:08:08 +00:00
Matt Arsenault 8edfaee7be AMDGPU: Remove unnecessary ands when f16 is legal
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.

Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.

llvm-svn: 299246
2017-03-31 19:53:03 +00:00
Jan Vesely 3c99441ef4 AMDGPU/R600: Fix amdgpu alias analysis pass.
R600 uses higher AS number to access kernel parameters

Fixes: r298846
Differential Revision: https://reviews.llvm.org/D31520

llvm-svn: 299245
2017-03-31 19:26:23 +00:00
Sam Kolton 27e0f8bc72 [AMDGPU] SDWA Peephole: improve search for immediates in SDWA patterns
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.

llvm-svn: 299202
2017-03-31 11:42:43 +00:00
Matt Arsenault 79f837c254 AMDGPU: Add all atomicrmw fields to atomic.inc/dec
Add scope, order, isVolatile

llvm-svn: 299122
2017-03-30 22:21:40 +00:00
Stanislav Mekhanoshin 89653dfd2a [AMDGPU] Add GlobalOpt parameter to Always Inliner pass
If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.

Differential Revision: https://reviews.llvm.org/D31489

llvm-svn: 299108
2017-03-30 20:16:02 +00:00
Stanislav Mekhanoshin baf31ac7c8 [AMDGPU] Boost unroll threshold for loops reading local memory
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.

Differential Revision: https://reviews.llvm.org/D31412

llvm-svn: 298948
2017-03-28 22:13:51 +00:00
Stanislav Mekhanoshin 9053f22eeb [AMDGPU] Split -amdgpu-early-inline-all option
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.

Differential Revision: https://reviews.llvm.org/D31429

llvm-svn: 298935
2017-03-28 18:23:24 +00:00
Yaxun Liu 14834c3e3d [AMDGPU] Switch data layout by triple environment amdgiz
Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.

amdgiz is for non-OpenCL environment where generic address space is 0.

amdgizcl is for OpenCL environment where generic address space is 0.

Differential Revision: https://reviews.llvm.org/D31211

llvm-svn: 298758
2017-03-25 02:05:44 +00:00
Matt Arsenault 0607a4427b AMDGPU: Fix annotating loops with nested loop conditions
If the branch condition for a loop was a phi which itself
was fed from a phi from a loop, it isn't safe to try
to delete the phi until after the loop is handled.

llvm-svn: 298737
2017-03-24 20:57:10 +00:00
Matt Arsenault b5d23271e2 AMDGPU: Implement f16 fround
llvm-svn: 298730
2017-03-24 20:04:18 +00:00
Matt Arsenault b8f8dbc227 AMDGPU: Unify divergent function exits.
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.

llvm-svn: 298729
2017-03-24 19:52:05 +00:00