Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.
However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.
This patch fixes that.
Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.
AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.
Differential Revision: https://reviews.llvm.org/D32021
llvm-svn: 300864
This is inserted directly in the text section. The relocation
for the function ends up resolving to the beginning of the
amd_kernel_code_t header rather than the actual function
entry point.
Also skip some of the comments for initialization
that only makes sense for kernels.
llvm-svn: 300736
The most common case for a branch condition is
a single use compare. Directly invert the branch
predicate rather than adding a lot of xor i1 true
which the DAG will have to fold later.
This produces nicer to read structurizer output.
This produces some random changes in codegen
due to the DAG swapping branch conditions itself,
and then does a poor job of dealing with those
inverts.
llvm-svn: 300732
Avoid looping through program to determine register counts.
This avoids needing to look at regmask operands.
Also fixes some counting errors with flat_scr when there
are no stack objects.
llvm-svn: 300482
If a kernel's pointer argument is known to be readonly
set access qualifier accordingly. This allows RT not to
flush caches before dispatches.
Differential Revision: https://reviews.llvm.org/D32091
llvm-svn: 300362
In many cases ds operations can be combined even if offsets do not
fit into 8 bit encoding. What it takes is to adjust base address.
Differential Revision: https://reviews.llvm.org/D31993
llvm-svn: 300227
If workgroup size is known inform llvm about range returned by local
id and local size queries.
Differential Revision: https://reviews.llvm.org/D31804
llvm-svn: 300102
Increase threshold to unroll a loop which contains an "if" statement
whose condition defined by a PHI belonging to the loop. This may help
to eliminate if region and potentially even PHI itself, saving on
both divergence and registers used for the PHI.
Add a small bonus for each of such "if" statements.
Differential Revision: https://reviews.llvm.org/D31693
llvm-svn: 299779
Summary:
Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled.
With this change order of passes will not change.
Reviewers: arsenm, vpykhtin, rampitec
Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D31705
llvm-svn: 299757
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.
Differential Revision: https://reviews.llvm.org/D30596
llvm-svn: 299723
This is possible in ways that are not compiler bugs,
so stop asserting on them.
This emits an extra error when emitting objects when it
can't encode the new pseudo, but I'm not sure that matters.
llvm-svn: 299712
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guarantied
to come to the same point at the same time.
Differential Revision: https://reviews.llvm.org/D31731
llvm-svn: 299659
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.
Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.
llvm-svn: 299246
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.
llvm-svn: 299202
If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.
Differential Revision: https://reviews.llvm.org/D31489
llvm-svn: 299108
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.
Differential Revision: https://reviews.llvm.org/D31412
llvm-svn: 298948
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.
Differential Revision: https://reviews.llvm.org/D31429
llvm-svn: 298935
Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.
amdgiz is for non-OpenCL environment where generic address space is 0.
amdgizcl is for OpenCL environment where generic address space is 0.
Differential Revision: https://reviews.llvm.org/D31211
llvm-svn: 298758
If the branch condition for a loop was a phi which itself
was fed from a phi from a loop, it isn't safe to try
to delete the phi until after the loop is handled.
llvm-svn: 298737
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.
llvm-svn: 298729