Summary:
The new test case led to incorrect code.
Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca
Reviewers: arsenm, david-salinas
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63871
llvm-svn: 364566
Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.
With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.
AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.
Mips doesn't support aggregates yet, so it's also NFC.
x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.
Differential Revision: https://reviews.llvm.org/D63549
llvm-svn: 364510
Summary:
The +DumpCode attribute is a horrible hack in AMDGPU to embed the
disassembly of the generated code into the elf file. It is used by LLPC
to implement an extension that allows the application to read back the
disassembly of the code.
It tries to print an entry label at the start of every function, but
that didn't work for the first function in the module because
DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart
which is too late.
Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee
Reviewers: arsenm, tpr, kzhuravl
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63712
llvm-svn: 364508
The LivePhysRegs calculated in order to find a scratch register in the
epilogue code wrongly uses 'LiveIns'. Instead, it should use the
'Liveout' sets. For the liveness, also considering the operands of
the terminator (return) instruction which is the insertion point for
the scratch-exec-copy instruction.
Patch by Christudasan Devadasan
llvm-svn: 364470
Summary:
This fixes a hardware bug that makes a branch offset of 0x3f unsafe.
This replaces the 32 bit branch with offset 0x3f to a 64 bit
instruction that includes the same 32 bit branch and the encoding
for a s_nop 0 to follow. The relaxer than modifies the offsets
accordingly.
Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63494
llvm-svn: 364451
Original patch https://reviews.llvm.org/D63659 from
Steven Perron <stevenperron@google.com>
The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in
the successors of blocks that is splits. This is fixed by calling
BasicBlock::splitBasicBlock to split the block instead of doing it
manually. This does extra work because a new conditional branch is
created in BB which is immediately replaced, but I think the simplicity
is worth it. It also helps make the code more future proof in case other
things need to be updated.
llvm-svn: 364342
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.
Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.
Some notes:
- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
to a constant at compile times, which means some tests can no longer
be applied.
The current "solution" is a terrible hack, but the intrinsic isn't
used by Mesa, so we can keep it for now.
- We no longer know the full LDS size per kernel at compile time, which
means that we can no longer generate a relevant error message at
compile time. It would be possible to add a check for the size of
individual variables, but ultimately the linker will have to perform
the final check.
Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275
Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin
Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61494
llvm-svn: 364297
Summary:
The directive defines a symbol as an group/local memory (LDS) symbol.
LDS symbols behave similar to common symbols for the purposes of ELF,
using the processor-specific SHN_AMDGPU_LDS as section index.
It is the linker and/or runtime loader's job to "instantiate" LDS symbols
and resolve relocations that reference them.
It is not possible to initialize LDS memory (not even zero-initialize
as for .bss).
We want to be able to link together objects -- starting with relocatable
objects, but possible expanding to shared objects in the future -- that
access LDS memory in a flexible way.
LDS memory is in an address space that is entirely separate from the
address space that contains the program image (code and normal data),
so having program segments for it doesn't really make sense.
Furthermore, we want to be able to compile multiple kernels in a
compilation unit which have disjoint use of LDS memory. In that case,
we may want to place LDS symbols differently for different kernels
to save memory (LDS memory is very limited and physically private to
each kernel invocation), so we can't simply place LDS symbols in a
.lds section.
Hence this solution where LDS symbols always stay undefined.
Change-Id: I08cbc37a7c0c32f53f7b6123aa0afc91dbc1748f
Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61493
llvm-svn: 364296
Scalar extends to s64 can use S_BFE_{I64|U64}, but vector extends need
to extend to the 32-bit half, and then to 64.
I'm not sure what the line should be between what RegBankSelect
handles, and what instruction select does, but for now I'm erring on
the side of RegBankSelect for future post-RBS combines.
llvm-svn: 364212
Summary:
The LLVM disassembler assumes that the unused src0 operand of v_nop is
zero. Other tools can put another value in that field, which is still
valid. This commit fixes the LLVM disassembler to recognize such an
encoding as v_nop, in the same way as we already do for s_getpc.
Differential Revision: https://reviews.llvm.org/D63724
Change-Id: Iaf0363eae26ff92fc4ebc716216476adbff37a6f
llvm-svn: 364208
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
This needs different handling if the source is known to be a valid
condition or not. Handle turning it into shifts or a select during
regbankselect.
llvm-svn: 364186
This matters for byval uses outside of the entry block, which appear
as copies.
Previously, the only folding done was during selection, which could
not see the underlying frame index. For any uses outside the entry
block, the frame index was materialized in the entry block relative to
the global scratch wave offset.
This may produce worse code in cases where the offset ends up not
fitting in the MUBUF offset field. A better heuristic would be helpfu
for extreme frames.
llvm-svn: 364185
Every called function could possibly need this to calculate the
absolute address of stack objectst, and this avoids inserting a copy
around every call site in the kernel. It's also somewhat cleaner to
keep this in a callee saved SGPR.
llvm-svn: 363990
The attribute can specify elimination for leaf or non-leaf, so it
should always be considered. I copied this bug from AArch64, which
probably should also be fixed.
llvm-svn: 363949
Introducing VCC defs during SIFixSGPRCopies is generally
problematic. Avoid it by starting with the VOP3 form with the general
condition register. This is the easiest to fix instance, but doesn't
solve any specific problems I'm looking at.
llvm-svn: 363904
This is incomplete, and ideally these would all be removed, but it's
better to localize them to the subtarget first with comments about
what they're for.
llvm-svn: 363902
The def instruction for the vreg may not match, because it may be
folding through a reg_sequence. The assert was overly conservative and
not necessary. It's not actually important if DefMI really defined the
register, because the fold that will be done cares about the def of
the value that will be folded.
For some reason copies aren't making it through the reg_sequence,
although they should.
llvm-svn: 363876
This reapplies r363678, using the correct chain for the CopyToReg for
v0. glueCopyToM0 counterintuitively changes the operands of the
original node.
llvm-svn: 363870
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
llvm-svn: 363757
Invert the name and return value to better reflect the imprecise
nature.
Force passing in the DefMI, since it's known in the 2 users and could
possibly fail for an arbitrary vreg.
Allow specifying a specific user instruction. Scan through use
instructions, instead of use operands. Add scan thresholds instead of
searching infinitely.
Stop using a set to track seen uses. I didn't understand this usage,
or why it would not check the last use. I don't think the use list has
any particular order.
llvm-svn: 363675
This is part of the approved D63204 pending parent revision.
This small change is in fact a part of the VOP2b legalization which
does not technically belong to wave32 support, so extracted
separately.
llvm-svn: 363625
AMDGPUPropagateAttributes will not work on function bitcatsts,
so move AMDGPUFixFunctionBitcasts before it.
Differential Revision: https://reviews.llvm.org/D63455
llvm-svn: 363614
Summary:
The purpose of the padding is to guard against stale code being
fetched into the instruction cache by the lowest level prefetching.
We're generating relocatable ELF here, and so the padding should
arguably be added by the linker. This is in fact what Mesa does.
This also fixes multi-part shaders for Mesa.
Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82
Reviewers: arsenm, rampitec, t-tye
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63427
llvm-svn: 363602
The pass works in two modes:
Mode 1: Just set attributes starting from kernels. This can work at
the very beginning of opt and llc pipeline, but cannot clone functions
because it must be a function pass.
Mode 2: Actually clone functions for new attributes. This can only work
after all function passes in the opt pipeline because it has to be a
module pass.
Differential Revision: https://reviews.llvm.org/D63208
llvm-svn: 363586
I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.
llvm-svn: 363579
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.
This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.
Reviewers: arsenm, thegameg, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60137
llvm-svn: 363533
Summary:
Instead of encoding a high-word of 0 using a fake TargetGlobalAddress,
just use a literal target constant. This simplifies some subsequent changes.
The generated assembly is now more explicit about the kind of relocation
that is to be used.
Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61491
llvm-svn: 363516
Currently you get extra waits, because waits are inserted for the
register dependencies of the call, and the function prolog waits on
everything.
Currently waits are still inserted on returns. It may make sense to
not do this, and wait in the caller instead.
llvm-svn: 363465
The way SelectionDAG treats memory operands is very frustrating, and
by default drops them unless a property is set on the pattern. There
is no pattern for manually selected instructions, so this requires
manually setting them.
llvm-svn: 363455
Earlier commit has added AMDGPUOperand::isBoolReg(). Turns out
gcc issues warning about unused function since D63204 is not
yet submitted.
Added NFC part of D63204 to have a use of that function and
mute the warning.
llvm-svn: 363416
Avoid producing illegal register bank copies for reg_sequence and
phi. The default implementation assumes it is possible to pick any
operand's bank and use that for the result, introducing a copy for
operands with a different bank. This does not check for illegal
copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR
operand requires the result to be a VGPR.
The changes in getInstrMappingImpl aren't strictly necessary, since
AMDGPU now just bypasses this for reg_sequence/phi. This could be
replaced with an assert in case other targets run into this. It is
currently responsible for producing the error for unsatisfiable
copies, but this will be better served with a verifier check.
For phis, for now assume any undetermined operands must be
VGPRs. Eventually, this needs to be able to defer mapping these
operations. This also does not yet have a way to check for whether the
block is in a divergent region.
llvm-svn: 363410
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.
llvm-svn: 363048
This reverts r362990 (git commit 374571301d)
This was causing linker warnings on Darwin:
ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)'
from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol
'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&),
std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)'
means the weak symbol cannot be overridden at runtime. This was likely caused by different translation
units being compiled with different visibility settings.
llvm-svn: 363028
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
llvm-svn: 362990
SIInsertSkips really doesn't understand the control flow, and makes
very stupid assumptions about the block layout. This was able to get
away with not skipping return blocks, since usually after
structurization there is only one placed at the end of the
function. Tail duplication can break this assumption.
llvm-svn: 362754
"Divergence driven ISel. Assign register class for cross block values
according to the divergence."
that discovered the design flaw leading to several issues that
required to be solved before.
This change reverts AMDGPU specific changes and keeps common part
unaffected.
llvm-svn: 362749
This forced the caller to be aware of this, which is an ugly ABI
feature.
Partially reverts r295877. The original reasons for doing this are
mostly fixed. Alloca is now in a non-0 address space, so it should be
OK to have 0 as a valid pointer. Since we treat the absolute address
as the pointer value, this part only really needed to apply to
kernels.
Since r357093, we avoid the need to increment/decrement the offset
register in more cases, and since r354816 the scavenger can fail
without spilling, so it's less critical that we try to avoid an offset
that fits in the MUBUF offset.
Restrict to callable functions for now to split this into 2 steps to
limit thte number of test updates and in case anything breaks.
llvm-svn: 362665
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.
Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.
The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.
Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.
Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.
llvm-svn: 362661
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.
Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.
llvm-svn: 362507
This matches APInt's versions of these functions, and there is no need for these to be size_t.
(as well as __builtin_clzll())
Differential Revision: https://reviews.llvm.org/D60823
llvm-svn: 362503
This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.
TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.
Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.
llvm-svn: 362447
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.
Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.
llvm-svn: 362436
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.
Differential Revision: https://reviews.llvm.org/D62707
llvm-svn: 362239
With LLPC, previous investigation has suggested that si-scheduler
interacts badly with SiFormMemoryClauses on an XNACK target in some
games.
That needs further investigation in the future. In the meantime, this
commit adds a target-specific attribute to allow us to disable
SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC
to use.
Differential Revision: https://reviews.llvm.org/D62572
Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92
llvm-svn: 362127
Summary:
- There's a regression due to the cross-block RC assignment. Use the
proper way to derive the output register RC in inline asm.
Reviewers: rampitec, alex-t
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62537
llvm-svn: 361868
If the only VGPRs used for SGPR spilling were not CSRs, this was
enabling all laness and immediately restoring exec. This is the usual
situation in leaf functions.
llvm-svn: 361848
Summary:
- Don't treat the use of a scalar register as `vreg_1` an VGPR usage.
Otherwise, that promotes that scalar register into vector one, which
breaks the assumption that scalar register holds the lane mask.
- The issue is triggered in a complicated case, where if the uses of
that (lane mask) scalar register is legalized firstly before its
definition, e.g., due to the mismatch block placement and its
topological order or loop. In that cases, the legalization of PHI
introduces the use of that scalar register as `vreg_1`.
Reviewers: rampitec, nhaehnle, arsenm, alex-t
Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62492
llvm-svn: 361847
1a8b2ea611cf4ca7cb09562e0238cfefa27c05b5 Divergence driven ISel. Assign register class for cross block values according to the divergence.
llvm-svn: 361770
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
This commit was reverted because of the build failure.
The reason was mlformed patch.
Build failure fixed.
llvm-svn: 361741
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().
(as well as __builtin_clzll())
llvm-svn: 361724
This was skipping GetUnderlyingObject for nonprivate addresses, but an
alloca could also be found through an addrspacecast if it's flat.
llvm-svn: 361649
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
llvm-svn: 361644
We were assuming a much larger possible per-wave visible stack
allocation than is possible:
faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)
Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.
Remove the corresponding subtarget feature and option that made
this configurable.
llvm-svn: 361541
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.
For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.
llvm-svn: 361405
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.
Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.
llvm-svn: 361202
A std::array is implemented as a template with an array
inside a struct. Older versions of clang, like 3.6,
require an extra set of curly braces around std::array
initializations to avoid warnings.
The C++ language was changed regarding this by CWG 1270.
So more modern tool chains does not complaing even if
leaving out one level of braces.
llvm-svn: 361171
Summary:
Avoid introducing hazard mitigation when lgkmcnt is reduced to 0.
Clarify code comments to explain assumptions made for this hazard
mitigation. Expand and correct test cases to cover variants of
s_waitcnt.
Reviewers: nhaehnle, rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62058
llvm-svn: 361124
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.
This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.
llvm-svn: 361081
Summary:
In order to combine memory operations efficiently, the load/store
optimizer might move some instructions around. It's usually safe
to move instructions down past the merged instruction because the
pass checks if memory operations can be re-ordered.
Though, the current logic doesn't handle Write-after-Write hazards.
This fixes a reflection issue with Monster Hunter World and DXVK.
v2: - rebased on top of master
- clean up the test case
- handle WaW hazards correctly
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130
Original patch by Samuel Pitoiset.
Reviewers: tpr, arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D61313
llvm-svn: 361008
This is the conservatively correct default. It is always safe to
assume xnack is enabled, but not the converse.
Introduce a feature to blacklist targets where xnack can never be
meaningfully enabled. I'm not sure the targets this is applied to is
100% correct.
llvm-svn: 360903
Summary:
SGPR in CC can be either hw initialized or set by other chained shaders
and so this increases the SGPR count availalbe to CC to 105.
Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61261
llvm-svn: 360778
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360713
The +DumpCode attribute is a horrible hack in AMDGPU to embed the
disassembly of the generated code into the elf file. It is used by LLPC
to implement an extension that allows the application to read back the
disassembly of the code. Longer term, we should re-implement that by
using the LLVM disassembler from the Vulkan driver.
Recent LLVM changes broke +DumpCode. With -filetype=asm it crashed, and
with -filetype=obj I think it did not include any instructions, only the
labels. Fixed with this commit: now it has no effect with -filetype=asm,
and works as intended with -filetype=obj.
Differential Revision: https://reviews.llvm.org/D60682
Change-Id: I6436d86fe2ea220d74a643a85e64753747c9366b
llvm-svn: 360688
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360487
This also allows three op patterns to use increased constant bus
limit of GFX10.
Differential Revision: https://reviews.llvm.org/D61763
llvm-svn: 360395
The VOP3 form should always be the preferred selection, to be shrunk
later. This should only be an optimization issue, but this partially
works around a problem from clobbering VCC when SIFixSGPRCopies
rewrites an SCC defining operation directly to VCC.
3 of the testcases are regressions from failing to fold the immediate
in cases it should. These can be avoided by improving the VCC liveness
handling in SIFoldOperands. Simply increasing the threshold to
computeRegisterLiveness works, although this is common enough that VCC
liveness should probably be tracked throughout the pass. The hack of
leaving behind an implicit_def instruction to avoid breaking iterator
wastes instruction count, which inhibits finding the VCC def in long
chains of adds. Doing this however exposes different, worse looking
regressions from poor scheduling behavior. This could probably be
avoided around by forcing the shrink of the addc here, but the
scheduler should probably be fixed.
The r600 add test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.
llvm-svn: 360293
This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch.
llvm-svn: 360263
Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually.
Reviewers: arsenm, msearles, rampitec
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61564
llvm-svn: 360199
Summary:
No test case because I don't know of a way to trigger this, but I
accidentally caused this to fail while working on a different change.
Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61490
llvm-svn: 360123
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"
Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.
Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.
Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.
This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.
llvm-svn: 360066
The VOP3 form should always be the preferred selection form to be
shrunk later.
The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.
llvm-svn: 359899
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.
Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.
llvm-svn: 359891
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.
llvm-svn: 359883
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.
There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.
Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.
Differential Revision: https://reviews.llvm.org/D61331
llvm-svn: 359791
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.
This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.
Differential Revision: https://reviews.llvm.org/D59785
llvm-svn: 359537
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.
We discovered some internal test failures, so reverting for now.
Differential Revision: https://reviews.llvm.org/D61213
llvm-svn: 359363
Essentially complete a proper rebase of the V3 metadata change over
https://reviews.llvm.org/D49096.
Minimize the diff between the V2 and V3 variants of the relevant lit
tests, and clean up some trailing whitespace.
llvm-svn: 358992
Summary:
When an LCSSA phi survives through instruction selection, the pass
ends up removing that phi entirely because it is dominated by the
logic that does the lanemask merging.
This then used to trigger an assertion when processing a dependent
phi instruction.
Change-Id: Id4949719f8298062fe476a25718acccc109113b6
Reviewers: llvm-commits
Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60999
llvm-svn: 358983
Converting InlineCost interface and its internals into CallBase usage.
Inliners themselves are still not converted.
Reviewed By: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60636
llvm-svn: 358982
Summary:
- Only apply packed literal `op_sel_hi` skipping on operands requiring
packed literals. Even an instruction is `packed`, it may have operand
requiring non-packed literal, such as `v_dot2_f32_f16`.
Reviewers: rampitec, arsenm, kzhuravl
Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60978
llvm-svn: 358922
These are inserted after branch relaxation, and for some reason it's
decided to put them in the long branch expansion block. It's probably
not great to rely on the source block address, so this should probably
be switched to being PC relative instead of relying on the block
address
llvm-svn: 358909
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.
The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.
The X86 changes are all definite wins.
Differential Revision: https://reviews.llvm.org/D60462
llvm-svn: 358887
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).
The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60856
llvm-svn: 358744
Summary:
Ignore edges to non-SUnits (e.g. ExitSU) when checking
for low latency instructions.
When calling the function isLowLatencyInstruction(),
an ExitSU could be on the list of successors, not necessarily
a regular SU. In other places in the code there is a check
"Succ->NodeNum >= DAGSize" to prevent further processing of
ExitSU as "Succ->getInstr()" is NULL in such a case.
Also, 8 out of 9 cases of "SUnit *Succ = SuccDep.getSUnit())"
has the guard, so it is clearly an omission here.
Change-Id: Ica86f0327c7b2e6bcb56958e804ea6c71084663b
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: MatzeB, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60864
llvm-svn: 358740
fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but
creating the new fadd folds to just fneg(A). When A has multiple uses,
this confuses it and you get an assert. Fixed.
Differential Revision: https://reviews.llvm.org/D60633
Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c
llvm-svn: 358640
Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches.
Reviewers: arsen, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60824
llvm-svn: 358592
This fixes a test I introduced in change D59191 (that added src0 and
src1 modifiers to the v_cndmask instruction for disassembly purposes).
Spotted by David Binderman in bug 41488.
Differential Revision: https://reviews.llvm.org/D60652
Change-Id: I6ac95e66cd84e812ed3359ad57bcd0e13198ba0c
llvm-svn: 358392
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.
This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.
I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.
Compile time:
Program base cse diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8%
test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7%
test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0%
Geomean difference -0.2%
Code size:
Program base cse diff
test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test 3870380 3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0%
Geomean difference -0.9%
Differential Revision: https://reviews.llvm.org/D60580
llvm-svn: 358369
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.
This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.
llvm-svn: 358368
Summary:
The InlineAsm::AsmDialect is only required for X86; no architecture
makes use of it and as such it gets passed around between arch-specific
and general code while being unused for all architectures but X86.
Since the AsmDialect is queried from a MachineInstr, which we also pass
around, remove the additional AsmDialect parameter and query for it deep
in the X86AsmPrinter only when needed/as late as possible.
This refactor should help later planned refactors to AsmPrinter, as this
difference in the X86AsmPrinter makes it harder to make AsmPrinter more
generic.
Reviewers: craig.topper
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60488
llvm-svn: 358101
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.
Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.
Differential Revision: https://reviews.llvm.org/D59506
llvm-svn: 357870
Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.
MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.
Differential Revision: https://reviews.llvm.org/D59626
llvm-svn: 357805
We have done some predicate and feature refactoring lately but
did not upstream it. This is to sync.
Differential revision: https://reviews.llvm.org/D60292
llvm-svn: 357791
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.
Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.
This avoids regressions in the fast register allocator rewrite from
inverting the direction.
llvm-svn: 357634
The test should really be checking for the property directly in the
code object headers, but there are problems with this. I don't see
this directly represented in the text form, and for the binary
emission this is depending on a function level subtarget feature to
emit a global flag.
llvm-svn: 357558
We should overall stop using these, but the uppercase name didn't
work. Any feature string is converted to lowercase, so these
could never be found in the table.
llvm-svn: 357541
There are various places in LLVM where the definition of StackID is not
properly honoured, for example in PEI where objects with a StackID > 0 are
allocated on the default stack (StackID0). This patch enforces that PEI
only considers allocating objects to StackID 0.
Reviewers: arsenm, thegameg, MatzeB
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60062
llvm-svn: 357460
This change incorporates an effort by Connor Abbot to change how we deal
with WWM operations potentially trashing valid values in inactive lanes.
Previously, the SIFixWWMLiveness pass would work out which registers
were being trashed within WWM regions, and ensure that the register
allocator did not have any values it was depending on resident in those
registers if the WWM section would trash them. This worked perfectly
well, but would cause sometimes severe register pressure when the WWM
section resided before divergent control flow (or at least that is where
I mostly observed it).
This fix instead runs through the WWM sections and pre allocates some
registers for WWM. It then reserves these registers so that the register
allocator cannot use them. This results in a significant register
saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just
this change!).
Differential Revision: https://reviews.llvm.org/D59295
llvm-svn: 357400
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.
Also introduce a new amdgpu-ieee attribute to match.
The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.
Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.
llvm-svn: 357302
Some DAG mutations can only be applied to `ScheduleDAGMI`, and have to
internally cast a `ScheduleDAGInstrs` to `ScheduleDAGMI`.
There is nothing actually specific to `ScheduleDAGMI` in `Topo`.
llvm-svn: 357239
The register index can only really be an SGPR. Lie that a VGPR index
is legal, and then rewrite the instruction in a waterfall loop to
handle the index.
llvm-svn: 357235
This avoids allocating a few KB of heap memory on startup, and instead
allocates these maps lazily. I noticed this while profiling LLD.
llvm-svn: 357192
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)
This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.
Patch by Guillaume Marques. Thanks!
Differential Revision: https://reviews.llvm.org/D56201
llvm-svn: 357120
This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.
This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.
llvm-svn: 357098
Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.
Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.
llvm-svn: 357091
These were defaulting to true, but they are just wrappers around bit
operations. This avoids regressions in the exec mask optimization
passes in a future commit.
llvm-svn: 356952
Summary:
Adding contained caching to AliasAnalysis. BasicAA is currently the only one using it.
AA changes:
- This patch is pulling the caches from BasicAAResults to AAResults, meaning the getModRefInfo call benefits from the IsCapturedCache as well when in "batch mode".
- All AAResultBase implementations add the QueryInfo member to all APIs. AAResults APIs maintain wrapper APIs such that all alias()/getModRefInfo call sites are unchanged.
- AA now provides a BatchAAResults type as a wrapper to AAResults. It keeps the AAResults instance and a QueryInfo instantiated to batch mode. It delegates all work to the AAResults instance with the batched QueryInfo. More API wrappers may be needed in BatchAAResults; only the minimum needed is currently added.
MemorySSA changes:
- All walkers are now templated on the AA used (AliasAnalysis=AAResults or BatchAAResults).
- At build time, we optimize uses; now we create a local walker (lives only as long as OptimizeUses does) using BatchAAResults.
- All Walkers have an internal AA and only use that now, never the AA in MemorySSA. The Walkers receive the AA they will use when built.
- The walker we use for queries after the build is instantiated on AliasAnalysis and is built after building MemorySSA and setting AA.
- All static methods doing walking are now templated on AliasAnalysisType if they are used both during build and after. If used only during build, the method now only takes a BatchAAResults. If used only after build, the method now takes an AliasAnalysis.
Subscribers: sanjoy, arsenm, jvesely, nhaehnle, jlebar, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59315
llvm-svn: 356783
Some image ops return three or five dwords. Previously, we modeled that
with a 4 or 8 dword register class. The register allocator could
cleverly spot that some subregs were dead and allocate something else
there, but that caused the de-optimization that waitcnt insertion would
think that the result was used immediately.
This commit allows such an image op to have a result with a three or
five dword result, avoiding the above de-optimization.
Differential Revision: https://reviews.llvm.org/D58905
Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b
llvm-svn: 356757
Now we have vec3 MVTs, this commit implements dwordx3 variants of the
buffer intrinsics.
On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4
instruction, and a dwordx3 buffer store intrinsic is not supported.
We need to support the dwordx3 load intrinsic because it is generated by
subtarget-unaware code in InstCombine.
Differential Revision: https://reviews.llvm.org/D58904
Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e
llvm-svn: 356755
They are not used by anything yet, but a subsequent commit will start
using them for image ops that return 5 dwords.
Differential Revision: https://reviews.llvm.org/D58903
Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271
llvm-svn: 356735
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.
SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58902
Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9
llvm-svn: 356659
My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata"
accidentally caused a spurious PAL metadata .note record to be emitted
for any AMDGPU output. That caused failures in the lld test
amdgpu-relocs.s. Fixed.
Differential Revision: https://reviews.llvm.org/D59613
Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e
llvm-svn: 356621
Summary: - The linking is broken when this library is built as shared one.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59610
llvm-svn: 356617
The constantness shouldn't change the register bank choice. We also
don't need to restrict this to only indexing VGPRs, since it's
possible to index SGPRs (but SelectionDAG made using this
difficult). Allow directly indexing SGPRs when appropriate.
llvm-svn: 356611
Summary:
- Should use `targetconstant` instead of `constant` operand for clamp
bit, which is expected as an immediate operand. Under certain
conditions, such as a common `i1 false` constant is used in other
place and selected before the instruction with clamp bit, register
operand may be added instead of immediate one. Use `targetcosntant` to
enforce that.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59608
llvm-svn: 356608
Summary:
PAL metadata now supports both the old linear reg=val pairs format and
the new MsgPack format.
The MsgPack format uses YAML as its textual representation. On output to
YAML, a mnemonic name is provided for some hardware registers.
Differential Revision: https://reviews.llvm.org/D57028
Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94
llvm-svn: 356591
Summary:
This commit introduces a new AMDGPUPALMetadata class that:
* is inside the AMDGPU target;
* keeps an in-memory representation of PAL metadata;
* provides a method to read the frontend-supplied metadata from LLVM IR;
* provides methods for the asm printer to set metadata items;
* provides methods to write the metadata as a binary blob to put in a
.note record or as an asm directive;
* provides a method to read the metadata as a binary blob from a .note
record.
Because llvm-readobj cannot call directly into a target, I had to remove
llvm-readobj's ability to dump PAL metadata, pending a resolution to
https://reviews.llvm.org/D52821
Differential Revision: https://reviews.llvm.org/D57027
Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64
llvm-svn: 356582
Summary:
If an MIMG instruction has managed to get through to adjustWritemask in isel but
has no uses (and doesn't enable TFC) then prevent an assertion by not attempting
to adjust the writemask.
The instruction will be removed anyway.
Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58964
llvm-svn: 356540
This will allow targets more flexibility to replace the
register allocator core passes. In a future commit,
AMDGPU will run the core register assignment passes
twice, and will also want to disallow using the
standard -regalloc option.
llvm-svn: 356506
I found this really weird WWM-related case whereby through the WWM
transformations our isel lowering was trying to promote 2 min's into a
min3 for the i8 type, which our hardware doesn't support.
The new min3_i8.ll test case would previously spew the error:
PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68
Before the simple fix to our isel lowering to not do it for i8 MVT's.
Differential Revision: https://reviews.llvm.org/D59543
llvm-svn: 356464
Allow the clamp modifier on vop3 int arithmetic instructions in assembly
and disassembly.
This involved adding a clamp operand to the affected instructions in MIR
and MC, and thus having to fix up several places in codegen and MIR
tests.
Differential Revision: https://reviews.llvm.org/D59267
Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e
llvm-svn: 356399
This commit allows v_cndmask_b32_e64 with abs, neg source
modifiers on src0, src1 to be assembled and disassembled.
This does appear to be allowed, even though they are floating point
modifiers and the operand type is b32.
To do this, I added src0_modifiers and src1_modifiers to the
MachineInstr, which involved fixing up several places in codegen and mir
tests.
Differential Revision: https://reviews.llvm.org/D59191
Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea
llvm-svn: 356398
This fixes a couple of unflushed raw_string_ostream bugs in recent
commits that only show up on a bot building on windows with expensive
checks.
Differential Revision: https://reviews.llvm.org/D59396
Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf
llvm-svn: 356394
This allows better code size for aarch64 floating point materialization
in a future patch.
Reviewers: evandro
Differential Revision: https://reviews.llvm.org/D58690
llvm-svn: 356389
Add an experimental buffer fat pointer address space that is currently
unhandled in the backend. This commit reserves address space 7 as a
non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer
descriptor + 32-bit offset) that is heavily used in graphics workloads
using the AMDGPU backend.
Differential Revision: https://reviews.llvm.org/D58957
llvm-svn: 356373
There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.
Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).
Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.
llvm-svn: 356347
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:
* Fixed assumptions of power-of-2 vector type in kernel arg handling,
and added v5 kernel arg tests and v3/v5 shader arg tests.
* Added v5 tests for cost analysis.
* Added vec3/vec5 arg test cases.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58928
Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd
llvm-svn: 356342
Summary:
At the exit of the loop, the compiler uses a register to remember and accumulate
the number of threads that have already exited. When all active threads exit the
loop, this register is used to restore the exec mask, and the execution continues
for the post loop code.
When there is a "continue" in the loop, the compiler made a mistake to reset the
register to 0 when the "continue" backedge is taken. This will result in some
threads not executing the post loop code as they are supposed to.
This patch fixed the issue.
Reviewers:
nhaehnle, arsenm
Differential Revision:
https://reviews.llvm.org/D59312
llvm-svn: 356298
Summary:
- During the fixing of SGPR copying from VGPR, ensure users of SCC is
properly propagated, i.e.
* only propagate through live def of SCC,
* skip the SCC-def inst itself, and
* stop the propagation on the other SCC-def inst after checking its
SCC-use first.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59362
llvm-svn: 356258
This has been a very painful missing feature that has made producing
reduced testcases difficult. In particular the various registers
determined for stack access during function lowering were necessary to
avoid undefined register errors in a large percentage of
cases. Implement a subset of the important fields that need to be
preserved for AMDGPU.
Most of the changes are to support targets parsing register fields and
properly reporting errors. The biggest sort-of bug remaining is for
fields that can be initialized from the IR section will be overwritten
by a default initialized machineFunctionInfo section. Another
remaining bug is the machineFunctionInfo section is still printed even
if empty.
llvm-svn: 356215
Summary:
MsgPackDocument is the lighter-weight replacement for MsgPackTypes. This
commit switches AMDGPU HSA metadata processing to use MsgPackDocument
instead of MsgPackTypes.
Differential Revision: https://reviews.llvm.org/D57024
Change-Id: I0751668013abe8c87db01db1170831a76079b3a6
llvm-svn: 356081
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.
Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.
This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.
llvm-svn: 355981
AMDGPU target run out of Subtarget feature flags hitting the limit of 64.
AssemblerPredicates uses at most uint64_t for their representation.
At the same time CodeGen has exhausted this a long time ago and switched
to a FeatureBitset with the current limit of 192 bits.
This patch completes transition to the bitset for feature bits extending
it to asm matcher and MC code emitter.
Differential Revision: https://reviews.llvm.org/D59002
llvm-svn: 355839
When matching half of the build_vector to a load, there could still be
a hidden dependency on the other half of the build_vector the pattern
wouldn't detect. If there was an additional chain dependency on the
other value, a cycle could be introduced.
I don't think a tablegen pattern is capable of matching the necessary
conditions, so move this into PreprocessISelDAG. Check isPredecessorOf
for the other value to avoid a cycle. This has a warning that it's
expensive, so this should probably be moved into an MI pass eventually
that will have more freedom to reorder instructions to help match
this. That is currently complicated by the lack of a computeKnownBits
type mechanism for the selected function.
llvm-svn: 355731
This avoids breaking possible value dependencies when sorting loads by
offset.
AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.
Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.
llvm-svn: 355728
This was checking the wrong operands for the base register and the
offsets. The indexes are shifted by the number of output registers
from the machine instruction definition, and the chain is moved to the
end.
llvm-svn: 355722
- Copy kernel symbol attributes into kernel descriptor attributes
- Make sure kernel symbol's visibility is not "higher" than protected
Differential Revision: https://reviews.llvm.org/D59057
llvm-svn: 355630
A previous patch for "uniform-work-group-size" attribute was found to break
some RADV and possibly radeon SI tests and had to be retracted.
This patch fixes that.
Differential Revision: http://reviews.llvm.org/D58993
llvm-svn: 355574
Summary:
This adds support for 64 bit buffer atomic arithmetic instructions but does not include
cmpswap as that depends on a fix to the way the register pairs are handled
Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58918
llvm-svn: 355520
Summary:
In some cases the KILL was causing a hazard to be introduced as these were
scheduled into hazard slots, but don't result in an instruction.
KILL shouldn't be considered for hazard recognition.
Change-Id: Ib6d2a2160f8c94cd0ce611ab198c7e4f46aeffcf
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58898
llvm-svn: 355384
Implement MCInstrAnalysis for AMDGPU, with default implementations save
for `evaluateBranch`.
Differential Revision: https://reviews.llvm.org/D58400
llvm-svn: 355373
These were not recognized as potential atomics by memory legalizer.
The test was working not because legalizer did a right thing, but
because it has skipped all these instructions. When I have fixed
DS desciption test started to fail because region address has
changed from 4 to 2 a while ago.
Differential Revision: https://reviews.llvm.org/D58802
llvm-svn: 355179
SITargetLowering::reassociateScalarOps() does not touch constants
so that DAGCombiner::ReassociateOps() does not revert the combine.
However a global address is not a ConstantSDNode.
Switched to the method used by DAGCombiner::ReassociateOps() itself
to detect constants.
Differential Revision: https://reviews.llvm.org/D58695
llvm-svn: 354926
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.
llvm-svn: 354828
An internal build is hitting asserts complaining about too many subtarget
features:
llvm/utils/TableGen/Types.cpp:42:
const char* llvm::getMinimalTypeForEnumBitfield(uint64_t):
Assertion `MaxIndex <= 64 && "Too many bits"' failed.
llvm/utils/TableGen/AsmMatcherEmitter.cpp:1476:
void {anonymous}::AsmMatcherInfo::buildInfo():
Assertion `SubtargetFeatures.size() <= 64 && "Too many subtarget features!"'
failed.
The short-term solution is to remove a few unused AssemblerPredicates to get
under the limit.
The long-term solution seems to be to revisit these asserts. E.g., rather than
hardcoded '64', use the standard sized std::bitset like the other places that
track subtarget features.
Differential Revision: https://reviews.llvm.org/D58516
llvm-svn: 354604
Summary:
This is to fix a memory dependence bug in LoadStoreOptimizer.
Reviewers:
arsenm, rampitec
Differential Revision:
https://reviews.llvm.org/D58295
llvm-svn: 354295
This is basically a pointer typed add, so shouldn't be any different.
This was assuming everything was an SGPR, which is not true.
Also cleanup legality for GEP. I don't seem to be seeing the problem
the hack marking s64 as a legal pointer type the comment mentions.
llvm-svn: 354067
Reassociate adds to collect scalar operands in a single
instruction when possible. That will result in a scalar
add followed by vector instead of two vector adds, thus
better utilizing SALU.
Differential Revision: https://reviews.llvm.org/D58220
llvm-svn: 354066
Subtargets are a function level property, so ideally we would
eliminate everywhere that needs to check the global one. Rename the
function to try avoiding confusion.
llvm-svn: 353900
This was inhibiting inlining of library functions when clang was
invoking the inliner directly. This is covering a bit of a mess with
subtarget feature handling, and this shouldn't be a subtarget
feature. The behavior is different depending on whether you are using
a -mattr flag in clang, or llc, opt.
llvm-svn: 353899
This commit fixes the DPP sequence in the atomic optimizer (which was
previously missing the row_shr:3 step), and works around a read_register
exec bug by using a ballot instead.
Differential Revision: https://reviews.llvm.org/D57737
llvm-svn: 353703
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
Inline compatability is determined from the individual feature
bits. These are just sets of the separate features, but will always be
treated as incompatible unless they are specifically ignored.
Defining the ISA version number here in tablegen would be nice, but it
turns out this wasn't actually used.
llvm-svn: 353558
clampScalar doesn't do anything for non-power-of-2 in range.
There should probably be a combination rule to reduce the number
of matching rules.
llvm-svn: 353526