Summary:
The printOperand function takes a default parameter, for which there are
zero call sites that explicitly pass such a parameter. As such, there
is no case to support. This means that the method
printVecModifiedImmediate is purly dead code, and can be removed.
The eventual goal for some of these AsmPrinter refactoring is to have
printOperand be a virtual method; making it easier to print operands
from the base class for more generic Asm printing. It will help if all
printOperand methods have the same function signature (ie. no Modifier
argument when not needed).
Reviewers: echristo, tra
Reviewed By: echristo
Subscribers: jholewinski, hiraditya, llvm-commits, craig.topper, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60727
llvm-svn: 358527
Summary:
The InlineAsm::AsmDialect is only required for X86; no architecture
makes use of it and as such it gets passed around between arch-specific
and general code while being unused for all architectures but X86.
Since the AsmDialect is queried from a MachineInstr, which we also pass
around, remove the additional AsmDialect parameter and query for it deep
in the X86AsmPrinter only when needed/as late as possible.
This refactor should help later planned refactors to AsmPrinter, as this
difference in the X86AsmPrinter makes it harder to make AsmPrinter more
generic.
Reviewers: craig.topper
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60488
llvm-svn: 358101
Summary:
Previously, we translate llvm.round to PTX cvt.rni, which rounds to the
even interger when the source is equidistant between two integers. This
is not correct as llvm.round should round away from zero. This change
replaces llvm.round with a round away from zero implementation through
target specific custom lowering.
Modify a few affected tests to not check for cvt.rni. Instead, we check
for the use of a few constants used in implementing round. We are also
adding CUDA runnable tests to check for the values produced by
llvm.round to test-suites/External/CUDA.
Reviewers: tra
Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59947
llvm-svn: 357407
This will allow targets more flexibility to replace the
register allocator core passes. In a future commit,
AMDGPU will run the core register assignment passes
twice, and will also want to disallow using the
standard -regalloc option.
llvm-svn: 356506
Summary:
This patch works around the bug in the ptxas tool with the processing of bytes
separated by the comma symbol. The emission of the packed string is
temporarily disabled.
Reviewers: tra
Subscribers: jholewinski, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59148
llvm-svn: 355740
Summary:
If the LLVM module shows that it has debug info, but the file is
actually empty and the real debug info is not emitted, the ptxas tool
emits error 'Debug information not found in presence of .target debug'.
We need at leas one empty debug section to silence this message. Section
`.debug_loc` is not emitted for PTX and we can emit empty `.debug_loc`
section if `debug` option was emitted.
Reviewers: tra
Subscribers: jholewinski, aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D57250
llvm-svn: 355719
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57172
llvm-svn: 352911
Enable full support for the debug info. Recommit to fix the emission of
the not required closing brace.
Differential revision: https://reviews.llvm.org/D46189
llvm-svn: 351972
Summary: Enable full support for the debug info.
Reviewers: echristo
Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46189
llvm-svn: 351846
Summary: Initial function labels must follow the debug location for the correct relocation info generation.
Reviewers: tra, jlebar, echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D45784
llvm-svn: 351843
to reflect the new license. These used slightly different spellings that
defeated my regular expressions.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351648
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
The patch adds a possibility to make library calls on NVPTX.
An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.
Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.
Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.
Patch by Denys Zariaiev!
Differential Revision: https://reviews.llvm.org/D34708
llvm-svn: 350069
NVPTXAsmPrinter::doInitialization() was creating an NVPTXSubtarget on
the stack. This object is huge, about 80kb. Also it's slow to create.
And it's all redundant; we have one in NVPTXTargetMachine anyway!
llvm-svn: 349982
The change is an effort to split and refactor abandoned
D34708 into smaller parts.
Here the behaviour of unsupported instructions is changed
to match the behaviour of explicit intrinsics calls.
Currently LLVM crashes with:
> Assertion getInstruction() && "Not a call or invoke instruction!" failed.
With this patch LLVM produces a more sensible error message:
> Cannot select: ... i32 = ExternalSymbol'__foobar'
Author: Denys Zariaiev <denys.zariaiev@gmail.com>
Differential Revision: https://reviews.llvm.org/D55145
llvm-svn: 349213
If a module has function references, but no functions
themselves, we may end up never calling runOnMachineFunction
and therefore would never initialize nvptxSubtarget field
which would eventually cause a crash.
Instead of relying on nvptxSubtarget being initialized by
one of the methods, retrieve subtarget info directly.
Differential Revision: https://reviews.llvm.org/D55580
llvm-svn: 348952
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.
Differential Revision: https://reviews.llvm.org/D50141
llvm-svn: 348585
Summary:
If the output of debug directives only is requested, we should drop
emission of ',debug' option from the target directive. Required for
supporting of nvprof profiler.
Reviewers: echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46061
llvm-svn: 348497
Summary:
We may end up with not emitted debug directives at the end of the module
emission. Patch fixes this problem emitting those last directives the
end of the module emission.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D54320
llvm-svn: 348495
This reverts commit r345972. Need to update the description + possibly
to update the patch itself after discussion with Eric Christofer.
llvm-svn: 346508
MachineFunction can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.
Do the same for references in ScheduleDAG and RegUsageInfoCollector.
llvm-svn: 346183
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.
llvm-svn: 346180
Summary:
If the output of debug directives only is requested, we should drop
emission of ',debug' option from the target directive. Required for
supporting of nvprof profiler.
Reviewers: probinson, echristo, dblaikie
Subscribers: Hahnfeld, jholewinski, llvm-commits, JDevlieghere, aprantl
Differential Revision: https://reviews.llvm.org/D46061
llvm-svn: 345972
Summary:
If the instruction in the eliminateFrameIndex function is a DBG_VALUE
instruction, it requires special processing. The frame register is set
to VRFrame and the offset is based on the object offset.
The code is similar to the code used in
lib/CodeGen/PrologEpilogInserter.cpp.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D53657
llvm-svn: 345269
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.
Reviewers: arsenm, aheejin, dschuff, javed.absar
Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D53112
llvm-svn: 345218
Summary:
If the target does not support `.asciz` and `.ascii` directives, the
strings are represented as bytes and each byte is placed on the new line
as a separate byte directive `.b8 <data>`. NVPTX target allows to
represent the vector of the data of the same type as a vector, where
values are separated using `,` symbol: `.b8 <data1>,<data2>,...`. This
allows to reduce the size of the final PTX file. Ptxas tool includes ptx
files into the resulting binary object, so reducing the size of the PTX
file is important.
Reviewers: tra, jlebar, echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D45822
llvm-svn: 345142
by `getTerminator()` calls instead be declared as `Instruction`.
This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).
llvm-svn: 344502
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.
Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.
This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.
This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.
As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]
Differential Revision: https://reviews.llvm.org/D50680
llvm-svn: 339740
According to PTX ISA .volatile has the same memory synchronization
semantics as .relaxed.sys, so it can be used to implement monotonic
atomic loads and stores. This is important for OpenMP's atomic
construct where
- 'read's and 'write's are lowered to atomic loads and stores, and
- an update of float or double types are lowered into a cmpxchg loop.
(Note that PTX could do better because it has atom.add.f{32,64} but
LLVM's atomicrmw instruction only allows integer types.)
Higher levels of atomicity (like acquire and release) need additional
synchronization properties which were added with PTX ISA 6.0 / sm_70.
So using these instructions still results in an error.
Differential Revision: https://reviews.llvm.org/D50391
llvm-svn: 339316
Summary:
NVPTX target dos not use register-based frame information. Instead it
relies on the artificial local_depot that is used instead of the frame
and the data for variables must be emitted relatively to this
local_depot.
Reviewers: tra, jlebar, echristo
Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45963
llvm-svn: 338039
A TableGen instruction record usually contains a DAG pattern that will
describe the SelectionDAG operation that can be implemented by this
instruction. However, there will be cases where several different DAG
patterns can all be implemented by the same instruction. The way to
represent this today is to write additional patterns in the Pattern
(or usually Pat) class that map those extra DAG patterns to the
instruction. This usually also works fine.
However, I've noticed cases where the current setup seems to require
quite a bit of extra (and duplicated) text in the target .td files.
For example, in the SystemZ back-end, there are quite a number of
instructions that can implement an "add-with-overflow" operation.
The same instructions also need to be used to implement just plain
addition (simply ignoring the extra overflow output). The current
solution requires creating extra Pat pattern for every instruction,
duplicating the information about which particular add operands
map best to which particular instruction.
This patch enhances TableGen to support a new PatFrags class, which
can be used to encapsulate multiple alternative patterns that may
all match to the same instruction. It operates the same way as the
existing PatFrag class, except that it accepts a list of DAG patterns
to match instead of just a single one. As an example, we can now define
a PatFrags to match either an "add-with-overflow" or a regular add
operation:
def z_sadd : PatFrags<(ops node:$src1, node:$src2),
[(z_saddo node:$src1, node:$src2),
(add node:$src1, node:$src2)]>;
and then use this in the add instruction pattern:
defm AR : BinaryRRAndK<"ar", 0x1A, 0xB9F8, z_sadd, GR32, GR32>;
These SystemZ target changes are implemented here as well.
Note that PatFrag is now defined as a subclass of PatFrags, which
means that some users of internals of PatFrag need to be updated.
(E.g. instead of using PatFrag.Fragment you now need to use
!head(PatFrag.Fragments).)
The implementation is based on the following main ideas:
- InlinePatternFragments may now replace each original pattern
with several result patterns, not just one.
- parseInstructionPattern delays calling InlinePatternFragments
and InferAllTypes. Instead, it extracts a single DAG match
pattern from the main instruction pattern.
- Processing of the DAG match pattern part of the main instruction
pattern now shares most code with processing match patterns from
the Pattern class.
- Direct use of main instruction patterns in InferFromPattern and
EmitResultInstructionAsOperand is removed; everything now operates
solely on DAG match patterns.
Reviewed by: hfinkel
Differential Revision: https://reviews.llvm.org/D48545
llvm-svn: 336999
It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.
I used Python's re.sub with this regex to update users:
r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'
llvm-svn: 336462
We don't want to prevent inlining because of target-cpu and -features
attributes that were added to newer versions of LLVM/Clang: There are
no incompatible functions in PTX, ptxas will throw errors in such cases.
Differential Revision: https://reviews.llvm.org/D47691
llvm-svn: 334904
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.
Target that uses these opcodes are changed in order to ensure their behavior doesn't change.
Reviewers: efriedma, craig.topper, dblaikie, bkramer
Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D47422
llvm-svn: 333748
This reapplies commits: r330271, r330592, r330779.
[DEBUG] Initial adaptation of NVPTX target for debug info emission.
Summary:
Patch adds initial emission of the debug info for NVPTX target.
Currently, only .file and .loc directives are emitted, everything else is
commented out to not break the compilation of Cuda.
llvm-svn: 332689
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Const/local/shared address spaces are all < 4GB and we can always use
32-bit pointers to access them. This has substantial performance impact
on kernels that uses shared memory for intermediary results.
The feature is disabled by default.
Differential Revision: https://reviews.llvm.org/D46147
llvm-svn: 331941
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
This appears to have some issues associated with the file directive output
causing multiple global symbols with the name "file" to be emitted into a
startup section. I'm investigating more specific causes and working with the
original author.
This reverts commit r330271.
Also Revert "[DEBUGINFO, NVPTX] Add the test for the debug info of the local"
This reverts commit r330592 and the follow up of 330779 as the testcase is dependent upon r330271.
llvm-svn: 331237
Since PTX has grown a <2 x half> datatype vectorization has become more
important. The late LoadStoreVectorizer intentionally only does loads
and stores, but now arithmetic has to be vectorized for optimal
throughput too.
This is still very limited, SLP vectorization happily creates <2 x half>
if it's a legal type but there's still a lot of register moving
happening to get that fed into a vectorized store. Overall it's a small
performance win by reducing the amount of arithmetic instructions.
I haven't really checked what the loop vectorizer does to PTX code, the
cost model there might need some more tweaks. I didn't see it causing
harm though.
Differential Revision: https://reviews.llvm.org/D46130
llvm-svn: 331035
There's no direct instruction for this, but it's trivially implemented
with two movs. Without this the code generator just dies when
encountering a shufflevector.
Differential Revision: https://reviews.llvm.org/D46116
llvm-svn: 330948
Summary:
Patch adds initial emission of the debug info for NVPTX target.
Currently, only .file and .loc directives are emitted, everything else is
commented out to not break the compilation of Cuda.
Reviewers: echristo, jlebar, tra, jholewinski
Subscribers: mgorny, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41827
llvm-svn: 330271
When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature,
consider those features available if we're compiling for GPU >= sm_XX or have
enabled PTX version >= ptxYY.
Differential Revision: https://reviews.llvm.org/D45061
llvm-svn: 329829
Previously HalfTy was not handled which would either trigger an assertion,
or result in array initialized with garbage.
Differential Revision: https://reviews.llvm.org/D45391
llvm-svn: 329463
v2f16 is a special case in NVPTX. v4f16 may be loaded as a pair of v2f16
and that was not previously handled correctly by tryLDGLDU()
Differential Revision: https://reviews.llvm.org/D45339
llvm-svn: 329456
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.
Also remove some stale-looking comments from the Nios2 target cmakefile.
No intended behavior change.
llvm-svn: 329181
Summary:
This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs
property, so now the MachineFunctionPass can enforce this check.
These passes are disabled in NVPTX & WebAssembly.
Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier
Reviewed By: dschuff, thegameg
Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45183
llvm-svn: 329095
Summary:
Make NVPTX require structured CFG. Added a temporary flag to
"roll back" the behavior for easy deployment.
Combined with D45008, this fixes several internal Nvidia GPU test
failures that we suspect to be ptxas miscompiles (PR27738).
Reviewers: jlebar
Subscribers: jholewinski, sanjoy, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D45070
llvm-svn: 328885
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.
The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.
Differential Revision: https://reviews.llvm.org/D45017
llvm-svn: 328806
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)
llvm-svn: 328395
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.
Differential Revision: https://reviews.llvm.org/D44719
llvm-svn: 328158
This way we can support address-space specific variants without explicitly
encoding the space in the name of the intrinsic. Less intrinsics to deal with ->
less boilerplate.
Added a bit of tablegen magic to match/replace an intrinsics with a pointer
argument in particular address space with the space-specific instruction
variant.
Updated tests to use non-default address spaces.
Differential Revision: https://reviews.llvm.org/D43268
llvm-svn: 328006
Now that patterns can handle intrinsics returning multiple results,
use tablegen'ed pattern matching instead of custom lowering.
Differential Revision: https://reviews.llvm.org/D43890
llvm-svn: 326457
Summary:
After D43914, loads from global variables in addrspace(1) happen with
ld.global. But since they're constants, even better would be to use
ld.global.nc, aka ldg.
Reviewers: tra
Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D43915
llvm-svn: 326390
Summary:
NVPTXGenericToNVVM was using target-specific intrinsics to do address
space casts. Using the addrspacecast instruction is (a lot) simpler.
But it also has the advantage of being understandable to other passes.
In particular, InferAddrSpaces is able to understand these address space
casts and remove them in most cases.
Reviewers: tra
Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D43914
llvm-svn: 326389
NVPTX stopped supporting GPUs older than sm_20 (Fermi) quite a while back.
Removal of support of pre-Fermi GPUs made a lot of predicates in the NVPTX
backend pointless as they can't ever be false any more.
It's time to retire them. NFC intended.
Differential Revision: https://reviews.llvm.org/D43843
llvm-svn: 326349
This avoids playing games with pseudo pass IDs and avoids using an
unreliable MRI::isSSA() check to determine whether register allocation
has happened.
Note that this renames:
- MachineLICMID -> EarlyMachineLICM
- PostRAMachineLICMID -> MachineLICMID
to be consistent with the EarlyTailDuplicate/TailDuplicate naming.
llvm-svn: 322927
Re-land r321234. It had to be reverted because it broke the shared
library build. The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).
Original commit message:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321375
Summary:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321234
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.
On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.
llvm-svn: 320746
PTX requires that identifiers consist only of [a-zA-Z0-9_$]. The
existing pass already ensured this for globals and this patch adds
the cleanup for functions with local linkage.
However, there was a different problem in the case of collisions
of the adjusted name: The ValueSymbolTable then automatically
appended ".N" with increasing Ns to get a unique name while helping
the ABI demangling. Special case this behavior to omit the dots and
append N directly. This will always give us legal names according
to the PTX requirements.
Differential Revision: https://reviews.llvm.org/D40573
llvm-svn: 319657
As part of the unification of the debug format and the MIR format, avoid
printing "vreg" for virtual registers (which is one of the current MIR
possibilities).
Basically:
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g"
* grep -nr '%vreg' . and fix if needed
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g"
* grep -nr 'vreg[0-9]\+' . and fix if needed
Differential Revision: https://reviews.llvm.org/D40420
llvm-svn: 319427
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
Summary:
Make it possible to feed runtime information back to tablegen to enable
profile-guided tablegen-eration, detection of untested tablegen definitions, etc.
Being a cross-compiler by nature, LLVM will potentially collect data for multiple
architectures (e.g. when running 'ninja check'). We therefore need a way for
TableGen to figure out what data applies to the backend it is generating at the
time. This patch achieves that by including the name of the 'def X : Target ...'
for the backend in the TargetRegistry.
Reviewers: qcolombet
Reviewed By: qcolombet
Subscribers: jholewinski, arsenm, jyknight, aditya_nandakumar, sdardis, nemanjai, ab, nhaehnle, t.p.northover, javed.absar, qcolombet, llvm-commits, fedor.sergeev
Differential Revision: https://reviews.llvm.org/D39742
llvm-svn: 318352
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.
llvm-svn: 317647
Summary:
This just seems to have been an oversight. We already supported the f64
atomic add with an explicit scope (e.g. "cta"), but not the scopeless
version.
Reviewers: tra
Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39638
llvm-svn: 317623
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.
This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.
llvm-svn: 317379
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.
Differential Revision: https://reviews.llvm.org/D39026
llvm-svn: 316495
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.
This reverts commit r315633.
llvm-svn: 315637
Merge LLVMTargetMachine into TargetMachine.
- There is no in-tree target anymore that just implements TargetMachine
but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
interface.
Differential Revision: https://reviews.llvm.org/D38489
llvm-svn: 315633
WMMA = "Warp Level Matrix Multiply-Accumulate".
These are the new instructions introduced in PTX6.0 and available
on sm_70 GPUs.
Differential Revision: https://reviews.llvm.org/D38645
llvm-svn: 315601
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.
On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.
Differential Revision: https://reviews.llvm.org/D37576
llvm-svn: 312734
IMHO it is an antipattern to have a enum value that is Default.
At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.
This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.
llvm-svn: 309911
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262https://reviews.llvm.org/D35049
llvm-svn: 308729
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].
Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types
Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)
llvm-svn: 308675
Where is is needed (at the end of headers that define it), be
consistent about its use.
Also fix a few header guards that I found in the process.
Differential Revision: https://reviews.llvm.org/D34916
llvm-svn: 307840
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.
Differential revision: https://reviews.llvm.org/D32536
llvm-svn: 307346
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].
Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types
Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)
llvm-svn: 307326
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
This adds a callback to the LLVMTargetMachine that lets target indicate
that they do not pass the machine verifier checks in all cases yet.
This is intended to be a temporary measure while the targets are fixed
allowing us to enable the machine verifier by default with
EXPENSIVE_CHECKS enabled!
Differential Revision: https://reviews.llvm.org/D33696
llvm-svn: 304320
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.
While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.
llvm-svn: 304247
Follow up to D33147
NVPTXTargetLowering::LowerCall was trusting the default argument values.
Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.
Differential Revision: https://reviews.llvm.org/D33189
llvm-svn: 303082
NFC followup to D33147, this explicitly sets all the arguments (instead of relying on the defaults) to SelectionDAG::getMemIntrinsicNode to help identify -verify-machineinstrs issues.
llvm-svn: 303047
This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.
Differential Revision: https://reviews.llvm.org/D33147
llvm-svn: 302942
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.
This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.
The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
affects all targets that use frame pseudo instructions and touched many
files although the changes are uniform.
- Access to frame properties are implemented using special instructions
rather than calls getOperand(N).getImm(). For X86 and ARM such
replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
instruction. These involve proper instruction initialization and
methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
frame parts initialized inside frame instruction pair and outside it.
The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.
Differential Revision: https://reviews.llvm.org/D32394
llvm-svn: 302527
The method is called "get *Param* Alignment", and is only used for
return values exactly once, so it should take argument indices, not
attribute indices.
Avoids confusing code like:
IsSwiftError = CS->paramHasAttr(ArgIdx, Attribute::SwiftError);
Alignment = CS->getParamAlignment(ArgIdx + 1);
Add getRetAlignment to handle the one case in Value.cpp that wants the
return value alignment.
This is a potentially breaking change for out-of-tree backends that do
their own call lowering.
llvm-svn: 301682
1. RegisterClass::getSize() is split into two functions:
- TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const;
- TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const;
2. RegisterClass::getAlignment() is replaced by:
- TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const;
This will allow making those values depend on subtarget features in the
future.
Differential Revision: https://reviews.llvm.org/D31783
llvm-svn: 301221
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.
This adds an lshrInPlace(const APInt &) version as well.
Differential Revision: https://reviews.llvm.org/D32155
llvm-svn: 300566
Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1,
Kind) everywhere.
The fact that the AttributeList index for an argument is ArgNo+1 should
be a hidden implementation detail.
NFC
llvm-svn: 300272
LLVM makes several assumptions about address space 0. However,
alloca is presently constrained to always return this address space.
There's no real way to avoid using alloca, so without this
there is no way to opt out of these assumptions.
The problematic assumptions include:
- That the pointer size used for the stack is the same size as
the code size pointer, which is also the maximum sized pointer.
- That 0 is an invalid, non-dereferencable pointer value.
These are problems for AMDGPU because alloca is used to
implement the private address space, which uses a 32-bit
index as the pointer value. Other pointers are 64-bit
and behave more like LLVM's notion of generic address
space. By changing the address space used for allocas,
we can change our generic pointer type to be LLVM's generic
pointer type which does have similar properties.
llvm-svn: 299888
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
This is repetition of isImage() function in NVPTXUtilities.cpp.
Patch by Briana Grace!
Differential Revision: https://reviews.llvm.org/D30706
llvm-svn: 297252
Make opcode selection code for the load instruction a bit easier
to read and maintain.
This patch also catches number of f16 load/store variants that were
not handled before.
Differential Revision: https://reviews.llvm.org/D30513
llvm-svn: 296785
This patch enables support for .f16x2 operations.
Added new register type Float16x2.
Added support for .f16x2 instructions.
Added handling of vectorized loads/stores of v2f16 values.
Differential Revision: https://reviews.llvm.org/D30057
Differential Revision: https://reviews.llvm.org/D30310
llvm-svn: 296032
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.
This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.
General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
which returns an array of flags marking boundaries of vectorized
load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
that constructs a vector according to the plan.
Differential Revision: https://reviews.llvm.org/D30011
llvm-svn: 295784
x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as
desired.
Verified that the particular nvptx approximate instructions here do in
fact return 0 for x = 0.
llvm-svn: 293713
Summary:
This lets us lower to sqrt.approx and rsqrt.approx under more
circumstances.
* Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32,
when fast-math is enabled. Previously, we only would emit it for
calls to @llvm.nvvm.sqrt.f. (With this patch we no longer emit
sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to
simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.)
* Now we emit the ftz version of rsqrt.approx when ftz is enabled.
Previously, we only emitted rsqrt.approx when ftz was disabled.
Reviewers: hfinkel
Subscribers: llvm-commits, tra, jholewinski
Differential Revision: https://reviews.llvm.org/D28508
llvm-svn: 293605