Today llc will crash when attempting to use non-power-of-two integer types as
function arguments or returns. This patch enables passing non standard integer
values in functions by promoting them before store and truncating after load.
The main motivation of implementing this change is that rust casts small structs
(less than pointer size) into an integer of the same size. As an example, if a
struct contains three u8 then it will be passed as an i24. This patch is a step
towards enabling rust compilation to ptx while retaining the target independent
optimizations.
More context can be found in https://github.com/llvm/llvm-project/issues/55764
Differential Revision: https://reviews.llvm.org/D129291
The original patch revealed an issue of reading incorrect values on BE hosts.
That is now changed to use `endian::read32le()` and `endian::read64le()`.
Original commit message:
The current implementation assumes that all pointers used in the
initialization of an aggregate are aligned according to the pointer size
of the target; that might not be so if the object is packed. In that
case, an array of .u8 should be used and pointers should be decorated
with the mask() operator.
The operator was introduced in PTX ISA 7.1, so an error is issued if the
case is detected for an earlier version.
Differential Revision: https://reviews.llvm.org/D127504
The current implementation assumes that all pointers used in the
initialization of an aggregate are aligned according to the pointer size
of the target; that might not be so if the object is packed. In that
case, an array of .u8 should be used and pointers should be decorated
with the mask() operator.
The operator was introduced in PTX ISA 7.1, so an error is issued if the
case is detected for an earlier version.
Differential Revision: https://reviews.llvm.org/D127504
This simplifies NVPTXAsmPrinter::AggBuffer and its usage.
It is also a preparation for D127504.
Differential Revision: https://reviews.llvm.org/D129773
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo. The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.
Recommitted with some fixes for the leftover MCII variables in release
builds.
Differential Revision: https://reviews.llvm.org/D129506
This reverts commit e2fb8c0f4b as it does
not build for Release builds, and some buildbots are giving more warning
than I saw locally. Reverting to fix those issues.
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo. The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.
Differential Revision: https://reviews.llvm.org/D129506
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
This removes the insertvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
This is very similar to the extractvalue removal from D125795.
insertvalue is also not supported in bitcode, so no auto-ugprade
is necessary.
ConstantExpr::getInsertValue() can be replaced with
IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(),
depending on whether a constant result is required (with the latter
being fallible).
The ConstantExpr::hasIndices() and ConstantExpr::getIndices()
methods also go away here, because there are no longer any constant
expressions with indices.
Differential Revision: https://reviews.llvm.org/D128719
`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code more opaque.
Differential Revision: https://reviews.llvm.org/D128121
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.
Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.
Differential Revision: https://reviews.llvm.org/D126910
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
We commonly want to create either an inbounds or non-inbounds GEP
based on a boolean value, e.g. when preserving inbounds from
existing GEPs. Directly accept such a boolean in the API, rather
than requiring a ternary between CreateGEP and CreateInBoundsGEP.
This change is not entirely NFC, because we now preserve an
inbounds flag in a constant expression edge-case in InstCombine.
A global variable may have the same name as a label, and ptxas does not accept it.
Prefix labels with $L__ to fix this.
Reviewed By: MaskRay, tra
Differential Revision: https://reviews.llvm.org/D119669
PTX supports those instructions for i64 starting from 4.3.
The patch also marks corresponding DAG nodes legal for both i32 and i64.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D124698
Default behavior for .file directory was changed in D105856, but
ptxas (CUDA 11.5 release) refuses to parse it:
$ llc -march=nvptx64 llvm/test/DebugInfo/NVPTX/debug-file-loc.ll
$ ptxas debug-file-loc.s
ptxas debug-file-loc.s, line 42; fatal : Parsing error near
'"foo.h"': syntax error
Added a new field to MCAsmInfo to control default value of
UseDwarfDirectory. This value is used if -dwarf-directory command line
option is not specified.
Differential Revision: https://reviews.llvm.org/D121299
Make sure NVPTX backend can handle bitcasting between `float` and `<2 x half>` types.
This was discovered through: https://github.com/intel/llvm/issues/5969
I'm not suggesting that such bitcasts make much sense, but it feels like the compiler should not hard crash on them.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D124171
ptxas fails to parse such syntax:
mov.u64 %rd1, ($str);
fatal : Parsing error near '$str': syntax error
A new MCAsmInfo option was added because InParens parameter of
MCExpr::print is not sufficient to disable parens
completely. MCExpr::print resets it to false for a recursive call in
case of unary or binary expressions.
Targets that require parens around identifiers that start with '$'
should always pass MCAsmInfo to MCExpr::print.
Therefore 'operator<<(raw_ostream &, MCExpr&)' should be avoided
because it calls MCExpr::print with nullptr MAI.
Differential Revision: https://reviews.llvm.org/D123702
ptxas fails to parse such syntax:
mov.u64 %rd1, ($str);
fatal : Parsing error near '$str': syntax error
A new MCAsmInfo option was added because InParens parameter of
MCExpr::print is not sufficient to disable parens
completely. MCExpr::print resets it to false for a recursive call in
case of unary or binary expressions.
Differential Revision: https://reviews.llvm.org/D123702
PTX ISA spec, s5.4.8. Variable Attribute Directive: .attribute
PTX ISA Notes
Introduced in PTX ISA version 4.0.
Target ISA Notes
.managed attribute requires sm_30 or higher.
Differential Revision: https://reviews.llvm.org/D123040
PTX ISA spec, s9.7.8.6. Data Movement and Conversion Instructions:
shfl.sync
PTX ISA Notes
Introduced in PTX ISA version 6.0.
Target ISA Notes
Requires sm_30 or higher.
Differential Revision: https://reviews.llvm.org/D123039
PTX ISA spec, s9.7.12.4. Parallel Synchronization and Communication
Instructions: atom
Target ISA Notes
64-bit atom.{and,or,xor,min,max} require sm_32 or higher.
Differential Revision: https://reviews.llvm.org/D123038
By specification, source and destination of llvm.memcpy.* must either be equal or non-overlapping. This semantics is hard or impossible to figure out once lowered. This patch explicitly marks loads from source and stores to destination as not aliasing if source and destination is known to be not equal.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D118441
NVPTXTargetLowering::getFunctionParamOptimizedAlign, which was introduces in
D120129, contained a poorly designed assertion checking that a function with
internal or private linkage is not a kernel. It relied on invariants that
were not actually guaranteed, and that resulted in compiler crash with some
CUDA versions (see discussion with @jdoerfert in D120129). This patch changes
that assertion and makes it use isKernelFunction which is designed exactly for
such checks. This patch also includes a test with IR that caused compiler crash
before.
Differential Revision: https://reviews.llvm.org/D122562