`getFunctionParamOptimizedAlign` was being passed a null function
argument when getting the callee of a bitcasted function symbol. This is
because `CallBase::getCalledFunction` does not look through bitcasts.
There is already code to handle this case in
`NVPTXTargetLowering::getArgumentAlignment`, which is now hoisted into
an NVPTX util.
The alignment computation now gracefully handles computing alignment of
virtual functions with a check for null.
Before this patch the code in printScalarConstant was unable to handle
nested constant expressions like (gep (addrspacecast ptr)) and crashed
with:
LLVM ERROR: Unsupported expression in static initializer:
addrspacecast ([4 x i8] addrspace(1)* @ga to [4 x i8]*)
We can use lowerConstantForGV instead which is a customized version of
lowerConstant that supports generic() and nested expressions.
Differential Revision: https://reviews.llvm.org/D127878
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit. This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
The second argument of `NVPTXFrameLowering::emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB)` is the first MBB of the MF. In that function, it assumes the first MBB always contains instructions, so it gets the first instruction by MachineInstr *MI = &MBB.front();. However, with the reproducer/test case attached, all instructions in the first MBB is cleared in a previous pass for stack coloring. As a consequence, MBB.front() triggers the assertion that the first node is actually a sentinel node. Hence we are using MachineBasicBlock::iterator to iterate over MBB.
Fix#52623.
Differential Revision: https://reviews.llvm.org/D132663
This patch replaces calls to greatestCommonDivisor with std::gcd where
both arguments are known to be of unsigned. This means that
std::common_type_t of the two argument types should just be the wider
one of the two.
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.
For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.
Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.
Differential Revision: https://reviews.llvm.org/D132520
In order to convert to mulwide.s32, we compute the 2nd operand as MulWide.32 $r, (1 << 31).
(1 << 31) is interpreted as a negative number, and is not equivalent to the original instruction.
The code `int64_t r = (int64_t)a << 31;` incorrectly compiled to `mul.wide.s32 %rd7, %r1, -2147483648;`
Reviewed By: jchlanda
Differential Revision: https://reviews.llvm.org/D132516
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.
This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.
I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.
E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.
Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D117723
Today llc will crash when attempting to use non-power-of-two integer types as
function arguments or returns. This patch enables passing non standard integer
values in functions by promoting them before store and truncating after load.
The main motivation of implementing this change is that rust casts small structs
(less than pointer size) into an integer of the same size. As an example, if a
struct contains three u8 then it will be passed as an i24. This patch is a step
towards enabling rust compilation to ptx while retaining the target independent
optimizations.
More context can be found in https://github.com/llvm/llvm-project/issues/55764
Differential Revision: https://reviews.llvm.org/D129291
The original patch revealed an issue of reading incorrect values on BE hosts.
That is now changed to use `endian::read32le()` and `endian::read64le()`.
Original commit message:
The current implementation assumes that all pointers used in the
initialization of an aggregate are aligned according to the pointer size
of the target; that might not be so if the object is packed. In that
case, an array of .u8 should be used and pointers should be decorated
with the mask() operator.
The operator was introduced in PTX ISA 7.1, so an error is issued if the
case is detected for an earlier version.
Differential Revision: https://reviews.llvm.org/D127504
The current implementation assumes that all pointers used in the
initialization of an aggregate are aligned according to the pointer size
of the target; that might not be so if the object is packed. In that
case, an array of .u8 should be used and pointers should be decorated
with the mask() operator.
The operator was introduced in PTX ISA 7.1, so an error is issued if the
case is detected for an earlier version.
Differential Revision: https://reviews.llvm.org/D127504
This simplifies NVPTXAsmPrinter::AggBuffer and its usage.
It is also a preparation for D127504.
Differential Revision: https://reviews.llvm.org/D129773
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo. The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.
Recommitted with some fixes for the leftover MCII variables in release
builds.
Differential Revision: https://reviews.llvm.org/D129506
This reverts commit e2fb8c0f4b as it does
not build for Release builds, and some buildbots are giving more warning
than I saw locally. Reverting to fix those issues.
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo. The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.
Differential Revision: https://reviews.llvm.org/D129506
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
This removes the insertvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
This is very similar to the extractvalue removal from D125795.
insertvalue is also not supported in bitcode, so no auto-ugprade
is necessary.
ConstantExpr::getInsertValue() can be replaced with
IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(),
depending on whether a constant result is required (with the latter
being fallible).
The ConstantExpr::hasIndices() and ConstantExpr::getIndices()
methods also go away here, because there are no longer any constant
expressions with indices.
Differential Revision: https://reviews.llvm.org/D128719
`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code more opaque.
Differential Revision: https://reviews.llvm.org/D128121
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.
Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.
Differential Revision: https://reviews.llvm.org/D126910
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
We commonly want to create either an inbounds or non-inbounds GEP
based on a boolean value, e.g. when preserving inbounds from
existing GEPs. Directly accept such a boolean in the API, rather
than requiring a ternary between CreateGEP and CreateInBoundsGEP.
This change is not entirely NFC, because we now preserve an
inbounds flag in a constant expression edge-case in InstCombine.
A global variable may have the same name as a label, and ptxas does not accept it.
Prefix labels with $L__ to fix this.
Reviewed By: MaskRay, tra
Differential Revision: https://reviews.llvm.org/D119669
PTX supports those instructions for i64 starting from 4.3.
The patch also marks corresponding DAG nodes legal for both i32 and i64.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D124698
Default behavior for .file directory was changed in D105856, but
ptxas (CUDA 11.5 release) refuses to parse it:
$ llc -march=nvptx64 llvm/test/DebugInfo/NVPTX/debug-file-loc.ll
$ ptxas debug-file-loc.s
ptxas debug-file-loc.s, line 42; fatal : Parsing error near
'"foo.h"': syntax error
Added a new field to MCAsmInfo to control default value of
UseDwarfDirectory. This value is used if -dwarf-directory command line
option is not specified.
Differential Revision: https://reviews.llvm.org/D121299