Previously we pre-calculated this and cached it for every
instruction in the function. Most of the calculated results will
never be used. So instead calculate it only on the first use, and
then cache it.
The cache was originally added to fix a compile time issue which
caused r216066 to be reverted.
This change exposed that we weren't pre-computing the Value for
Arguments. I've explicitly disabled that for now as it seemed to
regress some tests on AArch64 which has sext built into its compare
instructions.
Spotted while investigating how to improve heuristics to work better
with RISCV preferring sign extend for unsigned compares for i32 on RV64.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D107976
This patch adds vector-predicated ("VP") reduction intrinsics corresponding to
each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the
unpredicated reductions, all VP reductions have a start value. This start value
is returned when the no vector element is active.
Support for expansion on targets without native vector-predication support is
included.
This patch is based on the ["reduction
slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch
(https://reviews.llvm.org/D57504).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D104308
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().
Add hasRetAttr() since it was missing from AttributeList.
We were calling find and then using operator[]. Instead keep the
iterator from find and use it to get the value.
Just happened to notice while investigating how we decide what extends
to use between basic blocks.
This patch refactors / simplifies salvageDebugInfoImpl(). The goal
here is to simplify the implementation of coro::salvageDebugInfo() in
a followup patch.
1. Change the return value to I.getOperand(0). Currently users of
salvageDebugInfoImpl() assume that the first operand is
I.getOperand(0). This patch makes this information explicit. A
nice side-effect of this change is that it allows us to salvage
expressions such as add i8 1, %a in the future.
2. Factor out the creation of a DIExpression and return an array of
DIExpression operations instead. This change allows users that
call salvageDebugInfoImpl() in a loop to avoid the costly
creation of temporary DIExpressions and to defer the creation of
a DIExpression until the end.
This patch does not change any functionality.
rdar://80227769
Differential Revision: https://reviews.llvm.org/D107383
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
This adds handling for two cases:
1. A scalable vector where the element type is promoted.
2. A scalable vector where the element count is odd (or more generally,
not divisble by the element count of the part type).
(Some element types still don't work; for example, <vscale x 2 x i128>,
or <vscale x 2 x fp128>.)
Differential Revision: https://reviews.llvm.org/D105591
This patch legalizes the Machine Value Type introduced in D94096 for loads
and stores. A new target hook named getAsmOperandValueType() is added which
maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization.
Differential Revision: https://reviews.llvm.org/D94097
This is mostly a minor convenience, but the pattern seems frequent
enough to be worthwhile (and we'll probably add more uses in the
future).
Differential Revision: https://reviews.llvm.org/D105850
This is a cleanup patch -- we're now able to support all flavours of
variable location in instruction referencing mode. This patch updates
various tests for debug instructions to be broader: numerous code paths
try to ignore debug isntructions, and they now have to ignore the
additional DBG_PHI and DBG_INSTR_REFs that we can generate.
A small amount of rework happens for LiveDebugVariables: as we don't need
to track live intervals through regalloc any more, we can get away with
unlinking debug instructions before regalloc, then re-inserting them after.
Note that this isn't (yet) true of DBG_VALUE_LISTs, they still have to go
through live interval tracking.
In SelectionDAG, add a helper lambda that emits half-formed DBG_INSTR_REFs
for arguments in instr-ref mode, DBG_VALUE otherwise. This is one of the
final locations where DBG_VALUEs are emitted for vreg arguments.
X86InstrInfo now un-sets the debug instr number on SUB instructions that
get mutated into CMP instructions. As the instruction no longer computes a
subtraction, we can't use it for variable locations.
Differential Revision: https://reviews.llvm.org/D88898
Previously we used the vector type, but we're loading/storing
invididual elements so I think only element alignment should matter.
Noticed while looking at the code for something else so I don't
have a test case.
Differential Revision: https://reviews.llvm.org/D105220
This intrinsic blocks floating point transformations by the optimizer.
Author: Pengfei
Reviewed By: LuoYuanke, Andy Kaylor, Craig Topper, kpn
Differential Revision: https://reviews.llvm.org/D99675
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.
Differential Revision: https://reviews.llvm.org/D91722
This reverts commit 386b66b2fc.
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.
This fixes https://bugs.llvm.org/show_bug.cgi?id=46996
One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)
This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
ret [ 1 x %T1 ] zeroinitializer
}
```
We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.
This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.
You get this:
```
renamable $d0 = FMOVD0
$w0 = COPY killed renamable $d0
```
Rather than this:
```
$d0 = FMOVD0
$w0 = COPY $wzr
```
The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.
This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.
Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.
New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104123
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.
This reverts commit 36ec97f76a.
The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.
Reverting to unbreak people's builds until that can be addressed.
This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d8.
Parameter positions seem like they should be unsigned.
While there, make function names lowercase per coding standards.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D103224
SwiftTailCC has a different set of requirements than the C calling convention
for a tail call. The exact argument sequence doesn't have to match, but fewer
ABI-affecting attributes are allowed.
Also make sure the musttail diagnostic triggers if a musttail call isn't
actually a tail call.
This patch adds a way for the target to configure the type it uses for
the explicit vector length operands of VP SDNodes. The type must be a
legal integer type (there is still no target-independent legalization of
this operand) and must currently be at least as big as i32, the type
used by the IR intrinsics. An implicit zero-extension takes place on
targets which choose a larger type. All VP nodes should be created with
this type used for the EVL operand.
This allows 64-bit RISC-V to avoid custom legalization of all VP nodes,
keeping them in their target-independent form for that bit longer.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D103027
Support virtual, physical and tied i128 register operands in inline assembly.
i128 is on SystemZ not really supported and is not a legal type and generally
such a value will be split into two i64 parts. There are however some
instructions that require a pair of two GPR64 registers contained in the GR128
bit reg class, which is untyped.
For inline assmebly operands, it proved to be very cumbersome to first follow
the general behavior of splitting an i128 operand into two parts and then
later rebuild the INLINEASM MI to have one GR128 register. Instead, some
minor common code changes were made to SelectionDAGBUilder to only create one
GR128 register part to begin with. In particular:
- getNumRegisters() now has an optional parameter "RegisterVT" which is
passed by AddInlineAsmOperands() and GetRegistersForValue().
- The bitcasting in GetRegistersForValue is not performed if RegVT is
Untyped.
- The RC for a tied use in AddInlineAsmOperands() is now computed either from
the tied def (virtual register), or by getMinimalPhysRegClass() (physical
register).
- InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register
class (DstRC) can also be computed for an illegal type.
In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and
joinRegisterPartsIntoValue() have been implemented to handle i128 operands.
Differential Revision: https://reviews.llvm.org/D100788
Review: Ulrich Weigand
- When memory intrinsics, such as memcpy, the attached scoped AA
metadata is not passed down to the backend. As a result, the backend
cannot schedule relevant memory operations around them following that
hint. In this patch, SelectionDAG is enhanced to propagate that
metadata (scoped AA only) when they are lowered into loads and stores.
Differential Revision: https://reviews.llvm.org/D102215
D29668 enabled to avoid a useless copy of the argument value into an alloca if the caller places it in memory (as it often happens on x86) by directly forwarding the pointer to it. This optimization is illegal if the type contains padding bytes: if a truncating store into the alloca is replaced the upper bits are filled with garbage and produce code misbehaving at runtime.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D102153
This reapplies c0f3dfb9, which was reverted following the discovery of
crashes on linux kernel and chromium builds - these issues have since
been fixed, allowing this patch to re-land.
This reverts commit 4397b7095d.
This is a step towards relying more on node-level FMF rather than function-wide
or target settings.
I think it was just an oversight that we didn't get this path in D87361
or follow-on patches.
The lack of FMF propagation is blocking D90901 from converting tests to IR-level FMF.
We can't do much more than this currently because we also fail to propagate flags
from x86-specific node to generic FMA node. That would be another patch, so the
test just verifies that we can transfer from IR to initial SDAG node.
Differential Revision: https://reviews.llvm.org/D102725
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().
A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.
The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.
This is a reland after an MSan fix in D102667.
Differential Revision: https://reviews.llvm.org/D101713
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.
Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
outside of _try region must be updated in memory (not just in register)
before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.
Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others. If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation described below.
Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.
Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.
Part-2 (will be in Part-2 patch): LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.htmlhttps://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D80344/new/
This extends any frame record created in the function to include that
parameter, passed in X22.
The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:
* 0b0000 => a normal, non-extended record with just [FP, LR]
* 0b0001 => the extended record [X22, FP, LR]
* 0b1111 => kernel space, and a non-extended record.
All other values are currently reserved.
If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.
There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
I've taken the following steps to add unwinding support from inline assembly:
1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:
```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
to label %exit unwind label %uexit
```
2.) Add Bitcode writing/reading support + LLVM-IR parsing.
3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.
4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.
5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.
6.) Don't allow unwinding callbr.
Reviewed By: Amanieu
Differential Revision: https://reviews.llvm.org/D95745
This patch extends the vector type-conversion and legalization capabilities of
scalable vector types.
Firstly, `vscale x 1` types now behave more like the corresponding `vscale x
2+` types. This enables the integer promotion legalization of extended scalable
types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`.
These `vscale x 1` types are also now better handled by
`getVectorTypeBreakdown`, where what looks like older handling for 1-element
fixed-length vector types was spuriously updated to include scalable types.
Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR`
to insert the smaller scalable vector "value" type into the wider scalable
vector "part" type. This allows AArch64 to pass and return `vscale x 1` types
by value by widening.
There are still cases where we are unable to legalize `vscale x 1` types, such
as where expansion would require splitting the vector in two.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D102073
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().
A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.
The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.
Differential Revision: https://reviews.llvm.org/D101713
Removing an assertion introduced with D68945. The
patch was later reverted with 6531a78ac4, but failed
to remove this assertion. It causes a problem while
trying to split a 64-bit argument into sub registers.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D101594
Previously we used an i32 constant to store the saturation width, but i32 isn't
legal on RISCV64. This wasn't a big deal to fix, but it is extra work for the
type legalizer.
This patch uses a VTSDNode to store the type similar to SEXT_INREG. This makes
it opaque to the type legalizer.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101262
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.
This reverts commit dad5caa59e.
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example
struct S {
__attribute__ ((__aligned__(16))) double v[4];
};
Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)
Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.
This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.
The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.
For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.
On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.
Patch by Momchil Velikov and Lucas Prates.
Differential Revision: https://reviews.llvm.org/D98794