https://reviews.llvm.org/D95745 introduced a new `unwind` keyword for inline assembler expressions. Inline asms marked with the `unwind` keyword allows stack unwinding from inline assembly because the compiler emits unwinding information ("around" the inline asm) as it would for calls/invokes. Unwinding the stack from within non-unwind inline asm may cause UB.
Reviewed By: Amanieu
Differential Revision: https://reviews.llvm.org/D102642
SwiftTailCC has a different set of requirements than the C calling convention
for a tail call. The exact argument sequence doesn't have to match, but fewer
ABI-affecting attributes are allowed.
Also make sure the musttail diagnostic triggers if a musttail call isn't
actually a tail call.
There can be a need for some optimizations to get (base, offset)
for any GC pointer. The base can be calculated by generating
needed instructions as it is done by the
RewriteStatepointsForGC::findBasePointer() function. The offset
can be calculated in the same way. Though to not expose the base
calculation and to make the offset calculation as simple as
ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal
outside RS4GC, this patch introduces 2 intrinsics:
@llvm.experimental.gc.get.pointer.base(%derived_ptr)
@llvm.experimental.gc.get.pointer.offset(%derived_ptr)
These intrinsics are inlined by RS4GC along with generation of
statepoint sequences.
With these new intrinsics the GC parseable lowering for atomic
memcpy intrinsics (6ec2c5e402)
could be implemented as a separate pass.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100445
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.
Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.
Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035
Reviewed By: vitalybuka, morehouse
Differential Revision: https://reviews.llvm.org/D102772
In the WebAssembly target, we would like to allow alloca in two address
spaces. The alloca instruction already has an address space argument,
but the verifier asserts that the address space of an alloca is the
default alloca address space from the datalayout. This patch removes
this restriction. Targets that would like to impose additional
restrictions should do so via target-specific verification passes.
Differential Revision: https://reviews.llvm.org/D101045
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.
Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
outside of _try region must be updated in memory (not just in register)
before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.
Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others. If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation described below.
Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.
Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.
Part-2 (will be in Part-2 patch): LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.htmlhttps://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D80344/new/
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".
Support is added for AArch64 and X86 for now.
This extends any frame record created in the function to include that
parameter, passed in X22.
The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:
* 0b0000 => a normal, non-extended record with just [FP, LR]
* 0b0001 => the extended record [X22, FP, LR]
* 0b1111 => kernel space, and a non-extended record.
All other values are currently reserved.
If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.
There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
The opaque pointer type is essentially just a normal pointer type with a
null pointee type.
This also adds support for the opaque pointer type to the bitcode
reader/writer, as well as to textual IR.
To avoid confusion with existing pointer types, we disallow creating a
pointer to an opaque pointer.
Opaque pointer types should not be widely used at this point since many
parts of LLVM still do not support them. The next steps are to add some
very simple use cases of opaque pointers to make sure they work, then
start pretending that all pointers are opaque pointers and see what
breaks.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150359.html
Reviewed By: dblaikie, dexonsmith, pcc
Differential Revision: https://reviews.llvm.org/D101704
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.
Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)
Also document the function attribute "frame-pointer" which is long overdue.
Differential Revision: https://reviews.llvm.org/D101016
Don't phrase the semantics in terms of the optimizer. Instead have a
more straightforward execution based semantic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D63439
This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee.
This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute".
The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work.
Differential Revision: https://reviews.llvm.org/D100141
This patch clarifies the semantics of nocapture attribute.
A 'Pointer Capture' subsection is added to describe the semantics of pointer capture first.
For the nocapture example with two same pointer arguments, it is consistent with the semantics that Alive2 used to run lit tests.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D97924
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example
struct S {
__attribute__ ((__aligned__(16))) double v[4];
};
Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)
Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.
This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.
The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.
For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.
On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.
Patch by Momchil Velikov and Lucas Prates.
Differential Revision: https://reviews.llvm.org/D98794
I think byval/sret and the others are close to being able to rip out
the code to support the missing type case. A lot of this code is
shared with inalloca, so catch this up to the others so that can
happen.
This patch adds a new llvm.experimental.stepvector intrinsic,
which takes no arguments and returns a linear integer sequence of
values of the form <0, 1, ...>. It is primarily intended for
scalable vectors, although it will work for fixed width vectors
too. It is intended that later patches will make use of this
new intrinsic when vectorising induction variables, currently only
supported for fixed width. I've added a new CreateStepVector
method to the IRBuilder, which will generate a call to this
intrinsic for scalable vectors and fall back on creating a
ConstantVector for fixed width.
For scalable vectors this intrinsic is lowered to a new ISD node
called STEP_VECTOR, which takes a single constant integer argument
as the step. During lowering this argument is set to a value of 1.
The reason for this additional argument at the codegen level is
because in future patches we will introduce various generic DAG
combines such as
mul step_vector(1), 2 -> step_vector(2)
add step_vector(1), step_vector(1) -> step_vector(2)
shl step_vector(1), 1 -> step_vector(2)
etc.
that encourage a canonical format for all targets. This hopefully
means all other targets supporting scalable vectors can benefit
from this too.
I've added cost model tests for both fixed width and scalable
vectors:
llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll
llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll
as well as codegen lowering tests for fixed width and scalable
vectors:
llvm/test/CodeGen/AArch64/neon-stepvector.ll
llvm/test/CodeGen/AArch64/sve-stepvector.ll
See this thread for discussion of the intrinsic:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html
This attribute represents the minimum and maximum values vscale can
take. For now this attribute is not hooked up to anything during
codegen, this will be added in the future when such codegen is
considered stable.
Additionally hook up the -msve-vector-bits=<x> clang option to emit this
attribute.
Differential Revision: https://reviews.llvm.org/D98030
There are a couple of caveats when it comes to how vectors are
stored to memory, and thereby also how bitcast between vector
and integer types work, in LLVM IR. Specially in relation to
endianess. This patch is an attempt to document such things.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D94964
This patch adds support for intrinsic overloading on unnamed types.
This fixes PR38117 and PR48340 and will also be needed for the Full Restrict Patches (D68484).
The main problem is that the intrinsic overloading name mangling is using 's_s' for unnamed types.
This can result in identical intrinsic mangled names for different function prototypes.
This patch changes this by adding a '.XXXXX' to the intrinsic mangled name when at least one of the types is based on an unnamed type, ensuring that we get a unique name.
Implementation details:
- The mapping is created on demand and kept in Module.
- It also checks for existing clashes and recycles potentially existing prototypes and declarations.
- Because of extra data in Module, Intrinsic::getName needs an extra Module* argument and, for speed, an optional FunctionType* argument.
- I still kept the original two-argument 'Intrinsic::getName' around which keeps the original behavior (providing the base name).
-- Main reason is that I did not want to change the LLVMIntrinsicGetName version, as I don't know how acceptable such a change is
-- The current situation already has a limitation. So that should not get worse with this patch.
- Intrinsic::getDeclaration and the verifier are now using the new version.
Other notes:
- As far as I see, this should not suffer from stability issues. The count is only added for prototypes depending on at least one anonymous struct
- The initial count starts from 0 for each intrinsic mangled name.
- In case of name clashes, existing prototypes are remembered and reused when that makes sense.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D91250
Recently we improved the lowering of low overhead loops and tail
predicated loops, but concentrated first on the DLS do style loops. This
extends those improvements over to the WLS while loops, improving the
chance of lowering them successfully. To do this the lowering has to
change a little as the instructions are terminators that produce a value
- something that needs to be treated carefully.
Lowering starts at the Hardware Loop pass, inserting a new
llvm.test.start.loop.iterations that produces both an i1 to control the
loop entry and an i32 similar to the llvm.start.loop.iterations
intrinsic added for do loops. This feeds into the loop phi, properly
gluing the values together:
%wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div)
%wls0 = extractvalue { i32, i1 } %wls, 0
%wls1 = extractvalue { i32, i1 } %wls, 1
br i1 %wls1, label %loop.ph, label %loop.exit
...
loop:
%lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ]
..
%iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1)
%cmp = icmp ne i32 %iv.next, 0
br i1 %cmp, label %loop, label %loop.exit
The llvm.test.start.loop.iterations need to be lowered through ISel
lowering as a pair of WLS and WLSSETUP nodes, which each get converted
to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent
t2WhileLoopStart from being a terminator that produces a value,
something difficult to control at that stage in the pipeline. Instead
the t2WhileLoopSetup produces the value of LR (essentially acting as a
lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc).
These are then converted into a single t2WhileLoopStartLR at the same
point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop
to prevent them from progressing further in the pipeline. The
t2WhileLoopStartLR is a single instruction that takes a GPR and produces
LR, similar to the WLS instruction.
%1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3
t2B %bb.1
...
bb.2.loop:
%2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2
...
%3:gprlr = t2LoopEndDec %2:gprlr, %bb.2
t2B %bb.3
The t2WhileLoopStartLR can then be treated similar to the other low
overhead loop pseudos, eventually being lowered to a WLS providing the
branches are within range.
Differential Revision: https://reviews.llvm.org/D97729
This patch introduces a new intrinsic @llvm.experimental.vector.splice
that constructs a vector of the same type as the two input vectors,
based on a immediate where the sign of the immediate distinguishes two
variants. A positive immediate specifies an index into the first vector
and a negative immediate specifies the number of trailing elements to
extract from the first vector.
For example:
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> ; index
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count
These intrinsics support both fixed and scalable vectors, where the
former is lowered to a shufflevector to maintain existing behaviour,
although while marked as experimental the recommended way to express
this operation for fixed-width vectors is to use shufflevector. For
scalable vectors where it is not possible to express a shufflevector
mask for this operation, a new ISD node has been implemented.
This is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].
Patch by Paul Walker and Cullen Rhodes.
[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D94708
This is a minor patch that addresses concerns about lifetime in D94002.
We need to mention that what's written in LangRef isn't everything about lifetime.start/end
and its semantics depends on the stack coloring algorithm's pattern matching of a stack pointer.
If the stack coloring algorithm cannot conclude that a pointer is a stack-allocated object, the pointer is conservatively
considered as a non-stack one because stack coloring won't take this lifetime into account while assigning addresses.
A reference from alloca to lifetime.start/end is added as well.
Differential Revision: https://reviews.llvm.org/D98112
This patch adds a new metadata node, DIArgList, which contains a list of SSA
values. This node is in many ways similar in function to the existing
ValueAsMetadata node, with the difference being that it tracks a list instead of
a single value. Internally, it uses ValueAsMetadata to track the individual
values, but there is also a reasonable amount of DIArgList-specific
value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special
case in parsing and printing due to the fact that it requires a function state
(as it may reference function-local values).
This patch should not result in any immediate functional change; it allows for
DIArgLists to be parsed and printed, but debug variable intrinsics do not yet
recognize them as a valid argument (outside of parsing).
Differential Revision: https://reviews.llvm.org/D88175
Rewrites test to use correct architecture triple; fixes incorrect
reference in SourceLevelDebugging doc; simplifies `spillReg` behaviour
so as to not be dependent on changes elsewhere in the patch stack.
This reverts commit d2000b45d0.
explicitly emitting retainRV or claimRV calls in the IR
This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.
Original commit message:
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
This patch adds a new instruction that can represent variadic debug values,
DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set
of basic code changes in MachineInstr and a few adjacent areas, but does not
correctly handle variadic debug values outside of these areas, nor does it
generate them at any point.
The new instruction is similar to the existing DBG_VALUE instruction, with the
following differences: the operands are in a different order, any number of
values may be used in the instruction following the Variable and Expression
operands (these are referred to in code as “debug operands”) and are indexed
from 0 so that getDebugOperand(X) == getOperand(X+2), and the Expression in a
DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator to pass arguments into the
expression.
The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a
DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the
index given by the argument onto the Expression stack. For example the
sub-expression `DW_OP_LLVM_arg, 0` has the meaning “Push the debug operand at
index 0 onto the expression stack.”
Differential Revision: https://reviews.llvm.org/D82363
This patch is an update to LangRef by describing lifetime intrinsics' behavior
by following the description of MIR's LIFETIME_START/LIFETIME_END markers
at StackColoring.cpp (eb44682d67/llvm/lib/CodeGen/StackColoring.cpp (L163)) and the discussion in llvm-dev.
In order to explicitly define the meaning of an object lifetime, I added 'Object Lifetime' subsection.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D94002
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.
> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
> which indicates the call is implicitly followed by a marker
> instruction and an implicit retainRV/claimRV call that consumes the
> call result. In addition, it emits a call to
> @llvm.objc.clang.arc.noop.use, which consumes the call result, to
> prevent the middle-end passes from changing the return type of the
> called function. This is currently done only when the target is arm64
> and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
> with the operand bundle in the IR and removes the inserted calls after
> processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
> operand bundle. It doesn't remove the operand bundle on the call since
> the backend needs it to emit the marker instruction. The retainRV and
> claimRV calls are emitted late in the pipeline to prevent optimization
> passes from transforming the IR in a way that makes it harder for the
> ARC middle-end passes to figure out the def-use relationship between
> the call and the retainRV/claimRV calls (which is the cause of
> PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
> nothing in the callee prevents it from being paired up with the
> retainRV/claimRV call in the caller. It then inserts a release call if
> claimRV is attached to the call since autoreleaseRV+claimRV is
> equivalent to a release. If it cannot find an autoreleaseRV call, it
> tries to transfer the operand bundle to a function call in the callee.
> This is important since the ARC optimizer can remove the autoreleaseRV
> returning the callee result, which makes it impossible to pair it up
> with the retainRV/claimRV call in the caller. If that fails, it simply
> emits a retain call in the IR if retainRV is attached to the call and
> does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
> constant value if the call has the operand bundle. This ensures the
> call always has at least one user (the call to
> @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
> multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
> calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808
This reverts commit ed4718eccb.
This patch adds a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
reversed. For example:
```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```
The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.
This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].
Patch by Paul Walker (@paulwalker-arm).
[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html
Differential Revision: https://reviews.llvm.org/D94883
In the past, it was stated in D87994 that it is allowed to dereference a pointer that is partially undefined
if all of its possible representations fit into a dereferenceable range.
The motivation of the direction was to make a range analysis helpful for assuring dereferenceability.
Even if a range analysis concludes that its offset is within bounds, the offset could still be partially undefined; to utilize the range analysis, this relaxation was necessary.
https://groups.google.com/g/llvm-dev/c/2Qk4fOHUoAE/m/KcvYMEgOAgAJ has more context about this.
However, this is currently blocking another optimization, which is annotating the noundef attribute for library functions' arguments. D95122 is the patch.
Currently, there are quite a few library functions which cannot have noundef attached to its pointer argument because it can be transformed from load/store.
For example, MemCpyOpt can convert stores into memset:
```
store p, i32 0
store (p+1), i32 0 // Since currently it is allowed for store to have partially undefined pointer..
->
memset(p, 0, 8) // memset cannot guarantee that its ptr argument is noundef.
```
A bigger problem is that this makes unclear which library functions are allowed to have 'noundef' and which functions aren't (e.g., strlen).
This makes annotating noundef almost impossible for this kind of functions.
This patch proposes that all memory operations should have well-defined pointers.
For memset/memcpy, it is semantically equivalent to running a loop until the size is met (and branching on undef is UB), so the size is also updated to be well-defined.
Strictly speaking, this again violates the implication of dereferenceability from range analysis result.
However, I think this is okay for the following reasons:
1. It seems the existing analyses in the LLVM main repo does not have conflicting implementation with the new proposal.
`isDereferenceableAndAlignedPointer` works only when the GEP offset is constant, and `isDereferenceableAndAlignedInLoop` is also fine.
2. A possible miscompilation happens only when the source has a pointer with a *partially* undefined offset (it's okay with poison because there is no 'partially poison' value).
But, at least I'm not aware of a language using LLVM as backend that has a well-defined program while allowing partially undefined pointers.
There might be such a language that I'm not aware of, but improving the performance of the mainstream languages like C and Rust is more important IMHO.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95238
explicitly emitting retainRV or claimRV calls in the IR
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
This is a follow up patch to D83136 adding the align attribute to `cmpxchg`.
See also D83465 for `atomicrmw`.
Differential Revision: https://reviews.llvm.org/D87443
emitting retainRV or claimRV calls in the IR
This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.
Original commit message:
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
emitting retainRV or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
Based on the comments in the code, the idea is that AsmPrinter is
unable to produce entry value blocks of arbitrary length, such as
DW_OP_entry_value [DW_OP_reg5 DW_OP_lit1 DW_OP_plus]. But the way the
Verifier check is written it also disallows DW_OP_entry_value
[DW_OP_reg5] DW_OP_lit1 DW_OP_plus which seems to overshoot the
target.
Note that this patch does not change any of the safety guards in
LiveDebugValues — there is zero behavior change for clang. It just
allows us to legalize more complex expressions in future patches.
rdar://73907559
Differential Revision: https://reviews.llvm.org/D95990
To set non-default rounding mode user usually calls function 'fesetround'
from standard C library. This way has some disadvantages.
* It creates unnecessary dependency on libc. On the other hand, setting
rounding mode requires few instructions and could be made by compiler.
Sometimes standard C library even is not available, like in the case of
GPU or AI cores that execute small kernels.
* Compiler could generate more effective code if it knows that a particular
call just sets rounding mode.
This change introduces new IR intrinsic, namely 'llvm.set.rounding', which
sets current rounding mode, similar to 'fesetround'. It however differs
from the latter, because it is a lower level facility:
* 'llvm.set.rounding' does not return any value, whereas 'fesetround'
returns non-zero value in the case of failure. In glibc 'fesetround'
reports failure if its argument is invalid or unsupported or if floating
point operations are unavailable on the hardware. Compiler usually knows
what core it generates code for and it can validate arguments in many
cases.
* Rounding mode is specified in 'fesetround' using constants like
'FE_TONEAREST', which are target dependent. It is inconvenient to work
with such constants at IR level.
C standard provides a target-independent way to specify rounding mode, it
is used in FLT_ROUNDS, however it does not define standard way to set
rounding mode using this encoding.
This change implements only IR intrinsic. Lowering it to machine code is
target-specific and will be implemented latter. Mapping of 'fesetround'
to 'llvm.set.rounding' is also not implemented here.
Differential Revision: https://reviews.llvm.org/D74729
Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison.
To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null.
This makes many transformations like below legal:
```
%p = gep inbounds %x, 1 ; % p is non-null pointer or poison
call void @f(%p) ; instcombine converts this to call void @f(nonnull %p)
```
Instead, this semantics makes propagation of `nonnull` to caller illegal.
The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument.
Having `noundef` attribute there re-allows this.
```
define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore
call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p.
ret void
}
```
Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90529
RISC-V would like to use a struct of scalable vectors to return multiple
values from intrinsics. This woud also be needed for target independent
intrinsics like llvm.sadd.overflow.
This patch removes the existing restriction for this. I've modified
StructType::isSized to consider a struct containing scalable vectors
as unsized so the verifier won't allow loads/stores/allocas of these
structs.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D94142
The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a noalias
scope is declared. When the intrinsic is duplicated, a decision must
also be made about the scope: depending on the reason of the duplication,
the scope might need to be duplicated as well.
Reviewed By: nikic, jdoerfert
Differential Revision: https://reviews.llvm.org/D93039
New dwarf operator DW_OP_LLVM_implicit_pointer is introduced (present only in LLVM IR)
This operator is required as it is different than DWARF operator
DW_OP_implicit_pointer in representation and specification (number
and types of operands) and later can not be used as multiple level.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D84113
This is a small patch stating that a nocapture pointer cannot be returned.
Discussed in D93189.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94386
This patch adds support for the fptoui.sat and fptosi.sat intrinsics,
which provide basically the same functionality as the existing fptoui
and fptosi instructions, but will saturate (or return 0 for NaN) on
values unrepresentable in the target type, instead of returning
poison. Related mailing list discussion can be found at:
https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ
The intrinsics have overloaded source and result type and support
vector operands:
i32 @llvm.fptoui.sat.i32.f32(float %f)
i100 @llvm.fptoui.sat.i100.f64(double %f)
<4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
// etc
On the SelectionDAG layer two new ISD opcodes are added,
FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands
and one result. The second operand is an integer constant specifying
the scalar saturation width. The idea here is that initially the
second operand and the scalar width of the result type are the same,
but they may change during type legalization. For example:
i19 @llvm.fptsi.sat.i19.f32(float %f)
// builds
i19 fp_to_sint_sat f, 19
// type legalizes (through integer result promotion)
i32 fp_to_sint_sat f, 19
I went for this approach, because saturated conversion does not
compose well. There is no good way of "adjusting" a saturating
conversion to i32 into one to i19 short of saturating twice.
Specifying the saturation width separately allows directly saturating
to the correct width.
There are two baseline expansions for the fp_to_xint_sat opcodes. If
the integer bounds can be exactly represented in the float type and
fminnum/fmaxnum are legal, we can expand to something like:
f = fmaxnum f, FP(MIN)
f = fminnum f, FP(MAX)
i = fptoxi f
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
If the bounds cannot be exactly represented, we expand to something
like this instead:
i = fptoxi f
i = select f ult FP(MIN), MIN, i
i = select f ogt FP(MAX), MAX, i
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
It should be noted that this expansion assumes a non-trapping fptoxi.
Initial tests are for AArch64, x86_64 and ARM. This exercises all of
the scalar and vector legalization. ARM is included to test float
softening.
Original patch by @nikic and @ebevhan (based on D54696).
Differential Revision: https://reviews.llvm.org/D54749
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.
This patch adds support of hot function attribute to LLVM IR. This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).
This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
isFunctionColdInCallGraph is true, this function will be marked as
cold.
The changes are:
(1) user annotated function attribute will used in setting function
section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.
The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.
Differential Revision: https://reviews.llvm.org/D92493
This commit adds two new intrinsics.
- llvm.experimental.vector.insert: used to insert a vector into another
vector starting at a given index.
- llvm.experimental.vector.extract: used to extract a subvector from a
larger vector starting from a given index.
The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D91362
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
In this patch I have added support for a new loop hint called
vectorize.scalable.enable that says whether we should enable scalable
vectorization or not. If a user wants to instruct the compiler to
vectorize a loop with scalable vectors they can now do this as
follows:
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2
...
!2 = !{!2, !3, !4}
!3 = !{!"llvm.loop.vectorize.width", i32 8}
!4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
Setting the hint to false simply reverts the behaviour back to the
default, using fixed width vectors.
Differential Revision: https://reviews.llvm.org/D88962
This is similar to the existing alloca and program address spaces (D37052)
and should be used when creating/accessing global variables.
We need this in our CHERI fork of LLVM to place all globals in address space 200.
This ensures that values are accessed using CHERI load/store instructions
instead of the normal MIPS/RISC-V ones.
The problem this is trying to fix is that most of the time the type of
globals is created using a simple PointerType::getUnqual() (or ::get() with
the default address-space value of 0). This does not work for us and we get
assertion/compilation/instruction selection failures whenever a new call
is added that uses the default value of zero.
In our fork we have removed the default parameter value of zero for most
address space arguments and use DL.getProgramAddressSpace() or
DL.getGlobalsAddressSpace() whenever possible. If this change is accepted,
I will upstream follow-up patches to use DL.getGlobalsAddressSpace() instead
of relying on the default value of 0 for PointerType::get(), etc.
This patch and the follow-up changes will not have any functional changes
for existing backends with the default globals address space of zero.
A follow-up commit will change the default globals address space for
AMDGPU to 1.
Reviewed By: dylanmckay
Differential Revision: https://reviews.llvm.org/D70947
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.
When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.
In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
advantage of `dso_local_equivalent` for relative references
This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.
See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.
Differential Revision: https://reviews.llvm.org/D77248
Clarify the semantics of GEP inbounds, in particular with regard
to what it means for wrapping. This cleans up some confusion on
when it is legal to apply nuw/nsw flags to various parts of the
GEP calculation.
Differential Revision: https://reviews.llvm.org/D90708
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.
It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.
The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.
To motivate this, consider these specific questions we would like to get answered:
* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?
Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
Reviewed By: thegameg, jdoerfert
Differential Revision: https://reviews.llvm.org/D91188
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.
This is a fairly simple change in itself, but leads to a number of other
required alterations.
- The hardware loop pass, if UsePhi is set, now generates loops of the
form:
%start = llvm.start.loop.iterations(%N)
loop:
%p = phi [%start], [%dec]
%dec = llvm.loop.decrement.reg(%p, 1)
%c = icmp ne %dec, 0
br %c, loop, exit
- For this a new llvm.start.loop.iterations intrinsic was added, identical
to llvm.set.loop.iterations but produces a value as seen above, gluing
the loop together more through def-use chains.
- This new instrinsic conceptually produces the same output as input,
which is taught to SCEV so that the checks in MVETailPredication are not
affected.
- Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
been left mostly as before. We should now more reliably be able to tell
that the t2DoLoopStart is correct without having to prove it, but
t2WhileLoopStart and tail-predicated loops will remain the same.
- And all the tests have been updated. There are a lot of them!
This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.
Differential Revision: https://reviews.llvm.org/D89881
This patch adds the llvm.loop.mustprogress loop metadata. This is to be
added to loops where the frontend language requires that the loop makes
observable interactions with the environment. This is the loop-level
equivalent to the function attribute `mustprogress` defined in D86233.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D88464
The neutral value is -0.0, not 0.0. This doesn't matter for "fast"
reductions due to nsz, but does matter for reassoc-only and seq
reductions.
Change tests to mostly use -0.0 where the neutral value was intended,
and add some additional test coverage in some places. Also update
LangRef to use the right value.
If `null_pointer_is_valid` is present, `dereferenceable` does not imply
`nonnull`, make it clear.
Came up in D17993.
Reviewed By: aqjune
Differential Revision: https://reviews.llvm.org/D89417
This change introduces a GC parseable lowering for element atomic
memcpy/memmove intrinsics. This way runtime can provide an
implementation which can take a safepoint during copy operation.
See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ
Differential Revision: https://reviews.llvm.org/D88861
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.
While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
Fixes pr/47479.
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D87956
LLVM IR currently assumes some form of forward progress. This form is
not explicitly defined anywhere, and is the cause of miscompilations
in most languages that are not C++11 or later. This implicit forward progress
guarantee can not be opted out of on a function level nor on a loop
level. Languages such as C (C11 and later), C++ (pre-C++11), and Rust
have different forward progress requirements and this needs to be
evident in the IR.
Specifically, C11 and onwards (6.8.5, Paragraph 6) states that "An
iteration statement whose controlling expression is not a constant
expression, that performs no input/output operations, does not access
volatile objects, and performs no synchronization or atomic operations
in its body, controlling expression, or (in the case of for statement)
its expression-3, may be assumed by the implementation to terminate."
C++11 and onwards does not have this assumption, and instead assumes
that every thread must make progress as defined in [intro.progress] when
it comes to scheduling.
This was initially brought up in [0] as a bug, a solution was presented
in [1] which is the current workaround, and the predecessor to this
change was [2].
After defining a notion of forward progress for IR, there are two
options to address this:
1) Set the default to assuming Forward Progress and provide an opt-out for functions and an opt-in for loops.
2) Set the default to not assuming Forward Progress and provide an opt-in for functions, and an opt-in for loops.
Option 2) has been selected because only C++11 and onwards have a
forward progress requirement and it makes sense for them to opt-into it
via the defined `mustprogress` function attribute. The `mustprogress`
function attribute indicates that the function is required to make
forward progress as defined. This is sharply in contrast to the status
quo where this is implicitly assumed. In addition, `willreturn` implies `mustprogress`.
The background for why this definition was chosen is in [3] and for why
the option was chosen is in [4] and the corresponding thread(s). The implementation is in D85393, the
clang patch is in D86841, the LoopDeletion patch is in D86844, the
Inliner patches are in D87180 and D87262, and there will be more
incoming.
[0] https://bugs.llvm.org/show_bug.cgi?id=965#c25
[1] https://lists.llvm.org/pipermail/llvm-dev/2017-October/118558.html
[2] https://reviews.llvm.org/D65718
[3] https://lists.llvm.org/pipermail/llvm-dev/2020-September/144919.html
[4] https://lists.llvm.org/pipermail/llvm-dev/2020-September/145023.html
Reviewed By: jdoerfert, efriedma, nikic
Differential Revision: https://reviews.llvm.org/D86233
The langref description for llvm.test.set.loop.iterations.* were
missing the i1 return type.
Differential Revision: https://reviews.llvm.org/D89564
Patch by: Janek van Oirschot
This patch adds metadata !noundef and makes load instructions can optionally have it.
A load with !noundef always return a well-defined value (has no undef bit or isn't poison).
If the loaded value isn't well defined, the behavior is undefined.
This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values.
It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise.
The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead.
The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D89050
LLVM rejects DWARF operator DW_OP_over. This DWARF operator is needed
for Flang to support assumed rank array.
Summary:
Currently LLVM rejects DWARF operator DW_OP_over. Below error is
produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151, 20, 16, 48, 30, 35, 80, 34, 6)
warning: ignoring invalid debug info in over.ll
[..]
There were some parts missing in support of this operator, which are
now completed.
Testing
-added a unit testcase
-check-debuginfo
-check-llvm
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89208
Add some minimal documentation for DILabel, originally introduced in
D45024. Update the name and semantics of the `variables:` field in the
documentation for `DISubprogram`; the field is now called
`retainedNodes:` and is a heterogeneous list of `DILocalVariable` and
`DILabel`.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89082
This patch adds support for DWARF attribute DW_AT_rank.
Summary:
Fortran assumed rank arrays have dynamic rank. DWARF attribute
DW_AT_rank is needed to support that.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89141
Motivated by D88183, this seeks to clarify the current loop nomenclature with added illustrations, examples for possibly unexpected situations (infinite loops not part of the "parent" loop, logical loops sharing the same header, ...), and clarification on what other sources may consider a loop. The current document also has multiple errors that are fixed here.
Some selected errors:
* Loops a defined as strongly-connected components. A component a partition of all nodes, i.e. a subloop can never be a component. That is, the document as it currently is only covers top-level loops, even it also uses the term SCC for subloops.
* "a block can be the header of two separate loops at the same time" (it is considered a single loop by LoopInfo)
* "execute before some interesting event happens" (some interesting event is not well-defined)
Reviewed By: baziotis, Whitney
Differential Revision: https://reviews.llvm.org/D88408
This reverts commit 55c4ff91bd.
Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
This is a patch to LangRef that clarifies the behavior of load/store/memset/memcpy/memmove when the pointers or sizes are not well-defined
as well.
MSan detects a case when e.g., only lower bits of address are garbage when `-msan-check-access-address` is enabled, and it does not directly conflict with this patch because a C program should not use a pointer with undef bits and reasonable optimizations do not convert a well-defined pointer into a pointer with undef bits.
This patch contains a definition of a well-defined value as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D87994
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html
This is hopefully the final remaining showstopper before we can remove
the 'experimental' from the reduction intrinsics.
No behavior was specified for the FP min/max reductions, so we have a
mess of different interpretations.
There are a few potential options for the semantics of these max/min ops.
I think this is the simplest based on current behavior/implementation:
make the reductions inherit from the existing llvm.maxnum/minnum intrinsics.
These correspond to libm fmax/fmin, and those are similar to the (now
deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing
data). So the default expansion creates calls to libm functions.
Another option would be to inherit from llvm.maximum/minimum (NaNs propagate),
but most targets just crash in codegen when given those nodes because no
default expansion was ever implemented AFAICT.
We could also just assume 'nnan' semantics by default (we are already
assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets
(AArch64, PowerPC) support the more defined behavior, so it doesn't make much
sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to
loosen the semantics.
(Note that D67507 was proposed to update the LangRef to acknowledge the more
recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do
update based on the new standard, the reduction instructions can seamlessly
inherit from whatever updates are made to the max/min intrinsics.)
x86 sees a regression here on 'nnan' tests because we have underlying,
longstanding bugs in FMF creation/propagation. Those need to be fixed apart
from this change (for example: https://llvm.org/PR35538). The expansion
sequence before this patch may not have been correct.
Differential Revision: https://reviews.llvm.org/D87391
This adjusts the description of `llvm.memcpy` to also allow operands
to be equal. This is in line with what Clang currently expects.
This change is intended to be temporary and followed by re-introduce
a variant with the non-overlapping guarantee for cases where we can
actually ensure that property in the front-end.
See the links below for more details:
http://lists.llvm.org/pipermail/cfe-dev/2020-August/066614.html
and PR11763.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D86815
The wording before this patch applies to llvm.mem.parallel_loop_access, not access groups.
Reviewed By: mppf, hfinkel
Differential Revision: https://reviews.llvm.org/D83781
This patch makes LangRef be explicit about the value of padding when storing an aggregate.
It states that when an aggregate is stored into memory, padding is filled with undef.
Here is a clue that supports this change (edited to reflect the discussion from llvm-dev):
- IPSCCP ignores padding and directly stores a constant aggregate if possible. It loses the data stored in the padding. https://godbolt.org/z/xzenYs Memcpyopt ignores (the preexisting value of) padding when copying an aggregate or storing a constant: https://godbolt.org/z/hY6ndd / https://godbolt.org/z/3WMP5a
The two items below are not relevant with this patch because Clang lowers load/store of individual field of struct into load/stores of the corresponding pointer with a primitive type. Also, when copy is needed, it uses memcpy instead of load/store of an aggregate, as discussed in the llvm-dev. However, this patch is still valid (as discussed) because it is needed to explain the two optimizations above.
- According to C17, the value of padding bytes when storing values in structures or unions is unspecified.
- I updated Alive2 and it did not find any problematic transformation from LLVM unit tests and while running translation validation of a few C programs.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D86189
We had already specified that second argument `n` of this intrinsic is `n > 0`,
but now add to this that the result is a poison value if this is not the case.
Differential Revision: https://reviews.llvm.org/D86637
According to the current LangRef, Memset/memcpy/memmove can take a
null/dangling pointer if the size is zero.
(Relevant thread: http://lists.llvm.org/pipermail/llvm-dev/2017-July/115665.html )
This patch expands it and allows the functions to take undef/poison pointers
too.
This required the updates in the align attribute since it isn't specified
what is the alignment of undef/poison pointers.
This patch states that their alignment is 1.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86643
A first version of get.active.lane.mask was committed in rG7fb8a40e5220. One of
the main purposes and uses of this intrinsic is to communicate information from
the middle-end to the back-end, but its current definition and semantics make
this actually very difficult. The intrinsic was defined as:
@llvm.get.active.lane.mask(%IV, %BTC)
where %BTC is the Backedge-Taken Count (variable names are different in the
LangRef spec). This allows to implicitly communicate the loop tripcount, which
can be reconstructed by calculating BTC + 1. But it has been very difficult to
prove that calculating BTC + 1 is safe and doesn't overflow. We need
complicated range and SCEV analysis, and thus the problem is that this
intrinsic isn't really doing what it was supposed to solve. Examples of the
overflow checks that are required in the (ARM) back-end are D79175 and D86074,
which aren't even complete/correct yet.
To solve this problem, we are revising the definitions/semantics for
get.active.lane.mask to avoid all the complicated overflow analysis. This means
that instead of communicating the BTC, we are now using the loop tripcount. Now
using LangRef's variable names, its semantics is changed from:
icmp ule (%base + i), %n
to:
icmp ult (%base + i), %n
with %n > 0 and corresponding to the loop tripcount. The intrinsic signature
remains the same.
Differential Revision: https://reviews.llvm.org/D86147
(Forgot to land this a couple of weeks back.)
In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic.
The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy.
Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones.
Differential Revision: https://reviews.llvm.org/D80892
Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.
for pointer array (before allocation/association)
without DW_AT_associated
(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_associated
(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>
for allocatable array (before allocation)
without DW_AT_allocated
(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_allocated
(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>
Testing
- unit test cases added
- check-llvm
- check-debuginfo
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83544
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.
This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.
My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.
This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.
Background:
There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.
It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.
The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.
This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.
Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.
The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.
Additional context is in D79744.
Make explicit that freeze does not touch paddings of an aggregate.
(Relevant comment: https://reviews.llvm.org/D83752#2152550)
This implies that `v = freeze(load p); store v, q` may still leave undef bits
or poison in memory if `v` is an aggregate, but it still happens for
non-byte integers such as i1.
Differential Revision: https://reviews.llvm.org/D83927
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.
This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
This changes the matrix load/store intrinsic definitions to load/store from/to
a pointer, and not from/to a pointer to a vector, as discussed in D83477.
This also includes the recommit of "[Matrix] Tighten LangRef definitions and
Verifier checks" which adds improved language reference descriptions of the
matrix intrinsics and verifier checks.
Differential Revision: https://reviews.llvm.org/D83785
Loop metadata nodes do not adhere to the documented property:
(a) LoopIDs are not unique: Any pass that duplicates IR will do it
including its metadata (e.g. LoopVersioning) such that multiple
loops are linked with the same LoopID. There is even a test case
(Transforms/LoopUnroll/unroll-pragmas-disabled.ll) for multiple
loops with the same LoopID.
(b) LoopIDs are not persistent: Adding or removing an item from a LoopID
can only be done by creating a new MDNode and assigning it to the
loop's branch(es). Passes such as LoopUnroll (llvm.loop.unroll.disable)
and LoopVectorize (llvm.loop.isvectorized) use this to mark loops to
not be transformed multiple times or to avoid that a LoopVersioned
original loop is transformed.
Update the documentation according to how llvm.loop is used in practice.
Differential Revision: https://reviews.llvm.org/D55290
This tightens the matrix intrinsic definitions in LLVM LangRef and adds
correspondings checks to the IR Verifier.
Differential Revision: https://reviews.llvm.org/D83477
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
This is only allowed when "reassoc" is present on the fadd.
As discussed in D80801, this transform goes beyond
what is allowed by "contract" FMF (-ffp-contract=fast).
That is because we are fusing the trailing add of 'E' with a
multiply, but without "reassoc", the code mandates that the
products A*B and C*D are added together before adding in 'E'.
I've added this example to the LangRef to try to clarify the
meaning of "contract". If that seems reasonable, we should
probably do something similar for the clang docs because
there does not appear to be any formal spec for the behavior
of -ffp-contract=fast.
Differential Revision: https://reviews.llvm.org/D82499
LLVM currently does not require function parameters or return values
to be fully initialized, and does not care if they are poison. This can
be useful if the frontend ABI makes no such demands, but may prevent
helpful backend transformations in case they do. Specifically, the C
and C++ languages require all scalar function operands to be fully
determined.
Introducing this attribute is of particular use to MemorySanitizer
today, although other transformations may benefit from it as well.
We can modify MemorySanitizer instrumentation to provide modest (17%)
space savings where `frozen` is present.
This commit only adds the attribute to the Language Reference, and
the actual implementation of the attribute will follow in a separate
commit.
Differential Revision: https://reviews.llvm.org/D82316
This cleans up the stack allocated by a @llvm.call.preallocated.setup.
Should either call the teardown or the preallocated call to clean up the
stack. Calling both is UB.
Add LangRef.
Add verifier check that the token argument is a @llvm.call.preallocated.setup.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D83354
Summary:
D80831 changed part of the prefix usage for AIX.
But there are other places getting prefix from DataLayout.
This patch intends to make prefix usage consistent on AIX.
Reviewed by: hubert.reinterpretcast, daltenty
Differential Revision: https://reviews.llvm.org/D81270
Every other value parameter attribute uses parentheses, so accept this
as the preferred modern syntax. Updating everything to use the new
syntax is left for a future change.
Add a new builtin-function __builtin_expect_with_probability and
intrinsic llvm.expect.with.probability.
The interface is __builtin_expect_with_probability(long expr, long
expected, double probability).
It is mainly the same as __builtin_expect besides one more argument
indicating the probability of expression equal to expected value. The
probability should be a constant floating-point expression and be in
range [0.0, 1.0] inclusive.
It is similar to builtin-expect-with-probability function in GCC
built-in functions.
Differential Revision: https://reviews.llvm.org/D79830
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
Summary:
This patch adds optional field into function summary,
implements asm and bitcode serialization. YAML
serialization is omitted and can be added later if
needed.
This patch includes this information into summary only
if module contains at least one sanitize_memtag function.
In a near future MTE is the user of the analysis.
Later if needed we can provede more direct control
on when information is included into summary.
Reviewers: eugenis
Subscribers: hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80908
Currently, gc.relocates are defined in terms of indices into the statepoint's operand list. Given the gc args are at the end of a variable length list of operands, this makes interpreting their indices by hand a tad challenging. We can simplify the statepoint sequence and improve readability quite a bit by pulling these new operands into their own named operand bundle.
This patch defines a new operand bundle tag "gc-live". The semantics of the bundle are the same as the existing gc arguments of a statepoint. This patch simply introduces the definition and codegen for the bundle, future patches will migrate RS4GC to emitting the new form.
Interestingly, with this done and the recent migration to using deopt and gc-transition bundles, we really don't have much left in the statepoint itself. It really looks like the existing ID and flags fields are redundant; we have (existing!) attributes for all of them. I think we'll be able to reduce the gc.statepoint signature to simply a wrapped call (e.g. actual target and actual arguments).
Differential Revision: https://reviews.llvm.org/D80937
Currently all code instances within the matrix lowering pass consider
matrix A to be MxN and B to be NxK, producing C which is MxK. Anyone
interacting with this API after reading the docs but without reading the pass
would expect A: MxK, B: KxN, and C: MxN. These changes bring the documentation
in line with the implementation.
One point of concern with this, the original signature as described in the docs
may be better or at least more expected. The interface as it was written
reflected other common matrix multiplication interfaces such as BLAS'[1], where
the matrices are MxK, KxN, MxN respectively. Choosing to honor this requires
changing code and tests instead, but should be mostly just renaming of variables.
Patch by Braedy Kuzma <braedy@ualberta.ca>
[1] http://www.netlib.org/lapack/explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html#gafe51bacb54592ff5de056acabd83c260
Reviewers: anemet, LuoYuanke, nicolasvasilache, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D80663
This is split off from D79100 and:
- adds a intrinsic description/definition for @llvm.get.active.lane.mask(), and
- describe its semantics in LangRef.
As described (in more detail) in its LangRef section, it is semantically
equivalent to an icmp with the vector induction variable and the back-edge
taken count, and generates a mask of active/inactive vector lanes.
It will have several use cases. First, it will be used by the
ExpandVectorPredication pass for the VP intrinsics, to expand VP intrinsics for
scalable vectors on targets that do not support the `%evl` parameter, see
D78203.
Also, this is part of, and essential for our ARM MVE tail-predication story:
- this intrinsic will be emitted by the LoopVectorizer in D79100, when
the scalar epilogue is tail-folded into the vector body. This new intrinsic
will generate the predicate for the masked loads/stores, and it takes the
back-edge taken count as an argument. The back-edge taken count represents the
number of elements processed by the loop, which we need to setup MVE
tail-predication.
- Emitting the intrinsic is controlled by a new TTI hook, see D80597.
- We pick up this new intrinsic in an ARM MVETailPredication backend pass, see
D79175, and convert it to a MVE target specific intrinsic/instruction to
create a tail-predicated loop.
Differential Revision: https://reviews.llvm.org/D80596
The HardwareLoop intrinsics were missing and not described in LangRef. This
adds these descriptions/definitions.
Differential Revision: https://reviews.llvm.org/D80316
Summary:
preallocated and musttail can work together, but we don't want to call
@llvm.call.preallocated.setup() to modify the stack in musttail calls.
So we shouldn't have the "preallocated" operand bundle when a
preallocated call is musttail.
Also disallow use of preallocated on calls without preallocated.
Codegen not yet implemented.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80581
Summary:
A struct argument can be passed-by-value to a callee via a pointer to a
temporary stack copy. Add support for emitting an entry value DBG_VALUE
when an indirect parameter DBG_VALUE becomes unavailable. This is done
by omitting DW_OP_stack_value from the entry value expression, to make
the expression describe the location of an object.
rdar://63373691
Reviewers: djtodoro, aprantl, dstenb
Subscribers: hiraditya, lldb-commits, llvm-commits
Tags: #lldb, #llvm
Differential Revision: https://reviews.llvm.org/D80345
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.
Differential Revision: https://reviews.llvm.org/D75670
This change makes minor correction to the implementation of intrinsic
`llvm.flt.rounds`:
- Added documentation entry in LangRef,
- Attributes of the intrinsic changed to be in line with other functions
dependent of floating-point environment.
Differential Revision: https://reviews.llvm.org/D79322
Summary: 'A' constraint requires an immediate int or fp constant that can be inlined in an instruction encoding.
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D78494
The "null-pointer-is-valid" attribute needs to be checked by many
pointer-related combines. To make the check more efficient, convert
it from a string into an enum attribute.
In the future, this attribute may be replaced with data layout
properties.
Differential Revision: https://reviews.llvm.org/D78862
Summary:
The BFloat IR type is introduced to provide support for, initially, the BFloat16
datatype introduced with the Armv8.6 architecture (optional from Armv8.2
onwards). It has an 8-bit exponent and a 7-bit mantissa and behaves like an IEEE
754 floating point IR type.
This is part of a patch series upstreaming Armv8.6 features. Subsequent patches
will upstream intrinsics support and C-lang support for BFloat.
Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, sdesmalen, deadalnix, ctetreau
Subscribers: hiraditya, llvm-commits, danielkiss, arphaman, kristof.beyls, dexonsmith
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78190
This patch adds support for DWARF attribute DW_AT_data_location.
Summary:
Dynamic arrays in fortran are described by array descriptor and
data allocation address. Former is mapped to DW_AT_location and
later is mapped to DW_AT_data_location.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79592
llvm rejects DWARF operator DW_OP_push_object_address.This DWARF
operator is needed for Flang to support allocatable array.
Summary:
Currently llvm rejects DWARF operator DW_OP_push_object_address.
below error is produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151)
warning: ignoring invalid debug info in pushobj.ll
[..]
There are some parts missing in support of this operator, need to
be completed.
Testing
-added a unit testcase
-check-debuginfo
-check-llvm
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79306
We want to add a way to avoid merging identical calls so as to keep the
separate debug-information for those calls. There is also an asan
usecase where having this attribute would be beneficial to avoid
alternative work-arounds.
Here is the link to the feature request:
https://bugs.llvm.org/show_bug.cgi?id=42783.
`nomerge` is different from `noline`. `noinline` prevents function from
inlining at callsites, but `nomerge` prevents multiple identical calls
from being merged into one.
This patch adds `nomerge` to disable the optimization in IR level. A
followup patch will be needed to let backend understands `nomerge` and
avoid tail merge at backend.
Reviewed By: asbirlea, rnk
Differential Revision: https://reviews.llvm.org/D78659
Linkage type was only referenced for functions, not for global
variables.
Clarify that LLVM doesn't make assumption about the allocation size when
no definitive initializer for a global variable is known.
Differential Revision: https://reviews.llvm.org/D78952
The 'nsz' flag is different than 'nnan' or 'ninf' in that it does not create poison.
Make that explicit in the LangRef and fix ValueTracking analysis that misinterpreted
the definition.
This manifests as bugs in InstSimplify shown in the test diffs and as discussed in
PR45778:
https://bugs.llvm.org/show_bug.cgi?id=45778
Differential Revision: https://reviews.llvm.org/D79422
Add llvm.call.preallocated.{setup,arg} instrinsics.
Add "preallocated" operand bundle which takes a token produced by llvm.call.preallocated.setup.
Add "preallocated" parameter attribute, which is like byval but without the copy.
Verifier changes for these IR constructs.
See https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74651
Now compiler defines 5 sets of constants to represent rounding mode.
These are:
1. `llvm::APFloatBase::roundingMode`. It specifies all 5 rounding modes
defined by IEEE-754 and is used in `APFloat` implementation.
2. `clang::LangOptions::FPRoundingModeKind`. It specifies 4 of 5 IEEE-754
rounding modes and a special value for dynamic rounding mode. It is used
in clang frontend.
3. `llvm::fp::RoundingMode`. Defines the same values as
`clang::LangOptions::FPRoundingModeKind` but in different order. It is
used to specify rounding mode in in IR and functions that operate IR.
4. Rounding mode representation used by `FLT_ROUNDS` (C11, 5.2.4.2.2p7).
Besides constants for rounding mode it also uses a special value to
indicate error. It is convenient to use in intrinsic functions, as it
represents platform-independent representation for rounding mode. In this
role it is used in some pending patches.
5. Values like `FE_DOWNWARD` and other, which specify rounding mode in
library calls `fesetround` and `fegetround`. Often they represent bits
of some control register, so they are target-dependent. The same names
(not values) and a special name `FE_DYNAMIC` are used in
`#pragma STDC FENV_ROUND`.
The first 4 sets of constants are target independent and could have the
same numerical representation. It would simplify conversion between the
representations. Also now `clang::LangOptions::FPRoundingModeKind` and
`llvm::fp::RoundingMode` do not contain the value for IEEE-754 rounding
direction `roundTiesToAway`, although it is supported natively on
some targets.
This change defines all the rounding mode type via one `llvm::RoundingMode`,
which also contains rounding mode for IEEE rounding direction `roundTiesToAway`.
Differential Revision: https://reviews.llvm.org/D77379
D72467 updated the shufflevector instruction to include a constant mask
rather than a mask operand. The LangRef text was vague enough to still
make sense, but it is better to update here too, so there's no confusion
about valid mask values. The text here is adapted from the documentation
code comments for "class ShuffleVectorInst".
Differential Revision: https://reviews.llvm.org/D77396
We already mention that `noalias` is modeled after the C99 `restrict`
qualifier but we did omit one important requirement in the description.
For the restrict guarantees the object affected has to be modified
during the execution of the function, in any way (see 6.7.3.1.4 in [0]).
There are two reasons we want this restriction as well:
1) To match the `restrict` semantics when we lower it to `noalias`.
2) To allow the reasoning that the object pointed to by a `noalias`
pointer is not modified through means not derived from this
pointer. Hence, following the uses of that pointer is sufficient
to determine potential modifications.
The discussion on this came up as part of D73428. In that patch the
Attributor is taught to derive `noalias` for call site arguments based
on alias queries against objects that are accessed in the callee. This
is possible even if the pointer passed at the call site was "not-`noalias`".
To simplify the logic there *and* to allow the use of `noalias` as
described in 2) above, it is beneficial to follow the C `restrict`
semantics in cases where there might be "read-read-aliases". Note that
AliasAnalysis* queries for read only objects already result in
`NoAlias` even if the pointers might "alias".
* From this point of view our Alias Analysis is basically a Dependence
Analysis.
[0] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D74935
Summary:
This patch clarifies the semantics of branching on undef value.
Defining `br undef` as undefined behavior explains optimizations that use branch conditions, such as CVP (D76931) and GVN (propagateEquality).
For `switch cond`, it is defined to raise UB if cond is an expression containing undef && cond is not frozen &&
it may yield different values.
This allows that at the destination block the branch condition can be assumed to be frozen already (otherwise UB was already triggered).
This condition is slightly stricter than MemorySanitizer, which allows undef-y condition if it always leads to the same destination,
but it does not break MemorySanitizer because we are giving stricter constraint.
Reviewers: efriedma, fhahn, nikic, spatel, jdoerfert, nlopes
Reviewed By: nlopes
Subscribers: regehr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76973
This is copied from the suggested text by @regehr in:
https://bugs.llvm.org/show_bug.cgi?id=20895
The way forward was not clear for several years, but now that we
have 'freeze' and Alive2, the behavior should be documented.
Also see comments in D76332.
LLVM currently supports CSK_MD5 and CSK_SHA1 source file checksums in
debug info. This change adds support for CSK_SHA256 checksums.
The SHA256 checksums are supported by the CodeView debug format.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D75785
Summary: This patch adds the basic utilities to deal with dropable uses. dropable uses are uses that we rather drop than prevent transformations, for now they are limited to uses in llvm.assume.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: uenoku, lebedev.ri, mgorny, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73404
Summary: As discussed in D75505, it's not particularly useful to change the type of a load to/from floating-point/integer because it's followed by a bitcast, and it might lead to surprising code generation. Check that this doesn't generally happen.
Reviewers: lebedev.ri
Subscribers: jkorous, dexonsmith, ributzka, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75644
Summary:
Terminators in LLVM aren't prohibited from returning values. This means that
the "callbr" instruction, which is used for "asm goto", can support "asm goto
with outputs."
This patch removes all restrictions against "callbr" returning values. The
heavy lifting is done by the code generator. The "INLINEASM_BR" instruction's
a terminator, and the code generator doesn't allow non-terminator instructions
after a terminator. In order to correctly model the feature, we need to copy
outputs from "INLINEASM_BR" into virtual registers. Of course, those copies
aren't terminators.
To get around this issue, we split the block containing the "INLINEASM_BR"
right before the "COPY" instructions. This results in two cheats:
- Any physical registers defined by "INLINEASM_BR" need to be marked as
live-in into the block with the "COPY" instructions. This violates an
assumption that physical registers aren't marked as "live-in" until after
register allocation. But it seems as if the live-in information only
needs to be correct after register allocation. So we're able to get away
with this.
- The indirect branches from the "INLINEASM_BR" are moved to the "COPY"
block. This is to satisfy PHI nodes.
I've been told that MLIR can support this handily, but until we're able to
use it, we'll have to stick with the above.
Reviewers: jyknight, nickdesaulniers, hfinkel, MaskRay, lattner
Reviewed By: nickdesaulniers, MaskRay, lattner
Subscribers: rriddle, qcolombet, jdoerfert, MatzeB, echristo, MaskRay, xbolva00, aaron.ballman, cfe-commits, JonChesterfield, hiraditya, llvm-commits, rnk, craig.topper
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D69868
Summary:
This patch adds intrinsics and ISelDAG nodes for signed
and unsigned fixed-point division:
```
llvm.sdiv.fix.sat.*
llvm.udiv.fix.sat.*
```
These intrinsics perform scaled, saturating division
on two integers or vectors of integers. They are
required for the implementation of the Embedded-C
fixed-point arithmetic in Clang.
Reviewers: bjope, leonardchan, craig.topper
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71550
Summary:
Operand bundles on an llvm.assume allows representing
assumptions that an attribute holds for a certain value at a certain position.
Operand bundles enable assumptions that are either hard or impossible to
represent as a boolean argument of an llvm.assume.
Reviewers: jdoerfert, fhahn, nlopes, reames, regehr, efriedma
Reviewed By: jdoerfert
Subscribers: lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74209
Summary:
Debug Info Version was changed to use "Max" instead of "Warning" per the
original design intent - but this maxes old/new IR unlinkable, since
mismatched merge styles are a linking failure.
It seems possible/maybe reasonable to actually support the combination
of these two flags: Warn, but then use the maximum value rather than the
first value/earlier module's value.
Reviewers: tejohnson
Differential Revision: https://reviews.llvm.org/D74257
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
Summary:
This is a follow up on D61634. It adds an LLVM IR intrinsic to allow better implementation of memcpy from C++.
A follow up CL will add the intrinsics in Clang.
Reviewers: courbet, theraven, t.p.northover, jdoerfert, tejohnson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71710
Summary: masked_load and masked_store instructions require the alignment to be specified and a power of two. It seems to me that this requirement applies to masked_gather and masked_scatter as well.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73179
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:
getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1
This can be used to propagate the value in CodeGenPrepare.
In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.
This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).
Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner
Reviewed by: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68203
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.
- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute
AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).
Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.
Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.
-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.
Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.
This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.
AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
Summary:
This patch adds intrinsics and ISelDAG nodes for
signed and unsigned fixed-point division:
llvm.sdiv.fix.*
llvm.udiv.fix.*
These intrinsics perform scaled division on two
integers or vectors of integers. They are required
for the implementation of the Embedded-C fixed-point
arithmetic in Clang.
Patch by: ebevhan
Reviewers: bjope, leonardchan, efriedma, craig.topper
Reviewed By: craig.topper
Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70007
Summary:
Remove the restrictions that preventing "asm goto" from returning non-void
values. The values returned by "asm goto" are only valid on the "fallthrough"
path.
Reviewers: jyknight, nickdesaulniers, hfinkel
Reviewed By: jyknight, nickdesaulniers
Subscribers: rsmith, hiraditya, llvm-commits, cfe-commits, craig.topper, rnk
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69876
Adds the RISC-V asm template argument modifiers currently supported by LLVM.
Additional ones supported by GCC will be added to the documentation when we
start supporting them.
Add new intrinsics
llvm.experimental.constrained.minimum
llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.
Includes SystemZ back-end support.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71624
The following intrinsics currently carry a rounding mode metadata argument:
llvm.experimental.constrained.minnum
llvm.experimental.constrained.maxnum
llvm.experimental.constrained.ceil
llvm.experimental.constrained.floor
llvm.experimental.constrained.round
llvm.experimental.constrained.trunc
This is not useful since the semantics of those intrinsics do not in any way
depend on the rounding mode. In similar cases, other constrained intrinsics
do not have the rounding mode argument. Remove it here as well.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71218
of integers to floating point.
This includes some of Craig Topper's changes for promotion support from
D71130.
Differential Revision: https://reviews.llvm.org/D69275
This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html
The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.
Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.
For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.
This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:
* Shape propagation to eliminate the embedding/splitting for each
intrinsic.
* Fused & tiled lowering of multiply and other operations.
* Optimization remarks highlighting matrix expressions and costs.
* Generate loops for operations on large matrixes.
* More general block processing for operation on large vectors,
exploiting shape information.
We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to
(1) become unwieldy for larger matrixes (even for 16x16 matrixes,
the resulting shufflevector masks would be huge),
(2) risk instcombine making small changes, causing us to fail to
detect the transpose, preventing better lowerings
For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70456
This adds support for constrained floating-point comparison intrinsics.
Specifically, we add:
declare <ty2>
@llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
declare <ty2>
@llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).
The condition code is implemented as a metadata string. The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).
These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations. Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes. The patch includes support
for the common legalization operations for those nodes.
The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).
Differential Revision: https://reviews.llvm.org/D69281
Summary:
Rework the GMIR documentation to focus more on the end user than the
implementation and tie it in to the MIR document. There was also some
out-of-date information which has been removed.
The quality of the GenericOpcode reference is highly variable and drops
sharply as I worked through them all but we've got to start somewhere :-).
It would be great if others could expand on this too as there is an awful
lot to get through.
Also fix a typo in the definition of G_FLOG. Previously, the comments said
we had two base-2's (G_FLOG and G_FLOG2).
Reviewers: aemerson, volkan, rovka, arsenm
Reviewed By: rovka
Subscribers: wdng, arphaman, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69545
This reverts commit 004ed2b0d1.
Original commit hash 6d03890384
Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.
https://reviews.llvm.org/D67723
Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.
See https://bugs.llvm.org/show_bug.cgi?id=42344
Reviewers: rnk
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D67723
Summary:
This extends the rules for when a call instruction is deemed to be an
FPMathOperator, which is based on the type of the call (i.e. the return
type of the function being called). Previously we only allowed
floating-point and vector-of-floating-point types. Now we also allow
arrays (nested to any depth) of floating-point and
vector-of-floating-point types.
This was motivated by llpc, the pipeline compiler for AMD GPUs
(https://github.com/GPUOpen-Drivers/llpc). llpc has many math library
functions that operate on vectors, typically represented as <4 x float>,
and some that operate on matrices, typically represented as
[4 x <4 x float>], and it's useful to be able to decorate calls to all
of them with fast math flags.
Reviewers: spatel, wristow, arsenm, hfinkel, aemerson, efriedma, cameron.mcinally, mcberg2017, jmolloy
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69161
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.
Original commit message:
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.
This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.
To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.
The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.
This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.
To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.
I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.
On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.
I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.
I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).
Differential revision: https://reviews.llvm.org/D63932
llvm-svn: 375094
Summary:
Internally in LLVM's metadata we use DW_OP_entry_value operations with
the same semantics as DWARF; that is, its operand specifies the number
of bytes that the entry value covers.
At the time of emitting entry values we don't know the emitted size of
the DWARF expression that the entry value will cover. Currently the size
is hardcoded to 1 in DIExpression, and other values causes the verifier
to fail. As the size is 1, that effectively means that we can only have
valid entry values for registers that can be encoded in one byte, which
are the registers with DWARF numbers 0 to 31 (as they can be encoded as
single-byte DW_OP_reg0..DW_OP_reg31 rather than a multi-byte
DW_OP_regx). It is a bit confusing, but it seems like llvm-dwarfdump
will print an operation "correctly", even if the byte size is less than
that, which may make it seem that we emit correct DWARF for registers
with DWARF numbers > 31. If you instead use readelf for such cases, it
will interpret the number of specified bytes as a DWARF expression. This
seems like a limitation in llvm-dwarfdump.
As suggested in D66746, a way forward would be to add an internal
variant of DW_OP_entry_value, DW_OP_LLVM_entry_value, whose operand
instead specifies the number of operations that the entry value covers,
and we then translate that into the byte size at the time of emission.
In this patch that internal operation is added. This patch keeps the
limitation that a entry value can only be applied to simple register
locations, but it will fix the issue with the size operand being
incorrect for DWARF numbers > 31.
Reviewers: aprantl, vsk, djtodoro, NikolaPrica
Reviewed By: aprantl
Subscribers: jyknight, fedor.sergeev, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67492
llvm-svn: 374881
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.
This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.
To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.
The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.
This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.
To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.
I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.
On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.
I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.
I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).
Differential revision: https://reviews.llvm.org/D63932
llvm-svn: 374539
When the target option GuaranteedTailCallOpt is specified, calls with
the fastcc calling convention will be transformed into tail calls if
they are in tail position. This diff adds a new calling convention,
tailcc, currently supported only on X86, which behaves the same way as
fastcc, except that the GuaranteedTailCallOpt flag does not need to
enabled in order to enable tail call optimization.
Patch by Dwight Guth <dwight.guth@runtimeverification.com>!
Reviewed By: lebedev.ri, paquette, rnk
Differential Revision: https://reviews.llvm.org/D67855
llvm-svn: 373976
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.
Reviewed by: andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by: craig.topper
Differential Revision: https://reviews.llvm.org/D64746
llvm-svn: 373900
Summary: The constraint goes up to regs d15 and q7, not d16 and q8.
Subscribers: kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68090
llvm-svn: 373228
Summary:
The list of indirect labels should ALWAYS have their blockaddresses as
argument operands to the callbr (but not necessarily the other way
around). Add an invariant that checks this.
The verifier catches a bad test case that was added recently in r368478.
I think that was a simple mistake, and the test was made less strict in
regards to the precise addresses (as those weren't specifically the
point of the test).
This invariant will be used to find a reported bug.
Link: https://www.spinics.net/lists/arm-kernel/msg753473.html
Link: https://github.com/ClangBuiltLinux/linux/issues/649
Reviewers: craig.topper, void, chandlerc
Reviewed By: void
Subscribers: ychen, lebedev.ri, javed.absar, kristof.beyls, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67196
llvm-svn: 372923
During the review of D67434, it was recommended to make fmuladd's
behavior more explicit. D67434 depends on this interpretation.
Reviewers: efriedma, jfb, reames, scanon, lebedev.ri, spatel
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D67552
llvm-svn: 372892
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917 <https://reviews.llvm.org/D61917>
As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086
The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535
But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.
The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.
Differential Revision: https://reviews.llvm.org/D67564
llvm-svn: 372878
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917
As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086
The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535
But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.
The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.
Differential Revision: https://reviews.llvm.org/D67564
llvm-svn: 372866
Summary:
Adds the following inline asm constraints for SVE:
- Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive
- Upa: SVE predicate register with full range, P0 to P15
Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin
Reviewed By: rovka
Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66524
llvm-svn: 371967