1. Emit warnings for files without symbols.
2. Add -no_warning_for_no_symbols.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D95843
The few options are niche. They solved a problem which was traditionally solved
with more shell commands (`llvm-readelf -n` fetches the Build ID. Then
`ln` is used to hard link the file to a directory derived from the Build ID.)
Due to limitation, they are no longer used by Fuchsia and they don't appear to
be used elsewhere (checked with Google Search and Debian Code Search). So delete
them without a transition period.
Announcement: https://lists.llvm.org/pipermail/llvm-dev/2021-February/148446.html
Differential Revision: https://reviews.llvm.org/D96310
This patch adds a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
reversed. For example:
```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```
The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.
This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].
Patch by Paul Walker (@paulwalker-arm).
[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html
Differential Revision: https://reviews.llvm.org/D94883
In the past, it was stated in D87994 that it is allowed to dereference a pointer that is partially undefined
if all of its possible representations fit into a dereferenceable range.
The motivation of the direction was to make a range analysis helpful for assuring dereferenceability.
Even if a range analysis concludes that its offset is within bounds, the offset could still be partially undefined; to utilize the range analysis, this relaxation was necessary.
https://groups.google.com/g/llvm-dev/c/2Qk4fOHUoAE/m/KcvYMEgOAgAJ has more context about this.
However, this is currently blocking another optimization, which is annotating the noundef attribute for library functions' arguments. D95122 is the patch.
Currently, there are quite a few library functions which cannot have noundef attached to its pointer argument because it can be transformed from load/store.
For example, MemCpyOpt can convert stores into memset:
```
store p, i32 0
store (p+1), i32 0 // Since currently it is allowed for store to have partially undefined pointer..
->
memset(p, 0, 8) // memset cannot guarantee that its ptr argument is noundef.
```
A bigger problem is that this makes unclear which library functions are allowed to have 'noundef' and which functions aren't (e.g., strlen).
This makes annotating noundef almost impossible for this kind of functions.
This patch proposes that all memory operations should have well-defined pointers.
For memset/memcpy, it is semantically equivalent to running a loop until the size is met (and branching on undef is UB), so the size is also updated to be well-defined.
Strictly speaking, this again violates the implication of dereferenceability from range analysis result.
However, I think this is okay for the following reasons:
1. It seems the existing analyses in the LLVM main repo does not have conflicting implementation with the new proposal.
`isDereferenceableAndAlignedPointer` works only when the GEP offset is constant, and `isDereferenceableAndAlignedInLoop` is also fine.
2. A possible miscompilation happens only when the source has a pointer with a *partially* undefined offset (it's okay with poison because there is no 'partially poison' value).
But, at least I'm not aware of a language using LLVM as backend that has a well-defined program while allowing partially undefined pointers.
There might be such a language that I'm not aware of, but improving the performance of the mainstream languages like C and Rust is more important IMHO.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95238
explicitly emitting retainRV or claimRV calls in the IR
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
Before, the first mention of LLVM's license on the developer policy page stated
that LLVM's license is Apache 2. This patch makes that more accurate by
mentioning the LLVM exception this first time the LLVM license is discussed on
that page, i.e. Apache-2.0 with LLVM-exception.
Technically, the correct SPDX identifier for LLVM's license is 'Apache-2.0 WITH
LLVM-exception', but I thought that writing the 'WITH' in lower case made the
paragraph easier to read without reducing clarity.
Differential Revision: https://reviews.llvm.org/D96482
This is a follow up patch to D83136 adding the align attribute to `cmpxchg`.
See also D83465 for `atomicrmw`.
Differential Revision: https://reviews.llvm.org/D87443
emitting retainRV or claimRV calls in the IR
This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.
Original commit message:
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
emitting retainRV or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
Based on the comments in the code, the idea is that AsmPrinter is
unable to produce entry value blocks of arbitrary length, such as
DW_OP_entry_value [DW_OP_reg5 DW_OP_lit1 DW_OP_plus]. But the way the
Verifier check is written it also disallows DW_OP_entry_value
[DW_OP_reg5] DW_OP_lit1 DW_OP_plus which seems to overshoot the
target.
Note that this patch does not change any of the safety guards in
LiveDebugValues — there is zero behavior change for clang. It just
allows us to legalize more complex expressions in future patches.
rdar://73907559
Differential Revision: https://reviews.llvm.org/D95990
On z/OS, other error messages are not matched correctly in lit tests.
```
EDC5121I Invalid argument.
EDC5111I Permission denied.
```
This patch adds a lit substitution to fix it.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D95808
With the new PM imminent, bugpoint will diverge from opt, meaning it may
not reproduce a crash with the same arguments passed to opt. We need to
specify alternatives to bugpoint for reducing crashes.
I looked at the rest of the document to see if anything could be
improved. Major highlights:
* Run -Xclang -disable-llvm-passes instead of -O0 for skipping IR passes
* Mention the files that clang dumps on a crash
* Remove outdated reference to `delta` and plug `creduce` instead
* Mention llvm-reduce on top of bugpoint
* Mention --print-before-all --print-module-scope
* Mention sanitizers in addition to valgrind
* Mention opt-bisect for miscompiles
Reviewed By: fhahn, MaskRay
Differential Revision: https://reviews.llvm.org/D95578
So far, it was not specified what happens with the VGPRs of inactive
lanes when functions are called. This patch explicitely mentions that
the VGPR values of inactive lanes need to be preserved for all
registers.
This describes the current behavior, as only active lanes of registers
are saved to scratch. Also, as the multi-lane nature of VGPRs is not
properly modeled, we cannot determine the live VGPRs from inactive lanes
at calls. So we cannot save them, even if we intended to do so.
Differential Revision: https://reviews.llvm.org/D95610
To set non-default rounding mode user usually calls function 'fesetround'
from standard C library. This way has some disadvantages.
* It creates unnecessary dependency on libc. On the other hand, setting
rounding mode requires few instructions and could be made by compiler.
Sometimes standard C library even is not available, like in the case of
GPU or AI cores that execute small kernels.
* Compiler could generate more effective code if it knows that a particular
call just sets rounding mode.
This change introduces new IR intrinsic, namely 'llvm.set.rounding', which
sets current rounding mode, similar to 'fesetround'. It however differs
from the latter, because it is a lower level facility:
* 'llvm.set.rounding' does not return any value, whereas 'fesetround'
returns non-zero value in the case of failure. In glibc 'fesetround'
reports failure if its argument is invalid or unsupported or if floating
point operations are unavailable on the hardware. Compiler usually knows
what core it generates code for and it can validate arguments in many
cases.
* Rounding mode is specified in 'fesetround' using constants like
'FE_TONEAREST', which are target dependent. It is inconvenient to work
with such constants at IR level.
C standard provides a target-independent way to specify rounding mode, it
is used in FLT_ROUNDS, however it does not define standard way to set
rounding mode using this encoding.
This change implements only IR intrinsic. Lowering it to machine code is
target-specific and will be implemented latter. Mapping of 'fesetround'
to 'llvm.set.rounding' is also not implemented here.
Differential Revision: https://reviews.llvm.org/D74729
On z/OS, the following error message is not matched correctly in lit tests.
```
EDC5129I No such file or directory.
```
This patch uses a lit config substitution to check for platform specific error messages.
Reviewed By: muiez, jhenderson
Differential Revision: https://reviews.llvm.org/D95246
Add LLVM to the DW_CFA_LLVM_def_aspace_cfa and
DW_CFA_LLVM_def_aspace_cfa_sf DWARF extensions.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D95640
This adds a generic opcode which communicates that a type has already been
zero-extended from a narrower type.
This is intended to be similar to AssertZext in SelectionDAG.
For example,
```
%x_was_extended:_(s64) = G_ASSERT_ZEXT %x, 16
```
Signifies that the top 48 bits of %x are known to be 0.
This is useful in cases like this:
```
define i1 @zeroext_param(i8 zeroext %x) {
%cmp = icmp ult i8 %x, -20
ret i1 %cmp
}
```
In AArch64, `%x` must use a 32-bit register, which is then truncated to a 8-bit
value.
If we know that `%x` is already zero-ed out in the relevant high bits, we can
avoid the truncate.
Currently, in GISel, this looks like this:
```
_zeroext_param:
and w8, w0, #0xff ; We don't actually need this!
cmp w8, #236
cset w0, lo
ret
```
While SDAG does not produce the truncation, since it knows that it's
unnecessary:
```
_zeroext_param:
cmp w0, #236
cset w0, lo
ret
```
This patch
- Adds G_ASSERT_ZEXT
- Adds MIRBuilder support for it
- Adds MachineVerifier support for it
- Documents it
It also puts G_ASSERT_ZEXT into its own class of "hint instruction." (There
should be a G_ASSERT_SEXT in the future, maybe a G_ASSERT_ALIGN as well.)
This allows us to skip over hints in the legalizer etc. These can then later
be selected like COPY instructions or removed.
Differential Revision: https://reviews.llvm.org/D95564
... and similarly for some other cases. This is for consistency and to
make it easier to search for mentions of a particular architecture.
Differential Revision: https://reviews.llvm.org/D95453
Commit be9f322e8dc530a56f03356aad31fa9031b27e26 moved the list of workers from
slaves.py to workers.py, but the documentation in "How To Add A Builder" was
never updated and now references a non-existing file. This fixes that.
Reviewed By: gkistanova
Differential Revision: https://reviews.llvm.org/D94886
The documentation for contributing to LLVM currently links to the section
explaining how to submit a Phabricator review using the web interface.
I believe it would be better to link to the general page for using
Phabricator instead, which explains how to sign up with Phabricator,
and also how to submit patches using either the web interface or the
command-line.
I think this is worth changing because what currently *appears* to be our
preferred way of submitting a patch (through the web interface) isn't
actually what we prefer. Indeed, patches submitted from the command-line
have more meta-data available (such as which repository the patch targets),
and also can't suffer from missing context.
Differential Revision: https://reviews.llvm.org/D94929
This puts it in alphabetical order, matching the rest of the list.
Reviewed by: MaskRay, saugustine
Differential Revision: https://reviews.llvm.org/D94481
Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison.
To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null.
This makes many transformations like below legal:
```
%p = gep inbounds %x, 1 ; % p is non-null pointer or poison
call void @f(%p) ; instcombine converts this to call void @f(nonnull %p)
```
Instead, this semantics makes propagation of `nonnull` to caller illegal.
The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument.
Having `noundef` attribute there re-allows this.
```
define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore
call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p.
ret void
}
```
Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90529
The attributes in the example are placed wrong:
They belong after the type, not after the parameter name.
Reviewed by: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D94683
RISC-V would like to use a struct of scalable vectors to return multiple
values from intrinsics. This woud also be needed for target independent
intrinsics like llvm.sadd.overflow.
This patch removes the existing restriction for this. I've modified
StructType::isSized to consider a struct containing scalable vectors
as unsized so the verifier won't allow loads/stores/allocas of these
structs.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D94142
The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a noalias
scope is declared. When the intrinsic is duplicated, a decision must
also be made about the scope: depending on the reason of the duplication,
the scope might need to be duplicated as well.
Reviewed By: nikic, jdoerfert
Differential Revision: https://reviews.llvm.org/D93039
New dwarf operator DW_OP_LLVM_implicit_pointer is introduced (present only in LLVM IR)
This operator is required as it is different than DWARF operator
DW_OP_implicit_pointer in representation and specification (number
and types of operands) and later can not be used as multiple level.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D84113
Generalize the documentation to include both, GDB and LLDB. Add a link to the interface
definition. Make a note on MCJIT's restriction to ELF. Mention the regression and bugfix
in LLDB as well as the jit-loader setting for macOS. Update the command line session to
use LLDB instead of GDB.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D90789
This is a small patch stating that a nocapture pointer cannot be returned.
Discussed in D93189.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94386
Reorder the AMDGPUUage description of the memory model code sequences
for volatile so clear that it applies independent of the nontemporal
setting.
Differential Revision: https://reviews.llvm.org/D94358
Currently SimplifyCFG drops the debug locations of 'bonus' instructions.
Such instructions are moved before the first branch. The reason for the
current behavior is that this could lead to surprising debug stepping,
if the block that's folded is dead.
In case the first branch and the instructions to be folded have the same
debug location, this shouldn't be an issue and we can keep the debug
location.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D93662
Treat a non-atomic volatile load and store as a relaxed atomic at
system scope for the address spaces accessed. This will ensure all
relevant caches will be bypassed.
A volatile atomic is not changed and still only bypasses caches upto
the level specified by the SyncScope operand.
Differential Revision: https://reviews.llvm.org/D94214
Differential Revision: https://reviews.llvm.org/D93911
This first step adds the assert statement and supports it at top level
and in record definitions. Later steps will support it in class
definitions and multiclasses.
Several `#if SANITIZER_LINUX && !SANITIZER_ANDROID` guards are replaced
with the more appropriate `#if SANITIZER_GLIBC` (the headers are glibc
extensions, not specific to Linux (i.e. if we ever support GNU/kFreeBSD
or Hurd, the guards may automatically work)).
Several `#if SANITIZER_LINUX && !SANITIZER_ANDROID` guards are refined
with `#if SANITIZER_GLIBC` (the definitions are available on Linux glibc,
but may not be available on other libc (e.g. musl) implementations).
This patch makes `ninja asan cfi lsan msan stats tsan ubsan xray` build on a musl based Linux distribution (apk install musl-libintl)
Notes about disabled interceptors for musl:
* `SANITIZER_INTERCEPT_GLOB`: musl does not implement `GLOB_ALTDIRFUNC` (GNU extension)
* Some ioctl structs and functions operating on them.
* `SANITIZER_INTERCEPT___PRINTF_CHK`: `_FORTIFY_SOURCE` functions are GNU extension
* `SANITIZER_INTERCEPT___STRNDUP`: `dlsym(RTLD_NEXT, "__strndup")` errors so a diagnostic is formed. The diagnostic uses `write` which hasn't been intercepted => SIGSEGV
* `SANITIZER_INTERCEPT_*64`: the `_LARGEFILE64_SOURCE` functions are glibc specific. musl does something like `#define pread64 pread`
* Disabled `msg_iovlen msg_controllen cmsg_len` checks: musl is conforming while many implementations (Linux/FreeBSD/NetBSD/Solaris) are non-conforming. Since we pick the glibc definition, exclude the checks for musl (incompatible sizes but compatible offsets)
Pass through LIBCXX_HAS_MUSL_LIBC to make check-msan/check-tsan able to build libc++ (https://bugs.llvm.org/show_bug.cgi?id=48618).
Many sanitizer features are available now.
```
% ninja check-asan
(known issues:
* ASAN_OPTIONS=fast_unwind_on_malloc=0 odr-violations hangs
)
...
Testing Time: 53.69s
Unsupported : 185
Passed : 512
Expectedly Failed: 1
Failed : 12
% ninja check-ubsan check-ubsan-minimal check-memprof # all passed
% ninja check-cfi
( all cross-dso/)
...
Testing Time: 8.68s
Unsupported : 264
Passed : 80
Expectedly Failed: 8
Failed : 32
% ninja check-lsan
(With GetTls (D93972), 10 failures)
Testing Time: 4.09s
Unsupported: 7
Passed : 65
Failed : 22
% ninja check-msan
(Many are due to functions not marked unsupported.)
Testing Time: 23.09s
Unsupported : 6
Passed : 764
Expectedly Failed: 2
Failed : 58
% ninja check-tsan
Testing Time: 23.21s
Unsupported : 86
Passed : 295
Expectedly Failed: 1
Failed : 25
```
Used `ASAN_OPTIONS=verbosity=2` to verify there is no unneeded interceptor.
Partly based on Jari Ronkainen's https://reviews.llvm.org/D63785#1921014
Note: we need to place `_FILE_OFFSET_BITS` above `#include "sanitizer_platform.h"` to avoid `#define __USE_FILE_OFFSET64 1` in 32-bit ARM `features.h`
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D93848
This is an enhancement to LLVM Source-Based Code Coverage in clang to track how
many times individual branch-generating conditions are taken (evaluate to TRUE)
and not taken (evaluate to FALSE). Individual conditions may comprise larger
boolean expressions using boolean logical operators. This functionality is
very similar to what is supported by GCOV except that it is very closely
anchored to the ASTs.
Differential Revision: https://reviews.llvm.org/D84467
Currently, the compiler crashes in instruction selection of global
load/stores in gfx600 due to the lack of FLAT instructions. This patch
fix the crash by selecting MUBUF instructions for global load/stores
in gfx600.
Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com>
Reviewed by: t-tye
Differential revision: https://reviews.llvm.org/D92483
Update the documentation and add a test.
Build failed: Change SIZE_MAX to std::numeric_limits<int64_t>::max().
Differential Revision: https://reviews.llvm.org/D93419
The llvm.coro.end.async intrinsic allows to specify a function that is
to be called as the last action before returning. This function will be
inlined after coroutine splitting.
This function can contain a 'musttail' call to allow for guaranteed tail
calling as the last action.
Differential Revision: https://reviews.llvm.org/D93568
Change Summary:
* Clarify that release manager can commit without code owner approval
(but are still highly encouraged to get approval).
* Clarify that there is no official release criteria.
* Document what types of changes are allowed in each release phase.
This is update is based on the RFC submitted here:
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141730.html
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D93493
Introduce CHECK modifiers that change the behavior of the CHECK
directive. Also add a LITERAL modifier for cases where matching could
end requiring escaping strings interpreted as regex where only
literal/fixed string matching is desired (making the CHECK's more
difficult to write/fragile and difficult to interpret).
This patch adds support for the fptoui.sat and fptosi.sat intrinsics,
which provide basically the same functionality as the existing fptoui
and fptosi instructions, but will saturate (or return 0 for NaN) on
values unrepresentable in the target type, instead of returning
poison. Related mailing list discussion can be found at:
https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ
The intrinsics have overloaded source and result type and support
vector operands:
i32 @llvm.fptoui.sat.i32.f32(float %f)
i100 @llvm.fptoui.sat.i100.f64(double %f)
<4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
// etc
On the SelectionDAG layer two new ISD opcodes are added,
FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands
and one result. The second operand is an integer constant specifying
the scalar saturation width. The idea here is that initially the
second operand and the scalar width of the result type are the same,
but they may change during type legalization. For example:
i19 @llvm.fptsi.sat.i19.f32(float %f)
// builds
i19 fp_to_sint_sat f, 19
// type legalizes (through integer result promotion)
i32 fp_to_sint_sat f, 19
I went for this approach, because saturated conversion does not
compose well. There is no good way of "adjusting" a saturating
conversion to i32 into one to i19 short of saturating twice.
Specifying the saturation width separately allows directly saturating
to the correct width.
There are two baseline expansions for the fp_to_xint_sat opcodes. If
the integer bounds can be exactly represented in the float type and
fminnum/fmaxnum are legal, we can expand to something like:
f = fmaxnum f, FP(MIN)
f = fminnum f, FP(MAX)
i = fptoxi f
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
If the bounds cannot be exactly represented, we expand to something
like this instead:
i = fptoxi f
i = select f ult FP(MIN), MIN, i
i = select f ogt FP(MAX), MAX, i
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
It should be noted that this expansion assumes a non-trapping fptoxi.
Initial tests are for AArch64, x86_64 and ARM. This exercises all of
the scalar and vector legalization. ARM is included to test float
softening.
Original patch by @nikic and @ebevhan (based on D54696).
Differential Revision: https://reviews.llvm.org/D54749
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.
This patch adds support of hot function attribute to LLVM IR. This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).
This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
isFunctionColdInCallGraph is true, this function will be marked as
cold.
The changes are:
(1) user annotated function attribute will used in setting function
section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.
The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.
Differential Revision: https://reviews.llvm.org/D92493
[amdgpu] Default to code object v3
v4 is not yet readily available, and doesn't appear
to be implemented in the back end
Reviewed By: t-tye, yaxunl
Differential Revision: https://reviews.llvm.org/D93258
Add mir-check-debug pass to check MIR-level debug info.
For IR-level, currently, LLVM have debugify + check-debugify to generate
and check debug IR. Much like the IR-level pass debugify, mir-debugify
inserts sequentially increasing line locations to each MachineInstr in a
Module, But there is no equivalent MIR-level check-debugify pass, So now
we support it at "mir-check-debug".
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D91595
Add mir-check-debug pass to check MIR-level debug info.
For IR-level, currently, LLVM have debugify + check-debugify to generate
and check debug IR. Much like the IR-level pass debugify, mir-debugify
inserts sequentially increasing line locations to each MachineInstr in a
Module, But there is no equivalent MIR-level check-debugify pass, So now
we support it at "mir-check-debug".
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D91595
The Linux/SystemZ platform is missing in the Getting Started guide
as platform on which LLVM is known to work.
Reviewed by: uweigand
Differential Revision: https://reviews.llvm.org/D93388
- Clarify documentation on initializing scratch.
- Rename compute_pgm_rsrc2 field for enabling scratch from
ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET to
ENABLE_PRIVATE_SEGMENT to match hardware definition.
Differential Revision: https://reviews.llvm.org/D93271
Add mir-check-debug pass to check MIR-level debug info.
For IR-level, currently, LLVM have debugify + check-debugify to generate
and check debug IR. Much like the IR-level pass debugify, mir-debugify
inserts sequentially increasing line locations to each MachineInstr in a
Module, But there is no equivalent MIR-level check-debugify pass, So now
we support it at "mir-check-debug".
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D91595
Add mir-check-debug pass to check MIR-level debug info.
For IR-level, currently, LLVM have debugify + check-debugify to generate
and check debug IR. Much like the IR-level pass debugify, mir-debugify
inserts sequentially increasing line locations to each MachineInstr in a
Module, But there is no equivalent MIR-level check-debugify pass, So now
we support it at "mir-check-debug".
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D95195
[amdgpu] Default to code object v3
v4 is not yet readily available, and doesn't appear
to be implemented in the back end
Reviewed By: t-tye
Differential Revision: https://reviews.llvm.org/D93258
- Document which processors are supported by which runtimes.
- Add missing mappings for code object V2 note records
Differential Revision: https://reviews.llvm.org/D93016
This commit adds two new intrinsics.
- llvm.experimental.vector.insert: used to insert a vector into another
vector starting at a given index.
- llvm.experimental.vector.extract: used to extract a subvector from a
larger vector starting from a given index.
The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D91362
This patch documents the MIR syntax for a number of things relevant to
debugging information:
* Trailing 'debug-location' metadata that becomes a DebugLoc,
* Variable location metadata for stack slots,
* Syntax for DBG_VALUE metainstructions,
* Syntax for DBG_INSTR_REF, including trailing instruction numbers
attached to MIR instructions.
Differential Revision: https://reviews.llvm.org/D89337
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.
This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path.
we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding.
Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context.
With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO.
A breakdown of noteworthy changes:
- Added `HybridSample` class as the abstraction perf sample including LBR stack and call stack
* Extended `PerfReader` to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple `HybridSample` are extracted
* Speed up by aggregating `HybridSample` into `AggregatedSamples`
* Added VirtualUnwinder that consumes aggregated `HybridSample` and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context. Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like `main:1 @ foo:2 @ bar`.
* Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap.
* Leveraged LLVM build-in(`SampleProfWriter`) writer to support different serialization format with no stop
- `getCanonicalFnName` for callee name and name from ELF section
- Added regression test for both unwinding and profile generation
Test Plan:
ninja & ninja check-llvm
Reviewed By: hoy, wenlei, wmi
Differential Revision: https://reviews.llvm.org/D89723
This patch adds a capability to SmallVector to decide a number of
inlined elements automatically. The policy is:
- A minimum of 1 inlined elements, with more as long as
sizeof(SmallVector<T>) <= 64.
- If sizeof(T) is "too big", then trigger a static_assert: this dodges
the more pathological cases
This is expected to systematically improve SmallVector use in the
LLVM codebase, which has historically been plagued by semi-arbitrary /
cargo culted N parameters, often leading to bad outcomes due to
excessive sizeof(SmallVector<T, N>). This default also makes
programming more convenient by avoiding edit/rebuild cycles due to
forgetting to type the N parameter.
Differential Revision: https://reviews.llvm.org/D92522
Revert "Delete llvm::is_trivially_copyable and CMake variable HAVE_STD_IS_TRIVIALLY_COPYABLE"
This reverts commit 4d4bd40b57.
This reverts commit 557b00e0af.
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.
Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D89566
In this patch I have added support for a new loop hint called
vectorize.scalable.enable that says whether we should enable scalable
vectorization or not. If a user wants to instruct the compiler to
vectorize a loop with scalable vectors they can now do this as
follows:
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2
...
!2 = !{!2, !3, !4}
!3 = !{!"llvm.loop.vectorize.width", i32 8}
!4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
Setting the hint to false simply reverts the behaviour back to the
default, using fixed width vectors.
Differential Revision: https://reviews.llvm.org/D88962
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.
Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D89566
This does the same as `--mcpu=help` but was only
documented in the user guide.
* Added a test for both options.
* Corrected the single dash in `-mcpu=help` text.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D92305
llvm-symbolizer used to use the DIA SDK for symbolization on
Windows; this patch switches to using native symbolization, which was
implemented recently.
Users can still make the symbolizer use DIA by adding the `-dia` flag
in the LLVM_SYMBOLIZER_OPTS environment variable.
Differential Revision: https://reviews.llvm.org/D91814
- Document that the kernel descriptor defined is for code object V3.
Document that it also applies to earlier code object formats for CP.
- Document the deprecated bits in kernel descriptor.
Differential Revision: https://reviews.llvm.org/D91458
This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.
This change enables disassembling the text sections to build various address maps that are potentially used by the virtual unwinder. A switch `--show-disassembly` is being added to print the disassembly code.
Like the llvm-objdump tool, this change leverages existing LLVM components to parse and disassemble ELF binary files. So far X86 is supported.
Test Plan:
ninja check-llvm
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D89712
This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.
As a starter, this change sets up an entry point by introducing PerfReader to load profiled binaries and perf traces(including perf events and perf samples). For the event, here it parses the mmap2 events from perf script to build the loader snaps, which is used to retrieve the image load address in the subsequent perf tracing parsing.
As described in llvm-profgen.rst, the tool being built aims to support multiple input perf data (preprocessed by perf script) as well as multiple input binary images. It should also support dynamic reload/unload shared objects by leveraging the loader snaps being built by this change
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D89707
This is similar to the existing alloca and program address spaces (D37052)
and should be used when creating/accessing global variables.
We need this in our CHERI fork of LLVM to place all globals in address space 200.
This ensures that values are accessed using CHERI load/store instructions
instead of the normal MIPS/RISC-V ones.
The problem this is trying to fix is that most of the time the type of
globals is created using a simple PointerType::getUnqual() (or ::get() with
the default address-space value of 0). This does not work for us and we get
assertion/compilation/instruction selection failures whenever a new call
is added that uses the default value of zero.
In our fork we have removed the default parameter value of zero for most
address space arguments and use DL.getProgramAddressSpace() or
DL.getGlobalsAddressSpace() whenever possible. If this change is accepted,
I will upstream follow-up patches to use DL.getGlobalsAddressSpace() instead
of relying on the default value of 0 for PointerType::get(), etc.
This patch and the follow-up changes will not have any functional changes
for existing backends with the default globals address space of zero.
A follow-up commit will change the default globals address space for
AMDGPU to 1.
Reviewed By: dylanmckay
Differential Revision: https://reviews.llvm.org/D70947
This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.
Differential Revision: https://reviews.llvm.org/D91157
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.
When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.
In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
advantage of `dso_local_equivalent` for relative references
This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.
See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.
Differential Revision: https://reviews.llvm.org/D77248
This patch introduces a new VPDef class, which can be used to
manage VPValues defined by recipes/VPInstructions.
The idea here is to mirror VPUser for values defined by a recipe. A
VPDef can produce either zero (e.g. a store recipe), one (most recipes)
or multiple (VPInterleaveRecipe) result VPValues.
To traverse the def-use chain from a VPDef to its users, one has to
traverse the users of all values defined by a VPDef.
VPValues now contain a pointer to their corresponding VPDef, if one
exists. To traverse the def-use chain upwards from a VPValue, we first
need to check if the VPValue is defined by a VPDef. If it does not have
a VPDef, this means we have a VPValue that is not directly defined
iniside the plan and we are done.
If we have a VPDef, it is defined inside the region by a recipe, which
is a VPUser, and the upwards def-use chain traversal continues by
traversing all its operands.
Note that we need to add an additional field to to VPVAlue to link them
to their defs. The space increase is going to be offset by being able to
remove the SubclassID field in future patches.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D90558
- In certain cases, a generic pointer could be assumed as a pointer to
the global memory space or other spaces. With a dedicated target hook
to query that address space from a given value, infer-address-space
pass could infer and propagate that to all its users.
Differential Revision: https://reviews.llvm.org/D91121
Describe in the BackEnd Developer's Guide. Instrument a few backends.
Remove an old unused timing facility. Add a null backend for timing
the parser.
Differential Revision: https://reviews.llvm.org/D91388
Clarify the semantics of GEP inbounds, in particular with regard
to what it means for wrapping. This cleans up some confusion on
when it is legal to apply nuw/nsw flags to various parts of the
GEP calculation.
Differential Revision: https://reviews.llvm.org/D90708
Use exact component name in add_ocaml_library.
Make expand_topologically compatible with new architecture.
Fix quoting in is_llvm_target_library.
Fix LLVMipo component name.
Write release note.
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.
It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.
The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.
To motivate this, consider these specific questions we would like to get answered:
* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?
Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
Reviewed By: thegameg, jdoerfert
Differential Revision: https://reviews.llvm.org/D91188
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
Adding new paragraphs under "Introducing New Components" section to
check the different levels of support we have, to help introduction of
smaller set of changes without overwhelming new collaborators and
potentially losing the contribution.
Differential Revision: D91013
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.
This is a fairly simple change in itself, but leads to a number of other
required alterations.
- The hardware loop pass, if UsePhi is set, now generates loops of the
form:
%start = llvm.start.loop.iterations(%N)
loop:
%p = phi [%start], [%dec]
%dec = llvm.loop.decrement.reg(%p, 1)
%c = icmp ne %dec, 0
br %c, loop, exit
- For this a new llvm.start.loop.iterations intrinsic was added, identical
to llvm.set.loop.iterations but produces a value as seen above, gluing
the loop together more through def-use chains.
- This new instrinsic conceptually produces the same output as input,
which is taught to SCEV so that the checks in MVETailPredication are not
affected.
- Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
been left mostly as before. We should now more reliably be able to tell
that the t2DoLoopStart is correct without having to prove it, but
t2WhileLoopStart and tail-predicated loops will remain the same.
- And all the tests have been updated. There are a lot of them!
This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.
Differential Revision: https://reviews.llvm.org/D89881
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.
Differential Revision: https://reviews.llvm.org/D88540
As discussed in the mailing list [1-4], we need a separation of support
tiers when requiring support from the whole community versus a
sub-community. Essentially, if a sub-community is active enough and
takes maintenance into their own internal costs without affecting other
parts of the community's maintenance costs, then code that is not
immediately relevant to all parts (ie. not released, actively tested,
etc) can still find its way into the LLVM main repository without major
pain points.
The main benefit is to reduce the maintenance cost that those
sub-communities have outside of LLVM (for example, in duplicating common
code, applying the same patches on top of multiple user repositories or
downstream projects).
This document outlines the components and responsibilities of the
sub-communities with regards to maintenance costs and how they affect
the rest of the community.
It also adds an addendum on removal policies, which expand the existing
"new target removal" policy into something more generic, to encompass
any piece of code, scripts or documents in the repository.
[1] http://lists.llvm.org/pipermail/llvm-dev/2020-October/146249.html
[2] http://lists.llvm.org/pipermail/llvm-dev/2020-November/146335.html
[3] http://lists.llvm.org/pipermail/llvm-dev/2020-October/146138.html
[4] http://lists.llvm.org/pipermail/llvm-dev/2020-November/146298.html
The `llvm.coro.suspend.async` intrinsic takes a function pointer as its
argument that describes how-to restore the current continuation's
context from the context argument of the continuation function. Before
we assumed that the current context can be restored by loading from the
context arguments first pointer field (`first_arg->caller_context`).
This allows for defining suspension points that reuse the current
context for example.
Also:
llvm.coro.id.async lowering: Add llvm.coro.preprare.async intrinsic
Blocks inlining until after the async coroutine was split.
Also, change the async function pointer's context size position
struct async_function_pointer {
uint32_t relative_function_pointer_to_async_impl;
uint32_t context_size;
}
And make the position of the `async context` argument configurable. The
position is specified by the `llvm.coro.id.async` intrinsic.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90783
Update the Programmer's Reference document.
Add a test. Update a couple of tests with an improved error message.
Differential Revision: https://reviews.llvm.org/D90635
This patch adds the llvm.loop.mustprogress loop metadata. This is to be
added to loops where the frontend language requires that the loop makes
observable interactions with the environment. This is the loop-level
equivalent to the function attribute `mustprogress` defined in D86233.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D88464
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Reapply with fix for memory sanitizer failure and sphinx failure.
Differential Revision: https://reviews.llvm.org/D90612
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90612
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
Only the aliases 'xzr' and 'sp' exist for the physical register x31.
The reason for wanting to remove the alias 'x31' is because it allows users
to write invalid asm that is not accepted by the GNU assembler.
Is there any objection to removing this alias? Or do we want to keep
this for compatibility with existing code that uses w31/x31?
Differential Revision: https://reviews.llvm.org/D90153
This patch mainly made the following changes:
1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.
Differential Revision: https://reviews.llvm.org/D89105
Make all of the "AMDGPU Machine Code GFX*" columns in the Memory Model
table a consistent width of 32-characters.
Best viewed with something like --word-diff
Differential Revision: https://reviews.llvm.org/D89977
Mostly NFC, but some changes are "bug fixes" rather than just e.g.
formatting changes or typo corrections.
- Fix typo "competing" -> "completing".
- Document why waintcnt is added to stores and not loads for
sequentially consistent ordering.
- Lowercase some mentions of `buffer_gl{0,1}_inv`.
- Make mentions of `*cnt(0)` consistently include the `(0)` count.
- Remove some mentions of instructions for incorrect address spaces. For
example, remove mention of `flat_load` from
`load atomic acquire workgroup global`.
- Re-flow some text to get all the target columns to fit in a
32-character wide column. Makes a future NFC patch to make these columns
both 32-character wide more straightforward.
Modified cherry-pick of patch by Tony Tye
Reviewed By: t-tye
Differential Revision: https://reviews.llvm.org/D89596
The neutral value is -0.0, not 0.0. This doesn't matter for "fast"
reductions due to nsz, but does matter for reassoc-only and seq
reductions.
Change tests to mostly use -0.0 where the neutral value was intended,
and add some additional test coverage in some places. Also update
LangRef to use the right value.
- AMDGPUUsage.rst: Correct AMD GPU DWARF address space table address
sizes which are in bits and not bytes.
- clang/.../Options.td: Improve description of AMD GPU options.
- Re-generate ClangComamndLineReference.rst from clang/.../Options.td .
Differential Revision: https://reviews.llvm.org/D90364
Add a few cross-references among TableGen documents.
Differential Revision: https://reviews.llvm.org/D90186
Add cross-references between TableGen documents.
If `null_pointer_is_valid` is present, `dereferenceable` does not imply
`nonnull`, make it clear.
Came up in D17993.
Reviewed By: aqjune
Differential Revision: https://reviews.llvm.org/D89417
--section-details/-t is a GNU readelf option that produce
an output that is an alternative to --sections.
Differential revision: https://reviews.llvm.org/D89304
Allow overriding the default set of flags used to enable UBSan when
building llvm.
This can be used to test new checks or opt out of certain checks.
Differential Revision: https://reviews.llvm.org/D89439
This change introduces a GC parseable lowering for element atomic
memcpy/memmove intrinsics. This way runtime can provide an
implementation which can take a safepoint during copy operation.
See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ
Differential Revision: https://reviews.llvm.org/D88861
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.
While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
Fixes pr/47479.
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D87956
In preparation for potential future concurrency, a FunctionPass
shouldn't modify anything at the module level that other FunctionPasses
can also modify.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89890
Recently [1], there was an upgrade to the version of buildbot being
deployed. The new setup will still work with old buildslaves but I
thought it might be a good idea to update the documentation to reflect,
that you now can use a newer buildbot version to when setting up your
worker (formely known as slave).
The upgrade from buildbot 0.8.5 to 2.8.5 went a long with a transition
to a new "worker" terminology [2] which is also reflected by this
change.
[1]: http://lists.llvm.org/pipermail/llvm-dev/2020-October/145629.html
[2]: http://docs.buildbot.net/0.9.12/manual/worker-transition.html
Reviewed By: gkistanova
Differential Revision: https://reviews.llvm.org/D89230
LLVM IR currently assumes some form of forward progress. This form is
not explicitly defined anywhere, and is the cause of miscompilations
in most languages that are not C++11 or later. This implicit forward progress
guarantee can not be opted out of on a function level nor on a loop
level. Languages such as C (C11 and later), C++ (pre-C++11), and Rust
have different forward progress requirements and this needs to be
evident in the IR.
Specifically, C11 and onwards (6.8.5, Paragraph 6) states that "An
iteration statement whose controlling expression is not a constant
expression, that performs no input/output operations, does not access
volatile objects, and performs no synchronization or atomic operations
in its body, controlling expression, or (in the case of for statement)
its expression-3, may be assumed by the implementation to terminate."
C++11 and onwards does not have this assumption, and instead assumes
that every thread must make progress as defined in [intro.progress] when
it comes to scheduling.
This was initially brought up in [0] as a bug, a solution was presented
in [1] which is the current workaround, and the predecessor to this
change was [2].
After defining a notion of forward progress for IR, there are two
options to address this:
1) Set the default to assuming Forward Progress and provide an opt-out for functions and an opt-in for loops.
2) Set the default to not assuming Forward Progress and provide an opt-in for functions, and an opt-in for loops.
Option 2) has been selected because only C++11 and onwards have a
forward progress requirement and it makes sense for them to opt-into it
via the defined `mustprogress` function attribute. The `mustprogress`
function attribute indicates that the function is required to make
forward progress as defined. This is sharply in contrast to the status
quo where this is implicitly assumed. In addition, `willreturn` implies `mustprogress`.
The background for why this definition was chosen is in [3] and for why
the option was chosen is in [4] and the corresponding thread(s). The implementation is in D85393, the
clang patch is in D86841, the LoopDeletion patch is in D86844, the
Inliner patches are in D87180 and D87262, and there will be more
incoming.
[0] https://bugs.llvm.org/show_bug.cgi?id=965#c25
[1] https://lists.llvm.org/pipermail/llvm-dev/2017-October/118558.html
[2] https://reviews.llvm.org/D65718
[3] https://lists.llvm.org/pipermail/llvm-dev/2020-September/144919.html
[4] https://lists.llvm.org/pipermail/llvm-dev/2020-September/145023.html
Reviewed By: jdoerfert, efriedma, nikic
Differential Revision: https://reviews.llvm.org/D86233
The langref description for llvm.test.set.loop.iterations.* were
missing the i1 return type.
Differential Revision: https://reviews.llvm.org/D89564
Patch by: Janek van Oirschot
This patch updates the Kaleidoscope and BuildingAJIT tutorial series (chapter
1-4) to OrcV2. Chapter 5 of the BuildingAJIT series is removed -- it will be
re-instated once we have in-tree support for out-of-process JITing.
This patch only updates the tutorial code, not the text. Patches welcome for
that, otherwise I will try to update it in a few weeks.
This patch adds metadata !noundef and makes load instructions can optionally have it.
A load with !noundef always return a well-defined value (has no undef bit or isn't poison).
If the loaded value isn't well defined, the behavior is undefined.
This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values.
It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise.
The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead.
The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D89050
LLVM rejects DWARF operator DW_OP_over. This DWARF operator is needed
for Flang to support assumed rank array.
Summary:
Currently LLVM rejects DWARF operator DW_OP_over. Below error is
produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151, 20, 16, 48, 30, 35, 80, 34, 6)
warning: ignoring invalid debug info in over.ll
[..]
There were some parts missing in support of this operator, which are
now completed.
Testing
-added a unit testcase
-check-debuginfo
-check-llvm
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89208
The prefix given to --prefix will be added to GNU absolute paths when
used with --source option (source interleaved with the disassembly).
This matches GNU's objdump behavior.
GNU and C++17 rules for absolute paths are different.
Differential Revision: https://reviews.llvm.org/D85024
Fixes PR46368.
Differential Revision: https://reviews.llvm.org/D85024
Add some minimal documentation for DILabel, originally introduced in
D45024. Update the name and semantics of the `variables:` field in the
documentation for `DISubprogram`; the field is now called
`retainedNodes:` and is a heterogeneous list of `DILocalVariable` and
`DILabel`.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89082
Following up on the discussion within the group during the roundtable at
the 2020 LLVM Developers Meeting, this commit adds to the security docs:
* How long we expect acknowledging security reports will take
* The escalation path the reporter can follow if they get no response
A temporary line inviting reporters to directly follow the escalation
path while the mailing list is being setup is also added.
Differential Revision: https://reviews.llvm.org/D89068
Resigning from security group as Azul representative as I have left Azul. Previously communicated via email with security group.
Differential Revision: https://reviews.llvm.org/D88933
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:
* The "Oland" and "Hainan" variants of gfx601 are now split out into
gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
that to avoid using the shaderZExport workaround on gfx602.
* One variant of gfx703 is now split out into gfx705. LLPC and other
front-ends could use that to avoid using the
shaderSpiCsRegAllocFragmentation workaround on gfx705.
* The "TongaPro" variant of gfx802 is now split out into gfx805.
TongaPro has a faster 64-bit shift than its former friends in gfx802,
and a subtarget feature could be set up for that to take advantage of
it. This commit does not make that change; it just adds the target.
V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
so fix the GPUKind order.
Differential Revision: https://reviews.llvm.org/D88916
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
This patch adds support for DWARF attribute DW_AT_rank.
Summary:
Fortran assumed rank arrays have dynamic rank. DWARF attribute
DW_AT_rank is needed to support that.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89141
This patch introduce files that just enough for lib/Target/CSKY to compile.
Notably a basic CSKYTargetMachine and CSKYTargetInfo.
Differential Revision: https://reviews.llvm.org/D88466
This patch lets the bb_addr_map (renamed to __llvm_bb_addr_map) section use a special section type (SHT_LLVM_BB_ADDR_MAP) instead of SHT_PROGBITS. This would help parsers, dumpers and other tools to use the sh_type ELF field to identify this section rather than relying on string comparison on the section name.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D88199
Have the build work out of the box by forcing an LLD build.
That way, we don't require an external LTO-aware linker,
as we build one.
Also remove reference to the seemingly dead builder.
Differential Revision: https://reviews.llvm.org/D88990
The section on SmallVector has a note about preferring SmallVectorImpl
for APIs but doesn't mention ArrayRef. Although ArrayRef is discussed
elsewhere, let's re-emphasize here.
Differential Revision: https://reviews.llvm.org/D49881
Motivated by D88183, this seeks to clarify the current loop nomenclature with added illustrations, examples for possibly unexpected situations (infinite loops not part of the "parent" loop, logical loops sharing the same header, ...), and clarification on what other sources may consider a loop. The current document also has multiple errors that are fixed here.
Some selected errors:
* Loops a defined as strongly-connected components. A component a partition of all nodes, i.e. a subloop can never be a component. That is, the document as it currently is only covers top-level loops, even it also uses the term SCC for subloops.
* "a block can be the header of two separate loops at the same time" (it is considered a single loop by LoopInfo)
* "execute before some interesting event happens" (some interesting event is not well-defined)
Reviewed By: baziotis, Whitney
Differential Revision: https://reviews.llvm.org/D88408
This reverts commit 55c4ff91bd.
Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
This is a patch to LangRef that clarifies the behavior of load/store/memset/memcpy/memmove when the pointers or sizes are not well-defined
as well.
MSan detects a case when e.g., only lower bits of address are garbage when `-msan-check-access-address` is enabled, and it does not directly conflict with this patch because a C program should not use a pointer with undef bits and reasonable optimizations do not convert a well-defined pointer into a pointer with undef bits.
This patch contains a definition of a well-defined value as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D87994
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
Add the ability to selectively instrument a subset of functions by dividing the functions into N logical groups and then selecting a group to cover. By selecting different groups over time you could cover the entire application incrementally with lower overhead than instrumenting the entire application at once.
Differential Revision: https://reviews.llvm.org/D87953
I don't have commit access.
Please help me commit it.
Thanks : )
Reviewed By: Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D88139
This refactors VPuser to not inherit from VPValue to facilitate
introducing operations that introduce multiple VPValues (e.g.
VPInterleaveRecipe).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D84679
The rendered html was (no hyperlink was generated):
(see Getting Started <GettingStarted.html#git-pre-push-hook>)
Now, it is (with proper hyperlink):
(see Git pre-push hook)
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D88116
Updated file paths and function signatures in section
"Adding a new type".
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D88049
As to not conflict with the legacy PM example passes under
llvm/lib/Transforms/Hello, this is under HelloNew. This makes the
CMakeLists.txt and general directory structure less confusing for people
following the example.
Much of the doc structure was taken from WritinAnLLVMPass.rst.
This adds a HelloWorld pass which simply prints out each function name.
More will follow after this, e.g. passes over different units of IR, analyses.
https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more.
Relanded with missing "Support" dependency in LLVMBuild.txt.
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D86979
As to not conflict with the legacy PM example passes under
llvm/lib/Transforms/Hello, this is under HelloNew. This makes the
CMakeLists.txt and general directory structure less confusing for people
following the example.
Much of the doc structure was taken from WritinAnLLVMPass.rst.
This adds a HelloWorld pass which simply prints out each function name.
More will follow after this, e.g. passes over different units of IR, analyses.
https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more.
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D86979
Document the `ento` namespace in the Lexicon according to @nicolas17 on the
mailing list (http://lists.llvm.org/pipermail/cfe-dev/2020-August/066577.html).
The analyzer lived at different namespaces at different times.
Originally lived at the `GR` aka. (Graph Reachability) namespace [7], later it
moved under the `ento` namespace [9].
The Static Analyzer's code lived at many other places as well:
`Analysis` -[2]-> `Checker` -[5]-> `GR` -[10]> `entoSA` -[11]-> `StaticAnalyzer`
The relevant code motion, refactor commits, cfe-dev mailing in chronological
order:
1) 2008-03-15 Make a major restructuring of the clang tree: introduce a ...
7a51313d8a
2) 2010-01-25 Split libAnalysis into two libraries: libAnalysis and libChecker
d6b8708643
3) 2010-12-21 Reorganization of Checker files
http://lists.llvm.org/pipermail/cfe-dev/2010-December/012694.html
4) 2010-12-22 Refactoring: include/clang/Checker -> include/clang/GR
8d602a8aa8
5) 2010-12-22 Refactoring: lib/Checker -> lib/GR
2ff5ab1516
6) 2010-12-22 Refactoring: Move checkers into lib/GR/Checkers and their own
a700e976b6
7) 2010-12-22 Refactoring: Move stuff into namespace 'GR'
ca08fba414
8) 2010-12-22 Refactoring: Drop the 'GR' prefix.
1696f508e2
9) 2010-12-23 Rename static analyzer namespace 'GR' to 'ento'
98857c9860
10) 2010-12-23 Rename headers: 'clang/GR' 'clang/EntoSA' and update Makefile
ef33f0996c
11) 2010-12-23 Chris Lattner has strong opinions about directory
d99bd55a5e
12) 2010-12-24 Remove the EntoSA directories.
9d6af5328e
Reviewed By: Szelethus,martong,ASDenysPetrov,xazax.hun
Differential Revision: https://reviews.llvm.org/D86446
Add `LLVM_EXTERNALIZE_DEBUGINFO` to CMake.rst. This should help make dSYM
generation more discoverable.
Differential Revision: https://reviews.llvm.org/D87591
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html
This is hopefully the final remaining showstopper before we can remove
the 'experimental' from the reduction intrinsics.
No behavior was specified for the FP min/max reductions, so we have a
mess of different interpretations.
There are a few potential options for the semantics of these max/min ops.
I think this is the simplest based on current behavior/implementation:
make the reductions inherit from the existing llvm.maxnum/minnum intrinsics.
These correspond to libm fmax/fmin, and those are similar to the (now
deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing
data). So the default expansion creates calls to libm functions.
Another option would be to inherit from llvm.maximum/minimum (NaNs propagate),
but most targets just crash in codegen when given those nodes because no
default expansion was ever implemented AFAICT.
We could also just assume 'nnan' semantics by default (we are already
assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets
(AArch64, PowerPC) support the more defined behavior, so it doesn't make much
sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to
loosen the semantics.
(Note that D67507 was proposed to update the LangRef to acknowledge the more
recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do
update based on the new standard, the reduction instructions can seamlessly
inherit from whatever updates are made to the max/min intrinsics.)
x86 sees a regression here on 'nnan' tests because we have underlying,
longstanding bugs in FMF creation/propagation. Those need to be fixed apart
from this change (for example: https://llvm.org/PR35538). The expansion
sequence before this patch may not have been correct.
Differential Revision: https://reviews.llvm.org/D87391
This adjusts the description of `llvm.memcpy` to also allow operands
to be equal. This is in line with what Clang currently expects.
This change is intended to be temporary and followed by re-introduce
a variant with the non-overlapping guarantee for cases where we can
actually ensure that property in the front-end.
See the links below for more details:
http://lists.llvm.org/pipermail/cfe-dev/2020-August/066614.html
and PR11763.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D86815
Propose Ahmed as a replacement. He's fixed many security issues in LLVM for Apple in the last few years, as such he'll fit the "Individual contributors" description.
Differential Revision: https://reviews.llvm.org/D86742
The wording before this patch applies to llvm.mem.parallel_loop_access, not access groups.
Reviewed By: mppf, hfinkel
Differential Revision: https://reviews.llvm.org/D83781
Add printf-style precision specifier to pad numbers to a given number of
digits when matching them if the value is smaller than the given
precision. This works on both empty numeric expression (e.g. variable
definition from input) and when matching a numeric expression. The
syntax is as follows:
[[#%.<precision><format specifier>, ...]
where <format specifier> is optional and ... can be a variable
definition or not with an empty expression or not. In the absence of a
precision specifier, a variable definition will accept leading zeros.
Reviewed By: jhenderson, grimar
Differential Revision: https://reviews.llvm.org/D81667
This patch makes LangRef be explicit about the value of padding when storing an aggregate.
It states that when an aggregate is stored into memory, padding is filled with undef.
Here is a clue that supports this change (edited to reflect the discussion from llvm-dev):
- IPSCCP ignores padding and directly stores a constant aggregate if possible. It loses the data stored in the padding. https://godbolt.org/z/xzenYs Memcpyopt ignores (the preexisting value of) padding when copying an aggregate or storing a constant: https://godbolt.org/z/hY6ndd / https://godbolt.org/z/3WMP5a
The two items below are not relevant with this patch because Clang lowers load/store of individual field of struct into load/stores of the corresponding pointer with a primitive type. Also, when copy is needed, it uses memcpy instead of load/store of an aggregate, as discussed in the llvm-dev. However, this patch is still valid (as discussed) because it is needed to explain the two optimizations above.
- According to C17, the value of padding bytes when storing values in structures or unions is unspecified.
- I updated Alive2 and it did not find any problematic transformation from LLVM unit tests and while running translation validation of a few C programs.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D86189
It's not undefined behavior for an unsigned left shift to overflow (i.e. to
shift bits out), but it has been the source of bugs and exploits in certain
codebases in the past. As we do in other parts of UBSan, this patch adds a
dynamic checker which acts beyond UBSan and checks other sources of errors. The
option is enabled as part of -fsanitize=integer.
The flag is named: -fsanitize=unsigned-shift-base
This matches shift-base and shift-exponent flags.
<rdar://problem/46129047>
Differential Revision: https://reviews.llvm.org/D86000
This patch optionally replaces the CRT allocator (i.e., malloc and free) with rpmalloc (mixed public domain licence/MIT licence) or snmalloc (MIT licence) or mimalloc (MIT licence). Please note that the source code for these allocators must be available outside of LLVM's tree.
To enable, use `cmake ... -DLLVM_INTEGRATED_CRT_ALLOC=D:/git/rpmalloc -DLLVM_USE_CRT_RELEASE=MT` where `D:/git/rpmalloc` has already been git clone'd from `https://github.com/mjansson/rpmalloc`. The same applies to snmalloc and mimalloc.
When enabled, the allocator will be embeded (statically linked) into the LLVM tools & libraries. This currently only works with the static CRT (/MT), although using the dynamic CRT (/MD) could potentially work as well in the future.
When enabled, this changes the memory stack from:
new/delete -> MS VC++ CRT malloc/free -> HeapAlloc -> VirtualAlloc
to:
new/delete -> {rpmalloc|snmalloc|mimalloc} -> VirtualAlloc
The goal of this patch is to bypass the application's global heap - which is thread-safe thus inducing locking - and instead take advantage of a modern lock-free, thread cache, allocator. On a 6-core Xeon Skylake we observe a 2.5x decrease in execution time when linking a large scale application with LLD and ThinLTO (12 min 20 sec -> 5 min 34 sec), when all hardware threads are being used (using LLD's flag /opt:lldltojobs=all). On a dual 36-core Xeon Skylake with all hardware threads used, we observe a 24x decrease in execution time (1 h 2 min -> 2 min 38 sec) when linking a large application with LLD and ThinLTO. Clang build times also see a decrease in the range 5-10% depending on the configuration.
Differential Revision: https://reviews.llvm.org/D71786
We had already specified that second argument `n` of this intrinsic is `n > 0`,
but now add to this that the result is a poison value if this is not the case.
Differential Revision: https://reviews.llvm.org/D86637
As discussed in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html.
Currently no users outside of unit tests.
Replace all instances in tests of -constprop with -instsimplify.
Notable changes in tests:
* vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead
* InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef
llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp
Reviewed By: lattner, nikic
Differential Revision: https://reviews.llvm.org/D85159
According to the current LangRef, Memset/memcpy/memmove can take a
null/dangling pointer if the size is zero.
(Relevant thread: http://lists.llvm.org/pipermail/llvm-dev/2017-July/115665.html )
This patch expands it and allows the functions to take undef/poison pointers
too.
This required the updates in the align attribute since it isn't specified
what is the alignment of undef/poison pointers.
This patch states that their alignment is 1.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86643
This is an older syntax than the {disp32} and {disp8} pseudo
prefixes that were added a few weeks ago. We can reuse most of
the support for that to support .d32 and .d8 as well.
A first version of get.active.lane.mask was committed in rG7fb8a40e5220. One of
the main purposes and uses of this intrinsic is to communicate information from
the middle-end to the back-end, but its current definition and semantics make
this actually very difficult. The intrinsic was defined as:
@llvm.get.active.lane.mask(%IV, %BTC)
where %BTC is the Backedge-Taken Count (variable names are different in the
LangRef spec). This allows to implicitly communicate the loop tripcount, which
can be reconstructed by calculating BTC + 1. But it has been very difficult to
prove that calculating BTC + 1 is safe and doesn't overflow. We need
complicated range and SCEV analysis, and thus the problem is that this
intrinsic isn't really doing what it was supposed to solve. Examples of the
overflow checks that are required in the (ARM) back-end are D79175 and D86074,
which aren't even complete/correct yet.
To solve this problem, we are revising the definitions/semantics for
get.active.lane.mask to avoid all the complicated overflow analysis. This means
that instead of communicating the BTC, we are now using the loop tripcount. Now
using LangRef's variable names, its semantics is changed from:
icmp ule (%base + i), %n
to:
icmp ult (%base + i), %n
with %n > 0 and corresponding to the loop tripcount. The intrinsic signature
remains the same.
Differential Revision: https://reviews.llvm.org/D86147
This patch adds support for representing Fortran `character(n)`.
Primarily patch is based out of D54114 with appropriate modifications.
Test case IR is generated using our downstream classic-flang. We're in process
of upstreaming flang PR's but classic-flang has dependencies on llvm, so
this has to get in first.
Patch includes functional test case for both IR and corresponding
dwarf, furthermore it has been manually tested as well using GDB.
Source snippet:
```
program assumedLength
call sub('Hello')
call sub('Goodbye')
contains
subroutine sub(string)
implicit none
character(len=*), intent(in) :: string
print *, string
end subroutine sub
end program assumedLength
```
GDB:
```
(gdb) ptype string
type = character (5)
(gdb) p string
$1 = 'Hello'
```
Reviewed By: aprantl, schweitz
Differential Revision: https://reviews.llvm.org/D86305
The TableGen range piece punctuator is currently '-' (e.g., {0-9}),
which interacts oddly with the fact that an integer literal's sign
is part of the literal. This patch replaces the '-' with the new
punctuator '...'. The '-' punctuator is deprecated.
Differential Revision: https://reviews.llvm.org/D85585
Change-Id: I3d53d14e23f878b142d8f84590dd465a0fb6c09c
This new TableGen Programmer's Reference document replaces the current Language Introduction and Language Reference documents. It brings all the TableGen reference information into one document.
As an experiment, I numbered the sections in the document. See what you think about that.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D85838
(changes by Nicolai Hähnle <nicolai.haehnle@amd.com>:
- fixed build error due to toctree in docs/LangRef/index.rst
- fixed reference to ProgRef)
Change-Id: Ifbdfa39768b8a460aae2873103d31c7b347aff00
Summary of changes:
- added description of MTBUF instructions and format modifier;
- described limitations of f16 inline constants when used with integer operands;
- updated description of gfx9+ flat global addressing modes;
- v_accvgpr_write_b32 src0 corrections (gfx908);
- minor bugfixing and improvements.
- Rename AMDGPU SCC DWARF register to STATUS since the scalar
condition code is a bit within the STATUS register.
- Correct bit size of the VCC_64 register to 64 which is the size in
wave64 mode.
Differential Revision: https://reviews.llvm.org/D86259
When diffing disassembly dump of two binaries, I see lots of noises from mismatched jump target addresses and global data references, which unnecessarily causes diffs on every function, making it impractical. I'm trying to symbolize the raw binary addresses to minimize the diff noise.
In this change, a local branch target is modeled as a label and the branch target operand will simply be printed as a label. Local labels are collected by a separate pre-decoding pass beforehand. A global data memory operand will be printed as a global symbol instead of the raw data address. Unfortunately, due to the way the disassembler is set up and to be less intrusive, a global symbol is always printed as the last operand of a memory access instruction. This is less than ideal but is probably acceptable from checking code quality point of view since on most targets an instruction can have at most one memory operand.
So far only the X86 disassemblers are supported.
Test Plan:
llvm-objdump -d --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:
<_start>:
push rax
mov dword ptr [rsp + 4], 0
mov dword ptr [rsp], 0
mov eax, dword ptr [rsp]
cmp eax, dword ptr [rip + 4112] # 202182 <g>
jge 0x20117e <_start+0x25>
call 0x201158 <foo>
inc dword ptr [rsp]
jmp 0x201169 <_start+0x10>
xor eax, eax
pop rcx
ret
```
llvm-objdump -d **--symbolize-operands** --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:
<_start>:
push rax
mov dword ptr [rsp + 4], 0
mov dword ptr [rsp], 0
<L1>:
mov eax, dword ptr [rsp]
cmp eax, dword ptr <g>
jge <L0>
call <foo>
inc dword ptr [rsp]
jmp <L1>
<L0>:
xor eax, eax
pop rcx
ret
```
Note that the jump instructions like `jge 0x20117e <_start+0x25>` without this work is printed as a real target address and an offset from the leading symbol. With a change in the optimizer that adds/deletes an instruction, the address and offset may shift for targets placed after the instruction. This will be a problem when diffing the disassembly from two optimizers where there are unnecessary false positives due to such branch target address changes. With `--symbolize-operand`, a label is printed for a branch target instead to reduce the false positives. Similarly, the disassemble of PC-relative global variable references is also prone to instruction insertion/deletion.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D84191
Some of the lower implementations were relying on this, however the
type was not set depending on which form .lower* helper form you were
using. For instance, if you used an unconditonal lower(), the type was
never set. Most of the lower actions do not benefit from a type
parameter, and just expand in terms of the original operation's types.
However, some lowerings could benefit from an additional type hint to
combine a promotion and an expansion. An example of this is for
add/sub sat. The DAG integer legalization tries to use smarter
expansions directly when promoting the integer type, and doesn't
always produce the same instruction with a wider type.
Treat this as an optional hint argument, that only means something for
specific lower actions. It may be useful to generalize this mechanism
to pass a full list of type indexes and desired types, but I haven't
run into a case like that yet.
The "gc-live" operand bundles were recently added, and all tests have been updated to use that format. A migration period was provided, though it's worth noting these intrinsics are experimental, so formally there is no compatibile requirement.
This is an extension to a96fc46. "gc-live" hadn't been implemented at the point that patch was initially posted.
(Forgot to land this a couple of weeks back.)
In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic.
The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy.
Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones.
Differential Revision: https://reviews.llvm.org/D80892
Add support for passing in libraries via `-l` and `-L` options to
`llvm-libtool-darwin`.
Reviewed by jhenderson, smeenai
Differential Revision: https://reviews.llvm.org/D85540
Add support for -arch_only option for llvm-libtool-darwin. This diff
also adds support for accepting universal files as input and flattening
them to create the required static library. Supports input universal
files contaning both Mach-O object files or archives.
Differences from cctools' libtool:
- `-arch_only` can be specified multiple times
- archives containing universal files are considered invalid (libtool
allows such archives)
Reviewed by jhenderson, smeenai
Differential Revision: https://reviews.llvm.org/D84770
Add documentation for the remaining options of
`llvm-install-name-tool`.
Reviewed by jhenderson, smeenai
Differential Revision: https://reviews.llvm.org/D85655
The switch from llvm::cl to OptTable (D83530) dropped --version, which
is needed by some users.
This patch also adds a -v alias, which is available in GNU addr2line.
The version dumping is similar to llvm-objcopy --version (exotic):
```
llvm-symbolizer
LLVM (http://llvm.org/):
LLVM version 12.0.0git
Optimized build with assertions.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake-avx512
```
Reviewed By: dyung, jhenderson
Differential Revision: https://reviews.llvm.org/D85624
Add support for `-D` and `-U` options for llvm-libtool-darwin. `-D`
allows for using zero for timestamps and UIDs/GIDs. `-U` allows for
using actual timestamps and UIDs/GIDs.
Reviewed by jhenderson, smeenai
Differential Revision: https://reviews.llvm.org/D84209
Add support for `-filelist` option for llvm-libtool-darwin. `-filelist`
option allows for passing in a file containing a list of filenames.
Reviewed by jhenderson, smeenai
Differential Revision: https://reviews.llvm.org/D84206
This diff adds documentation for `allow-empty` flag under FileCheck
docs.
Reviewed by jhenderson, smeenai, thopre
Differential Revision: https://reviews.llvm.org/D83682
This change adds a CMake rule to produce shared object versions of
libFuzzer (no-main). Like the static library versions, these shared
libraries have a copy of libc++ statically linked in. For i386 we don't
link with libc++ since i386 does not support mixing position-
independent and non-position-independent code in the same library.
Patch By: IanPudney
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D84947
This came up during the review for D67656. It's nice but also subtle, so documenting it as an idiom will make tests easier to understand.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D68061
for the advantage outlined by D83639 ([OptTable] Support grouped short options)
Some behavior changes:
* -i={0,false} is removed. Use --no-inlines instead.
* --demangle={0,false} is removed. Use --no-demangle instead
* -untag-addresses={0,false} is removed. Use --no-untag-addresses instead
Added a higher level API OptTable::parseArgs which handles optional
initial options populated from an environment variable, expands response
files recursively, and parses options.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D83530
See https://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html
"[llvm-dev] Multiple documents in one test file" for some discussions.
This patch has explored several alternatives. The current semantics are similar to
what @dblaikie proposed.
`split-file filename output` splits the input file into multiple parts separated by
regex `^(.|//)--- filename` and write each part to the file `output/filename`
(`filename` can include path separators).
Use case A (organizing input of different formats (e.g. linker
script+assembly) in one file).
```
# RUN: split-file %s %t
# RUN: llvm-mc %t/asm -o %t.o
# RUN: ld.lld -T %t/lds %t.o -o %t
This is sometimes better than the %S/Inputs/ approach because the user
can see the auxiliary files immediately and don't have to open another file.
# asm
...
# lds
...
```
Use case B (for utilities which don't have built-in input splitting
feature):
```
// RUN: split-file %s %t
// RUN: llc < %t/1.ll | FileCheck %s --check-prefix=CASE1
// RUN: llc < %t/2.ll | FileCheck %s --check-prefix=CASE2
Combing tests prudently can improve readability.
For example, when testing parsing errors if the recovery mechanism isn't possible,
grouping the tests in one file can more readily see test coverage/strategy.
//--- 1.ll
...
//--- 2.ll
...
```
Since this is a new utility, there is no git history concerns for
UpperCase variable names. I use lowerCase variable names like mlir/lld.
Reviewed By: jhenderson, lattner
Differential Revision: https://reviews.llvm.org/D83834
As far as I know, ipconstprop has not been used in years and ipsccp has
been used instead. This has the potential for confusion and sometimes
leads people to spend time finding & reporting bugs as well as
updating it to work with the latest API changes.
This patch moves the tests over to SCCP. There's one functional difference
I am aware of: ipconstprop propagates for each call-site individually, so
for functions that are called with different constant arguments it can sometimes
produce better results than ipsccp (at much higher compile-time cost).But
IPSCCP can be thought to do so as well for internal functions and as mentioned
earlier, the pass seems unused in practice (and there are no plans on working
towards enabling it anytime).
Also discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143773.html
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84447
- Clarify that these are extensions to DWARF 5 and not as yet a
proposal.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D70523
- Clarify what context is used in DWARF expression evaluation.
- Define location descriptions to fully resolve the context and so
include the context in their result.
- As a consequence of location descriptions being fully resoved,
change address spaces so only a swizzled and unswizzled private
address space is defined. The lane is now part of the location
description context.
- Clarify how call frame information is used to fully resolve
expressions that specify registers.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D70523
To match NewPM pass name, and also for readability.
Also rename rpo-functionattrs -> rpo-function-attrs while we're here.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D84694
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.
The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.
In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.
Differential Revision: https://reviews.llvm.org/D81981
This adds a new extern "C" function that serves the same purpose. This removes the need for external users to depend on internal headers in order to use this feature. It also standardizes the interface in a way that other fuzzing engines will be able to match.
Patch By: IanPudney
Reviewed By: kcc
Differential Revision: https://reviews.llvm.org/D84561
Starting with Skylake, the LBR contains the precise number of cycles between the two
consecutive branches.
Making use of this will hopefully make the measurements more precise than the
existing methods of using RDTSC.
Differential Revision: https://reviews.llvm.org/D77422
New change: check for existence of field `cycles` in perf_branch_entry before enabling this mode.
This should prevent compilation errors when building for older kernel whose headers don't support it.
See https://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html
"[llvm-dev] Multiple documents in one test file" for some discussions.
`extract part filename` splits the input file into multiple parts separated by
regex `^(.|//)--- ` and extract the specified part to stdout or the
output file (if specified).
Use case A (organizing input of different formats (e.g. linker
script+assembly) in one file).
```
// RUN: extract lds %s -o %t.lds
// RUN: extract asm %s -o %t.s
// RUN: llvm-mc %t.s -o %t.o
// RUN: ld.lld -T %t.lds %t.o -o %t
This is sometimes better than the %S/Inputs/ approach because the user
can see the auxiliary files immediately and don't have to open another file.
```
Use case B (for utilities which don't have built-in input splitting
feature):
```
// RUN: extract case1 %s | llc | FileCheck %s --check-prefix=CASE1
// RUN: extract case2 %s | llc | FileCheck %s --check-prefix=CASE2
Combing tests prudently can improve readability.
This is sometimes better than having multiple test files.
```
Since this is a new utility, there is no git history concerns for
UpperCase variable names. I use lowerCase variable names like mlir/lld.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D83834
Add support for creating static libraries when the input includes only
Mach-O binaries (and not libraries/archives themselves).
Reviewed by alexshap, Ktwu, smeenai, jhenderson, MaskRay, mtrent
Differential Revision: https://reviews.llvm.org/D83002
Its effect could be achieved by
`-stop-after`,`-print-after`,`-print-after-all`. But a few tests need to
print MIR after ISel which could not be done with
`-print-after`/`-stop-after` since isel pass does not have commandline name.
That's the reason `--print-machineinstrs` is downgraded to
`--print-after-isel` in this patch. `--print-after-isel` could be
removed after we switch to new pass manager since isel pass would have a
commandline text name to use `print-after` or equivalent switches.
The motivation of this patch is to reduce tests dependency on
would-be-deprecated feature.
Reviewed By: arsenm, dsanders
Differential Revision: https://reviews.llvm.org/D83275
Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.
for pointer array (before allocation/association)
without DW_AT_associated
(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_associated
(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>
for allocatable array (before allocation)
without DW_AT_allocated
(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_allocated
(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>
Testing
- unit test cases added
- check-llvm
- check-debuginfo
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83544
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.
This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.
My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.
This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.
Background:
There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.
It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.
The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.
This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.
Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.
The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.
Additional context is in D79744.
This patch changes llvm-readelf (and llvm-readobj for consistency)
behavior to print an error when executed with no input files.
Reading from stdin can be achieved via a '-' for the input
object.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46400
Differential Revision: https://reviews.llvm.org/D83704
Reviewed by: jhenderson, MaskRay, sbc, jyknight
This diff starts the implementation of llvm-libtool-darwin
(an llvm based replacement of cctool's libtool).
Libtool is used for creating static and dynamic libraries
from a bunch of object files given as input.
Reviewed by alexshap, smeenai, jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D82923
From @erichkeane:
```
This patch doesn't seem to build for me:
/iusers/ekeane1/workspaces/llvm-project/llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp: In function ‘llvm::Error llvm::exegesis::parseDataBuffer(const char*, size_t, const void*, const void*, llvm::SmallVector<long int, 4>*)’:
/iusers/ekeane1/workspaces/llvm-project/llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:99:37: error: ‘struct perf_branch_entry’ has no member named ‘cycles’
CycleArray->push_back(Entry.cycles);
I'm on RHEL7, so I have kernel 3.10, so it doesn't have 'cycles'.
According ot this: https://elixir.bootlin.com/linux/v4.3/source/include/uapi/linux/perf_event.h#L963 kernel 4.3 is the first time that 'cycles' appeared in this structure.
```
Make explicit that freeze does not touch paddings of an aggregate.
(Relevant comment: https://reviews.llvm.org/D83752#2152550)
This implies that `v = freeze(load p); store v, q` may still leave undef bits
or poison in memory if `v` is an aggregate, but it still happens for
non-byte integers such as i1.
Differential Revision: https://reviews.llvm.org/D83927
Starting with Skylake, the LBR contains the precise number of cycles between the two
consecutive branches.
Making use of this will hopefully make the measurements more precise than the
existing methods of using RDTSC.
Differential Revision: https://reviews.llvm.org/D77422
This came up in a recent review, someone was wondering were was
this all documented and I couldn't find a reference to provide.
Differential Revision: https://reviews.llvm.org/D83816
Like most readability rules, it isn't absolute and there is a matter of taste
to it. I think more recent part of the project may be more consistent in the
current application of the guideline. I suspect sources like
mlir/lib/Dialect/StandardOps/IR/Ops.cpp may be examples of this at the moment.
Differential Revision: https://reviews.llvm.org/D82594
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.
This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
This changes the matrix load/store intrinsic definitions to load/store from/to
a pointer, and not from/to a pointer to a vector, as discussed in D83477.
This also includes the recommit of "[Matrix] Tighten LangRef definitions and
Verifier checks" which adds improved language reference descriptions of the
matrix intrinsics and verifier checks.
Differential Revision: https://reviews.llvm.org/D83785
Loop metadata nodes do not adhere to the documented property:
(a) LoopIDs are not unique: Any pass that duplicates IR will do it
including its metadata (e.g. LoopVersioning) such that multiple
loops are linked with the same LoopID. There is even a test case
(Transforms/LoopUnroll/unroll-pragmas-disabled.ll) for multiple
loops with the same LoopID.
(b) LoopIDs are not persistent: Adding or removing an item from a LoopID
can only be done by creating a new MDNode and assigning it to the
loop's branch(es). Passes such as LoopUnroll (llvm.loop.unroll.disable)
and LoopVectorize (llvm.loop.isvectorized) use this to mark loops to
not be transformed multiple times or to avoid that a LoopVersioned
original loop is transformed.
Update the documentation according to how llvm.loop is used in practice.
Differential Revision: https://reviews.llvm.org/D55290
This tightens the matrix intrinsic definitions in LLVM LangRef and adds
correspondings checks to the IR Verifier.
Differential Revision: https://reviews.llvm.org/D83477
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
This is only allowed when "reassoc" is present on the fadd.
As discussed in D80801, this transform goes beyond
what is allowed by "contract" FMF (-ffp-contract=fast).
That is because we are fusing the trailing add of 'E' with a
multiply, but without "reassoc", the code mandates that the
products A*B and C*D are added together before adding in 'E'.
I've added this example to the LangRef to try to clarify the
meaning of "contract". If that seems reasonable, we should
probably do something similar for the clang docs because
there does not appear to be any formal spec for the behavior
of -ffp-contract=fast.
Differential Revision: https://reviews.llvm.org/D82499
This doesn't appear used for anything, and is emitted incorrectly
based on the description. This also depends on the IR type, and
pointee element type.
In FileCheck.rst, add `-dump-input-context` and `-dump-input-filter`,
and fix some `-dump-input` documentation.
In `FileCheck -help`, `cl::value_desc("kind")` is being ignored for
`-dump-input-filter`, so just drop it.
Extend `-dump-input=help` to mention FILECHECK_OPTS.
Summary:
As per disscussion in D83351, using `for_each` is potentially confusing,
at least in regards to inconsistent style (there's less than 100 `for_each`
usages in LLVM, but ~100.000 `for` range-based loops
Therefore, it should be avoided.
Reviewers: dblaikie, nickdesaulniers
Reviewed By: dblaikie, nickdesaulniers
Subscribers: hubert.reinterpretcast, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83431
This adds the --debug-vars option to llvm-objdump, which prints
locations (registers/memory) of source-level variables alongside the
disassembly based on DWARF info. A vertical line is printed for each
live-range, with a label at the top giving the variable name and
location, and the position and length of the line indicating the program
counter range in which it is valid.
Differential revision: https://reviews.llvm.org/D70720
Summary:
Fixes two minor issues in the docs present under `ninja docs-llvm-html`:
1 - A header is too small:
```
Warning, treated as error:
llvm/llvm/docs/Passes.rst:70:Title underline too short.
``-basic-aa``: Basic Alias Analysis (stateless AA impl)
------------------------------------------------------
```
2 - Multiple definitions on a non-anonymous target (llvm-dev mailing list):
```
Warning, treated as error:
llvm/llvm/docs/DeveloperPolicy.rst:3:Duplicate explicit target name: "llvm-dev mailing list".
```
Reviewers: lattner
Reviewed By: lattner
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83416
LLVM currently does not require function parameters or return values
to be fully initialized, and does not care if they are poison. This can
be useful if the frontend ABI makes no such demands, but may prevent
helpful backend transformations in case they do. Specifically, the C
and C++ languages require all scalar function operands to be fully
determined.
Introducing this attribute is of particular use to MemorySanitizer
today, although other transformations may benefit from it as well.
We can modify MemorySanitizer instrumentation to provide modest (17%)
space savings where `frozen` is present.
This commit only adds the attribute to the Language Reference, and
the actual implementation of the attribute will follow in a separate
commit.
Differential Revision: https://reviews.llvm.org/D82316
This cleans up the stack allocated by a @llvm.call.preallocated.setup.
Should either call the teardown or the preallocated call to clean up the
stack. Calling both is UB.
Add LangRef.
Add verifier check that the token argument is a @llvm.call.preallocated.setup.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D83354
Do not enforce recommonmark dependency if sphinx is called to build
manpages. In order to do this, try to import recommonmark first
and do not configure it if it's not available. Additionally, declare
a custom tags for the selected builder via CMake, and ignore
recommonmark import failure when 'man' target is used.
This will permit us to avoid the problematic recommonmark dependency
for the majority of Gentoo users that do not need to locally build
the complete documentation but want to have tool manpages.
Differential Revision: https://reviews.llvm.org/D83161
Summary:
D80831 changed part of the prefix usage for AIX.
But there are other places getting prefix from DataLayout.
This patch intends to make prefix usage consistent on AIX.
Reviewed by: hubert.reinterpretcast, daltenty
Differential Revision: https://reviews.llvm.org/D81270
The "new way" of enabling recommonmark is only supported in recommonmark
0.5 and later. Use the deprecated approach with versions of Sphinx that
still support it.
If I understand correctly there's no way to use older versions of
recommonmark (<0.5) with newer versions of Sphinx (>3.0) because the old
approach got removed.
Differential revision: https://reviews.llvm.org/D75284
Every other value parameter attribute uses parentheses, so accept this
as the preferred modern syntax. Updating everything to use the new
syntax is left for a future change.
This is cleaning up comments (mostly in the bitcode handling) about
removing some backward compatibility aspect in the 4.0 release.
Historically, "4.0" was used during the development of the 3.x
versions as "this future major breaking change version". At the time
the major number was used to indicate the compatibility. When we
reached 3.9 we decided to change the numbering, instead of going to
3.10 we went to 4.0 but after changing the meaning of the major
number to not mean anything anymore with respect to bitcode backward
compatibility.
The current policy
(https://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility)
indicates only now:
The current LLVM version supports loading any bitcode since version 3.0.
Differential Revision: https://reviews.llvm.org/D82514
Before the fix the build of docs-llvm-html would fail.
The D80959 introduced options that are not recognized, so we have
warning as:
llvm-project/llvm/docs/CommandGuide/llvm-dwarfdump.rst:40\
:unknown option: --debug-info
Differential Revision: https://reviews.llvm.org/D82460
Before the fix the build of docs-llvm-html would fail.
The rG8bc03d216824 introduced a reference to an undefined label,
so we have warning as:
llvm-project/llvm/docs/GlobalISel/GenericOpcode.rst:295:\
undefined label: i_intr_llvm_ptrmask (if the link has no\
caption the label must precede a section header)