Change --max-timeline-cycles=0 to mean no limit on the number of cycles.
Use this in AMDGPU tests to show all instructions in the timeline view
instead of having it arbitrarily truncated.
Differential Revision: https://reviews.llvm.org/D104846
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
Currently the value is only used when calling `F->viewCFG()` which is missing out on its potential and usefulness.
So I added the check to the printer passes as well.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D102011
The original change was pushed in main as commit f7a23ecece.
It was then reverted by commit a04f01bab2 because it caused linker failures
on buildbots that don't build the AMDGPU target.
--
Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.
More details are available in the original commit log message (f7a23ecece).
Differential Revision: https://reviews.llvm.org/D104149
I believe that after https://reviews.llvm.org/D102355 the behaviour of --print-source-context-lines has changed.
Before: --print-source-context-lines=3 prints 4 lines.
After: --print-source-context-lines=3 prints 3 lines.
Adjust the example in the docs for this change and make the testing a little more robust.
Differential Revision: https://reviews.llvm.org/D104114
Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.
Implementation details:
llvm-mca does its best to extract relevant register, resource, and memory
information from every MCInst when lowering them to an mca::Instruction. It then
uses this information to detect dependencies and simulate stalls within the
pipeline. For some instructions, the information that gets captured within the
mca::Instruction is not enough for mca to simulate them properly. In these
cases, there are two main possibilities:
1. The instruction has a dependency that isn’t detected by mca.
2. mca is incorrectly enforcing a dependency that shouldn’t exist.
For the rest of this discussion, I will be focusing on (1), but I have put some
thought into (2) and I may revisit it in the future.
So we have an instruction that has dependencies that aren’t picked up by mca.
The basic idea for both pipelines in mca is that when an instruction wants to be
dispatched, we first check for register hazards and then we check for resource
hazards. This is where CB is injected. If no register or resource hazards have
been detected, we make a call to CustomBehaviour::checkCustomHazard() to give
the target specific CB the chance to detect and enforce any custom dependencies.
The return value for checkCustomHazaard() is an unsigned int representing the
(minimum) number of cycles that the instruction needs to stall for. It’s fine to
underestimate this value because when StallCycles gets down to 0, we’ll end up
checking for all the hazards again before the instruction is actually
dispatched. However, it’s important not to overestimate the value and the more
accurate your estimate is, the more efficient mca’s execution can be.
In general, for checkCustomHazard() to be able to detect these custom
dependencies, it needs information about the current instruction and also all of
the instructions that are still executing within the pipeline. The mca pipeline
uses mca::Instruction rather than MCInst and the current information encoded
within each mca::Instruction isn’t sufficient for my use cases. I had to add a
few extra attributes to the mca::Instruction class and have them get set by the
MCInst during instruction building. For example, the current mca::Instruction
doesn’t know its opcode, and it also doesn’t know anything about its immediate
operands (both of which I had to add to the class).
With information about the current instruction, a list of all currently
executing instructions, and some target specific objects (MCSubtargetInfo and
MCInstrInfo which the base CB class has references to), developers should be
able to detect and enforce most custom dependencies within checkCustomHazard. If
you need more information than is present in the mca::Instruction, feel free to
add attributes to that class and have them set during the lowering sequence from
MCInst.
Fortunately, in the in-order pipeline, it’s very convenient for us to pass these
arguments to checkCustomHazard. The hazard checking is taken care of within
InOrderIssueStage::canExecute(). This function takes a const InstRef as a
parameter (representing the instruction that currently wants to be dispatched)
and the InOrderIssueStage class maintains a SmallVector<InstRef, 4> which holds
all of the currently executing instructions. For the out-of-order pipeline, it’s
a bit trickier to get the list of executing instructions and this is why I have
held off on implementing it myself. This is the main topic I will bring up when
I eventually make a post to discuss and ask for feedback.
CB is a base class where targets implement their own derived classes. If a
target specific CB does not exist (or we pass in the -disable-cb flag), the base
class is used. This base class trivially returns 0 from its checkCustomHazard()
implementation (meaning that the current instruction needs to stall for 0 cycles
aka no hazard is detected). For this reason, targets or users who choose not to
use CB shouldn’t see any negative impacts to accuracy or performance (in
comparison to pre-patch llvm-mca).
Differential Revision: https://reviews.llvm.org/D104149
Emphasize that this is basically an attempt to remove
``PointerType::getElementType`` and ``Type::getPointerElementType()``.
Add a couple more subtasks.
Differential Revision: https://reviews.llvm.org/D104151
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.
This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.
Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99173
This patch implements vector-predicated intrinsics on IR level for fadd,
fsub, fmul, fdiv and frem. There operate in the default floating-point
environment. We will use constrained fp operand bundles for constrained
vector-predicated fp math (D93455).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93470
I don't like landing this change, but it's an acknowledgement of a practical reality. Despite not having well specified semantics for inttoptr and ptrtoint involving non-integral pointer types, they are used in practice. Here's a quick summary of the current pragmatic reality:
* I happen to know that the main external user of non-integral pointers has effectively disabled the verifier rules.
* RS4GC (the lowering pass for abstract GC machine model which is the key motivation for non-integral pointers), even supports them. We just have all the tests using an integral pointer space to let the verifier run.
* Certain idioms (such as alignment checks for alignment N, where any relocation is guaranteed to be N byte aligned) are fine in practice.
* As implemented, inttoptr/ptrtoint are CSEd and are not control dependent. This means that any code which is intending to check a particular bit pattern at site of use must be wrapped in an intrinsic or external function call.
This change allows them in the Verifier, and updates the LangRef to specific them as implementation dependent. This allows us to acknowledge current reality while still leaving ourselves room to punt on figuring out "good" semantics until the future.
I noticed that I did not update the command guide when introducing the
--rsp-quoting option. This change fixes this.
Differential Revision: https://reviews.llvm.org/D103915
I found the documentation of the various CMake variables difficult to
navigate, because they are unsorted. I can see they've grown
organically with new clusters of somewhat-related options, but the
result is hard to use. This collates them (treating '_' as space).
Differential Revision: https://reviews.llvm.org/D102481
Loads in the first half of the chapter are missing the type argument.
Patched By: klao (Mihaly Barasz)
Reviewed By: Jim
Differential Revision: https://reviews.llvm.org/D90326
Needs to be discussed more.
This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46
This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1
This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23
This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd
https://reviews.llvm.org/D95745 introduced a new `unwind` keyword for inline assembler expressions. Inline asms marked with the `unwind` keyword allows stack unwinding from inline assembly because the compiler emits unwinding information ("around" the inline asm) as it would for calls/invokes. Unwinding the stack from within non-unwind inline asm may cause UB.
Reviewed By: Amanieu
Differential Revision: https://reviews.llvm.org/D102642
SwiftTailCC has a different set of requirements than the C calling convention
for a tail call. The exact argument sequence doesn't have to match, but fewer
ABI-affecting attributes are allowed.
Also make sure the musttail diagnostic triggers if a musttail call isn't
actually a tail call.
There can be a need for some optimizations to get (base, offset)
for any GC pointer. The base can be calculated by generating
needed instructions as it is done by the
RewriteStatepointsForGC::findBasePointer() function. The offset
can be calculated in the same way. Though to not expose the base
calculation and to make the offset calculation as simple as
ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal
outside RS4GC, this patch introduces 2 intrinsics:
@llvm.experimental.gc.get.pointer.base(%derived_ptr)
@llvm.experimental.gc.get.pointer.offset(%derived_ptr)
These intrinsics are inlined by RS4GC along with generation of
statepoint sequences.
With these new intrinsics the GC parseable lowering for atomic
memcpy intrinsics (6ec2c5e402)
could be implemented as a separate pass.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100445
Describes how to run the Fortran LLVM Test Suite, specifically the external SPEC CPU 2017 Fortran tests.
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D102877
LLVM_DEFINITIONS is a string variable containing a list of arguments
to pass to the compiler. When CMake's add_definitions is passed a
string variable, this is interpreted as one argument. To make it
behave properly, the string variable needs to be split into a list.
Despite the fact that add_definitions isn't supposed to be used like
the LLVM docs recommended, it worked fine in practice in many cases.
If the first argument in LLVM_DEFINITIONS is of the form -DFOO=42
instead of plain -DFOO, the rest of the string is treated as value
to this define. I.e. if LLVM_DEFINITIONS consists of `-DFOO=42 -DBAR`,
CMake ended up passing `-DFOO="42 -DBAR"` to the compiler.
See https://gitlab.kitware.com/cmake/cmakissues/22162
for discussion on the matter.
Changing LLVM_DEFINITIONS to be a list variable would possibly be
more disruptive; instead keep the variable defined as before but
change the recommendation for how to use it. Then projects using it
can gradually be updated to follow the new recommendation.
Differential Revision: https://reviews.llvm.org/D103044
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.
Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.
Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035
Reviewed By: vitalybuka, morehouse
Differential Revision: https://reviews.llvm.org/D102772
I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.
Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode: inverse_throughput
key:
instructions:
- 'VPXORYrr YMM0 YMM0 YMM0'
config: ''
register_initial_values: []
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error: ''
info: ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...
```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.
Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode: inverse_throughput
key:
instructions:
- 'VPXORYrr YMM0 YMM0 YMM0'
config: ''
register_initial_values: []
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error: ''
info: ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?
That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode: inverse_throughput
key:
instructions:
- 'VPXORYrr YMM0 YMM0 YMM0'
config: ''
register_initial_values: []
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error: ''
info: ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D102522
Add link to documentation for "AMD Instinct MI100 Instruction Set
Architecture" to AMDGPUUsage.rst.
Reviewed By: kzhuravl, rampitec, dp
Differential Revision: https://reviews.llvm.org/D102859
In the WebAssembly target, we would like to allow alloca in two address
spaces. The alloca instruction already has an address space argument,
but the verifier asserts that the address space of an alloca is the
default alloca address space from the datalayout. This patch removes
this restriction. Targets that would like to impose additional
restrictions should do so via target-specific verification passes.
Differential Revision: https://reviews.llvm.org/D101045
[Debugify][Original DI] Test dbg var loc preservation
This is an improvement of [0]. This adds checking of
original llvm.dbg.values()/declares() instructions in
optimizations.
We have picked a real issue that has been found with
this (actually, picked one variable location missing
from [1] and resolved the issue), and the result is
the fix for that -- D100844.
Before applying the D100844, using the options from [0]
(but with this patch applied) on the compilation of GDB 7.11,
the final HTML report for the debug-info issues can be found
at [1] (please scroll down, and look for
"Summary of Variable Location Bugs"). After applying
the D100844, the numbers has improved a bit -- please take
a look into [2].
[0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\
test-original-debug-info-preservation-in-optimizations
[1] https://djolertrk.github.io/di-check-before-adce-fix/
[2] https://djolertrk.github.io/di-check-after-adce-fix/
Differential Revision: https://reviews.llvm.org/D100845
The Unit test was failing because the pass from the test that
modifies the IR, in its runOnFunction() didn't return 'true',
so the expensive-check configuration triggered an assertion.
This is an improvement of [0]. This adds checking of
original llvm.dbg.values()/declares() instructions in
optimizations.
We have picked a real issue that has been found with
this (actually, picked one variable location missing
from [1] and resolved the issue), and the result is
the fix for that -- D100844.
Before applying the D100844, using the options from [0]
(but with this patch applied) on the compilation of GDB 7.11,
the final HTML report for the debug-info issues can be found
at [1] (please scroll down, and look for
"Summary of Variable Location Bugs"). After applying
the D100844, the numbers has improved a bit -- please take
a look into [2].
[0] https://llvm.org/docs/HowToUpdateDebugInfo.html\
[1] https://djolertrk.github.io/di-check-before-adce-fix/
[2] https://djolertrk.github.io/di-check-after-adce-fix/
Differential Revision: https://reviews.llvm.org/D100845
To track security issues, we're starting with the chromium bug tracker
(using the llvm project there).
We considered using Github Security Advisories. However, they are
currently intended as a way for project owners to publicize their
security advisories, and aren't well-suited to reporting issues.
This also moves the issue-reporting paragraph to the beginning of the
document, in part to make it more discoverable, in part to allow the
anchor-linking to actually display the paragraph at the top of the page.
Note that this doesn't update the concrete list of security-sensitive
areas, which is still an open item. When we do, we may want to move the
list of security-sensitive areas next to the issue-reporting paragraph
as well, as it seems like relevant information needed in the reporting
process.
Finally, when describing the discission medium, this splits the topics
discussed into two: the concrete security issues, discussed in the
issue tracker, and the logistics of the group, in our mailing list,
as patches on public lists, and in the monthly sync-up call.
While there, add a SECURITY.md page linking to the relevant paragraph.
Differential Revision: https://reviews.llvm.org/D100873
In many cases it is helpful to know at what address the resolved function starts.
This patch adds a new StartAddress member to the DILineInfo structure.
Reviewed By: jhenderson, dblaikie
Differential Revision: https://reviews.llvm.org/D102316
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.
Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
outside of _try region must be updated in memory (not just in register)
before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.
Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others. If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation described below.
Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.
Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.
Part-2 (will be in Part-2 patch): LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.htmlhttps://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D80344/new/
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".
Support is added for AArch64 and X86 for now.
This extends any frame record created in the function to include that
parameter, passed in X22.
The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:
* 0b0000 => a normal, non-extended record with just [FP, LR]
* 0b0001 => the extended record [X22, FP, LR]
* 0b1111 => kernel space, and a non-extended record.
All other values are currently reserved.
If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.
There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
The opaque pointer type is essentially just a normal pointer type with a
null pointee type.
This also adds support for the opaque pointer type to the bitcode
reader/writer, as well as to textual IR.
To avoid confusion with existing pointer types, we disallow creating a
pointer to an opaque pointer.
Opaque pointer types should not be widely used at this point since many
parts of LLVM still do not support them. The next steps are to add some
very simple use cases of opaque pointers to make sure they work, then
start pretending that all pointers are opaque pointers and see what
breaks.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150359.html
Reviewed By: dblaikie, dexonsmith, pcc
Differential Revision: https://reviews.llvm.org/D101704
LLVM's build system contains support for configuring a distribution, but
it can often be useful to be able to configure multiple distributions
(e.g. if you want separate distributions for the tools and the
libraries). Add this support to the build system, along with
documentation and usage examples.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D89177
This patch adds JSON output style to llvm-symbolizer to better support CLI automation by providing a machine readable output.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96883
The option --use-symbol-table is now a noop and does not appear in the
help text, however it still appears in the command guide. This change
removes it from the command guide and updates the description of
--output-style .
Differential Revision: https://reviews.llvm.org/D102078
The internal `cl::opt` option --x86-asm-syntax sets the AsmParser and AsmWriter
dialect. The option is used by llc and llvm-mc tests to set the AsmWriter dialect.
This patch adds -M {att,intel} as GNU objdump compatible aliases (PR43413).
Note: the dialect is initialized when the MCAsmInfo is constructed.
`MCInstPrinter::applyTargetSpecificCLOption` is called too late and its MCAsmInfo
reference is const, so changing the `cl::opt` in
`MCInstPrinter::applyTargetSpecificCLOption` is not an option, at least without
large amount of refactoring.
Reviewed By: hoy, jhenderson, thakis
Differential Revision: https://reviews.llvm.org/D101695
The llvm-objdump command guide has the option --cfg which was removed
from the tool by 888320e9fa in 2014. This
change updates the command guide to reflect this.
Differential Revision: https://reviews.llvm.org/D101648
Add a flag to change dsymutil's behavior and force a static variable to
keep its enclosing function. The test shows a situation where that could
be useful. I'm not convinced this behavior makes sense as a default,
which is why it's behind a flag.
rdar://74918374
Differential revision: https://reviews.llvm.org/D101337
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.
Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)
Also document the function attribute "frame-pointer" which is long overdue.
Differential Revision: https://reviews.llvm.org/D101016
Don't phrase the semantics in terms of the optimizer. Instead have a
more straightforward execution based semantic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D63439
This implements an LLVM tool that's flag- and output-compatible
with macOS's `otool` -- except for bugs, but from testing with both
`otool` and `xcrun otool-classic`, llvm-otool matches vanilla
otool's behavior very well already. It's not 100% perfect, but
it's a very solid start.
This uses the same approach as llvm-objcopy: llvm-objdump uses
a different OptTable when it's invoked as llvm-otool. This
is possible thanks to D100433.
Differential Revision: https://reviews.llvm.org/D100583
Beginners might not aware of this variable and wanted to try a new experimental target.
Although this variable mention in Writing a Backend Documentation. But it becomes easy to search when listed in cmake.rst doc where most variables are listed.
Reviewed By: myhsu
Differential Revision: https://reviews.llvm.org/D100729
This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee.
This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute".
The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work.
Differential Revision: https://reviews.llvm.org/D100141
This patch clarifies the semantics of nocapture attribute.
A 'Pointer Capture' subsection is added to describe the semantics of pointer capture first.
For the nocapture example with two same pointer arguments, it is consistent with the semantics that Alive2 used to run lit tests.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D97924
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example
struct S {
__attribute__ ((__aligned__(16))) double v[4];
};
Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)
Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.
This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.
The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.
For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.
On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.
Patch by Momchil Velikov and Lucas Prates.
Differential Revision: https://reviews.llvm.org/D98794
Update the Scudo document to align with the standalone version.
Add some more verbiage about the various component of the
allocator, rework a bit everything.
The build instructions have been updated.
The options and their default values have been updated, and
the `mallopt` ones have been added.
Differential Revision: https://reviews.llvm.org/D100230
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
This has come up a few times recently, and I was surprised to notice that we don't have anything in the docs.
This patch deliberately sticks to stuff that is uncontroversial in the community. Everything herein is thought to be widely agreed to by a large majority of the community. A few things were noted and removed in review which failed this standard, if you spot anything else, please point it out.
Differential Revision: https://reviews.llvm.org/D99305
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.
As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.
Differential Revision: https://reviews.llvm.org/D98294
The documentation link of Google Test on GitHub have been moved to the
top-level docs directory.
Thus, the original link is invalid now.
Reviewed By: Pavel Labath
Differential Revision: https://reviews.llvm.org/D99559
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
RVV intrinsics has new overloading rule, please see
82aac7dad4
Changed:
1. Rename `generic` to `overloaded` because the new rule is not using C11 generic.
2. Change HasGeneric to HasNoMaskedOverloaded because all masked operations
support overloading api.
3. Add more overloaded tests due to overloading rule changed.
Differential Revision: https://reviews.llvm.org/D99189
I think byval/sret and the others are close to being able to rip out
the code to support the missing type case. A lot of this code is
shared with inalloca, so catch this up to the others so that can
happen.
This adds me as a Google representative for the LLVM security group.
This was proposed, discussed, and voted on in the differential revision
linked below; please see it for more information.
Differential Revision: https://reviews.llvm.org/D99232
In order to test the preservation of the original Debug Info metadata
in your projects, a front end option could be very useful, since users
usually report that a concrete entity (e.g. variable x, or function fn2())
is missing debug info. The [0] is an example of running the utility
on GDB Project.
This depends on: D82546 and D82545.
Differential Revision: https://reviews.llvm.org/D82547