This patch adds support for DWARF attribute DW_AT_data_location.
Summary:
Dynamic arrays in fortran are described by array descriptor and
data allocation address. Former is mapped to DW_AT_location and
later is mapped to DW_AT_data_location.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79592
llvm rejects DWARF operator DW_OP_push_object_address.This DWARF
operator is needed for Flang to support allocatable array.
Summary:
Currently llvm rejects DWARF operator DW_OP_push_object_address.
below error is produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151)
warning: ignoring invalid debug info in pushobj.ll
[..]
There are some parts missing in support of this operator, need to
be completed.
Testing
-added a unit testcase
-check-debuginfo
-check-llvm
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79306
We want to add a way to avoid merging identical calls so as to keep the
separate debug-information for those calls. There is also an asan
usecase where having this attribute would be beneficial to avoid
alternative work-arounds.
Here is the link to the feature request:
https://bugs.llvm.org/show_bug.cgi?id=42783.
`nomerge` is different from `noline`. `noinline` prevents function from
inlining at callsites, but `nomerge` prevents multiple identical calls
from being merged into one.
This patch adds `nomerge` to disable the optimization in IR level. A
followup patch will be needed to let backend understands `nomerge` and
avoid tail merge at backend.
Reviewed By: asbirlea, rnk
Differential Revision: https://reviews.llvm.org/D78659
Linkage type was only referenced for functions, not for global
variables.
Clarify that LLVM doesn't make assumption about the allocation size when
no definitive initializer for a global variable is known.
Differential Revision: https://reviews.llvm.org/D78952
The 'nsz' flag is different than 'nnan' or 'ninf' in that it does not create poison.
Make that explicit in the LangRef and fix ValueTracking analysis that misinterpreted
the definition.
This manifests as bugs in InstSimplify shown in the test diffs and as discussed in
PR45778:
https://bugs.llvm.org/show_bug.cgi?id=45778
Differential Revision: https://reviews.llvm.org/D79422
Add llvm.call.preallocated.{setup,arg} instrinsics.
Add "preallocated" operand bundle which takes a token produced by llvm.call.preallocated.setup.
Add "preallocated" parameter attribute, which is like byval but without the copy.
Verifier changes for these IR constructs.
See https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74651
Now compiler defines 5 sets of constants to represent rounding mode.
These are:
1. `llvm::APFloatBase::roundingMode`. It specifies all 5 rounding modes
defined by IEEE-754 and is used in `APFloat` implementation.
2. `clang::LangOptions::FPRoundingModeKind`. It specifies 4 of 5 IEEE-754
rounding modes and a special value for dynamic rounding mode. It is used
in clang frontend.
3. `llvm::fp::RoundingMode`. Defines the same values as
`clang::LangOptions::FPRoundingModeKind` but in different order. It is
used to specify rounding mode in in IR and functions that operate IR.
4. Rounding mode representation used by `FLT_ROUNDS` (C11, 5.2.4.2.2p7).
Besides constants for rounding mode it also uses a special value to
indicate error. It is convenient to use in intrinsic functions, as it
represents platform-independent representation for rounding mode. In this
role it is used in some pending patches.
5. Values like `FE_DOWNWARD` and other, which specify rounding mode in
library calls `fesetround` and `fegetround`. Often they represent bits
of some control register, so they are target-dependent. The same names
(not values) and a special name `FE_DYNAMIC` are used in
`#pragma STDC FENV_ROUND`.
The first 4 sets of constants are target independent and could have the
same numerical representation. It would simplify conversion between the
representations. Also now `clang::LangOptions::FPRoundingModeKind` and
`llvm::fp::RoundingMode` do not contain the value for IEEE-754 rounding
direction `roundTiesToAway`, although it is supported natively on
some targets.
This change defines all the rounding mode type via one `llvm::RoundingMode`,
which also contains rounding mode for IEEE rounding direction `roundTiesToAway`.
Differential Revision: https://reviews.llvm.org/D77379
D72467 updated the shufflevector instruction to include a constant mask
rather than a mask operand. The LangRef text was vague enough to still
make sense, but it is better to update here too, so there's no confusion
about valid mask values. The text here is adapted from the documentation
code comments for "class ShuffleVectorInst".
Differential Revision: https://reviews.llvm.org/D77396
We already mention that `noalias` is modeled after the C99 `restrict`
qualifier but we did omit one important requirement in the description.
For the restrict guarantees the object affected has to be modified
during the execution of the function, in any way (see 6.7.3.1.4 in [0]).
There are two reasons we want this restriction as well:
1) To match the `restrict` semantics when we lower it to `noalias`.
2) To allow the reasoning that the object pointed to by a `noalias`
pointer is not modified through means not derived from this
pointer. Hence, following the uses of that pointer is sufficient
to determine potential modifications.
The discussion on this came up as part of D73428. In that patch the
Attributor is taught to derive `noalias` for call site arguments based
on alias queries against objects that are accessed in the callee. This
is possible even if the pointer passed at the call site was "not-`noalias`".
To simplify the logic there *and* to allow the use of `noalias` as
described in 2) above, it is beneficial to follow the C `restrict`
semantics in cases where there might be "read-read-aliases". Note that
AliasAnalysis* queries for read only objects already result in
`NoAlias` even if the pointers might "alias".
* From this point of view our Alias Analysis is basically a Dependence
Analysis.
[0] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D74935
Summary:
This patch clarifies the semantics of branching on undef value.
Defining `br undef` as undefined behavior explains optimizations that use branch conditions, such as CVP (D76931) and GVN (propagateEquality).
For `switch cond`, it is defined to raise UB if cond is an expression containing undef && cond is not frozen &&
it may yield different values.
This allows that at the destination block the branch condition can be assumed to be frozen already (otherwise UB was already triggered).
This condition is slightly stricter than MemorySanitizer, which allows undef-y condition if it always leads to the same destination,
but it does not break MemorySanitizer because we are giving stricter constraint.
Reviewers: efriedma, fhahn, nikic, spatel, jdoerfert, nlopes
Reviewed By: nlopes
Subscribers: regehr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76973
This is copied from the suggested text by @regehr in:
https://bugs.llvm.org/show_bug.cgi?id=20895
The way forward was not clear for several years, but now that we
have 'freeze' and Alive2, the behavior should be documented.
Also see comments in D76332.
LLVM currently supports CSK_MD5 and CSK_SHA1 source file checksums in
debug info. This change adds support for CSK_SHA256 checksums.
The SHA256 checksums are supported by the CodeView debug format.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D75785
Summary: This patch adds the basic utilities to deal with dropable uses. dropable uses are uses that we rather drop than prevent transformations, for now they are limited to uses in llvm.assume.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: uenoku, lebedev.ri, mgorny, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73404
Summary: As discussed in D75505, it's not particularly useful to change the type of a load to/from floating-point/integer because it's followed by a bitcast, and it might lead to surprising code generation. Check that this doesn't generally happen.
Reviewers: lebedev.ri
Subscribers: jkorous, dexonsmith, ributzka, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75644
Summary:
Terminators in LLVM aren't prohibited from returning values. This means that
the "callbr" instruction, which is used for "asm goto", can support "asm goto
with outputs."
This patch removes all restrictions against "callbr" returning values. The
heavy lifting is done by the code generator. The "INLINEASM_BR" instruction's
a terminator, and the code generator doesn't allow non-terminator instructions
after a terminator. In order to correctly model the feature, we need to copy
outputs from "INLINEASM_BR" into virtual registers. Of course, those copies
aren't terminators.
To get around this issue, we split the block containing the "INLINEASM_BR"
right before the "COPY" instructions. This results in two cheats:
- Any physical registers defined by "INLINEASM_BR" need to be marked as
live-in into the block with the "COPY" instructions. This violates an
assumption that physical registers aren't marked as "live-in" until after
register allocation. But it seems as if the live-in information only
needs to be correct after register allocation. So we're able to get away
with this.
- The indirect branches from the "INLINEASM_BR" are moved to the "COPY"
block. This is to satisfy PHI nodes.
I've been told that MLIR can support this handily, but until we're able to
use it, we'll have to stick with the above.
Reviewers: jyknight, nickdesaulniers, hfinkel, MaskRay, lattner
Reviewed By: nickdesaulniers, MaskRay, lattner
Subscribers: rriddle, qcolombet, jdoerfert, MatzeB, echristo, MaskRay, xbolva00, aaron.ballman, cfe-commits, JonChesterfield, hiraditya, llvm-commits, rnk, craig.topper
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D69868
Summary:
This patch adds intrinsics and ISelDAG nodes for signed
and unsigned fixed-point division:
```
llvm.sdiv.fix.sat.*
llvm.udiv.fix.sat.*
```
These intrinsics perform scaled, saturating division
on two integers or vectors of integers. They are
required for the implementation of the Embedded-C
fixed-point arithmetic in Clang.
Reviewers: bjope, leonardchan, craig.topper
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71550
Summary:
Operand bundles on an llvm.assume allows representing
assumptions that an attribute holds for a certain value at a certain position.
Operand bundles enable assumptions that are either hard or impossible to
represent as a boolean argument of an llvm.assume.
Reviewers: jdoerfert, fhahn, nlopes, reames, regehr, efriedma
Reviewed By: jdoerfert
Subscribers: lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74209
Summary:
Debug Info Version was changed to use "Max" instead of "Warning" per the
original design intent - but this maxes old/new IR unlinkable, since
mismatched merge styles are a linking failure.
It seems possible/maybe reasonable to actually support the combination
of these two flags: Warn, but then use the maximum value rather than the
first value/earlier module's value.
Reviewers: tejohnson
Differential Revision: https://reviews.llvm.org/D74257
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
Summary:
This is a follow up on D61634. It adds an LLVM IR intrinsic to allow better implementation of memcpy from C++.
A follow up CL will add the intrinsics in Clang.
Reviewers: courbet, theraven, t.p.northover, jdoerfert, tejohnson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71710
Summary: masked_load and masked_store instructions require the alignment to be specified and a power of two. It seems to me that this requirement applies to masked_gather and masked_scatter as well.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73179
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:
getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1
This can be used to propagate the value in CodeGenPrepare.
In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.
This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).
Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner
Reviewed by: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68203
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.
- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute
AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).
Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.
Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.
-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.
Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.
This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.
AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
Summary:
This patch adds intrinsics and ISelDAG nodes for
signed and unsigned fixed-point division:
llvm.sdiv.fix.*
llvm.udiv.fix.*
These intrinsics perform scaled division on two
integers or vectors of integers. They are required
for the implementation of the Embedded-C fixed-point
arithmetic in Clang.
Patch by: ebevhan
Reviewers: bjope, leonardchan, efriedma, craig.topper
Reviewed By: craig.topper
Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70007
Summary:
Remove the restrictions that preventing "asm goto" from returning non-void
values. The values returned by "asm goto" are only valid on the "fallthrough"
path.
Reviewers: jyknight, nickdesaulniers, hfinkel
Reviewed By: jyknight, nickdesaulniers
Subscribers: rsmith, hiraditya, llvm-commits, cfe-commits, craig.topper, rnk
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69876
Adds the RISC-V asm template argument modifiers currently supported by LLVM.
Additional ones supported by GCC will be added to the documentation when we
start supporting them.
Add new intrinsics
llvm.experimental.constrained.minimum
llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.
Includes SystemZ back-end support.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71624
The following intrinsics currently carry a rounding mode metadata argument:
llvm.experimental.constrained.minnum
llvm.experimental.constrained.maxnum
llvm.experimental.constrained.ceil
llvm.experimental.constrained.floor
llvm.experimental.constrained.round
llvm.experimental.constrained.trunc
This is not useful since the semantics of those intrinsics do not in any way
depend on the rounding mode. In similar cases, other constrained intrinsics
do not have the rounding mode argument. Remove it here as well.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71218
of integers to floating point.
This includes some of Craig Topper's changes for promotion support from
D71130.
Differential Revision: https://reviews.llvm.org/D69275
This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html
The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.
Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.
For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.
This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:
* Shape propagation to eliminate the embedding/splitting for each
intrinsic.
* Fused & tiled lowering of multiply and other operations.
* Optimization remarks highlighting matrix expressions and costs.
* Generate loops for operations on large matrixes.
* More general block processing for operation on large vectors,
exploiting shape information.
We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to
(1) become unwieldy for larger matrixes (even for 16x16 matrixes,
the resulting shufflevector masks would be huge),
(2) risk instcombine making small changes, causing us to fail to
detect the transpose, preventing better lowerings
For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70456
This adds support for constrained floating-point comparison intrinsics.
Specifically, we add:
declare <ty2>
@llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
declare <ty2>
@llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).
The condition code is implemented as a metadata string. The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).
These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations. Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes. The patch includes support
for the common legalization operations for those nodes.
The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).
Differential Revision: https://reviews.llvm.org/D69281
Summary:
Rework the GMIR documentation to focus more on the end user than the
implementation and tie it in to the MIR document. There was also some
out-of-date information which has been removed.
The quality of the GenericOpcode reference is highly variable and drops
sharply as I worked through them all but we've got to start somewhere :-).
It would be great if others could expand on this too as there is an awful
lot to get through.
Also fix a typo in the definition of G_FLOG. Previously, the comments said
we had two base-2's (G_FLOG and G_FLOG2).
Reviewers: aemerson, volkan, rovka, arsenm
Reviewed By: rovka
Subscribers: wdng, arphaman, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69545
This reverts commit 004ed2b0d1.
Original commit hash 6d03890384
Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.
https://reviews.llvm.org/D67723
Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.
See https://bugs.llvm.org/show_bug.cgi?id=42344
Reviewers: rnk
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D67723
Summary:
This extends the rules for when a call instruction is deemed to be an
FPMathOperator, which is based on the type of the call (i.e. the return
type of the function being called). Previously we only allowed
floating-point and vector-of-floating-point types. Now we also allow
arrays (nested to any depth) of floating-point and
vector-of-floating-point types.
This was motivated by llpc, the pipeline compiler for AMD GPUs
(https://github.com/GPUOpen-Drivers/llpc). llpc has many math library
functions that operate on vectors, typically represented as <4 x float>,
and some that operate on matrices, typically represented as
[4 x <4 x float>], and it's useful to be able to decorate calls to all
of them with fast math flags.
Reviewers: spatel, wristow, arsenm, hfinkel, aemerson, efriedma, cameron.mcinally, mcberg2017, jmolloy
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69161
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.
Original commit message:
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.
This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.
To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.
The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.
This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.
To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.
I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.
On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.
I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.
I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).
Differential revision: https://reviews.llvm.org/D63932
llvm-svn: 375094
Summary:
Internally in LLVM's metadata we use DW_OP_entry_value operations with
the same semantics as DWARF; that is, its operand specifies the number
of bytes that the entry value covers.
At the time of emitting entry values we don't know the emitted size of
the DWARF expression that the entry value will cover. Currently the size
is hardcoded to 1 in DIExpression, and other values causes the verifier
to fail. As the size is 1, that effectively means that we can only have
valid entry values for registers that can be encoded in one byte, which
are the registers with DWARF numbers 0 to 31 (as they can be encoded as
single-byte DW_OP_reg0..DW_OP_reg31 rather than a multi-byte
DW_OP_regx). It is a bit confusing, but it seems like llvm-dwarfdump
will print an operation "correctly", even if the byte size is less than
that, which may make it seem that we emit correct DWARF for registers
with DWARF numbers > 31. If you instead use readelf for such cases, it
will interpret the number of specified bytes as a DWARF expression. This
seems like a limitation in llvm-dwarfdump.
As suggested in D66746, a way forward would be to add an internal
variant of DW_OP_entry_value, DW_OP_LLVM_entry_value, whose operand
instead specifies the number of operations that the entry value covers,
and we then translate that into the byte size at the time of emission.
In this patch that internal operation is added. This patch keeps the
limitation that a entry value can only be applied to simple register
locations, but it will fix the issue with the size operand being
incorrect for DWARF numbers > 31.
Reviewers: aprantl, vsk, djtodoro, NikolaPrica
Reviewed By: aprantl
Subscribers: jyknight, fedor.sergeev, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67492
llvm-svn: 374881
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.
This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.
To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.
The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.
This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.
To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.
I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.
On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.
I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.
I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).
Differential revision: https://reviews.llvm.org/D63932
llvm-svn: 374539
When the target option GuaranteedTailCallOpt is specified, calls with
the fastcc calling convention will be transformed into tail calls if
they are in tail position. This diff adds a new calling convention,
tailcc, currently supported only on X86, which behaves the same way as
fastcc, except that the GuaranteedTailCallOpt flag does not need to
enabled in order to enable tail call optimization.
Patch by Dwight Guth <dwight.guth@runtimeverification.com>!
Reviewed By: lebedev.ri, paquette, rnk
Differential Revision: https://reviews.llvm.org/D67855
llvm-svn: 373976
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.
Reviewed by: andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by: craig.topper
Differential Revision: https://reviews.llvm.org/D64746
llvm-svn: 373900
Summary: The constraint goes up to regs d15 and q7, not d16 and q8.
Subscribers: kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68090
llvm-svn: 373228
Summary:
The list of indirect labels should ALWAYS have their blockaddresses as
argument operands to the callbr (but not necessarily the other way
around). Add an invariant that checks this.
The verifier catches a bad test case that was added recently in r368478.
I think that was a simple mistake, and the test was made less strict in
regards to the precise addresses (as those weren't specifically the
point of the test).
This invariant will be used to find a reported bug.
Link: https://www.spinics.net/lists/arm-kernel/msg753473.html
Link: https://github.com/ClangBuiltLinux/linux/issues/649
Reviewers: craig.topper, void, chandlerc
Reviewed By: void
Subscribers: ychen, lebedev.ri, javed.absar, kristof.beyls, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67196
llvm-svn: 372923
During the review of D67434, it was recommended to make fmuladd's
behavior more explicit. D67434 depends on this interpretation.
Reviewers: efriedma, jfb, reames, scanon, lebedev.ri, spatel
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D67552
llvm-svn: 372892
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917 <https://reviews.llvm.org/D61917>
As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086
The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535
But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.
The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.
Differential Revision: https://reviews.llvm.org/D67564
llvm-svn: 372878
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917
As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086
The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535
But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.
The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.
Differential Revision: https://reviews.llvm.org/D67564
llvm-svn: 372866
Summary:
Adds the following inline asm constraints for SVE:
- Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive
- Upa: SVE predicate register with full range, P0 to P15
Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin
Reviewed By: rovka
Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66524
llvm-svn: 371967
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.
This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.
Patch by: leonardchan, bjope
Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel
Reviewed By: leonardchan
Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57836
llvm-svn: 371308
Summary:
Adds the following inline asm constraints for SVE:
- w: SVE vector register with full range, Z0 to Z31
- x: Restricted to registers Z0 to Z15 inclusive.
- y: Restricted to registers Z0 to Z7 inclusive.
This change also adds the "z" modifier to interpret a register as an SVE register.
Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness.
Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened
Reviewed By: sdesmalen
Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66302
llvm-svn: 370673
Some saturation examples for llvm.smul.fix.sat were not showing
the correct result. I've adjusted the operands to make sure that
we actually trigger overflow in those examples.
llvm-svn: 370566
This implements constrained floating point intrinsics for FP to signed and
unsigned integers.
Quoting from D32319:
The purpose of the constrained intrinsics is to force the optimizer to
respect the restrictions that will be necessary to support things like the
STDC FENV_ACCESS ON pragma without interfering with optimizations when
these restrictions are not needed.
Reviewed by: Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton
Approved by: Craig Topper
Differential Revision: http://reviews.llvm.org/D63782
llvm-svn: 370228
This implements the DWARF 5 feature described in:
http://dwarfstd.org/ShowIssue.php?issue=141212.1
To support recognizing anonymous structs:
struct A {
struct { // Anonymous struct
int y;
};
} a;
This patch adds a new (DI)flag to LLVM metadata:
ExportSymbols
Differential Revision: https://reviews.llvm.org/D66352
llvm-svn: 369781
This patch adds a ptrmask intrinsic which allows masking out bits of a
pointer that must be zero when accessing it, because of ABI alignment
requirements or a restriction of the meaningful bits of a pointer
through the data layout.
This avoids doing a ptrtoint/inttoptr round trip in some cases (e.g. tagged
pointers) and allows us to not lose information about the underlying
object.
Reviewers: nlopes, efriedma, hfinkel, sanjoy, jdoerfert, aqjune
Reviewed by: sanjoy, jdoerfert
Differential Revision: https://reviews.llvm.org/D59065
llvm-svn: 368986
For some targets the LICM pass can result in sub-optimal code in some
cases where it would be better not to run the pass, but it isn't
always possible to suppress the transformations heuristically.
Where the front-end has insight into such cases it is beneficial
to attach loop metadata to disable the pass - this change adds the
llvm.licm.disable metadata to enable that.
Differential Revision: https://reviews.llvm.org/D64557
llvm-svn: 368296