As suggested in D92247 (and independent of whatever we decide to do there),
this code is confusing as-is. Hopefully, this is at least mildly better.
We might be able to do better still, but we have a function called
"removePredecessor" with this behavior:
"Note that this function does not actually remove the predecessor." (!)
Define ConstantData::PoisonValue.
Add support for poison value to LLLexer/LLParser/BitcodeReader/BitcodeWriter.
Add support for poison value to llvm-c interface.
Add support for poison value to OCaml binding.
Add m_Poison in PatternMatch.
Differential Revision: https://reviews.llvm.org/D71126
Typically branch_weights are i32, not i64.
This fixes entry_counts_cold.ll under NPM.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D90539
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:
1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.
We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.
Let's now look at an example. Given the following LLVM IR:
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
br i1 %cmp, label %bb1, label %bb2
bb1:
br label %bb3
bb2:
br label %bb3
bb3:
ret void
}
```
The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
br i1 %cmp, label %bb1, label %bb2
bb1:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
br label %bb3
bb2:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
br label %bb3
bb3:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
ret void
}
```
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86490
This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D84345
This is similar to the existing alloca and program address spaces (D37052)
and should be used when creating/accessing global variables.
We need this in our CHERI fork of LLVM to place all globals in address space 200.
This ensures that values are accessed using CHERI load/store instructions
instead of the normal MIPS/RISC-V ones.
The problem this is trying to fix is that most of the time the type of
globals is created using a simple PointerType::getUnqual() (or ::get() with
the default address-space value of 0). This does not work for us and we get
assertion/compilation/instruction selection failures whenever a new call
is added that uses the default value of zero.
In our fork we have removed the default parameter value of zero for most
address space arguments and use DL.getProgramAddressSpace() or
DL.getGlobalsAddressSpace() whenever possible. If this change is accepted,
I will upstream follow-up patches to use DL.getGlobalsAddressSpace() instead
of relying on the default value of 0 for PointerType::get(), etc.
This patch and the follow-up changes will not have any functional changes
for existing backends with the default globals address space of zero.
A follow-up commit will change the default globals address space for
AMDGPU to 1.
Reviewed By: dylanmckay
Differential Revision: https://reviews.llvm.org/D70947
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.
When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.
In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
advantage of `dso_local_equivalent` for relative references
This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.
See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.
Differential Revision: https://reviews.llvm.org/D77248
This should be a perfectly reasonable operation for scalable vectors.
Currently, it only works for zeroinitializer values of
ScalableVectorType, but the fundamental operation is sound and it should
be possible to make it work for other splats
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D77442
With a function pass manager, it would insert debuginfo metadata before
getting to function passes while processing the pass manager, causing
debugify to skip while running the function passes.
Skip special passes + verifier + printing passes. Compared to the legacy
implementation of -debugify-each, this additionally skips verifier
passes. Probably no need to update the legacy version since it will be
obsolete soon.
This fixes 2 instcombine tests using -debugify-each under NPM.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D91558
This patch updates Clang's IRGen to add !annotation nodes with an
"auto-init" annotation to all stores for auto-initialization.
As discussed in 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
this allows using optimization remarks to track down where auto-init
code was inserted (and not removed by optimizations).
There are a few cases in the tests where !annotation gets dropped by
optimizations. Those optimizations will be updated in subsequent
patches.
This patch is based on a patch by Francis Visoiu Mistrih.
Reviewed By: thegameg, paquette
Differential Revision: https://reviews.llvm.org/D91417
See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.
I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.
This reverts commit 7bdad08429,
and some of it's follow-ups, that don't stand on their own.
For example, during RAUW in IRMover, the `Function` ValueAsMetadata in "CG Profile" could become bitcast.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D88433
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.
It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.
The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.
To motivate this, consider these specific questions we would like to get answered:
* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?
Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
Reviewed By: thegameg, jdoerfert
Differential Revision: https://reviews.llvm.org/D91188
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.
Differential Revision: https://reviews.llvm.org/D88540
Currently there is only a member version of isEquality(),
which requires an actual [IF]CmpInst to be avaliable,
which isn't always possible, and is inconsistent with
the general pattern here.
I wanted to use it in a new patch, but it wasn't there..
This patch changes the intrinsics cost model to assume that by default
target intrinsics are cheap. This didn't seem to be the case for all
intrinsics, and is potentially an MVE problem due to our scalarization
overheads. Cheap seems to be a good default in general though.
Differential Revision: https://reviews.llvm.org/D90597
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
This adds support for scalable vector types in the C API and in
llvm-c-test, and also adds a test to ensure that llvm-c-test can properly
roundtrip operations involving scalable vectors.
While creating this diff, I discovered that the C API cannot properly roundtrip
_constant expressions_ involving shufflevector / scalable vectors, but that
seems to be a separate enough issue that I plan to address it in a future diff
(unless reviewers feel it should be addressed here).
Differential Revision: https://reviews.llvm.org/D89816
The support of a few debug info attributes specifically for Fortran
arrays have been added to LLVM recently, but there's no way to take
advantage of them through DIBuilder. This patch extends
DIBuilder::createArrayType to enable the settings of those attributes.
Patch by Chih-Ping Chen!
Differential Review: https://reviews.llvm.org/D90323
This is needed to support fortran assumed rank arrays which
have runtime rank.
Summary:
Fortran assumed rank arrays have dynamic rank. DWARF TAG
DW_TAG_generic_subrange is needed to support that.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89218
These logically belong together since it's a base commit plus
followup fixes to less common build configurations.
The patches are:
Revert "CfgInterface: rename interface() to getInterface()"
This reverts commit a74fc48158.
Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5"
This reverts commit f2a06875b6.
Revert "Try to make GCC5 happy about the CfgTraits thing"
This reverts commit 03a5f7ce12.
Revert "Introduce CfgTraits abstraction"
This reverts commit c0cdd22c72.
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
The support of a few debug info attributes specifically for Fortran
arrays have been added to LLVM recently, but there's no way to take
advantage of them through DIBuilder. This patch extends
DIBuilder::createArrayType to enable the settings of those attributes.
Patch by Chih-Ping Chen!
Differential Revision: https://reviews.llvm.org/D89817
Change `ConstantDataSequential::Next` to a
`unique_ptr<ConstantDataSequential>` and update `CDSConstants` to a
`StringMap<unique_ptr<ConstantDataSequential>>`, making the ownership
more obvious.
Differential Revision: https://reviews.llvm.org/D90083
This is a long-delayed follow-up to
5e5b85098d.
`TempMDNode` includes a bunch of machinery for RAUW, and should only be
used when necessary. RAUW wasn't being used in any of these cases... it
was just a placeholder for a self-reference.
Where the real node was using `MDNode::getDistinct`, just replace the
temporary argument with `nullptr`.
Where the real node was using `MDNode::get`, the `replaceOperandWith`
call was "promoting" the node to a distinct one implicitly due to
self-reference detection in `MDNode::handleChangedOperand`. The
`TempMDNode` was serving a purpose by delaying uniquing, but it's way
simpler to just call `MDNode::getDistinct` in the first place.
Note that using a self-reference at all in these places is a hold-over
from before `distinct` metadata existed. It was an old trick to create
distinct nodes. It would be intrusive to change, including bitcode
upgrades, etc., and it's harmless so I'm not sure there's much value in
removing it from existing schemas. After this commit it still has a tiny
memory cost (in the extra metadata operand) but no more overhead in
construction.
Differential Revision: https://reviews.llvm.org/D90079
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.
While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
Fixes pr/47479.
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D87956
Now there are two main classes in Value hierarchy, which support metadata,
these are Instruction and GlobalObject. They implement different APIs for
metadata manipulation, which however overlap. This change moves metadata
manipulation code into Value, so descendant classes can use this code for
their operations on metadata.
No functional changes intended.
Differential Revision: https://reviews.llvm.org/D67626
Per asbirlea's comment, assert that only instructions, constants
and arguments are passed to this API. Simplify returning true
would not be correct for special Value subclasses like MemoryAccess.
Non-instruction defs like arguments, constants or global values
always dominate all instructions/uses inside the function. This
case currently needs to be treated separately by the caller, see
https://reviews.llvm.org/D89623#inline-832818 for an example.
This patch makes the dominator tree APIs accept a Value instead of
an Instruction and always returns true for the non-Instruction case.
A complication here is that BasicBlocks are also Values. For that
reason we can't support the dominates(Value *, BasicBlock *)
variant, as it would conflict with dominates(BasicBlock *, BasicBlock *),
which has different semantics. For the other two APIs we assert
that the passed value is not a BasicBlock.
Differential Revision: https://reviews.llvm.org/D89632
For GC parseable element atomic memcpy/memmove we'll need to
shuffle statepoint arguments. Make it possible by storing the
arguments as Value *, not Use *.
This patch teaches BasicBlock::print to construct an instance of
SlotTracker with the containing function.
Without this patch, we dump:
*** IR Dump After LoopInstSimplifyPass ***
; Preheader:
br label %1
; Loop:
<badref>: ; preds = %1, %0
br label %1
Note "<badref>" above. This happens because BasicBlock::print calls:
SlotTracker SlotTable(this->getModule());
Note that this constructor does not add the contents of functions to
the slot table. That is, basic blocks are left unnumbered.
This patch fixes the problem by switching to:
SlotTracker SlotTable(this->getParent());
which does add the contents of the Module and the function,
this->getParent(), to the slot table.
Differential Revision: https://reviews.llvm.org/D89567
This is to simplify icmp instructions in the form like:
%cmp = icmp eq i32 (i8*, i8*)* bitcast (i32 (i32**, i32**)* @f32 to i32
%(i8*, i8*)), bitcast (i32 (i64**, i64**) @f64 to i32 (i8*, i8*)*)
Here @f32 and @f64 are two functions.
Differential Revision: https://reviews.llvm.org/D87850
When generating the use-list order, also consider value uses that are
operands which are wrapped in metadata; e.g. llvm.dbg.value operands.
This fixes PR36778. The test case is based on the reproducer from that
report.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D53758
The CfgTraits abstraction simplfies writing algorithms that are
generic over the type of CFG, and enables writing such algorithms
as regular non-template code that operates on opaque references
to CFG blocks and values.
Implementations of CfgTraits provide operations on the concrete
CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock *`.
CfgInterface is an abstract base class which provides operations
on opaque types CfgBlockRef and CfgValueRef. Those opaque types
encapsulate a `void *`, but the meaning depends on the concrete
CFG type. For example, MachineCfgTraits -- for use with MachineIR
in SSA form -- encodes a Register inside CfgValueRef. Converting
between concrete references and opaque/generic ones is done by
CfgTraits::{fromGeneric,toGeneric}. Convenience methods
CfgTraits::{un}wrap{Iterator,Range} are available as well.
Writing algorithms in terms of CfgInterface adds some overhead
(virtual method calls, plus in same cases it removes the
opportunity to inline iterators), but can be much more convenient
since generic algorithms can be written as non-templates.
This patch adds implementations of CfgTraits for all CFGs on
which dominator trees are calculated, so that the dominator
tree can be ported to this machinery. Only IrCfgTraits (LLVM IR)
and MachineCfgTraits (Machine IR in SSA form) are complete, the
other implementations are limited to the absolute minimum
required to make the upcoming dominator tree changes work.
v5:
- fix MachineCfgTraits::blockdef_iterator and allow it to iterate over
the instructions in a bundle
- use MachineBasicBlock::printName
v6:
- implement predecessors/successors for all CfgTraits implementations
- fix error in unwrapRange
- rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming
that is consistent with {wrap,unwrap}{Iterator,Range}
- use getVRegDef instead of getUniqueVRegDef
v7:
- std::forward fix in wrapping_iterator
- fix typos
v8:
- cleanup operators on CfgOpaqueType
- address other review comments
Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d
Differential Revision: https://reviews.llvm.org/D83088
This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D85393
LLVM rejects DWARF operator DW_OP_over. This DWARF operator is needed
for Flang to support assumed rank array.
Summary:
Currently LLVM rejects DWARF operator DW_OP_over. Below error is
produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151, 20, 16, 48, 30, 35, 80, 34, 6)
warning: ignoring invalid debug info in over.ll
[..]
There were some parts missing in support of this operator, which are
now completed.
Testing
-added a unit testcase
-check-debuginfo
-check-llvm
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89208
Similar to MCSymbol::print in 3d6c8ebb58
(llvm-svn: 81682, PR4966), these symbols may need to be quoted to be handled by
the linker correctly.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D87099
This was broken by 16295d521e, when
instructions started being handled and not just constant
expressions. This was re-inserting an equivalent bitcast to the
original memcpy operand, which made a non-functional IR change on
every iteration.
This also fixes a secondary problem where it was inserting
addrspacecasts which may not have been legal (i.e. it changed the
source address space). Start visiting all pointer users and fail out
if we can't process them. Also start handling the relevant memory
intrinsic users. These cases can be dealt with by running
InferAddressSpaces separately.
This patch adds support for assemble disassemble intrinsics
for MMA.
Reviewed By: bsaleil, #powerpc
Differential Revision: https://reviews.llvm.org/D88739
Based on the recent patches D88475 and D88429 where we are losing undef values due to extension/comparisons.
I've added a Constant::mergeUndefsWith method that merges the undef scalar/elements from another Constant into a specific Constant.
Differential Revision: https://reviews.llvm.org/D88687
This exposes the helper for other power-of-2 instcombine folds that I'm intending to add vector support to.
The helper only operated on power-of-2 constants so getExactLogBase2 is a more accurate name.
This patch adds support for DWARF attribute DW_AT_rank.
Summary:
Fortran assumed rank arrays have dynamic rank. DWARF attribute
DW_AT_rank is needed to support that.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D89141
It is possible to get a fltSemantics of a particular Type,
but there is no way to produce a Type based on a
fltSemantics.
This adds the function Type::getFloatingPointTy, which
will return the appropriate floating point Type for a given
fltSemantics.
ConstantFP is modified to use this function instead of
implementing it itself. Also some minor refactors to use
Type::getFltSemantics instead of a hand-rolled version.
Differential Revision: https://reviews.llvm.org/D87512
Drop `noundef` for return values that are replaced by void and make it
illegal to put `noundef` on a void value.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87306
Alignment attributes need to be dropped for non-pointer values.
This also introduces a check into the verifier to ensure you don't use
`align` on anything but a pointer. Test needed to be adjusted
accordingly.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87304
This is an alternate fix (see D87835) for a bug where a NaN constant
gets wrongly transformed into Infinity via truncation.
In this patch, we uniformly convert any SNaN to QNaN while raising
'invalid op'.
But we don't have a way to directly specify a 32-bit SNaN value in LLVM IR,
so those are always encoded/decoded by calling convert from/to 64-bit hex.
See D88664 for a clang fix needed to allow this change.
Differential Revision: https://reviews.llvm.org/D88238
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.
The code introduces an abstract template base class that generalizes the
comparison of IRs that takes an IR representation as template parameter.
Derived classes provide overrides that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.
The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.
Reviewed By: aeubanks (Arthur Eubanks), yrouban (Yevgeny Rouban), ychen (Yuanfang Chen), MaskRay (Fangrui Song)
Differential Revision: https://reviews.llvm.org/D86360
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
This came from @lebedev.ri's suggestion to use m_SpecificInt_ICMP for D88429 - since I was going to change the m_APInt to m_Constant for that patch I thought I would do it for the only other user of the APInt first.
I've added a ConstantExpr::getUMin helper - its trivial to add UMAX/SMIN/SMAX but thought I'd wait until we have use cases.
Differential Revision: https://reviews.llvm.org/D88475
This reverts commit 55c4ff91bd.
Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
It is not a good idea to expose raw constants in the LLVM C API. Replace this with an explicit getter.
Differential Revision: https://reviews.llvm.org/D88367
This commit fixes a regression (from LLVM 10 to LLVM 11 RC3) in the LLVM
C API.
Previously, commit 1ee6ec2bf removed the mask operand from the
ShuffleVector instruction, storing the mask data separately in the
instruction instead; this reduced the number of operands of
ShuffleVector from 3 to 2. AFAICT, this change unintentionally caused
a regression in the LLVM C API. Specifically, it is no longer possible
to get the mask of a ShuffleVector instruction through the C API. This
patch introduces new functions which together allow a C API user to get
the mask of a ShuffleVector instruction, restoring the functionality
which was previously available through LLVMGetOperand().
This patch also adds tests for this change to the llvm-c-test
executable, which involved adding support for InsertElement,
ExtractElement, and ShuffleVector itself (as well as constant vectors)
to echo.cpp. Previously, vector operations weren't tested at all in
echo.ll.
I also fixed some typos in comments and help-text nearby these changes,
which I happened to spot while developing this patch. Since the typo
fixes are technically unrelated other than being in the same files, I'm
happy to take them out if you'd rather they not be included in the patch.
Differential Revision: https://reviews.llvm.org/D88190
attachments. They would crash the backend, which expects all
DISubprograms that are not part of the type system to have a unit field.
Clang right before https://reviews.llvm.org/D79967 would generate this
kind of broken IR.
rdar://problem/69534688
Thanks to Fangrui for fixing an assembler test I had missed!
https://reviews.llvm.org/D88270
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
attachments. They would crash the backend, which expects all
DISubprograms that are not part of the type system to have a unit field.
Clang right before https://reviews.llvm.org/D79967 would generate this
kind of broken IR.
rdar://problem/69534688
Introduce a helper which can be used to update the debug location of an
Instruction after the instruction is hoisted. This can be used to safely
drop a source location as recommended by the docs.
For more context, see the discussion in https://reviews.llvm.org/D60913.
Differential Revision: https://reviews.llvm.org/D85670
The langref already states it does, but this wasn't implemented. Also
covers inalloca and preallocated. Also helps fix a dependence on
pointer element types.
Similar to the ConstantRange::getActiveBits(), and to similarly-named
methods in APInt, returns the bitwidth needed to represent
the given signed constant range
Much like APInt::getActiveBits(), computes how many bits are needed
to be able to represent every value in this constant range,
treating the values as unsigned.
Use the fact that `~X` is equivalent to `-1 - X`, which gives us
fully-precise answer, and we only need to special-handle the wrapped case.
This fires ~16k times for vanilla llvm test-suite + RawSpeed.
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.
The code introduces a template base class that generalizes the comparison
of IRs that takes an IR representation as template parameter. The
constructor takes a series of lambdas that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.
The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.
Reviewed By: aeubanks (Arthur Eubanks), yrouban (Yevgeny Rouban), ychen (Yuanfang Chen)
Differential Revision: https://reviews.llvm.org/D86360
This patch prevents the `llvm.masked.gather` and `llvm.masked.scatter` intrinsics to be scalarized when invoked on scalable vectors.
The change in `Function.cpp` is needed to prevent the warning that is raised when `getNumElements` is used in place of `getElementCount` on `VectorType` instances. The tests guards for regressions on this change.
The tests makes sure that calls to `llvm.masked.[gather|scatter]` are still scalarized when:
# the intrinsics are operating on fixed size vectors, and
# the compiler is not targeting fixed length SVE code generation.
Reviewed By: efriedma, sdesmalen
Differential Revision: https://reviews.llvm.org/D86249
This is needed to support assumed size array of fortran which can have missing upperBound/count
, contrary to current DISubrange support.
Example:
subroutine sub (array1, array2)
integer :: array1 (*)
integer :: array2 (4:9, 10:*)
array1(7:8) = 9
array2(5, 10) = 10
end subroutine
Now the validation check is relaxed for fortran.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D87500
We're now getting close to having the necessary analysis/combines etc. for the new generic llvm smax/smin/umax/umin intrinsics.
This patch updates the SSE/AVX integer MINMAX intrinsics to emit the generic equivalents instead of the icmp+select code pattern.
Differential Revision: https://reviews.llvm.org/D87603
I've amended the isLoadInvariantInLoop function to bail out for
scalable vectors for now since the invariant.start intrinsic is only
ever generated by the clang frontend for thread locals or struct
and class constructors, neither of which support sizeless types.
In addition, the intrinsic itself does not currently support the
concept of a scaled size, which makes it impossible to compare
the sizes of different scalable objects, e.g. <vscale x 32 x i8>
and <vscale x 16 x i8>.
Added new tests here:
Transforms/LICM/AArch64/sve-load-hoist.ll
Transforms/LICM/hoisting.ll
Differential Revision: https://reviews.llvm.org/D87227
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
In particular, we shouldn't make assumptions about globals which are
unnamed_addr: we can fold them together with other globals.
Also while I'm here, use isInterposable() instead of trying to
explicitly name all the different kinds of weak linkage.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47090
Differential Revision: https://reviews.llvm.org/D87123
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.
The code introduces a template base class that generalizes the comparison
of IRs that takes an IR representation as template parameter. The
constructor takes a series of lambdas that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.
The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.
See https://hotcrp.llvm.org/usllvm2020/paper/29 for more information.
Reviewed By: yrouban (Yevgeny Rouban)
Differential Revision: https://reviews.llvm.org/D86360
This also changes -lint from an analysis to a pass. It's similar to
-verify, and that is a normal pass, and lives in llvm/IR.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D87057
The 1st try was reverted because I missed an assert that
needed softening.
As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.
This requires relaxing asserts in GVN, but that pass
seems to be working otherwise.
NewGVN requires more work because it uses different
code paths for numbering binops and calls.
As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.
This requires relaxing an assert in GVN, but that pass
seems to be working otherwise.
NewGVN requires more work because it uses different
code paths for numbering binops and calls.
The original take was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.
However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it :
> It appears to cause compilation non-determinism and caused stage3 mismatches.
However InstCombine already does many different optimizations,
so it should be a safe place to do it here.
Note that we still can't just compare incoming values ranges,
because there is no guarantee that these PHI's we'd simplify to
were already re-visited and sorted.
However coming up with a test is problematic.
Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name | baseline | proposed | Δ | % | |%| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instcombine.NumPHICSEs | 0 | 22228 | 22228 | 0.00% | 0.00% |
| asm-printer.EmittedInsts | 7942329 | 7942456 | 127 | 0.00% | 0.00% |
| assembler.ObjectBytes | 254295632 | 254313792 | 18160 | 0.01% | 0.01% |
| early-cse.NumCSE | 2183283 | 2183272 | -11 | 0.00% | 0.00% |
| early-cse.NumSimplify | 550105 | 541842 | -8263 | -1.50% | 1.50% |
| instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined | 3640311 | 3666911 | 26600 | 0.73% | 0.73% |
| instcombine.NumDeadInst | 1778204 | 1783318 | 5114 | 0.29% | 0.29% |
| instcount.NumCallInst | 1758395 | 1758804 | 409 | 0.02% | 0.02% |
| instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% |
| instcount.NumPHIInst | 330557 | 330549 | -8 | 0.00% | 0.00% |
| instcount.TotalBlocks | 1077138 | 1077221 | 83 | 0.01% | 0.01% |
| instcount.TotalFuncs | 101442 | 101441 | -1 | 0.00% | 0.00% |
| instcount.TotalInsts | 8831946 | 8832611 | 665 | 0.01% | 0.01% |
| simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% |
| simplifycfg.NumSimpl | 1019813 | 999740 | -20073 | -1.97% | 1.97% |
```
So it fires ~22k times, which is less than ~24k the take 1 did.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).
All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
Rather than calling hasFnAttribute and then calling getFnAttribute
if the attribute exists, its better to just call getFnAttribute and
then check if we got a valid attribute back.
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.
So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.
This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.
Differential Revision: https://reviews.llvm.org/D86744
This patch fixes this crash https://gcc.godbolt.org/z/Ps8d1e
And gives SROA the ability to remove assumes if it allows promoting an alloca to register
Without removing assumes when it can't promote to register.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86570
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).
The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.
This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D86146
Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..
While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.
So i would think this is pretty uncontroversial.
On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name | baseline | proposed | Δ | % | \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE | 0 | 23779 | 23779 | 0.00% | 0.00% |
| asm-printer.EmittedInsts | 7942328 | 7942392 | 64 | 0.00% | 0.00% |
| assembler.ObjectBytes | 273069192 | 273084704 | 15512 | 0.01% | 0.01% |
| correlated-value-propagation.NumPhis | 18412 | 18539 | 127 | 0.69% | 0.69% |
| early-cse.NumCSE | 2183283 | 2183227 | -56 | 0.00% | 0.00% |
| early-cse.NumSimplify | 550105 | 542090 | -8015 | -1.46% | 1.46% |
| instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined | 3640264 | 3664769 | 24505 | 0.67% | 0.67% |
| instcombine.NumDeadInst | 1778193 | 1783183 | 4990 | 0.28% | 0.28% |
| instcount.NumCallInst | 1758401 | 1758799 | 398 | 0.02% | 0.02% |
| instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% |
| instcount.NumPHIInst | 330557 | 330533 | -24 | -0.01% | 0.01% |
| instcount.TotalInsts | 8831952 | 8832286 | 334 | 0.00% | 0.00% |
| simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% |
| simplifycfg.NumSimpl | 1019808 | 999607 | -20201 | -1.98% | 1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.
That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..
I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.
Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.
As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).
This reverts commit 8d5f64c4ed.
Thanks to Eli Friedma for pointing out that this check is not appropiate here,
this check will be moved to the Lint pass.
As FIXME said, they really should be checking for a single user,
not use, so let's do that. It is not *that* unusual to have
the same value as incoming value in a PHI node, not unlike
how a PHI may have the same incoming basic block more than once.
There isn't a nice way to do that, Value::users() isn't uniqified,
and Value only tracks it's uses, not Users, so the check is
potentially costly since it does indeed potentially involes
traversing the entire use list of a value.
This adapts the verifier checks for intrinsic get.active.lane.mask to the new
semantics of it as described in D86147. I.e., the second argument %n, which
corresponds to the loop tripcount, must be greater than 0 if it is a constant,
so check that.
Differential Revision: https://reviews.llvm.org/D86301
Changes:
* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with
`llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an
ordering function for the `ElementCount` class.
* Bits and pieces around printing the `ElementCount` to string streams.
To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:
1. When it doesn't make sense to deal with the scalable property, for
example:
a. When computing unrolling factors.
b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.
Note that in some places the idiom
```
unsigned VF = ...
auto VTy = FixedVectorType::get(ScalarTy, VF)
```
has been replaced with
```
ElementCount VF = ...
assert(!VF.Scalable && ...);
auto VTy = VectorType::get(ScalarTy, VF)
```
The assertion guarantees that the new code is (at least in debug mode)
functionally equivalent to the old version. Notice that this change had been
possible because none of the methods that are specific to `FixedVectorType`
were used after the instantiation of `VTy`.
Reviewed By: rengolin, ctetreau
Differential Revision: https://reviews.llvm.org/D85794
Changes:
* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Add `<` operator to `ElementCount` to be able to use
`llvm::SmallSetVector<ElementCount, ...>`.
* Bits and pieces around printing the ElementCount to string streams.
* Added a static method to `ElementCount` to represent a scalar.
To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:
1. When it doesn't make sense to deal with the scalable property, for
example:
a. When computing unrolling factors.
b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.
Differential Revision: https://reviews.llvm.org/D85794
This patch adds support for representing Fortran `character(n)`.
Primarily patch is based out of D54114 with appropriate modifications.
Test case IR is generated using our downstream classic-flang. We're in process
of upstreaming flang PR's but classic-flang has dependencies on llvm, so
this has to get in first.
Patch includes functional test case for both IR and corresponding
dwarf, furthermore it has been manually tested as well using GDB.
Source snippet:
```
program assumedLength
call sub('Hello')
call sub('Goodbye')
contains
subroutine sub(string)
implicit none
character(len=*), intent(in) :: string
print *, string
end subroutine sub
end program assumedLength
```
GDB:
```
(gdb) ptype string
type = character (5)
(gdb) p string
$1 = 'Hello'
```
Reviewed By: aprantl, schweitz
Differential Revision: https://reviews.llvm.org/D86305
Extend the `applyUpdates` in DominatorTree to allow a post CFG view,
different from the current CFG.
This patch implements the functionality of updating an already up to
date DT, to the desired PostCFGView.
Combining a set of updates towards an up to date DT and a PostCFGView is
not yet supported.
Differential Revision: https://reviews.llvm.org/D85472
If some of gc live value are not used in gc.relocate we can remove them
from gc-live bundle of statepoint instruction.
Also the CL removes duplicated Values in gc-live bundle.
Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85959
Currently ConstantExpr::getWithOperands does not handle FNeg and
subsequently treats FNeg as binary operator, leading to an assertion
failure or segmentation fault if built without assertions.
Originally I reproduced this with llvm-dis on a bitcode file, which I
unfortunately cannot share and also cannot really reduce.
But PR45426 describes the same issue and has a reproducer with Clang, so
I'll go with that.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86274
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.
Reviewers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D81555
We don't need a std::string for a literal string, we can use a
StringRef.
The addition of StringRefs produces a Twine that we can just call
str() without converting to a SmallString ourselves. Twine will
do that internally.
Existing implementation always aborts on syntax errors in a DataLayout
description. While this is meaningful for consuming textual IR modules, it is
inconvenient for users that may need fine-grained control over the layout from,
e.g., command-line options. Propagate errors through the parsing functions and
only abort in the top-level parsing function instead.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D85650
This code becomes dead for valid IR after 48f4312 and a96fc46. The reason for the test change is that the verifier reports the first verification error encountered, in some non-specified visit order. By removing the verification code in gc.relocates for a statepoint with inline gc operands, I change the error the verifier reports. And in one case, the checked for error is no longer possible with the bundle representation, so I simply delete the file.
The "gc-live" operand bundles were recently added, and all tests have been updated to use that format. A migration period was provided, though it's worth noting these intrinsics are experimental, so formally there is no compatibile requirement.
This is an extension to a96fc46. "gc-live" hadn't been implemented at the point that patch was initially posted.
(Forgot to land this a couple of weeks back.)
In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic.
The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy.
Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones.
Differential Revision: https://reviews.llvm.org/D80892
This avoid GUID lookup in Index.findSummaryInModule.
Follow up for D81242.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D85269
This patch restricts the behaviour of referencing via .Lfoo$local
local aliases, introduced in https://reviews.llvm.org/D73230, to
STV_DEFAULT globals only.
Hidden symbols via --fvisiblity=hidden (https://gcc.gnu.org/wiki/Visibility)
is an important scenario.
Benefits:
- Improves the size of object files by using fewer STT_SECTION symbols.
- The code reads a bit better (it was not obvious to me without going
back to the code reviews why the canBenefitFromLocalAlias function
currently doesn't consider visibility).
- There is also a side benefit in restoring the effectiveness of the
--wrap linker option and making the behavior of --wrap consistent
between LTO and normal builds for references within a translation-unit.
Note: this --wrap behavior (which is specific to LLD) should not be
considered reliable. See comments on https://reviews.llvm.org/D73230
for more.
Differential Revision: https://reviews.llvm.org/D85782
Previously ConstantFoldExtractElementInstruction() would only work with
insertelement instructions, not contants. This properly handles
insertelement constants as well.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85865
This recommits the following patches now that D85684 has landed
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
Introduce a helper on Instruction which can be used to update the debug
location after hoisting.
Use this in GVN and LICM, where we were mistakenly introducing new line
0 locations after hoisting (the docs recommend dropping the location in
this case).
For more context, see the discussion in https://reviews.llvm.org/D60913.
Differential Revision: https://reviews.llvm.org/D85670
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D85385
Replace the `ident_t` handling in Clang with the methods offered by the
OMPIRBuilder. This cuts down on the clang code as well as the
differences between the two, making further transitions easier. Tests
have changed but there should not be a real functional change. The most
interesting difference is probably that we stop generating local ident_t
allocations for now and just use globals. Given that this happens only
with debug info, the location part of the `ident_t` is probably bigger
than the test anyway. As the location part is already a global, we can
avoid the allocation, memcpy, and store in favor of a constant global
that is slightly bigger. This can be revisited if there are
complications.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D80735
The history of dropTriviallyDeadConstantArrays is like this. Because the appending linkage uses too much memory (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150105/251381.html), dropTriviallyDeadConstantArrays was introduced (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to release unused constant arrays. Recently, dropTriviallyDeadConstantArrays was improved (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to reduce its quadratic cost.
Our recent LTO profiling shows that when a target is large, 15-20% of time cost is from the SetVector::insert called by dropTriviallyDeadConstantArrays.
A large application has hundreds or thousands of modules; each module calls dropTriviallyDeadConstantArrays once for cleaning up tens of thousands of ConstantArrays a module has. In those ConstantArrays, usually around 5 can be deleted; a very very few deleted ConstantArrays reference other ConstantArrays: less than 10 out of millions.
Given this, the cost of SetVector::insert is mainly from the construction of WorkList from ArrayConstants. This motivated the fix that iterates ArrayConstants directly, and uses WorkList only when necessary.
Our evaluation shows that
1) The cumulative time percentage of dropTriviallyDeadConstantArrays is reduced from 15-17% to 4-6%.
2) For targets with LTO time > 20min, the time reduction is about 20%.
3) No observable performance impact for build without using LTO.
{F12506218}
{F12506221}
Reviewed By: mehdi_amini, tejohnson, jdoerfert
Differential Revision: https://reviews.llvm.org/D85379
As noted on PR46885, the number of mask elements should always be a power of 2, so to fix the static analyzer warning we are better off replacing the condition to <= 4, and I've added a pow2 assertion as well.
`DenseMapAPIntKeyInfo` is now located in `lib/IR/LLVMContextImpl.h`.
Moved it into `include/ADT/DenseMapInfo.h` to use it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85131
As discussed on D81500, this adds a more general ElementCount variant of the build helper and converts the (non-scalable) unsigned NumElts variant to use it internally.
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D83283
I found that propagateAttributes was ~23% of a thin link's run time
(almost 4x higher than the second hottest function). The main reason is
that it re-examines a global var each time it is referenced. This
becomes unnecessary once it is marked both non read only and non write
only. I added a set to avoid doing redundant work, which dropped the
runtime of that thin link by almost 15%.
I made a smaller efficiency improvement (no measurable impact) to skip
all summaries for a VI if the first copy is dead. I added an assert to
ensure that all copies are dead if any is. The code in
computeDeadSymbols marks all summaries for a VI as live. There is one
corner case where it was skipping marking an alias as live, that I
fixed. However, since the code earlier marked all copies of a preserved
GUID's VI as live, and each 'visit' marks all copies live, the only case
where this could make a difference is summaries that were marked live
when they were built initially, and that is only a few special compiler
generated symbols and inline assembly symbols, so it likely is never
provoked in practice.
Differential Revision: https://reviews.llvm.org/D84985
Pass the abs poison flag to the underlying ConstantRange
implementation, allowing CVP to simplify based on it.
Importantly, this recognizes that abs with poison flag is actually
non-negative...
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.
Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)
This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.
This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.
The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)
Reviewed By: asbirlea, aeubanks
Differential Revision: https://reviews.llvm.org/D84774
This adds a common API for compute constant ranges of intrinsics.
The intention here is that
a) we can reuse the same code across different passes that handle
constant ranges, i.e. this can be reused in SCCP
b) we only have to add knowledge about supported intrinsics to
ConstantRange, not any consumers.
Differential Revision: https://reviews.llvm.org/D84587
All analysis are preserved if there's no local change, and thanks to
3667d87a33 this property is enforced for all
passes.
Skipping the dependency computation improves the performance when there's a lot
of small functions, where only a few change happen.
Thanks to Nikita Popov who provided this numbers (extract below)
https://llvm-compile-time-tracker.com/compare.php?from=183342c0a9850e60dd7a004b651c83dfb3a7d25e&to=f2f91e6a2743070471cc9471e4e8c646e50c653c&stat=instructions
O3: (number of instructions)
Benchmark Old New
kimwitu++ 60783M 59968M (-1.34%)
sqlite3 73200M 73083M (-0.16%)
consumer-typeset 52776M 52712M (-0.12%)
Bullet 133709M 132940M (-0.58%)
tramp3d-v4 123864M 123186M (-0.55%)
mafft 55534M 55477M (-0.10%)
ClamAV 76292M 76164M (-0.17%)
lencod 103190M 103061M (-0.13%)
SPASS 64068M 63713M (-0.55%)
7zip 197332M 196308M (-0.52%)
geomean 85750M 85389M (-0.42%)
Differential Revision: https://reviews.llvm.org/D80707
This is the part of the patch that's moving the Updates to a CFGDiff
object. Splitting off from the clean-up work merging the two branches when BUI is null.
Differential Revision: https://reviews.llvm.org/D77341
This is the first of two patches to address PR46753. We basically allow
mem2reg to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The uses of the alloca (or a bitcast or
zero offset GEP from there) are replaced by `undef` in the droppable
instructions.
Reviewed By: Tyker
Differential Revision: https://reviews.llvm.org/D83976
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.
This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
The new implementation makes it clear that there are exactly two
conditional stores (after the initial no-op optimization). By contrast
the old implementation had seven conditionals, some hidden inside other
functions.
This commit can change the order of operands in operand lists, hence the
tweak to one test case.
Differential Revision: https://reviews.llvm.org/D80116
Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.
for pointer array (before allocation/association)
without DW_AT_associated
(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_associated
(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>
for allocatable array (before allocation)
without DW_AT_allocated
(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_allocated
(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>
Testing
- unit test cases added
- check-llvm
- check-debuginfo
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83544
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.
This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.
My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.
This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.
Background:
There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.
It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.
The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.
This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.
Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.
The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.
Additional context is in D79744.
As requested by Andrew Kaylor, rewrite this code in a way that does
not warn on old MSVC versions.
Avoid the buggy constexpr warning by just not using constexpr and
removing the static_assert that depends on it.
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.
The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
This reverts most of the following patches due to reports of miscompiles.
I've left the added test cases with comments updated to be FIXMEs.
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
If no alignment is specified we try to find the datalayout by using the insert position to get the module so we can get the datalayout. But if those are null, then we deference a null pointer.
This patch adds asserts to make the failure a little more obvious than just seg faulting.
Differential Revision: https://reviews.llvm.org/D83829
Summary:
Ignore callback uses when adding a callback function
in the CallGraph. Callback functions are typically
created when outlining, e.g. for OpenMP, so they have
internal scope and linkage. They should not be added
to the ExternalCallingNode since they are only callable
by the specified caller function at creation time.
A CGSCC pass, such as OpenMPOpt, may need to update
the CallGraph by adding a new outlined callback function.
Without ignoring callback uses, adding breaks CGSCC
pass restrictions and results to a broken CallGraph.
Reviewers: jdoerfert
Subscribers: hiraditya, sstefan1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83370
This restores commit 80d0a137a5, and the
follow on fix in 873c0d0786, with a new
fix for test failures after a 2-stage clang bootstrap, and a more robust
fix for the Chromium build failure that an earlier version partially
fixed. See also discussion on D75201.
Reviewers: evgeny777
Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73242
This changes the matrix load/store intrinsic definitions to load/store from/to
a pointer, and not from/to a pointer to a vector, as discussed in D83477.
This also includes the recommit of "[Matrix] Tighten LangRef definitions and
Verifier checks" which adds improved language reference descriptions of the
matrix intrinsics and verifier checks.
Differential Revision: https://reviews.llvm.org/D83785
The approach is simple: if a pass reports that it's not modifying a
Function/Module, compute a loose hash of that Function/Module and compare it
with the original one. If we report no change but there's a hash change, then we
have an error.
This approach misses a lot of change but it's not super intrusive and can
detect most of the simple mistakes.
Differential Revision: https://reviews.llvm.org/D80916
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
This tightens the matrix intrinsic definitions in LLVM LangRef and adds
correspondings checks to the IR Verifier.
Differential Revision: https://reviews.llvm.org/D83477
PassInfoMixin should be used for all NPM passes, rater than a custom
`name()`.
This caused ambiguous references in LegacyPassManager.cpp, so had to
remove "using namespace llvm::legacy" and move some things around.
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D83498
PassInfoMixin should be used for all NPM passes, rater than a custom
`name()`.
This caused ambiguous references in LegacyPassManager.cpp, so had to
remove "using namespace llvm::legacy" and move some things around.
The passes had to be moved to the llvm namespace, or else they would get
printed as "(anonymous namespace)::FooPass".
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D83498
Summary:
Ignore callback uses when adding a callback function
in the CallGraph. Callback functions are typically
created when outlining, e.g. for OpenMP, so they have
internal scope and linkage. They should not be added
to the ExternalCallingNode since they are only callable
by the specified caller function at creation time.
A CGSCC pass, such as OpenMPOpt, may need to update
the CallGraph by adding a new outlined callback function.
Without ignoring callback uses, adding breaks CGSCC
pass restrictions and results to a broken CallGraph.
Reviewers: jdoerfert
Subscribers: hiraditya, sstefan1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83370
Since none of these users really care about the actual type, hide the
type under a new size-getting attribute to go along with
hasPassPointeeByValueAttr. This will work better for the future byref
attribute, which may end up only tracking the byte size and not the IR
type.
We currently have 3 parameter attributes that should carry the type
(technically inalloca does not yet). The APIs are somewhat awkward
since preallocated/inalloca piggyback on byval in some places, but in
others are treated as distinct attributes. Since these are all
mutually exclusive, we should probably just merge all the attribute
infrastructure treating these as totally distinct attributes.
The `noundef` attribute indicates an argument or return value which
may never have an undef value representation.
This patch allows LLVM to parse the attribute.
Differential Revision: https://reviews.llvm.org/D83412
This cleans up the stack allocated by a @llvm.call.preallocated.setup.
Should either call the teardown or the preallocated call to clean up the
stack. Calling both is UB.
Add LangRef.
Add verifier check that the token argument is a @llvm.call.preallocated.setup.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D83354
The approach is simple: if a pass reports that it's not modifying a
Function/Module, compute a loose hash of that Function/Module and compare it
with the original one. If we report no change but there's a hash change, then we
have an error.
This approach misses a lot of change but it's not super intrusive and can
detect most of the simple mistakes.
Differential Revision: https://reviews.llvm.org/D80916
Summary:
Make Constant::getSplatValue recognize scalable vector splats of the
form created by ConstantVector::getSplat. Add unit test to verify that
C == ConstantVector::getSplat(C)->getSplatValue() for fixed width and
scalable vector splats
Reviewers: efriedma, spatel, fpetrogalli, c-rhodes
Reviewed By: efriedma
Subscribers: sdesmalen, tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82416
Summary:
It is reasonably common to want to clone some call with different bundles.
Let's actually provide an interface to do that.
Reviewers: chandlerc, jdoerfert, dblaikie, nickdesaulniers
Reviewed By: nickdesaulniers
Subscribers: llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83248
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.
Minimal reduced reproducer: run `opt -alignment-from-assumptions` on
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }
; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
ret i32 0
}
; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1
attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }
This is what we'd have with -mllvm -enable-knowledge-retention
This reverts commit c95ffadb24.
Summary:
D80831 changed part of the prefix usage for AIX.
But there are other places getting prefix from DataLayout.
This patch intends to make prefix usage consistent on AIX.
Reviewed by: hubert.reinterpretcast, daltenty
Differential Revision: https://reviews.llvm.org/D81270
As per [MaxAlignmentExponent]{b7338fb1a6/llvm/include/llvm/IR/Value.h (L688)} alignment is not allowed to be more than 2^29.
Encoded as Log2, this means that storing alignment uses 5 bits.
This patch makes sure all instructions store their alignment in a consistent way, encoded as Log2 and using 5 bits.
Differential Revision: https://reviews.llvm.org/D83119
It's messy to pattern-match, and completely unnecessary: scalar indexes
work equally well.
See also discussion on D81620 and D82061.
Differential Revision: https://reviews.llvm.org/D82430
In most cases, this doesn't have much impact: the destructors just call
the base class destructor anyway. A few subclasses of ConstantExpr
actually store non-trivial data, though. Make sure we clean up
appropriately.
This is sort of ugly, but I don't see a good alternative given the
constraints.
Issue found by asan buildbots running the testcase for D80330.
Differential Revision: https://reviews.llvm.org/D82509
This is a followup on D78403.
I'm unsure about `getAtomicOpAlign` overloads that take `AtomicRMWInst` and `AtomicCmpXchgInst`, shouldn't `getAlign` provide the correct answer already?
Differential Revision: https://reviews.llvm.org/D81369
I noticed that for some benchmarks we spend quite a bit of time
inside AttributeList::hasAttrSomewhere(), mainly when checking
for the "returned" attribute. Most of the time the attribute will
not be present, in which case this function has to walk through
the whole attribute list and check for the attribute at each index.
This patch adds a cache of all "available somewhere" attributes
inside AttributeListImpl. This makes the structure 12 bytes larger,
but I don't think that's problematic, as attribute lists are uniqued.
Compile-time in terms of instructions retired improves by 0.4% on
average, but >1% for sqlite.
Differential Revision: https://reviews.llvm.org/D81867
When calling on-the-fly passes from the legacy pass manager, the modification
status is not reported, which is a problem in case we depend on an acutal
transformation pass, and not only analyse.
Update the Legacy PM API to optionally report the changed status, assert if a
change is detected but this change is lost.
Related to https://reviews.llvm.org/D80916
Differential Revision: https://reviews.llvm.org/D81236
Fixed an issue in DataLayout::getIntPtrType where we were assuming
the input type was always a fixed vector type, which isn't true.
Added a test that exposed the problem to:
Transforms/InstCombine/vector_gep1.ll
Differential Revision: https://reviews.llvm.org/D82294
This is cleaning up comments (mostly in the bitcode handling) about
removing some backward compatibility aspect in the 4.0 release.
Historically, "4.0" was used during the development of the 3.x
versions as "this future major breaking change version". At the time
the major number was used to indicate the compatibility. When we
reached 3.9 we decided to change the numbering, instead of going to
3.10 we went to 4.0 but after changing the meaning of the major
number to not mean anything anymore with respect to bitcode backward
compatibility.
The current policy
(https://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility)
indicates only now:
The current LLVM version supports loading any bitcode since version 3.0.
Differential Revision: https://reviews.llvm.org/D82514
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
This patch fixes a compiler crash that was hit when trying to simplify
the following code:
getelementptr [2 x i64], [2 x i64]* null, i64 0, <vscale x 2 x i64> zeroinitializer
For the case where we have a null pointer value like above, we just
need to ensure we don't assume the indices are always fixed width.
Differential Revision: https://reviews.llvm.org/D82183
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
This has two advantages: one, it's simpler, and two, it doesn't require
heroic pattern matching with scalable vectors.
Also includes a small fix to DataLayout to allow the scalable vector
testcase to work correctly.
Differential Revision: https://reviews.llvm.org/D82061