This is a follow-up to https://reviews.llvm.org/D100387.
std::vector is not the best storage container here. My local benchmark (counting
the number of instruction when compiling the sqlite3 amalgamation) yields the
following:
- std::vector<BitVector> -> 5,860,885,896
- SmallVector<BitWord, 0> -> 5,858,991,997
- SmallVector<BitWord> -> 5,817,679,224
Differential Revision: https://reviews.llvm.org/D100744
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.
This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.
After this lands and has baked for a couple days, planned cleanups:
Make GCStatepointInst a IntrinsicInst subclass.
Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
Do the same in SelectionDAG.
Do the same in FastISEL.
Differential Revision: https://reviews.llvm.org/D99976
Summary:
This patch registers OpenMPOpt as a Module pass in addition to a CGSCC
pass. This is so certain optimzations that are sensitive to intact
call-sites can happen before inlining. The old `openmpopt` pass name is
changed to `openmp-opt-cgscc` and `openmp-opt` calls the Module pass.
The current module pass only runs a single check but will be expanded in
the future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D99202
apple-m1 has the same level of ISA support as apple-a14,
so this is a straightforward mechanical change. However, that
also means this inherits apple-a14's v8.5a+nobti quirkiness.
rdar://68287159
Make sure that the `CriticalMemoryInstruction` of a memory group is invalidated
if it references an already executed instruction. This avoids a potential
use-after-free if the critical memory info becomes stale, and the value is
read after the instruction has executed.
Previously we would use the type of the pointee to determine what to
cast the result of constant folding a load. To aid with opaque pointer
types, we should explicitly pass the type of the load rather than
looking at pointee types.
ConstantFoldLoadThroughBitcast() converts the const prop'd value to the
proper load type (e.g. [1 x i32] -> i32). Instead of calling this in
every intermediate step like bitcasts, we only call this when we
actually see the global initializer value.
In some existing uses of this API, we don't know the exact type we're
loading from immediately (e.g. first we visit a bitcast, then we visit
the load using the bitcast). In those cases we have to manually call
ConstantFoldLoadThroughBitcast() when simplifying the load to make sure
that we cast to the proper type.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100718
This patch relaxes the requirement that the STEP_VECTOR step constant
must be of a type at least as large as the vector element type. This
does not permit its use on targets which have legal vector element types
larger than the largest legal scalar type, such as i64 vectors on RV32.
As such, the requirement has been loosened so that the step operand must
be any scalar type so long as the constant immediate is non-negative and
the value fits inside the vector element type.
This limits combining optimizations in certain circumstances but in
practice it's unlikely to be a hindrance.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D100660
Flipping the default value of SkipPseudoOp to true for those MIR APIs to favor maximum performance. Note that certain spots like branch folding and MIR if-conversion is are disabled for better counts quality. For these two optimizations, this is a no-diff change.
The counts quality with SPEC2017 before/after this change is unchanged.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D100332
When the ProcResGroup has BufferSize=0,
1. if there is a subunit in the list of write resources for the
scheduling class, do not attempt to schedule the ProcResGroup.
2. if there is not a subunit in the list of write resources for the
scheduling class, choose a subunit to use instead of the ProcResGroup.
3. having both the ProcResGroup and any of its subunits in the resources
implied by a InstRW is not supported.
Used to model parallel uses from a pool of resources.
Differential Revision: https://reviews.llvm.org/D98976
These constraints are machine agnostic; there's no reason to handle
these per-arch. If arches don't support these constraints, then they
will fail elsewhere during instruction selection. We don't need virtual
calls to look these up; TargetLowering::getInlineAsmMemConstraint should
only be overridden by architectures with additional unique memory
constraints.
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D100416
It turns out we actually import a bunch of selection code for intrinsics. The
imported code checks that the register banks on the G_INTRINSIC instruction
are correct. If so, it goes ahead and selects it.
This adds code to AArch64RegisterBankInfo to allow us to correctly determine
register banks on intrinsics which have known register bank constraints.
For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for
porting AArch64TargetLowering::LowerCTPOP.
Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction.
This seems a little nicer than having to know about how intrinsic instructions
are structured.
Differential Revision: https://reviews.llvm.org/D100398
Move <string> include to ImportedFunctionsInliningStatistics.cpp and add missing <memory> include as we have explicit uses of std::unique_ptr in the header.
protectMappedMemory no longer returns an error message, so we don't need std::string - I've fixed an unnecessary doxygen entry as well (oddly I wasn't seeing a Wdocumentation warning)
This patch corrects more instances of text files being opened as text.
Reviewed By: Jonathan.Crowther
Differential Revision: https://reviews.llvm.org/D100654
Move the findDbg* functions into lib/IR/DebugInfo.cpp from
lib/Transforms/Utils/Local.cpp.
D99169 adds a call to a function (findDbgUsers) that lives in
lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp
(LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught
this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch
moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp
which is part of LLVMCore.
Reviewed By: dblaikie, rnk
Differential Revision: https://reviews.llvm.org/D100632
As being discussed in https://reviews.llvm.org/D100721,
this modelling is lossy, we can't reconstruct `ash`/`ashr exact`
from it, which means that whenever we actually expand the IR,
we've just pessimized the code..
It would be good to model this pattern, after all it comes up every time
you want to compute a distance between two pointers, but not at this cost.
This reverts commit ec54867df5.
This fixes https://reviews.llvm.org/D93990#2666922
by teaching `m_Undef` to match vectors/aggrs with poison elements.
As suggested, fixes in InstCombine files to use the `m_Undef` matcher instead
of `isa<UndefValue>` will be followed.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D100122
At the moment, ReversePostOrderTraversal performs a post-order walk on
the entry node of the passed in graph, rather than the graph type
itself.
If GT::NodeRef is the same as GraphT, everything works as expected and
this is the case for the current uses in-tree. But it does not work as
expected if GraphT != GT::NodeRef. In that case, we either fail to build
(if there is no GraphTrait specialization for GT:NodeRef) or we pick the
GraphTrait specialization for GT::NodeRef, instead of the specialization
of GraphT.
Both the depth-first and post-order iterators pick the expected
specalization and this patch updates ReversePostOrderTraversal to
delegate to po_begin & po_end to pick the right specialization, rather
than forcing using GraphTraits<GT::NodeRef>, by first getting the entry
node.
This makes `ReversePostOrderTraversal<Graph<6>> RPOT(G);` build and
work as expected in the test.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D100169
This patch updates a couple of functions that unnecessarily took the
input graph by value, when it was not needed. They can take the graph by
const-reference instead, which does not require GraphT to provide a copy
constructor.
Split off from D100169.
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from
if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
to
if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
Introduce a getValueAsBool that normalize the check, with the following
behavior:
no attributes or attribute set to "false" => return false
attribute set to "true" => return true
Differential Revision: https://reviews.llvm.org/D99299
Instead of managing memory by hand, delegate it to std::vector. This makes the
code much simpler, and also avoids repeatedly computing the storage size.
According to valgrind --tool=callgrind, this also slightly decreases the
instruction count, but by a small margin.
This is a recommit of 82f0e3d3ea with one usage
fixed in llvm/lib/CodeGen/RegisterScavenging.cpp.
Not the slight API change: BitVector::clear() now has the same behavior as any
other container: it does not free memory, but indeed sets the size of the
BitVector to 0. It is thus incorrect to access its content right afterwards, a
scenario which wasn't enforced in previous implementation.
Differential Revision: https://reviews.llvm.org/D100387
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.
Differential Revision: https://reviews.llvm.org/D100596
Attributes don't know their parent Context, adding this would make Attribute larger. Instead, we add hasParentContext that answers whether this Attribute belongs to a particular LLVMContext by checking for itself inside the context's FoldingSet. Same with AttributeSet and AttributeList. The Verifier checks them with the Module context.
Differential Revision: https://reviews.llvm.org/D99362
The include is for std::swap(), but that's in <utility> in C++11,
and Hashing.h already includes that.
Differential Revision: https://reviews.llvm.org/D100657
MathExtras.h is indirectly included in over 98% of LLVM's
translation units. It currently expands to over 1MB of stuff,
over which far more than half is due to <algorithm>. Since not
using <algorithm> is slightly less code, do that.
No behavior change.
Differential Revision: https://reviews.llvm.org/D100656
The implementation supports static schedule for Fortran do loops. This
implements the dynamic variant of the same concept.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D97393
On Windows, we want to open a file in Binary mode if OF_CRLF bit is not set. On z/OS, we want to open a file in Binary mode if the OF_Text bit is not set.
This patch creates two new functions called ChangeStdinMode and ChangeStdoutMode which will take OpenFlags as an arg to determine which mode to set stdin and stdout to. This will enable patches like https://reviews.llvm.org/D100056 to not affect Windows when setting the OF_Text flag for raw_fd_streams.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D100130
These were misleading, they're more of a "clear" than an "invalidate".
We shouldn't be individually clearing analysis results. Either we clear
all analyses when some IR becomes invalid, or we properly go through
invalidation.
There was only one use of this, which can be simulated with
AM.invalidate(F, PA).
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D100519