There's hardly any functionality change here. Instead of calling
materializeMetadata on the first call to materialize(GlobalValue*), wait
until the first one that's actually going to do something. Noticed by
inspection; I don't have a concrete case where this makes a difference.
Added an assertion in materializeMetadata to be sure this (or a future
change) doesn't delay materializeMetadata after function-level metadata.
llvm-svn: 267345
Summary:
Remove the GlobalValueInfo and change the ModuleSummaryIndex to directly
reference summary objects. The info structure was there to support lazy
parsing of the combined index summary objects, which is no longer
needed and not supported.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19462
llvm-svn: 267344
Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm.
This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472.
llvm-svn: 267343
This fixes PR22248 on s390x. The previous attempt at this was D19101,
which was before LOAD_STACK_GUARD existed. Compared to the previous
version, this always emits a rather ugly block of 4 instructions, involving
a thread pointer load that can't be shared with other potential users.
However, this is necessary for SSP - spilling the guard value (or thread
pointer used to load it) is counter to the goal, since it could be
overwritten along with the frame it protects.
Differential Revision: http://reviews.llvm.org/D19363
llvm-svn: 267340
Add tests for some missing cases to bitcode upgrade in r267296.
- DICompositeType with an 'elements:' field, which will cause it to be
involved in a cycle after the upgrade.
- A DIDerivedType that references a class in 'extraData:'.
I updated test/Bitcode/dityperefs-3.8.ll with the missing cases and
regenerated test/Bitcode/dityperefs-3.8.ll.bc.
llvm-svn: 267332
The original patch caused crashes because it could derefence a null pointer
for SelectionDAGTargetInfo for targets that do not define it.
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267328
in a debug-info-bearing function has a debug location attached to it. Failure to
do so causes an "!dbg attachment points at wrong subprogram for function"
assertion failure when the inliner sets up inline scope info.
rdar://problem/25878916
llvm-svn: 267320
Right now it only contains the LinkageType, but will be extended
with "hasSection", "isOptSize", "hasInlineAssembly", etc.
Differential Revision: http://reviews.llvm.org/D19404
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267319
Keeping as much as possible internal/private is
known to help the optimizer. Let's try to benefit from
this in ThinLTO.
Note: this is early work, but is enough to build clang (and
all the LLVM tools). I still need to write some lit-tests...
Differential Revision: http://reviews.llvm.org/D19103
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267317
The option to control the emission of the new relocations
is -relax-relocations (blatantly copied from GNU as).
It can't be enabled by default because it breaks relatively
recent versions of ld.bfd/ld.gold (late 2015).
llvm-svn: 267307
Summary:
As discussed in D18298, some local globals can't
be renamed/promoted (because they have a section, or because
they are referenced from inline assembly).
To be able to detect naming collision, we need to keep around
the "GUID" using their original name without taking the linkage
into account.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19454
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267304
Summary:
We are always importing the initializer for a GlobalVariable.
So if a GlobalVariable is in the export-list, we pull in any
refs as well.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19102
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267303
A long overdue change to make DIGlobalVariable distinct. Much like
DISubprogram definitions (changed in r246098), it isn't logical to
unique DIGlobalVariable definitions from two different compile units.
(Longer-term, we should also find a way to reverse the link between
GlobalVariable and DIGlobalVariable, and between DIGlobalVariable and
DICompileUnit, so that debug info to do with optimized-out globals
disappears. Admittedly it's harder than with Function/DISubprogram,
since global variables may be constant-folded and the debug info should
still describe that somehow.)
llvm-svn: 267301
Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around
DIType*. It is no longer legal to refer to a DICompositeType by its
'identifier:', and DIBuilder no longer retains all types with an
'identifier:' automatically.
Aside from the bitcode upgrade, this is mainly removing logic to resolve
an MDString-based reference to an actualy DIType. The commits leading
up to this have made the implicit type map in DICompileUnit's
'retainedTypes:' field superfluous.
This does not remove DITypeRef, DIScopeRef, DINodeRef, and
DITypeRefArray, or stop using them in DI-related metadata. Although as
of this commit they aren't serving a useful purpose, there are patchces
under review to reuse them for CodeView support.
The tests in LLVM were updated with deref-typerefs.sh, which is attached
to the thread "[RFC] Lazy-loading of debug info metadata":
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html
llvm-svn: 267296
Since forward references for uniqued node operands are expensive (and
those for distinct node operands are cheap due to
DistinctMDOperandPlaceholder), minimize forward references in uniqued
node operands.
Moreover, guarantee that when a cycle is broken by a distinct node, none
of the uniqued nodes have any forward references. In
ValueEnumerator::EnumerateMetadata, enumerate uniqued node subgraphs
first, delaying distinct nodes until all uniqued nodes have been
handled. This guarantees that uniqued nodes only have forward
references when there is a uniquing cycle (since r267276 changed
ValueEnumerator::organizeMetadata to partition distinct nodes in front
of uniqued nodes as a post-pass).
Note that a single uniqued subgraph can hit multiple distinct nodes at
its leaves. Ideally these would themselves be emitted in post-order,
but this commit doesn't attempt that; I think it requires an extra pass
through the edges, which I'm not convinced is worth it (since
DistinctMDOperandPlaceholder makes forward references quite cheap
between distinct nodes).
I've added two testcases:
- test/Bitcode/mdnodes-distinct-in-post-order.ll is just like
test/Bitcode/mdnodes-in-post-order.ll, except with distinct nodes
instead of uniqued ones. This confirms that, in the absence of
uniqued nodes, distinct nodes are still emitted in post-order.
- test/Bitcode/mdnodes-distinct-nodes-break-cycles.ll is the minimal
example where a naive post-order traversal would cause one uniqued
node to forward-reference another. IOW, it's the motivating test.
llvm-svn: 267278
When an operand of a distinct node hasn't been read yet, the reader can
use a DistinctMDOperandPlaceholder. This is much cheaper than forward
referencing from a uniqued node. Change
ValueEnumerator::organizeMetadata to partition distinct nodes and
uniqued nodes to reduce the overhead of cycles broken by distinct nodes.
Mehdi measured this for me; this removes most of the RAUW from the
importing step of -flto=thin, even after a WIP patch that removes
string-based DITypeRefs (introducing many more cycles to the metadata
graph).
llvm-svn: 267276
Summary:
As discussed in on the mailing list yesterday, I have refactored
BitcodeWriter.cpp to use classes to manage the bitcode writing process,
instead of passing around long lists of parameters between static
functions. See:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098610.html
I created a parent BitcodeWriter class to own the BitstreamWriter,
write the header, and contain the main entry point into the writing
process. There are two derived classes, one for writing a module and one
for writing a combined index file (for ThinLTO), which manage the
writing process specific to those bitcode file types.
I also changed the functions to conform to LLVM coding standards
(lowercase function name first letter). The only two routines that still
start with an uppercase letter are the two external interfaces, which
can be fixed as a follow-on (I wanted to keep this round just within
BitcodeWriter.cpp).
Reviewers: dexonsmith, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19447
llvm-svn: 267273
Mehdi's pattern recognition pulled this one out. This is cleaner with
std::find_if than with the strange helper function that took an iterator
by reference and updated it.
llvm-svn: 267271
Each reference to an unresolved MDNode is expensive, since the RAUW
support in MDNode uses a separate allocation and side map. Since
a distinct MDNode doesn't require its operands on creation (unlike
uniuqed nodes, there's no need to check for structural equivalence),
use nullptr for any of its unresolved operands. Besides reducing the
burden on MDNode maps, this can avoid allocating temporary MDNodes in
the first place.
We need some way to track operands. Invent DistinctMDOperandPlaceholder
for this purpose, which is a Metadata subclass that holds an ID and
points at its single user. DistinctMDOperandPlaceholder::replaceUseWith
is just like RAUW, but its name highlights that there is only ever
exactly one use.
There is no support for moving (or, obviously, copying) these. Move
support would be possible but expensive; leaving it unimplemented
prevents user error. In the BitcodeReader I originally considered
allocating on a BumpPtrAllocator and keeping a vector of pointers to
them, and then I realized that std::deque implements exactly this.
A couple of obvious follow-ups:
- Change ValueEnumerator to emit distinct nodes first to take more
advantage of this optimization. (How convenient... I think I might
have a couple of patches for this.)
- Change DIBuilder and its consumers (like CGDebugInfo in clang) to
use something like this when constructing debug info in the first
place.
llvm-svn: 267270
Consistently use the IsDistinct variable and start relying on it in
GET_OR_DISTINCT. This change has NFC, but prepares for using IsDistinct
to optimize the behaviour of the getMD() and getMDOrNull() helpers.
llvm-svn: 267268
The only functionality change was removing an error check from the
BitcodeReader (and an assertion from DILocation::getImpl) that is
already caught by Verifier::visitDILocation. The Verifier is a better
place for this anyway, and being inconsistent with other subclasses of
MDNode isn't serving anyone.
llvm-svn: 267267
Only one consumer (llvm-objdump) actually cared about the fact that there were
two triples. Others were actively working around the fact that the Triple
returned by getArch might have been invalid. As for llvm-objdump, it needs to
be acutely aware of both Triples anyway, so being generic in the exposed API is
no benefit.
Also rename the version of getArch returning a Triple. Users were having to
pass an unwanted nullptr to disambiguate the two, which was nasty.
The only functional change here is that armv7m and armv7em object files no
longer crash llvm-objdump.
llvm-svn: 267249
The dwo_name was added to dwo files to improve diagnostics in dwp, but
it confuses tools that attempt to load any dwo named by a dwo_name, even
ones inside dwos. Avoid this by keeping track of whether a unit is
already a dwo unit, and if so, not loading further dwos.
llvm-svn: 267241
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
Rather than relying on the gmlt-like data emitted into the .o/executable
which only contains the simple name of any inlined functions, use the
.dwo file if present.
Test symbolication with/without a .dwo, and the old test that was
testing behavior when no gmlt-like data was present. (I haven't included
a test of non-gmlt-like data + no .dwo (that would be akin to
symbolication with no debug info) but we could add one for completeness)
The test was simplified a bit to be a little clearer (unoptimized, force
inline, using a function call as the inlined entity) and regenerated
with ToT clang. For the no-gmlt-like-data case, I modified Clang back to
its old behavior temporarily & the .dwo file is identical so it is
shared between the two executables.
llvm-svn: 267227
Summary: The clang assembler assumes that the discriminator remains the same when there is source line change. The correct behavior is that when there is line change, discriminator will automatically reset to 0.
Reviewers: dnovillo, davidxl, echristo
Subscribers: echristo, llvm-commits
Differential Revision: http://reviews.llvm.org/D19436
llvm-svn: 267226
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads
a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form ``i32 trunc(x - %ptr)``,
the intrinsic call is folded to ``x``.
LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if ``x`` is an
``unnamed_addr`` function. However, it does not provide this guarantee for
a constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a constant.
Differential Revision: http://reviews.llvm.org/D18367
llvm-svn: 267223
This patch changes the interface for createPGOFuncNameMetadata() where we add
another PGOFuncName argument.
Differential Revision: http://reviews.llvm.org/D19433
llvm-svn: 267216
Summary:
We can fold compares to false when two distinct allocations within a
function are compared for equality.
Patch by Anna Thomas!
Reviewers: majnemer, reames, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19390
llvm-svn: 267214
The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual
functions defined in other DSOs. The unnamed_addr attribute means that the
function's address is not significant, so we're allowed to substitute it
with the address of a PLT entry.
Also includes a bonus feature: addends for COFF image-relative references.
Differential Revision: http://reviews.llvm.org/D17938
llvm-svn: 267211
Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case.
llvm-svn: 267210
The opcode for the optimized branch does not depend on the size
of the activate bits in the AND masks, but the AND opcode itself.
Indeed, we need to use a X or W variant based on the AND variant
not based on whether the mask fits into the related variant.
Otherwise, we may end up using the W variant of the optimized branch
for 64-bit register inputs!
This fixes the last make check verifier issues for AArch64: PR27479.
llvm-svn: 267206
Summary: This change will shorten memset if the beginning of memset is overwritten by later stores.
Reviewers: hfinkel, eeckstein, dberlin, mcrosier
Subscribers: mgrang, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18906
llvm-svn: 267197
E.g. for:
!1 = {"llvm.distribute", i32 1}
it now returns the MDOperand for 1.
I will use this in LoopDistribution to check the value of the metadata.
Note that the change is backward-compatible with its current use in
LoopVersioningLICM. An Optional implicitly converts to a bool depending
whether it contains a value or not.
llvm-svn: 267190
Avoid quadratic complexity in unusually large basic blocks by limiting
the size of the ready lists.
Differential Revision: http://reviews.llvm.org/D19349
llvm-svn: 267189
We used to simply set the kill flags to true when transforming a scalar
instruction to a vector one.
SrcScalar1 = copy SrcVector1
... = opScalar SrcScalar1
=>
SrcScalar1 = copy SrcVector1
... = opVector SrcVector1<kill>
This is obviously wrong. The proper update consists in:
1. Propagate the kill status from the copy to the new opVector
2. Reset the kill status on the copy, since the live-range of
SrcVector1 got extended.
This fixes some of the machine verifier errors for AArch64 with make check.
llvm-svn: 267180
Summary: eq imply [u|s]ge and [u|s]le are true.
Remove redundant logic by implementing isImpliedFalseByMatchingCmp(Pred1, Pred2)
as isImpliedTrueByMatchingCmp(Pred1, getInversePredicate(Pred2)).
llvm-svn: 267177
Summary:
(... while still not using a PostDomTree)
The way we use isKnownNotFullPoison from SCEV today, the new CFG walking
logic will not trigger for any realistic cases -- it will kick in only
for situations where we could have merged the contiguous basic blocks
anyway[0], since the poison generating instruction dominates all of its
non-PHI uses (which are the only uses we consider right now).
However, having this change in place will allow a later bugfix to break
fewer llvm-lit tests.
[0]: i.e. cases where block A branches to block B and B is A's only
successor and A is B's only predecessor.
Reviewers: broune, bjarke.roune
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19212
llvm-svn: 267175
Summary: [u|s]gt and [u|s]lt imply [u|s]ge and [u|s]le are true, respectively.
I've simplified the existing tests and added additional tests to cover the new
cases mentioned above. I've also added tests for all the cases where the
first compare doesn't imply anything about the second compare.
llvm-svn: 267171
- Switch few loops to range-based for loops
- Fix nop insertion at the end of BB
- Fix formatting
- Check for endpgm
Differential Revision: http://reviews.llvm.org/D19380
llvm-svn: 267167
Summary:
CachingMemorySSAWalker::invalidateInfo was using IsCall to determine
which cache map needed to be cleared of entries referring to the invalidated
MemoryAccess, but there could also be entries referring to it in the
other cache map (value entries, not key entries). This change just
clears both tables to be conservatively correct.
Also add a verifyRemoved() function, called when expensive
checks (i.e. XDEBUG) are enabled to verify that the invalidated
MemoryAccess object is not referenced in any of the caches.
Reviewers: dberlin, george.burgess.iv
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19388
llvm-svn: 267157
Summary:
This new pass allows targets to use the hazard recognizer without having
to also run one of the schedulers. This is useful when compiling with
optimizations disabled for targets that still need noop hazards
to be handled correctly.
Reviewers: hfinkel, atrick
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18594
llvm-svn: 267156
We take the intersection of overflow flags while CSE'ing.
This permits us to consider two instructions with different overflow
behavior to be replaceable.
llvm-svn: 267153
Summary:
When generating assembly using -m16 we must explicitly mark it as
16-bit. Emit .code16 at beginning of file. Fixes wrong results when
using -fno-integrated-as.
Reviewers: dwmw2
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19392
llvm-svn: 267152
When targetting MIPS64R6 some of the patterns for select were guarded by a
broken predicate. The predicate was supposed to test if a constant value
could fit in a 16 bit zero-extended field. Instead the value was tested to
fit in a 16 bit sign-extended field. For negative constants of native word
width this resulted in wrong code generation.
Reviewers: vkalintiris, dsanders
Differential Review: http://reviews.llvm.org/D19378
llvm-svn: 267151
r267049 broke multiple buildbots (e.g. clang-cmake-mips, and clang-x86_64-linux-selfhost-modules) which the follow-ups have not yet resolved and this is preventing subsequent committers from being notified about additional failures on the affected buildbots.
llvm-svn: 267148
Summary:
When optimizing PHIs which have inputs floating point binary
operators, we preserve all IR flags except the fast math
flags.
This change removes the logic which tracked some of the IR flags
(no wrap, exact) and replaces it by doing an and on the IR flags of
all inputs to the PHI - which will also handle the fast math
flags.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19370
llvm-svn: 267139
Summary:
rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS
operations during DAG combine. This generates better code for truncate, so cost
of truncate needs to be changed but looks like it got changed only in SSE2 table
Whereas this change is also applicable for SSE4.1, so the cost of truncate needs
to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE
v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from
SSE4.1, so it will fall back to SSE2.
Reviewers: Simon Pilgrim
llvm-svn: 267123
Specifically, itineraries for LEON processors has been added, along with several LEON processor Subtargets. Although currently all these targets are pretty much identical, support for features that will differ among these processors will be added in the very near future.
The different Instruction Itinerary Classes (IICs) added are sufficient to differentiate between the instruction timings used by LEON and, quite probably, by generic Sparc processors too, but the focus of the exercise has been for LEON processors, as the requirement of my project. If the IICs are not sufficient for other Sparc processor types and you want to add a new itinerary for one of those, it should be relatively trivial to adapt this.
As none of the LEON processors has Quad Floats, or is a Version 9 processor, none of those instructions have itinerary classes defined and revert to the default "NoItinerary" instruction itinerary.
Phabricator Review: http://reviews.llvm.org/D19359
llvm-svn: 267121
EarlyCSE had inconsistent behavior with regards to flag'd instructions:
- In some cases, it would pessimize if the available instruction had
different flags by not performing CSE.
- In other cases, it would miscompile if it replaced an instruction
which had no flags with an instruction which has flags.
Fix this by being more consistent with our flag handling by utilizing
andIRFlags.
llvm-svn: 267111
Summary:
Also adds a small comment blurb on control flow + no-wrap flags, since
that question came up a few days back on llvm-dev.
Reviewers: bjarke.roune, broune
Subscribers: sanjoy, mcrosier, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D19209
llvm-svn: 267110
Summary:
This intrinsic returns true if the current thread belongs to a live pixel
and false if it belongs to a pixel that we are executing only for derivative
computation. It will be used by Mesa to implement gl_HelperInvocation.
Note that for pixels that are killed during the shader, this implementation
also returns true, but it doesn't matter because those pixels are always
disabled in the EXEC mask.
This unearthed a corner case in the instruction verifier, which complained
about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but
correct code, so make the verifier accept it as such.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19191
llvm-svn: 267102
Re-layer the functions in the new (i.e., newly correct) post-order
traversals in ValueEnumerator (r266947) and ValueMapper (r266949).
Instead of adding a node to the worklist in a helper function and
returning a flag to say what happened, return the node itself. This
makes the code way cleaner: the worklist is local to the main function,
there is no flag for an early loop exit (since we can cleanly bury the
loop), and it's perfectly clear when pointers into the worklist might be
invalidated.
I'm fixing both algorithms in the same commit to avoid repeating the
commit message; if you take the time to understand one the other should
be easy. The diff itself isn't entirely obvious since the traversals
have some noise (i.e., things to do), but here's the high-level change:
auto helper = [&WL](T *Op) { auto helper = [](T **&I, T **E) {
=> while (I != E) {
if (shouldVisit(Op)) { T *Op = *I++;
WL.push(Op, Op->begin()); if (shouldVisit(Op)) {
return true; return Op;
} }
return false; return nullptr;
}; };
=>
WL.push(S, S->begin()); WL.push(S, S->begin());
while (!empty()) { while (!empty()) {
auto *N = WL.top().N; auto *N = WL.top().N;
auto *&I = WL.top().I; auto *&I = WL.top().I;
bool DidChange = false;
while (I != N->end())
if (helper(*I++)) { => if (T *Op = helper(I, N->end()) {
DidChange = true; WL.push(Op, Op->begin());
break; continue;
} }
if (DidChange)
continue;
POT.push(WL.pop()); => POT.push(WL.pop());
} }
Thanks to Mehdi for helping me find a better way to layer this.
llvm-svn: 267099
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267098
This removes the interfaces added (and not yet complete) to support
lazy reading of summaries. This support is not expected to be needed
since we are moving to a model where the full index is only being
traversed in the thin link step, instead of the back ends.
(The second part of this that I plan to do next is remove the
GlobalValueInfo from the ModuleSummaryIndex - it was mostly needed to
support lazy parsing of summaries. The index can instead reference the
summary structures directly.)
llvm-svn: 267097
WIN__DBZCHK will insert a CBZ instruction into the stream. This instruction
reserves 3 bits for the condition register (rn). As such, we must ensure that
we restrict the register to a low register. Use the tGPR class instead of GPR
to ensure that this is properly constrained. In debug builds, we would attempt
to use lr as a condition register which would silently get truncated with no
hint that the register selection was incorrect.
llvm-svn: 267080
We'd disabled them on x86 because back in the early days some host tools
couldn't handle the new load commands. This no longer holds: anyone capable of
deploying Clang should be able to deploy its copies of ar/ranlib/etc.
rdar://25254790
llvm-svn: 267075
When printing the properties required by a pass, only print the
properties that are set, and not those that are clear (only properties
that are set are verified, clear properties are "don't-care").
llvm-svn: 267070
Summary:
Adds an instrumentation pass for the new EfficiencySanitizer ("esan")
performance tuning family of tools. Multiple tools will be supported
within the same framework. Preliminary support for a cache fragmentation
tool is included here.
The shared instrumentation includes:
+ Turn mem{set,cpy,move} instrinsics into library calls.
+ Slowpath instrumentation of loads and stores via callouts to
the runtime library.
+ Fastpath instrumentation will be per-tool.
+ Which memory accesses to ignore will be per-tool.
Reviewers: eugenis, vitalybuka, aizatsky, filcab
Subscribers: filcab, vkalintiris, pcc, silvas, llvm-commits, zhaoqin, kcc
Differential Revision: http://reviews.llvm.org/D19167
llvm-svn: 267058
Summary: As per title. This will help work on the C API.
Reviewers: Wallbraker, whitequark, joker.eph, echristo, rafael
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19173
llvm-svn: 267057
InstrProfSymtab::create can fail with instrprof_error::malformed, but
this error is silently dropped. Propagate the error up to the caller so
we fail early.
Eventually, I'd like to transition ProfileData over to the new Error
class so we can't ignore hard failures like this.
llvm-svn: 267055
splitting edges.
MachineBasicBlock::SplitCriticalEdges will crash if a nullptr would have
been passed for the Pass argument. Do not allow that by turning this
argument into a reference.
The alternative would have been to make the Pass a truly optional
argument, but although this is easy to do, I was afraid users using it
like this would not be aware the livness information, dominator tree and
such would silently be broken.
llvm-svn: 267052
PDB parsing code was hand-rolled into llvm-pdbdump. This patch moves the
parsing of this code into DebugInfoPDB and makes the dumper use this.
This is achieved by implementing the skeleton of RawPdbSession, the
non-DIA counterpart to the existing PDB read interface. None of the type /
source file / etc information is accessible yet, so this implementation is
not yet close to achieving parity with the DIA counterpart, but the
RawSession class simply holds a reference to a PDBFile class which handles
parsing the file format. Additionally a PDBStream class is introduced
which allows accessing the bytes of a particular stream in a PDB file.
Differential Revision: http://reviews.llvm.org/D19343
Reviewed By: majnemer
llvm-svn: 267049
Introduce canSplitCriticalEdge, so that clients can now query whether or
not a critical edge can be split without actually needing to split it.
This may be useful when gathering information for cost models for
instance.
llvm-svn: 267046
The previous allocation code was over-estimating the amount of memory required.
No test case: we don't currently have a good way to detect conervative
over-allocation.
llvm-svn: 267041
Summary:
If we know that the pointer allocated within a function does not escape,
we can fold away comparisons that are done with global pointers
Patch by Anna Thomas!
Reviewers: reames, majnemer, sanjoy
Subscribers: mgrang, mcrosier, majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D19276
llvm-svn: 267035
When custom lowered, this is not called if the store is custom
lowered. Move it to be a utility function so targets can
easily expand unaligned accesses when custom lowering.
llvm-svn: 267029
Instead of holding a mask, hold two value: the start index and the
length of the mapping. This is a more compact representation, although
less powerful. That being said, arbitrary masks would not have worked
for the generic so do not allow them in the first place.
llvm-svn: 267025
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
Summary:
IntrReadWriteArgMem simply becomes IntrArgMemOnly.
So there are fewer intrinsic properties that express their orthogonality
better, and correspond more closely to the corresponding IR attributes.
Suggested by: Philip Reames
Reviewers: joker.eph, reames, tstellarAMD
Subscribers: jholewinski, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19291
llvm-svn: 267021
"Into" was misleading. I am also planning to use this helper to look
for loop metadata and return the argument, so find seems like a better
name.
llvm-svn: 267013
This builds on 266999 which made FindAvailableValue do the right thing. Tests included show the newly enabled transforms and those which disabled either due to conservatism or correctness requirements.
llvm-svn: 267006
Before this fix, DILexicalBlockFile entries were skipped only in some cases and were not in other cases.
Differential Revision: http://reviews.llvm.org/D18724
llvm-svn: 267004
This change adds a couple of test cases to make sure FindAvailableLoadedValue does the right thing. At the moment, the code added is dead, but separating it makes follow on changes far more obvious.
llvm-svn: 266999
AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code.
A compare instruction is substituted with another instruction which does not
produce the same flags as the original compare instruction.
This patch contains:
1. Fix of the bug.
2. A regression test in MIR.
3. A new test to check that SUBS is replaced by SUB.
Differential Revision: http://reviews.llvm.org/D18838
llvm-svn: 266969
This help to streamline the process of handling importing since
we don't need to special case alias everywhere: just like
linkonce_odr function, make sure at least one alias is emitted
by turning it weak.
Differential Revision: http://reviews.llvm.org/D19308
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266958
Summary:
`llvm.guard(false)` always bails out of the current compilation unit, so
we can prune any control flow following it.
Reviewers: hfinkel, pcc, reames
Subscribers: majnemer, reames, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19245
llvm-svn: 266955
CTTZ_ZERO_UNDEF can be custom lowered specially if CTLZ is supported. Otherwise CTTZ and CTTZ_ZERO_UNDEF are handled the same way by using CTPOP and bitmath.
llvm-svn: 266952
The iteratitive algorithm from r265456 claimed but failed to create a
post-order traversal. It had the same error that was fixed in the
ValueEnumerator in r266947: now, instead of pushing all operands on the
worklist at once, we pause whenever an operand gets pushed in order to
go depth-first (I know, it sounds obvious).
Sadly, I have no idea how to observe this from outside the algorithm and
so I haven't written a test. The output should be the same; it should
just use fewer temporary nodes now. I've added some comments that I
hope make the current logic clear enough it's unlikely to regress.
llvm-svn: 266949
Summary:
The function importer already decided what symbols need to be pulled
in. Also these magically added ones will not be in the export list
for the source module, which can confuse the internalizer for
instance.
Reviewers: tejohnson, rafael
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19096
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266948
Emit metadata nodes in post-order. The iterative algorithm from r266709
failed to maintain this property. After understanding my mistake, it
wasn't too hard to write a test with llvm-bcanalyzer (and I've actually
made this change once before: see r220340).
This also reverts the "noisy" testcase change from r266709. That should
have been more of a red flag :/.
Note: The same bug crept into the ValueMapper in r265456. I'm still
working on the fix.
llvm-svn: 266947
No matter what value you OR in to A, the result of (or A, B) is going to be UGE A. When A and B are positive, it's SGE too. If A is negative, OR'ing a value into it can't make it positive, but can increase its value closer to -1, therefore (or A, B) is SGE A. Working through all possible combinations produces this truth table:
```
A is
+, -, +/-
F F F + B is
T F ? -
? F ? +/-
```
The related optimizations are flipping the 'slt' for 'sge' which always NOTs the result (if the result is known), and swapping the LHS and RHS while swapping the comparison predicate.
There are more idioms left to implement (aren't there always!) but I've stopped here because any more would risk becoming unreasonable for reviewers.
llvm-svn: 266939
Summary:
This patch refined the instruction weight anootation algorithm:
1. Do not use dbg_value intrinsics for annotation.
2. Annotate cold calls if the call is inlined in profile, but not inlined before preparation. This indicates that the annotation preparation step found no sample for the inlined callsite, thus the call should be very cold.
Reviewers: dnovillo, davidxl
Subscribers: mgrang, llvm-commits
Differential Revision: http://reviews.llvm.org/D19286
llvm-svn: 266936
Produce another specific error message for a malformed Mach-O file when a symbol’s
string index is past the end of the string table. The existing test case in test/Object/macho-invalid.test
for macho-invalid-symbol-name-past-eof now reports the error with the message indicating
that a symbol at a specific index has a bad sting index and that bad string index value.
Again converting interfaces to Expected<> from ErrorOr<> does involve
touching a number of places. Where the existing code reported the error with a
string message or an error code it was converted to do the same. There is some
code for this that could be factored into a routine but I would like to leave that for
the code owners post-commit to do as they want for handling an llvm::Error. An
example of how this could be done is shown in the diff in
lib/ExecutionEngine/RuntimeDyld/RuntimeDyldImpl.h which had a Check() routine
already for std::error_code so I added one like it for llvm::Error .
Also there some were bugs in the existing code that did not deal with the
old ErrorOr<> return values. So now with Expected<> since they must be
checked and the error handled, I added a TODO and a comment:
“// TODO: Actually report errors helpfully” and a call something like
consumeError(NameOrErr.takeError()) so the buggy code will not crash
since needed to deal with the Error.
Note there fixes needed to lld that goes along with this that I will commit right after this.
So expect lld not to built after this commit and before the next one.
llvm-svn: 266919
Don't use std::vector<TrackingMDRef>, since (at least in some versions
of libc++) std::vector apparently copies values on grow operations
instead of moving them. Found this when I was temporarily deleting the
copy constructor for TrackingMDRef to investigate a performance
bottleneck.
llvm-svn: 266909
No real functionality change here, just avoiding an unnecessary copy of
std::vector<TrackingMDRef> for every subprogram with variables.
llvm-svn: 266907
Summary:
This is done for consistency with asan-use-after-return.
I see no other users than tests.
Reviewers: aizatsky, kcc
Differential Revision: http://reviews.llvm.org/D19306
llvm-svn: 266906
A ModuleSlotTracker can be created without actually being used (e.g.,
r266889 added one to the Verifier). Create the SlotTracker within it
lazily on the first call to ModuleSlotTracker::getMachine.
llvm-svn: 266902
Differentiate between word and subword memory operations as they take different
amount of cycles to complete. This just adds a basic model of the subword
latency to the scheduler.
llvm-svn: 266898
Clients may call writeMergedModules before calling optimize, or call
compileOptimized without calling optimize. Make sure they don't sneak
past the verifier. This adds LTOCodeGenerator::verifyMergedModuleOnce,
and calls it from writeMergedModule, optimize, and codegenOptimized.
I couldn't find a good way to test this. I tried writing broken IR to
send into llvm-lto, but LTOCodeGenerator doesn't understand textual IR,
and assembler runs the verifier itself anyway. Checking in
valid-but-doesn't-verify bitcode here doesn't seem valuable.
llvm-svn: 266894
The alias handling was specific to the old iterative inlining
mechanism, so that is dead now. The variable handling could make a
difference, since we were previously falling through to the normal
selection logic, but we don't observe changes in the validation
because no client seems to rely on it.
Differential Revision: http://reviews.llvm.org/D19307
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266893
Speed up Verifier output by sharing a single ModuleSlotTracker for the
duration. There should be no functionality change here except for much
faster output when there's more than one statement.
Now the Verifier won't be traversing the full Metadata graph every time
it prints an error. The TypePrinter is still not shared, but that would
take some extra plumbing.
llvm-svn: 266889
While using a raw_null_ostream meant that the Verifier didn't have to
think about whether to print, it's actually quite expensive to print out
IR. Only print if the output is going somewhere.
llvm-svn: 266884
Summary:
This patch prevents importing from (and therefore exporting from) any
module with a "llvm.used" local value. Local values need to be promoted
and renamed when importing, and their presense on the llvm.used variable
indicates that there are opaque uses that won't see the rename. One such
example is a use in inline assembly.
See also the discussion at:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098047.html
As part of this, move collectUsedGlobalVariables out of Transforms/Utils
and into IR/Module so that it can be used more widely. There are several
other places in LLVM that used copies of this code that can be cleaned
up as a follow on NFC patch.
Reviewers: joker.eph
Subscribers: pcc, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18986
llvm-svn: 266877
Add ParseAMDGPURegister which can be invoked recursively for parsing lists.
Rename getRegForName to getSpecialRegForName.
Support legacy SP3 register list syntax: [s2,s3,s4,s5] or [flat_scratch_lo,flat_scratch_hi].
Add 64-bit registers TBA, TMA where missing.
Add some tests.
Differential Revision: http://reviews.llvm.org/D19163
llvm-svn: 266865
This linkage is *not* intended to express that a declaration refers
to a weak symbol, but that the symbol might not be present at link
time. I don't believe it was the intent.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266856
Summary:
LLVMAttribute has outlived its utility and is becoming a problem for C API users that what to use all the LLVM attributes. In order to help moving away from LLVMAttribute in a smooth manner, this diff introduce LLVMGetAttrKindIDInContext, which can be used instead of the enum values.
See D18749 for reference.
Reviewers: Wallbraker, whitequark, joker.eph, echristo, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19081
llvm-svn: 266842
We never use the set-ness of SmallPtrSet for distinct nodes. Eventually
we may start garbage-collecting or reference-counting nodes (in which
cases we'd want to remove things from this collection, and a fast erase
would be valuable), but in the meantime a vector is sufficient.
llvm-svn: 266835
Because lowering of CMP_SWAP_64 occurs during type legalization, there can be
i64 types produced by more than just a BUILD_PAIR or similar. My initial tests
used just incoming function args.
llvm-svn: 266828
Summary:
This property is used to mark an intrinsic that only writes to memory, but
neither reads from memory nor has other side effects.
An example where this is useful is the llvm.amdgcn.buffer.store.format.*
intrinsic, which corresponds to a store instruction that goes through a special
buffer descriptor rather than through a plain pointer.
With this property, the intrinsic should still be handled as having side
effects at the LLVM IR level, but machine scheduling can make smarter
decisions.
Reviewers: tstellarAMD, arsenm, joker.eph, reames
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18291
llvm-svn: 266826
Summary:
The added testcase, which triggered this, was derived from a shader-db case
via bugpoint. A separate question is why scalar branching wasn't used.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19208
llvm-svn: 266825
Summary:
A shader stored the live mask (initial exec mask) in an SGPR which was then
spilled during register allocation. The allocator quite reasonably
optimized turned the spill into
v_writelane_b32 %vgpr, exec_lo, N
v_writelane_b32 %vgpr, exec_hi, N+1
at the beginning of the shader, confusing the SGPR accounting.
No test case, because si-sgpr-spill.ll together with an upcoming patch for
WQM handling exhibits the problem.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19199
llvm-svn: 266824
Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that
just return the thread pointer. I have a pending patch that does the same
for SystemZ (D19054), and there are many more targets that could benefit
from one.
This patch merges the ARM and AArch64 intrinsics into a single target
independent one that will also be used by subsequent targets.
Differential Revision: http://reviews.llvm.org/D19098
llvm-svn: 266818
With this change, ideally IR pass can always generate llvm.stackguard
call to get the stack guard; but for now there are still IR form stack
guard customizations around (see getIRStackGuard()). Future SSP
customization should go through LOAD_STACK_GUARD.
There is a behavior change: stack guard values are not CSEed anymore,
since we should never reuse the value in case that it has been spilled (and
corrupted). See ssp-guard-spill.ll. This also cause the change of stack
size and codegen in X86 and AArch64 test cases.
Ideally we'd like to know if the guard created in llvm.stackprotector() gets
spilled or not. If the value is spilled, discard the value and reload
stack guard; otherwise reuse the value. This can be done by teaching
register allocator to know how to rematerialize LOAD_STACK_GUARD and
force a rematerialization (which seems hard), or check for spilling in
expandPostRAPseudo. It only makes sense when the stack guard is a global
variable, which requires more instructions to load. Anyway, this seems to go out
of the scope of the current patch.
llvm-svn: 266806
* Add lowering for SETCCE i32.
* Add test to check lowering of i64 compares uses SETCCE expansion (outside of EQ and NE).
* Fix select.ll test and immediate form selection for RI operations.
llvm-svn: 266802
The functionality contained within getIntrinsicIDForCall is two-fold: it
checks if a CallInst's callee is a vectorizable intrinsic. If it isn't
an intrinsic, it attempts to map the call's target to a suitable
intrinsic.
Move the mapping functionality into getIntrinsicForCallSite and rename
getIntrinsicIDForCall to getVectorIntrinsicIDForCall while
reimplementing it in terms of getIntrinsicForCallSite.
llvm-svn: 266801
Add a new method, DICompositeType::buildODRType, that will create or
mutate the DICompositeType for a given ODR identifier, and use it in
LLParser and BitcodeReader instead of DICompositeType::getODRType.
The logic is as follows:
- If there's no node, create one with the given arguments.
- Else, if the current node is a forward declaration and the new
arguments would create a definition, mutate the node to match the
new arguments.
- Else, return the old node.
This adds a missing feature supported by the current DITypeIdentifierMap
(which I'm slowly making redudant). The only remaining difference is
that the DITypeIdentifierMap has a "the-last-one-wins" rule, whereas
DICompositeType::buildODRType has a "the-first-one-wins" rule.
For now I'm leaving behind DICompositeType::getODRType since it has
obvious, low-level semantics that are convenient for unit testing.
llvm-svn: 266786
in preparation for enabling the outgoing parameter store-to-push optimization
for 64-bit targets.
Differential Revision: http://reviews.llvm.org/D19222
llvm-svn: 266774
This patch improves SimplifyCFG to catch cases like:
if (a < b) {
if (a > b) <- known to be false
unreachable;
}
Phabricator Revision: http://reviews.llvm.org/D18905
llvm-svn: 266767
Calling ValueMap::MD lazily constructs a ValueMap, which mallocs the
buckets. Instead of swapping constructed maps, move around the
underlying Optional<MDMapT>. This gets rid of some unnecessary malloc
traffic from r266579 (not that it showed up on a profile).
llvm-svn: 266761
Rather than checking for the SCEV type prior to calling
getContantPart, perform the checks in the function. This reduces
the number of places where the checks are needed.
Differential Revision: http://reviews.llvm.org/D19241
llvm-svn: 266759
Summary:
There is no reason to have a weak reference because the external
definition will be weak.
Reviewers: rafael
Subscribers: llvm-commits, tejohnson
Differential Revision: http://reviews.llvm.org/D19267
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266750
Summary:
This is a follow-on to apply Duncan's new DIType ODR uniquing from
r266549 and r266713 in more places.
Enable enableDebugTypeODRUniquing() for ThinLTO backends invoked via
libLTO, similar to the way r266549 enabled this for ThinLTO backend
threads launched from gold-plugin.
Also enable enableDebugTypeODRUniquing in opt, similar to the way
r266549 enabled this for llvm-link (on by default, can be disabled with
new -disable-debug-info-type-map option), since we may perform ThinLTO
importing from opt.
Reviewers: dexonsmith, joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19263
llvm-svn: 266746
Lift the API for debug info ODR type uniquing up a layer. Instead of
clients managing the map directly on the LLVMContext, add a static
method to DICompositeType called getODRType and handle the map in the
background. Also adds DICompositeType::getODRTypeIfExists, so far just
for convenience in the unit tests.
This simplifies the logic in LLParser and BitcodeReader. Because of
argument spam there are actually a few more lines of code now; I'll see
if I come up with a reasonable way to clean that up.
llvm-svn: 266742
Tighten up the API for debug info ODR type uniquing in LLVMContext. The
only reason to allow other DIType subclasses is to make the unit tests
prettier :/.
llvm-svn: 266737
Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not.
Differential Revision: http://reviews.llvm.org/D19228
llvm-svn: 266728
Summary:
Need to use predecessors for reverse graph, successors for forward graph.
succ_iterator/pred_iterator are not compatible, this patch is all the work necessary to work around that (which is what everywhere else does). Not sure if there is a better way, so cc'ing some random folks to take a gander :)
Reviewers: dblaikie, qcolombet, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18796
llvm-svn: 266718
Summary:
The `"patchable-function"` attribute can be used by an LLVM client to
influence LLVM's code generation in ways that makes the generated code
easily patchable at runtime (for instance, to redirect control).
Right now only one patchability scheme is supported,
`"prologue-short-redirect"`, but this can be expanded in the future.
Reviewers: joker.eph, rnk, echristo, dberris
Subscribers: joker.eph, echristo, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19046
llvm-svn: 266715
As per David's review, rename everything in the new API for ODR type
uniquing of debug info.
ensureDITypeMap => enableDebugTypeODRUniquing
destroyDITypeMap => disableDebugTypeODRUniquing
hasDITypeMap => isODRUniquingDebugTypes
llvm-svn: 266713
Use a worklist instead of recursing through MDNode operands in
ValueEnumerator. The actual record output order has changed slightly,
but otherwise there's no functionality change.
I had to update test/Bitcode/metadata-function-blocks.ll. I renumbered
nodes so they continue to match the implicit record ids.
llvm-svn: 266709
When we suppress linkage names, for a non-inlined subprogram the name
can still be found in the object-file symbol table, because we have
the code address of the subprogram. This is not necessarily the case
for an inlined subprogram, so we still want to emit the linkage name
in the DWARF. Put this on the abstract-origin DIE because it's common
to all inlined instances.
Differential Revision: http://reviews.llvm.org/D18706
llvm-svn: 266692
The fast register-allocator cannot cope with inter-block dependencies without
spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions
where any value produced within a block is dead by the end, but not for
cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded
after regalloc.
Fortunately this is at -O0 so we don't have to care about performance. This
simplifies the various axes of expansion considerably: we assume a strong
seq_cst operation and ensure ordering via the always-present DMB instructions
rather than v8 acquire/release instructions.
Should fix the 32-bit part of PR25526.
llvm-svn: 266679
It's missing a dependency on Instrumentation (needed for
llvm::InstrProfiling::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&))
llvm-svn: 266656
Summary:
Calls to @llvm.experimental.deoptimize are expected to "never execute",
so optimize them as such.
Reviewers: chandlerc
Subscribers: junbuml, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19095
llvm-svn: 266654
There's currently no raw_ostream &operator<<(SimpleValueType); provided by LLVM. It could be added by refactoring utils/TableGen/CodeGenTarget.cpp:getEnumName, but that's much more work than fixing the build.
llvm-svn: 266627
Also,
- Skip pass if machine module does not have debug info
- Minor comment changes
- Added test
Differential Revision: http://reviews.llvm.org/D19079
llvm-svn: 266626
The root of the problem was that findMainViewFileID(File, Function)
could return some ID for any given file, even though that file
was not the main file for that function.
This patch ensures that the result of this function is conformed
with the result of findMainViewFileID(Function).
This commit reapplies r266436, which was reverted by r266458,
with the .covmapping file serialized in v1 format.
Differential Revision: http://reviews.llvm.org/D18787
llvm-svn: 266620
This reverts commit r266477.
This commit introduces cyclic dependency. This commit has "Analysis" depend on "ProfileData",
while "ProfileData" depends on "Object", which depends on "BitCode", which
depends on "Analysis".
llvm-svn: 266619
Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first.
Differential Revision: http://reviews.llvm.org/D19161
llvm-svn: 266617
Three problems:
1. <future> can't be easily used. If you must use it, see
include/Support/ThreadPool.h for how.
2. constexpr problems, even after 266588.
3. Move assignment operators can't be defaulted in MSVC2013.
llvm-svn: 266615
Summary:
When clang is given -save-temps or -via-file-asm, any inline assembly in
the source is parsed twice. Once by the compiler, and again by the
assembler. We must take care to ensure that this doesn't lead to
double-filling delay slots.
Reviewers: sdardis, vkalintiris
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D19166
llvm-svn: 266608
Summary:
This will allows us to eliminate some magic numbers from the offset operand of
branch instructions in favour of symbols and makes it possible to avoid
double-filling delay slots when clang is given -save-temps.
parseDirectiveCpRestore() is calling isIntegratedAssemblerRequired() for the
moment since correctly pushing the generation of these instructions into the
ELF target streamer is tricky enough to warrant a separate patch.
Reviewers: sdardis, vkalintiris
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: http://reviews.llvm.org/D19164
llvm-svn: 266602
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
I have no idea how I chose two different spellings in the space of a
couple of weeks, but now I can't remember what to use where. Choose
"Worklist".
llvm-svn: 266582
asynchronous call/handle. Also updates the ORC remote JIT API to use the new
scheme.
The previous version of the RPC tools only supported void functions, and
required the user to manually call a paired function to return results. This
patch replaces the Procedure typedef (which only supported void functions) with
the Function typedef which supports return values, e.g.:
Function<FooId, int32_t(std::string)> Foo;
The RPC primitives and channel operations are also expanded. RPC channels must
support four new operations: startSendMessage, endSendMessage,
startRecieveMessage and endRecieveMessage, to handle channel locking. In
addition, serialization support for tuples to RPCChannels is added to enable
multiple return values.
The RPC primitives are expanded from callAppend, call, expect and handle, to:
appendCallAsync - Make an asynchronous call to the given function.
callAsync - The same as appendCallAsync, but calls send on the channel when
done.
callSTHandling - Blocking call for single-threaded code. Wraps a call to
callAsync then waits on the result, using a user-supplied
handler to handle any callbacks from the remote.
callST - The same as callSTHandling, except that it doesn't handle
callbacks - it expects the result to be the first return.
expect and handle - as before.
handleResponse - Handle a response from the remote.
waitForResult - Wait for the response with the given sequence number to arrive.
llvm-svn: 266581
Cache the result of mapping metadata nodes between instances of IRLinker
(i.e., for the lifetime of IRMover). There shouldn't be any real
functional change here, but this should give a major speedup. I had
loaned this to Mehdi when he tested performance of r266446, and the two
patches together gave a 10x speedup in metadata mapping.
llvm-svn: 266579
This catches two nullptr insertions into the ValueMap I missed in
r266567. I missed CloneFunction becuase it never calls RemapInstruction
directly. Here's one of the still-failing bots:
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/11496
llvm-svn: 266570
Add an assertion to ValueMapper that prevents double-scheduling of
GlobalValues to remap, and fix the one place it happened. There are
tons of tests that fail with this assertion in place and without the
code change, so I'm not adding another.
Although it looks related, r266563 was, indeed, removing dead code.
AFAICT, this cross-file double-scheduling started in r266510 when the
cross-file recursion was removed.
llvm-svn: 266569
Apparently there isn't test coverage for all of these. I'd appreciate
if someone with could reproduce and send me something to reduce, but for
now I've just looked for users of RemapInstruction and MapValue and
ensured they don't accidentally insert nullptr. Here is one of the
bootstraps that caught:
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/11494
llvm-svn: 266567
As a follow-up to r123058, assert that there are no null mappings in the
ValueMap instead of just ignoring them when they are there. There were
a couple of accidental insertions in CloneFunction so I cleaned those up
(caught by testcases).
llvm-svn: 266565
This required changing several places to print VT enums as strings instead of raw ints since the proper method to use to print became ambiguous. This is probably an improvement anyway.
This also appears to save ~8K from an x86 self host build of llc.
llvm-svn: 266562
no functional change.
ExtraLoad and WrapperKind are been used only if (OpFlags == X86II::MO_GOTPCREL).
Differential Revision: http://reviews.llvm.org/D18942
llvm-svn: 266557
Fix a couple of places in the Verifier that call `getScope()` instead of
`getRawScope()`. Both DIDerivedType::getScope and
DICompositeType::getScope return a DITypeRef right now (which wraps a
Metadata*) so I don't think there's currently an observable bug. I
found this because a future commit that will change them to cast to
DIScope*.
llvm-svn: 266552
I accidentally replaced `mayBeOverridden` with `!isInterposable`.
Remove the negation and add a test case that would've caught this.
Many thanks to Håkan Hjort for spotting this!
llvm-svn: 266551
Rather than relying on the structural equivalence of DICompositeType to
merge type definitions, use an explicit map on the LLVMContext that
LLParser and BitcodeReader consult when constructing new nodes.
Each non-forward-declaration DICompositeType with a non-empty
'identifier:' field is stored/loaded from the type map, and the first
definiton will "win".
This map is opt-in: clients that expect ODR types from different modules
to be merged must call LLVMContext::ensureDITypeMap.
- Clients that just happen to load more than one Module in the same
LLVMContext won't magically merge types.
- Clients (like LTO) that want to continue to merge types based on ODR
identifiers should opt-in immediately.
I have updated LTOCodeGenerator.cpp, the two "linking" spots in
gold-plugin.cpp, and llvm-link (unless -disable-debug-info-type-map) to
set this.
With this in place, it will be straightforward to remove the DITypeRef
concept (i.e., referencing types by their 'identifier:' string rather
than pointing at them directly).
llvm-svn: 266549
Merge members that are describing the same member of the same ODR type,
even if other bits differ. If the file or line differ, we don't care;
if anything else differs, it's an ODR violation (and we still don't
really care).
For DISubprogram declarations, this looks at the LinkageName and Scope.
For DW_TAG_member instances of DIDerivedType, this looks at the Name and
Scope. In both cases, we know that the Scope follows ODR rules if it
has a non-empty identifier.
llvm-svn: 266548
This commit has no functionality change, but it adds a configuration
point for MDNodeInfo::isEqual to allow custom uniquing of subclasses of
MDNode, minimizing the diff of a follow-up.
llvm-svn: 266542
Since the result of a mapped distinct node is known up front, it's more
efficient to map them separately from uniqued nodes. This commit pulls
them out of the post-order traversal and stores them in a worklist to be
remapped at the top-level.
This is essentially reapplying r244181 ("ValueMapper: Rotate distinct
node remapping algorithm") to the new iterative algorithm from r265456
("ValueMapper: Rewrite Mapper::mapMetadata without recursion").
Now that the traversal logic only handles uniqued MDNodes, it's much
simpler to inline it all into MDNodeMapper::createPOT (I've killed the
MDNodeMapper::push and MDNodeMapper::tryToPop helpers and localized the
traversal worklist).
The resulting high-level algorithm for MDNodeMapper::map now looks like
this:
- Distinct nodes are immediately mapped and added to
MDNodeMapper::DistinctWorklist using MDNodeMapper::mapDistinctNode.
- Uniqued nodes are mapped via MDNodeMapper::mapTopLevelUniquedNode,
which traverses the transitive uniqued subgraph of a node to
calculate uniqued node mappings in bulk.
- This is a simplified version of MDNodeMapper::map from before
this commit (originally r265456) that doesn't traverse through
any distinct nodes.
- Distinct nodes are added to MDNodeMapper::DistinctWorklist via
MDNodeMapper::mapDistinctNode.
- This uses MDNodeMapper::createPOT to fill a
MDNodeMapper::UniquedGraph (a post-order traversal and side
table), UniquedGraph::propagateChanges to track which uniqued
nodes need to change, and MDNodeMapper::mapNodesInPOT to create
the uniqued nodes.
- Placeholders for forward references are now only needed when
there's a uniquing cycle (a cycle of uniqued nodes unbroken by
distinct nodes). This is the key functionality change that
we're reintroducing (from r244181). As of r265456, a temporary
forward reference might be needed for any cycle that involved
uniqued nodes.
- After mapping the first node appropriately, MDNodeMapper::map works
through MDNodeMapper::DistinctWorklist. For each distinct node, its
operands are remapped with MDNodeMapper::mapDistinctNode and
MDNodeMapper::mapTopLevelUniquedNode until all nodes have been
mapped.
Sadly there's nothing observable I can test here; no real functionality
change, just a compile-time speedup from reduced malloc traffic.
llvm-svn: 266537
As a minor fixup to r266258, only track nodes that needed a placeholder
in CyclicNodes in MDNodeMapper::mapUniquedNodes. There should be no
observable functionality change, just some local memory savings because
CyclicNodes only needs to grow to accommodate nodes that are actually
involved in cycles. (This was the original intent of r266258, or else
the vector would have been called "ChangedNodes".)
llvm-svn: 266536
Summary: For Incremental LTO, we need to make sure that an old
cache entry is not used when incrementally re-linking with a new
libLTO.
Adding a global LLVM_REVISION in llvm-config.h would for to
rebuild/relink the world for every "git pull"/"svn update".
So instead only libLTO is made dependent on the VCS and will
be rebuilt (and the dependent binaries relinked, i.e. as of
today: libLTO.dylib and llvm-lto).
Reviewers: beanz
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18987
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266523
This is a requirement for the cache handling in D18494
Differential Revision: http://reviews.llvm.org/D18908
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266519
To be able to work accurately on the reference graph when taking
decision about internalizing, promoting, renaming, etc. We need
to have the alias information explicit.
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266517
Stop memoizing ConstantAsMetadata in ValueMapper::mapMetadata. Now we
have to recompute it, but these metadata aren't particularly common, and
it restricts the lifetime of the Metadata map unnecessarily.
(The motivation is that I have a patch which uses a single Metadata map
for the lifetime of IRMover. Mehdi profiled r266446 with the patch
applied and we saw a pretty big speedup in lib/Linker.)
llvm-svn: 266513
This reverts commit r266507, reapplying r266503 (and r266505
"ValueMapper: Use API from r266503 in unit tests, NFC") completely
unchanged.
I reverted because of a bot failure here:
http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/16810/
However, looking more closely, the failure was from a host-compiler
crash (clang 3.7.1) when building:
lib/CodeGen/AsmPrinter/CMakeFiles/LLVMAsmPrinter.dir/DwarfAccelTable.cpp.o
I didn't modify that file, or anything it includes, with that commit.
The next build (which hadn't picked up my revert) got past it:
http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/16811/
I think this was just unfortunate timing. I suppose the bot must be
flakey.
llvm-svn: 266510
This resolves more frame indexes early and folds
the immediate offsets into the scratch mubuf instructions.
This cleans up a lot of the mess that's currently emitted,
such as emitting add 0s and repeatedly initializing the same
register to 0 when spilling.
llvm-svn: 266508
This reverts commit r266503, in case it's the root cause of this bot
failure:
http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/16810
I'm also reverting r266505 -- "ValueMapper: Use API from r266503 in unit
tests, NFC" -- since it's in the way.
llvm-svn: 266507
Eliminate co-recursion of Mapper::mapValue through
ValueMaterializer::materializeInitFor, through a major redesign of the
ValueMapper.cpp interface.
- Expose a ValueMapper class that controls the entry points to the
mapping algorithms.
- Change IRLinker to use ValueMapper directly, rather than
llvm::RemapInstruction, llvm::MapValue, etc.
- Use (e.g.) ValueMapper::scheduleMapGlobalInit to add mapping work to
a worklist in ValueMapper instead of recursing.
There were two fairly major complications.
Firstly, IRLinker::linkAppendingVarProto incorporates an on-the-fly IR
ugprade that I had to split apart. Long-term, this upgrade should be
done in the bitcode reader (and we should only accept the "new" form),
but for now I've just made it work and added a FIXME. The hold-op is
that we need to deprecate C API that relies on this.
Secondly, IRLinker has special logic to correctly implement aliases with
comdats, and uses two ValueToValueMapTy instances and two
ValueMaterializers. I supported this by allowing clients to register an
alternate mapping context, whose MCID can be passed in when scheduling
new work.
While out of scope for this commit, it should now be straightforward to
remove recursion from Mapper::mapValue.
llvm-svn: 266503
1) We need to add this flag prior to adding any other, in case the user has
specified a -fmodule-cache-path= flag in their custom CXXFLAGS. Such a flag
causes -Werror builds to fail, and thus all config checks fail, until we add
the corresponding -fmodules flag. The modules selfhost bot does this, for
instance.
2) Delete module maps that were putting .cpp files into modules.
3) Enable -fmodules-local-submodule-visibility, to get proper module
visibility rules applied across submodules of the same module. Disable
-fmodules for C builds, since that flag is not available there.
llvm-svn: 266502
Change Mapper::VM to a pointer and add a `getVM()` accessor for it.
While this has no functionality change, it minimizes the diff on an
upcoming patch that allows switching between instances of
ValueToValueMapTy on a single Mapper instance.
llvm-svn: 266490
Because HoistSpillHelper::hoistAllSpills is called in postOptimization, before the
patch we didn't want LiveRangeEdit::eliminateDeadDefs to call splitSeparateComponents
and generate unassigned new vregs. However, skipping splitSeparateComponents will make
verify-machineinstrs unhappy, so I remove the early return, and use
HoistSpillHelper::LRE_DidCloneVirtReg to assign physreg/stackslot for those new vregs.
In addition, some code reorganization to make class HoistSpillHelper privately inheriting
from LiveRangeEdit::Delegate possible. This is to be consistent with class RAGreedy and
class RegisterCoalescer.
Differential Revision: http://reviews.llvm.org/D19142
llvm-svn: 266489
Allow explicit section for indirectly called functions in cfi-icall.
Jumptables for functions in the same type class must be contiguous, so they
always go to the default text section.
Fixes PR25079.
llvm-svn: 266486
After r245976, LLVM will skip the last bit test case if knows it will always be
true. However, we would still erroneously update PHI nodes with incoming values
from the MBB that would perform the final bit test, causing -verify-machineinstrs
to fail.
llvm-svn: 266479
Adds an interface to get ProfileSummary for a module and makes InlineCost use ProfileSummary to get max function count.
Differential Revision: http://reviews.llvm.org/D18622
llvm-svn: 266477
Use the tryAddingSymbolicOperand callback to attempt to present immediate
values in symbolic form when disassembling. This is currently only used
for PC-relative immediates (which are most likely to be symbolic in the
SystemZ ISA). Add new DecodeMethod types to allow distinguishing between
branch and non-branch instructions.
llvm-svn: 266469
Divisions by a constant can be converted into multiplies which are usually
cheaper, but this isn't possible if the constant gets separated (particularly
in loops). Fix this by telling ConstantHoisting that the immediate in a DIV is
cheap.
I considered making the check generic, but neither AArch64 (strangely) nor x86
showed any benefit on the tests I had.
llvm-svn: 266464
This improves AA in the MI schduler when reason about paired instructions.
Phabricator Revision: http://reviews.llvm.org/D17098
PR26358
llvm-svn: 266462
This is a recommit of r266390 with a fix that will allow tests to pass
(hopefully). Before we got a StringRef to M->getTargetTriple() and right
after we moved the Module so we were referencing a dangling object.
llvm-svn: 266456
InstCombine wants to optimize compares of calls to fabs with zero.
However, we didn't have the necessary legality checking to verify that
the function call had the same behavior as fabs.
llvm-svn: 266452
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.
Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.
Motivation
----------
Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.
We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.
Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.
http://reviews.llvm.org/D19034
<rdar://problem/25256815>
llvm-svn: 266446
This is almost identical to:
http://reviews.llvm.org/rL264527
This doesn't solve PR27344; it just allows the profile weights to survive.
To solve the bug, we need to use the profile weights in the backend.
llvm-svn: 266442
Summary:
Without MMOs, the callee-save load/store instructions were treated as
volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer.
Reviewers: t.p.northover, mcrosier
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17661
llvm-svn: 266439
[PPC] Previously when casting generic loads to LXV2DX/ST instructions we
would leave the original load return type in place allowing for an
assertion failure when we merge two equivalent LXV2DX nodes with
different types.
This fixes PR27350.
Reviewers: nemanjai
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19133
llvm-svn: 266438
Perform store clustering just like load clustering. This change add
StoreClusterMutation in machine-scheduler. To control StoreClusterMutation,
added enableClusterStores() in TargetInstrInfo.h. This is enabled only on
AArch64 for now.
This change also add support for unscaled stores which were not handled in
getMemOpBaseRegImmOfs().
llvm-svn: 266437
The root of the problem was that findMainViewFileID(File, Function)
could return some ID for any given file, even though that file
was not the main file for that function.
This patch ensures that the result of this function is conformed
with the result of findMainViewFileID(Function).
Differential Revision: http://reviews.llvm.org/D18787
llvm-svn: 266436
Summary:
In the added test-case, the atomic instruction feeds into a non-machine
CopyToReg node which hasn't been selected yet, so guard against
non-machine opcodes here.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19043
llvm-svn: 266433
Summary:
Calls on NVPTX are unusually expensive (for one thing, lots of state
needs to be saved to memory, which is slow), so make the inlininer much
more aggressive.
Reviewers: chandlerc
Subscribers: jholewinski, llvm-commits, tra
Differential Revision: http://reviews.llvm.org/D18561
llvm-svn: 266406
Summary:
InlineCost's threshold is multiplied by this value. This lets us adjust
the inlining threshold up or down on a per-target basis. For example,
we might want to increase the threshold on targets where calls are
unusually expensive.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18560
llvm-svn: 266405
It's unsafe to duplicate blocks that contain convergent instructions
during ifcnv. See the patch for details.
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D17518
llvm-svn: 266404
Summary:
This IR pass is helpful for GPUs, and other targets with divergent
branches. It's a nop on targets without divergent branches.
Reviewers: chandlerc
Subscribers: llvm-commits, jingyue, rnk, joker.eph, tra
Differential Revision: http://reviews.llvm.org/D18626
llvm-svn: 266399
Summary:
This lets us add this pass to the IR pass manager unconditionally; it
will simply not do anything on targets without branch divergence.
Reviewers: tra
Subscribers: llvm-commits, jingyue, rnk, chandlerc
Differential Revision: http://reviews.llvm.org/D18625
llvm-svn: 266398
This will be used in lld to avoid creating TargetMachine in two
different places. See D18999 for a more detailed discussion.
Differential Revision: http://reviews.llvm.org/D19139
llvm-svn: 266390
If the size of an AST entry changes, we also need to make sure we perform
necessary alias set merges, as the new size may overlap pointers in other sets.
We happen to run into this with memset, because memset allows an entry for a
i8* pointer to have a decidedly non-i8 size.
This fixes PR27262.
Differential Revision: http://reviews.llvm.org/D18939
llvm-svn: 266381
The only use for getGlobalContext() is in the C API.
Let's just move the static global here and nuke the C++ API.
Differential Revision: http://reviews.llvm.org/D19094
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266380
At the same time, fixes InstructionsTest::CastInst unittest: yes
you can leave the IR in an invalid state and exit when you don't
destroy the context (like the global one), no longer now.
This is the first part of http://reviews.llvm.org/D19094
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266379
Currently what comes out of instruction selection is a
register initialized to -1, and then copied to m0.
MachineCSE doesn't consider copies, but we want these
to be CSEed. This isn't much of a problem currently,
because SIFoldOperands is run immediately after.
This avoids regressions when SIFoldOperands is run later
from leaving all copies to m0.
llvm-svn: 266377
Summary:
Re-factor some code to improve clarity and style based on review
comments from http://reviews.llvm.org/D18093.
Reviewers: MatzeB, mcrosier
Subscribers: MatzeB, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19128
llvm-svn: 266372
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.
This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).
For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.
This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).
As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.
Fixes PR16275.
llvm-svn: 266363
https://llvm.org/bugs/show_bug.cgi?id=27105
We can check if all bits outside of a constant mask are set with a
single constant.
As noted in the bug report, although this form should be considered the
canonical IR, backends may want to transform this into an 'andn' / 'andc'
comparison against zero because that could be a single machine instruction.
Differential Revision: http://reviews.llvm.org/D18842
llvm-svn: 266362
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.
Reviewers: arsenm, qcolombet
Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits
Differential Revision: http://reviews.llvm.org/D19077
llvm-svn: 266356
MachineInstr.h and MachineInstrBuilder.h are very popular headers,
widely included across all LLVM backends. It turns out that there only a
handful of TUs that actually care about DI operands on MachineInstrs.
After this change, touching DebugInfoMetadata.h and rebuilding llc only
needs 112 actions instead of 542.
llvm-svn: 266351
Summary:
If a PHI has an incoming undef, we can pretend that it is equal to one
non-undef, non-self incoming value.
This is particularly relevant in combination with the StructurizeCFG
pass, which introduces PHI nodes with undefs. Previously, this lead to
branch conditions that were uniform before StructurizeCFG to become
non-uniform afterwards, which confused the SIAnnotateControlFlow
pass.
This fixes a crash when Mesa radeonsi compiles a shader from
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex
Reviewers: arsenm, tstellarAMD, jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19013
llvm-svn: 266347
Summary:
This fully solves the problem where the StructurizeCFG pass does not
consider the same branches as uniform as the SIAnnotateControlFlow pass.
The patch in D19013 helps with this problem, but is not sufficient
(and, interestingly, causes a "regression" with one of the existing
test cases).
No tests included here, because tests in D19013 already cover this.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19018
llvm-svn: 266346
Summary:
This pass is unnecessary and overly conservative. It was motivated by
situations like
def %vreg0:SGPR_32
...
if-block:
..
def %vreg1:SGPR_32
...
else-block:
...
use %vreg0:SGPR_32
...
and similar situations with uses after the non-uniform control flow, where
we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
even though in the original, thread/workitem-based CFG, it looks like the
live ranges of these registers do not overlap.
However, by the time register allocation runs, we have moved to a wave-based
CFG that accurately represents the fact that the wave may run through both
the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
overlap even without the SIFixSGPRLiveRanges pass.
In addition to proving this change correct, I have tested it with Piglit
and a small number of other tests.
Reviewers: arsenm, tstellarAMD
Subscribers: MatzeB, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19041
llvm-svn: 266345
Summary:
I've been carrying this change around with me for a while, because the if ()
managed to confuse me while following the code. All callers ensure that the
assertion holds.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19042
llvm-svn: 266344
FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.
I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.
It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.
Should fix PR25526.
llvm-svn: 266339
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.
This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.
Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.
Reviewers: mareko, arsenm, tstellarAMD, nhaehnle
Subscribers: FireBurn, kerberizer, llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D18340
Patch By: Bas Nieuwenhuizen
llvm-svn: 266337
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,
The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewers: mareko, arsenm, nhaehnle, tstellarAMD
Subscribers: qcolombet, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18941
Patch By: Bas Nieuwenhuizen
llvm-svn: 266336
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.
Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.
The additional checking also acts as white-box testing for the getAsAddRec method.
Reviewers: anemet, sanjoy
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18792
llvm-svn: 266334
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.
This patch was previous committed as r266055 as seemed to have caused some spurious
test failures. They did not reappear after further local testing.
llvm-svn: 266301
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.
It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the
-NAN semantics.
N.B. Of course, it is only safe to do this if the intrinsic call is
marked nnan.
llvm-svn: 266279
This code was creating a new type in the global context, regardless
of which context the user is sitting in, what can possibly go wrong?
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266275
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:
+ ConstantHoisting was modifying switch statements. This is simply invalid,
the cases must remain integer constants no matter the notional cost.
+ ConstantHoisting was mangling alloca instructions in the entry block. These
should be handled by FrameLowering, so constants actually have a cost of 0.
Worse, the resulting bitcasts meant they became dynamic allocas.
rdar://25707382
llvm-svn: 266260
Fix a major bug from r265456. Although it's now much rarer, ValueMapper
sometimes has to duplicate cycles. The
might-transitively-reference-a-temporary counts don't decrement on their
own when there are cycles, and you need to call MDNode::resolveCycles to
fix it.
r265456 was checking the input nodes to see if they were unresolved.
This is useless; they should never be unresolved. Instead we should
check the output nodes and resolve cycles on them.
llvm-svn: 266258
Summary: LLVMAttribute has outlived its utility and is becoming a problem for C API users that what to use all the LLVM attributes. In order to help moving away from LLVMAttribute in a smooth manner, this diff introduce LLVMGetAttrKindIDInContext, which can be used instead of the enum values.
Reviewers: Wallbraker, whitequark, joker.eph, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18749
llvm-svn: 266257
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D18901
llvm-svn: 266253
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D18902
llvm-svn: 266252
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D19007
llvm-svn: 266251
And update the existing test cases in test/Object/macho-invalid.test
to use llvm-objdump with the -macho option to produce these
error messages and stop producing the generic "Invalid data
was encountered while parsing the file" message.
Working from the beginning of the file, if the mach header is too large for
the size of the file and then if the load commands that follow extend past
the end of the file these two errors now generate correct error messages.
Both of these have existing test cases in test/Object/macho-invalid.test .
But the first with macho-invalid-header it will never trigger the error message
"mach header extends past the end of the file" using any of the llvm tools as
they all use identify_magic() which rejects files with the correct magic number
that are too small in size. So I tested this by hacking that code and seeing the
error message down in parseHeader() really does happen. So in case there
is ever code in llvm that directly calls createMachOObjectFile() this error
message will be correctly produced.
The second error message of "load commands extends past the end of the file"
is triggered by a number of existing tests cases in test/Object/macho-invalid.test .
Also other tests trigger different error messages now like "ilocalsym plus
nlocalsym in LC_DYSYMTAB load command extends past the end of the
symbol table".
There are two existing test cases that still get the "Invalid data was encountered ..."
error messages that I will tackle next. But they will involve a bit of pluming an
Expect<...> up through the call stack and I want to do those as separate changes.
FYI, for those test cases that were trying to test specific errors that now get
different errors I’ll fix those in follow on changes and create new test cases
for those so they test the error they were meant to test.
llvm-svn: 266248
Summary:
When we are spilling SGPRs to scratch memory, we usually don't have
free SGPRs to do the address calculation, so we need to re-use the
ScratchOffset register for the calculation.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18917
llvm-svn: 266244
Since we can't emit diagnostics for missing "jmp 1f" labels until the end of
the file, we need to be able to restore the context used to calculate
file/line. This is basically the "# line file" directive that's being used at
the time the expression is seen.
rdar://25706972
llvm-svn: 266238
LLVM optimization passes may reduce a profiled target expression
to a constant. Removing runtime calls at such instrumentation points
would help speedup the runtime of the instrumented program.
llvm-svn: 266229
This patch corresponds to review:
http://reviews.llvm.org/D17850
This patch implements the following instructions:
cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd.
llvm-svn: 266228
Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs
of regular LDR/STR.
Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>.
llvm-svn: 266223
This patch fixes a bug (PR26827) when using anti-aliasing in store
merging. This sets the chain users of the component stores to point to
the new store instead of the component stores chain parent.
Reviewers: jyknight
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18909
llvm-svn: 266217
Summary:
To be able to work accurately on the reference graph when taking decision
about internalizing, promoting, renaming, etc. We need to have the alias
information explicit.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266214
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.
TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.
Differential Revision: http://reviews.llvm.org/D17825
llvm-svn: 266205
Summary:
This is a special case for MIPS64 because the architecture requires
properly 32-bit sign-extended values in the register containers.
Additionaly, we merge consecutive trunc + AssertZExt nodes in order
to avoid unnecessary sign-extensions when the extension comes from a
type smaller than i32.
Reviewers: dsanders
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D18893
llvm-svn: 266203
This patch fixes calculating of builtin_object_size if it depends on a
condition. Before this patch compiler did not know how to calculate the
object size when it finds a condition that cannot be eliminated.
This patch enables calculating of builtin_object_size even in case when
condition cannot be eliminated by choosing minimum or maximum value as a
result from condition. Choosing minimum or maximum value from condition
is based on the second argument of __builtin_object_size function.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D18438
llvm-svn: 266193
Differential Revision: http://reviews.llvm.org/D17137
This patch was reverted after the revertion of dependant patch http://reviews.llvm.org/D17068.
There was the problem with test-suite failure.
The problem is hopefully solved with dependant patch so this patch is commited again.
llvm-svn: 266179
Remove an ad-hoc transform in InstCombine and replace it with more
general machinery (ValueTracking, InstructionSimplify and VectorUtils).
This fixes PR27332.
llvm-svn: 266175
It is now only doing the update to the llvm.compiler_used global.
The client has to call separately the internalization stage.
Hopefully the code is simpler to understand this way.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266174
Differential Revision: http://reviews.llvm.org/D17068
This changes contains fix for failing test-suite. So, this patch should hopefully work now.
llvm-svn: 266171