Share code for the (mostly problematic) embedded sentinel traits.
- Move the LLVM_NO_SANITIZE("object-size") attribute to
ilist_half_embedded_sentinel_traits and ilist_embedded_sentinel_traits
(previously it spread throughout the code duplication).
- Add an ilist_full_embedded_sentinel_traits which has no UB (but has
the downside of storing the complete node).
- Replace all the custom sentinel traits in LLVM with a declaration of
ilist_sentinel_traits that inherits from one of the embedded sentinel
traits classes.
There are still custom sentinel traits in other LLVM subprojects. I'll
remove those in a follow-up.
Nothing at all should be changing here, this is just rearranging code.
Note that the final goal here is to remove the sentinel traits
altogether, settling on the memory layout of
ilist_half_embedded_sentinel_traits without the UB. This intermediate
step moves the logic into ilist.h.
llvm-svn: 278513
lto::InputFile::Symbol::getCommonSize should return uint64_t instead of
size_t since it is returning the result of DataLayout::getTypeAllocSize
which returns uint64_t, and the result of getCommonSize is assigned to a
uint64_t variable. On 32-bit builds size_t is unsigned int and there are
type errors. This was introduced in r278338.
llvm-svn: 278512
Summary:
Port the NameAnonFunction pass and add a test.
Depends on D23439.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23440
llvm-svn: 278509
Summary:
Port the ModuleSummaryAnalysisWrapperPass to the new pass manager.
Use it in the ported BitcodeWriterPass (similar to how we use the
legacy ModuleSummaryAnalysisWrapperPass in the legacy WriteBitcodePass).
Also, pass the -module-summary opt flag through to the new pass
manager pipeline and through to the bitcode writer pass, and add
a test that uses it.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23439
llvm-svn: 278508
Summary:
1. Make coroutine representation more robust against optimization that may duplicate instruction by introducing coro.id intrinsics that returns a token that will get fed into coro.alloc and coro.begin. Due to coro.id returning a token, it won't get duplicated and can be used as reliable indicator of coroutine identify when a particular coroutine call gets inlined.
2. Move last three arguments of coro.begin into coro.id as they will be shared if coro.begin will get duplicated.
3. doc + test + code updated to support the new intrinsic.
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23412
llvm-svn: 278481
Remove all ilist_iterator to pointer casts. There were two reasons for
casts:
- Checking for an uninitialized (i.e., null) iterator. I added
MachineInstrBundleIterator::isValid() to check for that case.
- Comparing an iterator against the underlying pointer value while
avoiding converting the pointer value to an iterator. This is
occasionally necessary in MachineInstrBundleIterator, since there is
an assertion in the constructors that the underlying MachineInstr is
not bundled (but we don't care about that if we're just checking for
pointer equality).
To support the latter case, I rewrote the == and != operators for
ilist_iterator and MachineInstrBundleIterator.
- The implicit constructors now use enable_if to exclude
const-iterator => non-const-iterator conversions from overload
resolution (previously it was a compiler error on instantiation, now
it's SFINAE).
- The == and != operators are now global (friends), and are not
templated.
- MachineInstrBundleIterator has overloads to compare against both
const_pointer and const_reference. This avoids the implicit
conversions to MachineInstrBundleIterator that assert, instead just
checking the address (and I added unit tests to confirm this).
Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked
are in ilist.h, and no code outside of ilist*.h directly relies on this
UB end-iterator-to-pointer conversion anymore. It's still needed for
ilist_*sentinel_traits, but I'll clean that up soon.
llvm-svn: 278478
Allow an ilist_iterator to be constructed from an ilist_node, and give
access to the underlying ilist_node as well.
This will be used immediately in lld to support a type-erasure use case.
Longer term, they'll stick around once the iterator is using
ilist_node<NodeTy>* instead of NodeTy*.
llvm-svn: 278467
"insert_subreg, subreg_to_reg, and reg_sequence" instructions' after
adjusting some unittest checks.
This is to solve PR28852. The restriction was added at 2010 to make better register
coalescing. We assumed that it was not necessary any more. Testing results on x86
supported the assumption.
We will look closely to any performance impact it will bring and will be prepared
to help analyzing performance problem found on other architectures.
Differential Revision: https://reviews.llvm.org/D23210
llvm-svn: 278466
Summary: This is a follow up to r278389, where I have introduced the bug
Reviewers: mehdi_amini
Differential Revision: https://reviews.llvm.org/D23436
llvm-svn: 278442
Summary:
Notice that the data layout is changed: instead of using
std::pair<PointerIntPair<NodeType*, 1>, ChildItTy>, now use
std::pair<NodeRef, Optional<ChildItTy>>.
A NFC but worth noticing change is operator==(), since we only compare
an iterator against end(), it's better to put an assert there and make
people noticed when it fails.
Reviewers: dblaikie, chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23146
llvm-svn: 278437
Summary:
This patch adds IsVariadicFunction bit to summary in order
to not import variadic functions. Inliner doesn't inline
variadic functions because it is hard to reason about it.
This one small fix improves Importer by about 16%
(going from 86% to 100% of imported functions that are
inlined anywhere)
on some spec benchmarks like 'int' and others.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23339
llvm-svn: 278432
Summary:
This is an extension of the fix in r271424. That fix dealt with builder
insert points being moved by SCEV expansion, but only for the lifetime
of the expand call. This change modifies the interface so that LSR can
safely call expand multiple times at the same insert point and do the
right thing if one of the expansions decides to move the original insert
point.
This is a fix for PR28719.
Reviewers: sanjoy
Subscribers: llvm-commits, mcrosier, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23342
llvm-svn: 278413
Summary: Make Optional's behavior the same as the coming std::optional.
Reviewers: dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23178
llvm-svn: 278397
Summary:
Keep track of all methods for which we have devirtualized at least
one call and then print them sorted alphabetically. That allows to
avoid duplicates and also makes the order deterministic.
Add optimization names into the remarks, so that it's easier to
understand how has each method been devirtualized.
Fix a bug when wrong methods could have been reported for
tryVirtualConstProp.
Reviewers: kcc, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23297
llvm-svn: 278389
subreg_to_reg, and reg_sequence" instructions.
This is to solve PR28852. The restriction was added at 2010 to make better register
coalescing. We assumed that it was not necessary any more. Testing results on x86
supported the assumption.
We will look closely to any performance impact it will bring and will be prepared
to help analyzing performance problem found on other architectures.
Differential Revision: https://reviews.llvm.org/D23210
llvm-svn: 278384
This adds a createFunctionInliningPass pass that takes an InlineParams object and use this to create the pre-inliner pass. This prevents the regular inliner's threshold flag from influencing the preinliner.
Differential revision: https://reviews.llvm.org/D23377
llvm-svn: 278377
ExecutionEngine::runFunction is supposed to allow execution of arbitrary
function types, but MCJIT can only reasonably support a limited subset of
main-linke function types. This patch documents this limitation, and fixes
MCJIT::runFunction to abort with a meaningful error at runtime if called with
an unsupported function type.
llvm-svn: 278348
This restores commit r278330, with fixes for a few bot failures:
- Fix a late change I had made to the save temps output file that I
missed due to existing files sitting on my disk
- Fix a bunch of Windows bot failures with "ambiguous call to overloaded
function" due to confusion between llvm::make_unique vs
std::make_unique (preface the new make_unique calls with "llvm::")
- Attempt to fix a modules bot failure by adding a missing include
to LTO/Config.h.
Original change:
Resolution-based LTO API.
Summary:
This introduces a resolution-based LTO API. The main advantage of this API over
existing APIs is that it allows the linker to supply a resolution for each
symbol in each object, rather than the combined object as a whole. This will
become increasingly important for use cases such as ThinLTO which require us
to process symbol resolutions in a more complicated way than just adjusting
linkage.
Patch by Peter Collingbourne.
Reviewers: rafael, tejohnson, mehdi_amini
Subscribers: lhames, tejohnson, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D20268
llvm-svn: 278338
This reverts commit r278330.
I made a change to the save temps output that is causing issues with the
bots. Didn't realize this because I had older output files sitting on
disk in my test output directory.
llvm-svn: 278331
Summary:
This introduces a resolution-based LTO API. The main advantage of this API over
existing APIs is that it allows the linker to supply a resolution for each
symbol in each object, rather than the combined object as a whole. This will
become increasingly important for use cases such as ThinLTO which require us
to process symbol resolutions in a more complicated way than just adjusting
linkage.
Patch by Peter Collingbourne.
Reviewers: rafael, tejohnson, mehdi_amini
Subscribers: lhames, tejohnson, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D20268
Address review comments
llvm-svn: 278330
It's more than just inttoptr, but the others can't be tested until we have
support for non-trivial constants (they currently get unavoidably folded to a
ConstantInt).
llvm-svn: 278303
Summary:
This patch define and implement amdgcn image intrinsics with sampler.
1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty,
and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name
to have three suffixes to overload each of these three types;
2. D128 as well as two other flags are implied in the three types, for example,
if you use v8i32 as resource type, then r128 is 0!
3. don't expose TFE flag, and other flags are exposed in the instruction order:
unrm, glc, slc, lwe and da.
Differential Revision: http://reviews.llvm.org/D22838
Reviewed by:
arsenm and tstellarAMD
llvm-svn: 278291
Summary:
I think it is much better this way.
When I firstly saw line:
Cost += InlineConstants::LastCallToStaticBonus;
I though that this is a bug, because everywhere where the cost is being reduced
it is usuing -=.
Reviewers: eraman, tejohnson, mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23222
llvm-svn: 278290
Summary: make_scope_exit() is described in C++ proposal p0052r2, which uses RAII to do cleanup works at scope exit.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22796
llvm-svn: 278251
Summary:
A particular coroutine usage pattern, where a coroutine is created, manipulated and
destroyed by the same calling function, is common for coroutines implementing
RAII idiom and is suitable for allocation elision optimization which avoid
dynamic allocation by storing the coroutine frame as a static `alloca` in its
caller.
coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed
when dynamic allocation elision happens:
```
entry:
%elide = call i8* @llvm.coro.alloc()
%need.dyn.alloc = icmp ne i8* %elide, null
br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
dyn.alloc:
%alloc = call i8* @CustomAlloc(i32 4)
br label %coro.begin
coro.begin:
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
%hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null,
i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*))
```
and
```
%mem = call i8* @llvm.coro.free(i8* %hdl)
%need.dyn.free = icmp ne i8* %mem, null
br i1 %need.dyn.free, label %dyn.free, label %if.end
dyn.free:
call void @CustomFree(i8* %mem)
br label %if.end
if.end:
...
```
If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant.
Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234)
6.Add coroutine heap elision + tests. <= we are here
7.Add the rest of the logic (split into more patches)
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23245
llvm-svn: 278242
This adds an InlineParams struct which is populated from the command line options by getInlineParams and passed to getInlineCost for the call analyzer to use.
Differential revision: https://reviews.llvm.org/D22120
llvm-svn: 278189
Summary:
The inliner not being a function pass requires the work-around of
generating the OptimizationRemarkEmitter and in turn BFI on demand.
This will go away after the new PM is ready.
BFI is only computed inside ORE if the user has requested hotness
information for optimization diagnostitics (-pass-remark-with-hotness at
the 'opt' level). Thus there is no additional overhead without the
flag.
Reviewers: hfinkel, davidxl, eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22694
llvm-svn: 278185
In case there are compilers that support neither __FUNCSIG__ or
__PRETTY_FUNCTION__, we fall back to __func__ as a last resort,
which should be guaranteed by C++11 and C99.
llvm-svn: 278176
MSVC doesn't have this, it only has __FUNCSIG__. So this adds
a new macro called LLVM_PRETTY_FUNCTION which evaluates to the
right thing on any platform.
llvm-svn: 278170
For now put them all in the entry block. This should be correct but may give
poor runtime performance. Hopefully MachineSinking combined with
isReMaterializable can solve those issues, but if not the interface is sound
enough to support alternatives.
llvm-svn: 278168
The patch is to fix the bug in PR28705. It was caused by setting wrong return
value for SCEVExpander::findExistingExpansion. The return values of findExistingExpansion
have different meanings when the function is used in different ways so it is easy to make
mistake. The fix creates two new interfaces to replace SCEVExpander::findExistingExpansion,
and specifies where each interface is expected to be used.
Differential Revision: https://reviews.llvm.org/D22942
llvm-svn: 278161
The fix for PR28705 will be committed consecutively.
In D12090, the ExprValueMap was added to reuse existing value during SCEV expansion.
However, const folding and sext/zext distribution can make the reuse still difficult.
A simplified case is: suppose we know S1 expands to V1 in ExprValueMap, and
S1 = S2 + C_a
S3 = S2 + C_b
where C_a and C_b are different SCEVConstants. Then we'd like to expand S3 as
V1 - C_a + C_b instead of expanding S2 literally. It is helpful when S2 is a
complex SCEV expr and S2 has no entry in ExprValueMap, which is usually caused
by the fact that S3 is generated from S1 after const folding.
In order to do that, we represent ExprValueMap as a mapping from SCEV to
ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a} into the
ExprValueMap when we create SCEV for V1. When S3 is expanded, it will first
expand S2 to V1 - C_a because of S2->{V1, C_a} in the map, then expand S3 to
V1 - C_a + C_b.
Differential Revision: https://reviews.llvm.org/D21313
llvm-svn: 278160
One exception here is LoopInfo which must forward-declare it (because
the typedef is in LoopPassManager.h which depends on LoopInfo).
Also, some includes for LoopPassManager.h were needed since that file
provides the typedef.
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.
Thanks to David for the suggestion.
llvm-svn: 278079
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.
Thanks to David for the suggestion.
llvm-svn: 278078
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.
Thanks to David for the suggestion.
llvm-svn: 278077
The DebugDirectory contains a pointer to the CodeView info structure which is a
derivative of the OMF debug directory. The structure has evolved a bit over
time, and PDB 2.0 used a slightly different definition from PDB 7.0. Both of
these are specific to CodeView and not COFF. Reflect this by moving the
structure definitions into the DebugInfo/CodeView headers. Define a generic
DebugInfo union type that can be used to pass around a reference to the
DebugInfo irrespective of the versioning. NFC.
llvm-svn: 278075
This reverts commit r278048. Something changed between the last time I
built this--it takes awhile on my ridiculously slow and ancient
computer--and now that broke this.
llvm-svn: 278053
Summary:
Based on two patches by Michael Mueller.
This is a target attribute that causes a function marked with it to be
emitted as "hotpatchable". This particular mechanism was originally
devised by Microsoft for patching their binaries (which they are
constantly updating to stay ahead of crackers, script kiddies, and other
ne'er-do-wells on the Internet), but is now commonly abused by Windows
programs to hook API functions.
This mechanism is target-specific. For x86, a two-byte no-op instruction
is emitted at the function's entry point; the entry point must be
immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding.
This padding is where the patch code is written. The two byte no-op is
then overwritten with a short jump into this code. The no-op is usually
a `movl %edi, %edi` instruction; this is used as a magic value
indicating that this is a hotpatchable function.
Reviewers: majnemer, sanjoy, rnk
Subscribers: dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D19908
llvm-svn: 278048
Summary:
Ensure that the MemorySSA object never changes address when using the
new pass manager since the walkers contained by MemorySSA cache pointers
to it at construction time. This is achieved by wrapping the
MemorySSAAnalysis result in a unique_ptr. Also add some asserts that
check for this bug.
Reviewers: george.burgess.iv, dberlin
Subscribers: mcrosier, hfinkel, chandlerc, silvas, llvm-commits
Differential Revision: https://reviews.llvm.org/D23171
llvm-svn: 278028
This patch adds support for some new relocation models to the ARM
backend:
* Read-only position independence (ROPI): Code and read-only data is accessed
PC-relative. The offsets between all code and RO data sections are known at
static link time. This does not affect read-write data.
* Read-write position independence (RWPI): Read-write data is accessed relative
to the static base register (r9). The offsets between all writeable data
sections are known at static link time. This does not affect read-only data.
These two modes are independent (they specify how different objects
should be addressed), so they can be used individually or together. They
are otherwise the same as the "static" relocation model, and are not
compatible with SysV-style PIC using a global offset table.
These modes are normally used by bare-metal systems or systems with
small real-time operating systems. They are designed to avoid the need
for a dynamic linker, the only initialisation required is setting r9 to
an appropriate value for RWPI code.
I have only added support to SelectionDAG, not FastISel, because
FastISel is currently disabled for bare-metal targets where these modes
would be used.
Differential Revision: https://reviews.llvm.org/D23195
llvm-svn: 278015
For some reason, MSVC2013's cl.exe crashes with
fatal error C1001: An internal error has occurred in the compiler
with this when compiling e.g. LoopDistribute.cpp.
llvm-svn: 278011
Summary:
They are now lexed as a single token on targets where
MCAsmInfo::HasMipsExpressions is true and then parsed in a similar way to
the '~' operator as part of MCExpr::parseExpression.
As a result:
* expressions and immediates no longer have different parsing rules. The
difference is now solely down to whether evaluateAsAbsolute() succeeds.
* %hi(%neg(%gp_rel(x))) are no longer parsed as a single operator and
decomposed into the three MipsMCExpr nodes. They are parsed directly as
three MipsMCExpr nodes.
* parseMemOperand no longer needs to eat all the surrounding parenthesis
to get at the outermost operator to make this work
* %hi(%neg(%gp_rel(x))) and %lo(%neg(%gp_rel(x))) are no longer the only
3-in-1 relocs that parse for N64. They're still the only combinations that
are permitted in relocatable expressions though. Fixing that should be a
later patch.
* We no longer need to list all the tokens that can occur as the first token of
an expression or immediate.
test/MC/Mips/expr1.s:
This change also prevents the incorrect lowering of %lo(2*4)+foo to
%lo(8+foo) which is not an equivalent expression (the difference is
whether foo is truncated to 16-bit or not) and the test has been
updated to account for the macro expansion the correct expression requires.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D23110
llvm-svn: 277988
Summary:
The correctness fix here is that when we CSE a load with another load,
we need to combine the metadata on the two loads. This matches the
behavior of other passes, like instcombine and GVN.
There's also a minor optimization improvement here: for load PRE, the
aliasing metadata on the inserted load should be the same as the
metadata on the original load. Not sure why the old code was throwing
it away.
Issue found by inspection.
Differential Revision: http://reviews.llvm.org/D21460
llvm-svn: 277977
Summary:
CoroSplit pass processes the coroutine twice. First, it lets it go through
complete IPO optimization pipeline as a single function. It forces restart
of the pipeline by inserting an indirect call to an empty function "coro.devirt.trigger"
which is devirtualized by CoroElide pass that triggers a restart of the pipeline by CGPassManager.
(In later patches, when CoroSplit pass sees the same coroutine the second time, it splits it up,
adds coroutine subfunctions to the SCC to be processed by IPO pipeline.)
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. <= we are here
6.Add coroutine heap elision + tests.
7.Add the rest of the logic (split into more patches)
Reviewers: mehdi_amini, majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23234
llvm-svn: 277936
Summary:
This is the 4c patch of the coroutine series. CoroElide pass now checks if PostSplit coro.begin
is referenced by coro.subfn.addr intrinsics. If so replace coro.subfn.addrs with an appropriate coroutine
subfunction associated with that coro.begin.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization <= we are here
5.Add CGSCC restart trigger + tests.
6.Add coroutine heap elision + tests.
7.Add the rest of the logic (split into more patches)
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23229
llvm-svn: 277908
It breaks ExecutionEngine/OrcLazy/weak-function.ll on most bots.
Script:
--
...
--
Exit Code: 1
Command Output (stderr):
--
Could not find main function.
llvm-svn: 277907
This adds partial support for weak functions to the CompileOnDemandLayer by
modifying the addLogicalModule method to check for existing stub definitions
before building a new stub for a weak function. This scheme is sufficient to
support ODR definitions, but fails for general weak definitions if strong
definition is encountered after the first weak definition. (A more extensive
refactor will be required to fully support weak symbols).
This patch does *not* add weak symbol support to RuntimeDyld: I hope to add
that in the near future.
llvm-svn: 277896
This resubmits a3770391c5fb64108d565e12f61dd77ce71b5b4f,
which was reverted due to breakages on non-Windows machines.
Due to differences in template instantiation rules on Microsoft
and non-Microsoft platforms, a member access restriction was
triggering on non-Microsoft compilers. Previously, a friend
declaration for std::vector<> had been introduced into the
DebugMap class to make the member access restriction pass,
but the introduction of support for SmallVector<> meant that
an additional friend declaration would need to be added.
This didn't really make a lot of sense since the user of the
macro is probably only using one type (SmallVector<>, vector<>,
etc) and we could in theory add support for even more types
to this macro in the future (e.g. std::deque), so rather than
add another friend declaration, I just made the type being
referenced a public nested typedef instead of a private nested
typedef.
llvm-svn: 277888
Until now, our use case for the visitor has been to take a stream of bytes
representing a type stream, deserialize the records in sequence, and do
something with them, where "something" is determined by how the user
implements a particular set of callbacks on an abstract class.
For actually writing PDBs, however, we want to do the reverse. We have
some kind of description of the list of records in their in-memory format,
and we want to process each one. Perhaps by serializing them to a byte
stream, or perhaps by converting them from one description format (Yaml)
to another (in-memory representation).
This was difficult in the current model because deserialization and
invoking the callbacks were tightly coupled.
With this patch we change this so that TypeDeserializer is itself an
implementation of the particular set of callbacks. This decouples
deserialization from the iteration over a list of records and invocation
of the callbacks. TypeDeserializer is initialized with another
implementation of the callback interface, so that upon deserialization it
can pass the deserialized record through to the next set of callbacks. In
a sense this is like an implementation of the Decorator design pattern,
where the Deserializer is a decorator.
This will be useful for writing Pdbs from yaml, where we have a
description of the type records in Yaml format. In this case, the visitor
implementation would have each visitation callback method implemented in
such a way as to extract the proper set of fields from the Yaml, and it
could maintain state that builds up a list of these records. Finally at
the end we can pass this information through to another set of callbacks
which serializes them into a byte stream.
Reviewed By: majnemer, ruiu, rnk
Differential Revision: https://reviews.llvm.org/D23177
llvm-svn: 277871
Currently YAML sequences require std::vectors. All of the methods that the
YAML parser accesses though are present in SmallVector, so there's no
reason we can't support SmallVector inherently. This patch does that.
Reviewed By: majnemer
Differential Revision: https://reviews.llvm.org/D23213
llvm-svn: 277870
The analysis manager was made not optional and turned into a
reference instead of a pointer in r272978. Some comments were
still refering to the previous behavior.
llvm-svn: 277857
This prevents handles from being invalidated (through iterator invalidation)
when new modules are added.
No test-case yet: This bug was uncovered during work on an upcoming patch for
weak symbol support and the testcase for that feature will implicitly test for
correct behavior here.
llvm-svn: 277847
This differs from the previous version by being more careful about template
instantiation/specialization in order to prevent errors when building with
clang -Werror. Specifically:
* begin is not defined in the template and is instead instantiated when Head
is. I think the warning when we don't do that is wrong (PR28815) but for now
at least do it this way to avoid the warning.
* Instead of performing template specializations in LLVM_INSTANTIATE_REGISTRY
instead provide a template definition then do explicit instantiation. No
compiler I've tried has problems with doing it the other way, but strictly
speaking it's not permitted by the C++ standard so better safe than sorry.
Original commit message:
Currently the Registry class contains the vestiges of a previous attempt to
allow plugins to be used on Windows without using BUILD_SHARED_LIBS, where a
plugin would have its own copy of a registry and export it to be imported by
the tool that's loading the plugin. This only works if the plugin is entirely
self-contained with the only interface between the plugin and tool being the
registry, and in particular this conflicts with how IR pass plugins work.
This patch changes things so that instead the add_node function of the registry
is exported by the tool and then imported by the plugin, which solves this
problem and also means that instead of every plugin having to export every
registry they use instead LLVM only has to export the add_node functions. This
allows plugins that use a registry to work on Windows if
LLVM_EXPORT_SYMBOLS_FOR_PLUGINS is used.
llvm-svn: 277806
Add a generalized IRBuilderCallbackInserter, which is just given a
callback to execute after insertion. This can be used to get rid of
the custom inserter in InstCombine, which will in turn allow me to add
target specific InstCombineCalls API for intrinsics without horrible
layering violations.
llvm-svn: 277784
This is the forth patch in the coroutine series. CoroEaly pass now lowers coro.resume
and coro.destroy intrinsics by replacing them with an indirect call to an address
returned by coro.subfn.addr intrinsic. This is done so that CGPassManager recognizes
devirtualization when CoroElide replaces a call to coro.subfn.addr with an appropriate
function address.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22998
llvm-svn: 277765
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.
Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
for a maximum size allowed and relies on the next condition checking for %4 for correctness.
This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).
Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.
Reviewers: arsenm, jlebar, tstellarAMD
Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23068
llvm-svn: 277735
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.
The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.
Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.
Differential Revision: https://reviews.llvm.org/D21379
llvm-svn: 277725
overloaded (and simpler).
Sean rightly pointed out in code review that we've started using
"wrapper pass" as a specific part of the old pass manager, and in fact
it is more applicable there. Here, we really have a pass *template* to
build a repeated pass, so call it that.
llvm-svn: 277689
pdbdump calls DbiStreamBuilder::commit through PDBFileBuilder::commit
without calling DbiStreamBuilder::finalize. Because `finalize` initializes
`Header` member, `Header` remained nullptr which caused a crash bug.
Differential Revision: https://reviews.llvm.org/D23143
llvm-svn: 277681
changing them to Expected<> to allow them to pass through llvm Errors.
No functional change.
This commit by itself will break the next lld builds. I’ll be committing the
matching change for lld immediately next.
llvm-svn: 277656
This reverts commit the revert commit r277627. The build errors
mentioned in r277627 were likely caused by an unclean build directory.
Sorry for the noise.
llvm-svn: 277630
This reverts commit r277540. It breaks the build with:
../lib/Object/Archive.cpp:264:41: error: return type of out-of-line definition of 'llvm::object::ArchiveMemberHeader::getUID' differs from that in the declaration
Expected<unsigned> ArchiveMemberHeader::getUID() const {
~~~~~~~~~~~~~~~~~~ ^
include/llvm/Object/Archive.h:53:12: note: previous declaration is here
unsigned getUID() const;
~~~~~~~~ ^
llvm-svn: 277627
MappedBlockSTream can work with any sequence of block data where
the ordering is specified by a list of block numbers. So rather
than manually stitch them together in the case of the FPM, reuse
this functionality so that we can treat the FPM as if it were
contiguous.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D23066
llvm-svn: 277609
required by MSVC 2013.
This also makes the repeating pass wrapper assignable. Mildly
unfortunate as it means we can't use a const member for the int, but
that is a really minor invariant to try to preserve at the cost of loss
of regularity of the type. Yet another annoyance of the particular C++
object / move semantic model.
llvm-svn: 277582
manager.
While this has some utility for debugging and testing on its own, it is
primarily intended to demonstrate the technique for adding custom
wrappers that can provide more interesting interation behavior in
a nice, orthogonal, and composable layer.
Being able to write these kinds of very dynamic and customized controls
for running passes was one of the motivating use cases of the new pass
manager design, and this gives a hint at how they might look. The actual
logic is tiny here, and most of this is just wiring in the pipeline
parsing so that this can be widely used.
I'm adding this now to show the wiring without a lot of business logic.
This is a precursor patch for showing how a "iterate up to N times as
long as we devirtualize a call" utility can be added as a separable and
composable component along side the CGSCC pass management.
Differential Revision: https://reviews.llvm.org/D22405
llvm-svn: 277581
reason about and less error prone.
The core idea is to fully parse the text without trying to identify
passes or structure. This is done with a single state machine. There
were various bugs in the logic around this previously that were repeated
and scattered across the code. Having a single routine makes it much
easier to fix and get correct. For example, this routine doesn't suffer
from PR28577.
Then the actual pass construction is handled using *much* easier to read
code and simple loops, with particular pass manager construction sunk to
live with other pass construction. This is especially nice as the pass
managers *are* in fact passes.
Finally, the "implicit" pass manager synthesis is done much more simply
by forming "pre-parsed" structures rather than having to duplicate tons
of logic.
One of the bugs fixed by this was evident in the tests where we accepted
a pipeline that wasn't really well formed. Another bug is PR28577 for
which I have added a test case.
The code is less efficient than the previous code but I'm really hoping
that's not a priority. ;]
Thanks to Sean for the review!
Differential Revision: https://reviews.llvm.org/D22724
llvm-svn: 277561
Move those two options to llc:
The options in CommandFlags.h are shared by dsymutil, gold, llc,
llvm-dwp, llvm-lto, llvm-mc, lto, opt.
-stop-after/-start-after only affect codegen passes however only gold and llc
actually create codegen passes and I believe these flags to be only
useful for users of llc. For the other tools they are just highly
confusing: -stop-after claims to "Stop compilation after a specific
pass" which is not true in the context of the "opt" tool.
Differential Revision: https://reviews.llvm.org/D23050
llvm-svn: 277551
None of GlobalISel requires the property, but this lets us use the
verifier instead of rolling our own "all instructions selected" check.
llvm-svn: 277484
Selected: the InstructionSelect pass ran and all pre-isel generic
instructions have been eliminated; i.e., all instructions are now
target-specific or non-pre-isel generic instructions (e.g., COPY).
Since only pre-isel generic instructions can have generic virtual register
operands, this also means that all generic virtual registers have been
constrained to virtual registers (assigned to register classes) and that
all sizes attached to them have been eliminated.
This lets us enforce certain invariants across passes.
This property is GlobalISel-specific, but is always available.
llvm-svn: 277482
Fixes PR28670
Summary:
Rewrite the use optimizer to be less memory intensive and 50% faster.
Fixes PR28670
The new use optimizer works like a standard SSA renaming pass, storing
all possible versions a MemorySSA use could get in a stack, and just
tracking indexes into the stack.
This uses much less memory than caching N^2 alias query results.
It's also a lot faster.
The current version defers phi node walking to the normal walker.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23032
llvm-svn: 277480
The InstructionSelect pass assumes that RegBankSelect ran; set the
property on all tests (thereby verifying the test inputs) and require
it in the pass.
llvm-svn: 277477
RegBankSelected: the RegBankSelect pass ran and all generic virtual
registers have been assigned to a register bank.
This lets us enforce certain invariants across passes.
This property is GlobalISel-specific, but is always available.
llvm-svn: 277475
RegBankSelect and InstructionSelect run after the legalizer and
require a Legalized function: check that all instructions are legal.
Note that this should be in the MachineVerifier, but it can't use the
MachineLegalizer as it's currently in the separate GlobalISel library.
Note that the RegBankSelect verifier checks have the same layering
problem, but we only use inline methods so end up not needing to link
against the GlobalISel library.
llvm-svn: 277472
Legalized: The MachineLegalizer ran; all pre-isel generic instructions
have been legalized, i.e., all instructions are now one of:
- generic and always legal (e.g., COPY)
- target-specific
- legal pre-isel generic instructions.
This lets us enforce certain invariants across passes.
This property is GlobalISel-specific, but is always available.
llvm-svn: 277470
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277411
Summary: By generalize the interface, users are able to inject more flexible Node token into the algorithm, for example, a pair of vector<Node>* and index integer. Currently I only migrated SCCIterator to use NodeRef, but more is coming. It's a NFC.
Reviewers: dblaikie, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22937
llvm-svn: 277399
Common symbol support in ORC was broken in r270716 when the symbol resolution
rules in RuntimeDyld were changed. With the switch to lazily materialized
symbols in r277386, common symbols can be supported by having
RuntimeDyld::emitCommonSymbols search for (but not materialize!) definitions
elsewhere in the logical dylib.
This patch adds the 'Common' flag to JITSymbolFlags, and the necessary check
to RuntimeDyld::emitCommonSymbols.
llvm-svn: 277397
The FPM is split at regular intervals across the MSF file, as the MS code
suggests. It turns out that the value of the interval is precisely the
block size. If the block size is 4096, then there are two Fpm pages every
4096 blocks.
So here we teach the PDBFile class to parse a split FPM, and also add more
options when dumping the FPM to display some additional information such
as orphaned pages (pages which the FPM says are allocated, but which
nothing appears to use), use after free pages (pages which the FPM says
are not allocated, but which are referenced by a stream), and multiple use
pages (pages which the FPM says are allocated but are used more than
once).
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D23022
llvm-svn: 277388
This patch replaces RuntimeDyld::SymbolInfo with JITSymbol: A symbol class
that is capable of lazy materialization (i.e. the symbol definition needn't be
emitted until the address is requested). This can be used to support common
and weak symbols in the JIT (though this is not implemented in this patch).
For consistency, RuntimeDyld::SymbolResolver is renamed to JITSymbolResolver.
For space efficiency a new class, JITEvaluatedSymbol, is introduced that
behaves like the old RuntimeDyld::SymbolInfo - i.e. it is just a pair of an
address and symbol flags. Instances of JITEvaluatedSymbol can be used in
symbol-tables to avoid paying the space cost of the materializer.
llvm-svn: 277386
We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)"
Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value
of T is 1 or -1, depending on the type of the setcc, and getBooleanContents()
for the type if it is not i1.
This fixes PR28504.
llvm-svn: 277371
As it turns out, modref queries are broken with CFLAA. Specifically,
the data source we were using for determining modref behaviors
explicitly ignores operations on non-pointer values. So, it wouldn't
note e.g. storing an i32 to an i32* (or loading an i64 from an i64*).
It also ignores external function calls, rather than acting
conservatively for them.
(N.B. These operations, where necessary, *are* tracked by CFLAA; we just
use a different mechanism to do so. Said mechanism is relatively
imprecise, so it's unlikely that we can provide reasonably good modref
answers with it as implemented.)
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22978
llvm-svn: 277366
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277313
We had import_directory_table_entry and
coff_import_directory_table_entry, remove one. Also, factor out the
logic which determins if a descriptor is a terminator.
llvm-svn: 277296
are very handy when parsing text.
They are essentially a combination of startswith and a self-modifying
drop_front, or endswith and drop_back respectively.
Differential Revision: https://reviews.llvm.org/D22723
llvm-svn: 277288
Summary:
This change fixes issues with `LLVM_CONSTEXPR` functions and
`TrailingObjects::FixedSizeStorage`. In particular, some of the
functions marked `LLVM_CONSTEXPR` used by `FixedSizeStorage` were not
implemented such that they evaluate successfully as part of a constant
expression despite constant arguments.
This change also implements a more traditional template-meta path to
accommodate MSVC, and adds unit tests for `FixedSizeStorage`.
Drive-by fix: the access control for members of `TrailingObjectsImpl` is
tightened.
Reviewers: faisalv, rsmith, aaron.ballman
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D22668
llvm-svn: 277270
Summary:
This change adds `LLVM_CONSTEXPR` to functions selected as follows:
- the body is already valid under C++11 for a `constexpr` function,
- the evaluation of the function, given constant arguments, will not
fail during the evaluation of a constant expression, and
- the above properties are easily verifiable at a glance.
Note: the evaluation of the function cannot fail if the instantiation
triggers a static assertion failure.
Reviewers: faisalv, rsmith, aaron.ballman
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D22824
llvm-svn: 277269
This makes it possible to implement re-optimization on top of the
CompileOnDemandLayer.
Test case to come in a future patch: This will need an execution test, and
execution tests require a full working stack. The best option is to plumb this
API up to the C Bindings stack and add a C bindings test for this.
Patch by Sean Ogden. Thanks Sean!
llvm-svn: 277257
This should fix UB warnings from the sanitizer bots: LLD performs bit
manipulations on enums of this type, and these are UB if the underlying
storage type isn't specified.
llvm-svn: 277251
These come in two variants for now: G_INTRINSIC and G_INTRINSIC_W_SIDE_EFFECTS.
We may decide to split the latter up with finer-grained restrictions later, if
necessary.
llvm-svn: 277224
Previously this change was submitted from a Windows machine, so
changes made to the case of filenames and directory names did
not survive the commit, and as a result the CMake source file
names and the on-disk file names did not match on case-sensitive
file systems.
I'm resubmitting this patch from a Linux system, which hopefully
allows the case changes to make it through unfettered.
llvm-svn: 277213
LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter
is added to the common function analysis passes that loop passes
depend on.
The BFI and indirectly BPI used in this pass is computed lazily so no
overhead should be observed unless -pass-remarks-with-hotness is used.
This is how the patch affects the O3 pipeline:
Dominator Tree Construction
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Basic Alias Analysis (stateless AA impl)
Function Alias Analysis Results
Scalar Evolution Analysis
+ Lazy Branch Probability Analysis
+ Lazy Block Frequency Analysis
+ Optimization Remark Emitter
Loop Pass Manager
Rotate Loops
Loop Invariant Code Motion
Unswitch loops
Simplify the CFG
Dominator Tree Construction
Basic Alias Analysis (stateless AA impl)
Function Alias Analysis Results
Combine redundant instructions
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Scalar Evolution Analysis
+ Lazy Branch Probability Analysis
+ Lazy Block Frequency Analysis
+ Optimization Remark Emitter
Loop Pass Manager
Induction Variable Simplification
Recognize loop idioms
Delete dead loops
Unroll loops
...
llvm-svn: 277203
In a previous patch, it was suggested to use all caps instead of
rolling caps for initialisms, so this patch changes everything
to do this.
llvm-svn: 277190
Patch by Sunita Marathe
Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)
llvm-svn: 277189
As mentioned in commit log for r276686 this next step is adding a new
method in the ArchiveMemberHeader class to get the full name that
does proper error checking, and can be use for error messages.
To do this the name of ArchiveMemberHeader::getName() is changed to
ArchiveMemberHeader::getRawName() to be consistent with
Archive::Child::getRawName(). Then the “new” method is the addition
of a new implementation of ArchiveMemberHeader::getName() which gets
the full name and provides proper error checking. Which is mostly a rewrite
of what was Archive::Child::getName() and cleaning up incorrect uses of
llvm_unreachable() in the code which were actually just cases of errors
in the input Archives.
Then Archive::Child::getName() is changed to return Expected<> and use
the new implementation of ArchiveMemberHeader::getName() .
Also needed to change Archive::getMemoryBufferRef() with these
changes to return Expected<> as well to propagate Errors up.
As well as changing Archive::isThinMember() to return Expected<> .
llvm-svn: 277177
For MachineInstrBuilder, having to manually use RegState::Define is ugly and
makes register definitions clunkier than they need to be, so this adds two
convenience functions: addDef and addUse.
For MachineIRBuilder, we want to avoid BuildMI's first-reg-is-def rule because
it's hidden away and causes bugs. So this patch switches buildInstr to
returning a MachineInstrBuilder and adding *all* operands via addDef/addUse.
NFC.
llvm-svn: 277176
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.
This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.
This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.
Differential Review: http://reviews.llvm.org/D16829
llvm-svn: 277169
When coming from an IR label type, we set a 0 NumElements, but not
when constructing an LLT using unsized(), causing comparisons to fail.
Pick one variant and fix the other.
llvm-svn: 277161
This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of
subclasses already implement.
Differential Revision: https://reviews.llvm.org/D22885
llvm-svn: 277126
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
Summary:
copypasta doc of ImportedFunctionsInliningStatistics class
\brief Calculate and dump ThinLTO specific inliner stats.
The main statistics are:
(1) Number of inlined imported functions,
(2) Number of imported functions inlined into importing module (indirect),
(3) Number of non imported functions inlined into importing module
(indirect).
The difference between first and the second is that first stat counts
all performed inlines on imported functions, but the second one only the
functions that have been eventually inlined to a function in the importing
module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
possible to e.g. import function `A`, `B` and then inline `B` to `A`,
and after this `A` might be too big to be inlined into some other function
that calls it. It calculates this statistic by building graph, where
the nodes are functions, and edges are performed inlines and then by marking
the edges starting from not imported function.
If `Verbose` is set to true, then it also dumps statistics
per each inlined function, sorted by the greatest inlines count like
- number of performed inlines
- number of performed inlines to importing module
Reviewers: eraman, tejohnson, mehdi_amini
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D22491
llvm-svn: 277089
This broke some out-of-tree AMDGPU tests that relied on the old behavior
wherein isIntrinsic() would return true for any function that starts
with "llvm.". And in general that change will not play nicely with
out-of-tree backends.
llvm-svn: 277087
Summary:
This change adds a `ni` specifier in the `datalayout` string to denote
pointers in some given address spaces as "non-integral", and adds some
typing rules around these special pointers.
Reviewers: majnemer, chandlerc, atrick, dberlin, eli.friedman, tstellarAMD, arsenm
Subscribers: arsenm, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22488
llvm-svn: 277085
Summary:
The motivation is the same as in D22141: In order to add the hotness
attribute to optimization remarks we need BFI to be available in all
passes that emit optimization remarks. BFI depends on BPI so unless we
make this lazy as well we would still compute BPI unconditionally.
The solution is to use the new LazyBPI pass in LazyBFI and only compute
BPI when computation of BFI is requested by the client.
I extended the laziness test using a LoopDistribute test to also cover
BPI.
Reviewers: hfinkel, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22835
llvm-svn: 277083
This adds boilerplate code for all coroutine passes,
the passes are no-ops for now.
Also, a small test has been added to verify that passes execute in
the expected order or not at all if coroutine support is disabled.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22847
llvm-svn: 277033
This was a pure virtual base class whose purpose was to abstract
away the notion of how you retrieve the layout of a discontiguous
stream of blocks in an Msf file. This led to too many layers of
abstraction making it difficult to figure out what was going on
and extend things. Ultimately, a stream's layout is decided by
its length and the array of block numbers that it lives on. So
rather than have an abstract base class which can return this in
any number of ways, it's more straightforward to simply store them
as fields of a trivial struct, and also to give a more appropriate
name.
This patch does that. It renames IMsfStreamData to MsfStreamLayout,
and deletes the 2 concrete implementations, DirectoryStreamData
and IndexedStreamData. MsfStreamLayout is a trivial struct
with the necessary data.
llvm-svn: 277018
LLT() has a particular meaning: it's one invalid type. But we really
want selected instructions to have no type whatsoever.
Also verify that types don't linger after ISel, and enable the verifier
on the AArch64 select test.
llvm-svn: 277001
This version has two fixes compared to the original:
* In Registry.h the template static members are instantiated before they are
used, as clang gives an error if you do it the other way around.
* The use of the Registry template in clang-tidy is updated in the same way as
has been done everywhere else.
Original commit message:
Currently the Registry class contains the vestiges of a previous attempt to
allow plugins to be used on Windows without using BUILD_SHARED_LIBS, where a
plugin would have its own copy of a registry and export it to be imported by
the tool that's loading the plugin. This only works if the plugin is entirely
self-contained with the only interface between the plugin and tool being the
registry, and in particular this conflicts with how IR pass plugins work.
This patch changes things so that instead the add_node function of the registry
is exported by the tool and then imported by the plugin, which solves this
problem and also means that instead of every plugin having to export every
registry they use instead LLVM only has to export the add_node functions. This
allows plugins that use a registry to work on Windows if
LLVM_EXPORT_SYMBOLS_FOR_PLUGINS is used.
llvm-svn: 276973
Add unittest to {ARM | AArch64}TargetParser,and by the way correct problems as below:
1.Correct a incorrect indexing problem in AArch64TargetParser. The architecture enumeration
is shared across ARM and AArch64 in original implementation.But In the code,I just used the
index which was offset by the ARM, and this would index into the array incorrectly. To make
AArch64 has its own arch enum,or we will do a lot of slowly iterating.
2.Correct a spelling error. The parameter of llvm::AArch64::getArchExtName.
3.Correct a writing mistake, in llvm::ARM::parseArchISA.
Differential Revision: https://reviews.llvm.org/D21785
llvm-svn: 276957
The EP_CGSCCOptimizerLate extension point allows adding CallGraphSCC
passes at the end of the main CallGraphSCC passes and before any
function simplification passes run by CGPassManager.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22897
llvm-svn: 276953
Summary:
getName() involves a hashtable lookup, so is expensive given how
frequently isIntrinsic() is called. (In particular, many users cast to
IntrinsicInstr or one of its subclasses before calling
getIntrinsicID().)
This has an incidental functional change: Before, isIntrinsic() would
return true for any function whose name started with "llvm.", even if it
wasn't properly an intrinsic. The new behavior seems more correct to
me, because it's strange to say that isIntrinsic() is true, but
getIntrinsicId() returns "not an intrinsic".
Some callers want the old behavior -- they want to know whether the
caller is a recognized intrinsic, or might be one in some other version
of LLVM. For them, we added Function::hasLLVMReservedName(), which
checks whether the name starts with "llvm.".
This change is good for a 1.5% e2e speedup compiling a large Eigen
benchmark.
Reviewers: bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22065
llvm-svn: 276942
This patch lets CFLAnders respond to mod-ref queries. It also includes
a small bugfix to CFLSteens.
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22823
llvm-svn: 276939
Remove the implicit conversion from MachineInstrBundleIterator to
MachineInstr*, leaving behind an explicit conversion.
I *think* this is the last ilist_iterator-related implicit conversion to
ilist_node subclass. If I'm right, I can finally dig in and fix the UB
in ilist that these conversions were relying on.
Note that the implicit users of this conversion have already been
removed. If you have out-of-tree code that doesn't update, you might be
able to buy some time by temporarily reverting this commit.
llvm-svn: 276902
TargetOptions wants the ExceptionHandling enum. Move that to
MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in
clang. Now you can add a DWARF tag without a full rebuild of clang
semantic analysis.
llvm-svn: 276883
The instance in the input operand list allows both inputs and outputs,
but the one in (outs) is not treated specially which leads to the
MachineVerifier invoking UB (looking at an invalid MCInstrDesc field).
No functional change except in UBSan builds (maybe, who knows!), where
it fixes the legalize-add.mir test.
llvm-svn: 276872
Summary:
This is one possible solution to the problem of ignoring constraints that Simon
raised in D21473 but it's a bit of a hack.
The integrated assembler currently ignores violations of the tied register
constraints when the operands involved in a tie are both present in the AsmText.
For example, 'dati $rs, $rt, $imm' with the '$rs = $rt' will silently replace
$rt with $rs. So 'dati $2, $3, 1' is processed as if the user provided
'dati $2, $2, 1' without any diagnostic being emitted.
This is difficult to solve properly because there are multiple parts of the
matcher that are silently forcing these constraints to be met. Tied operands are
rendered to instructions by cloning previously rendered operands but this is
unnecessary because the matcher was already instructed to render the operand it
would have cloned. This is also unnecessary because earlier code has already
replaced the MCParsedOperand with the one it was tied to (so the parsed input
is matched as if it were 'dati <RegIdx 2>, <RegIdx 2>, <Imm 1>'). As a result,
it looks like fixing this properly amounts to a rewrite of the tied operand
handling which affects all targets.
This patch however, merely inserts a checking hook just before the
substitution of MCParsedOperands and the Mips target overrides it. It's not
possible to accurately check the registers are the same this early (because
numeric registers haven't been bound to a register class yet) so it cheats a
bit and checks that the tokens that produced the operand are lexically
identical. This works because tied registers need to have the same register
class but it does have a flaw. It will reject 'dati $4, $a0, 1' for violating
the constraint even though $a0 ends up as the same register as $4.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D21994
llvm-svn: 276867
Currently the Registry class contains the vestiges of a previous attempt to
allow plugins to be used on Windows without using BUILD_SHARED_LIBS, where a
plugin would have its own copy of a registry and export it to be imported by
the tool that's loading the plugin. This only works if the plugin is entirely
self-contained with the only interface between the plugin and tool being the
registry, and in particular this conflicts with how IR pass plugins work.
This patch changes things so that instead the add_node function of the registry
is exported by the tool and then imported by the plugin, which solves this
problem and also means that instead of every plugin having to export every
registry they use instead LLVM only has to export the add_node functions. This
allows plugins that use a registry to work on Windows if
LLVM_EXPORT_SYMBOLS_FOR_PLUGINS is used.
Differential Revision: http://reviews.llvm.org/D21385
llvm-svn: 276856
This lets you actually check to see if a block is valid before trying to
extract.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22699
llvm-svn: 276846
This is the second patch in the coroutine series. It adds coroutine
intrinsics and updates intrinsic cost in TargetTransformInfoImpl.h.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22659
llvm-svn: 276839
Instead of an ad-hoc collection of "buildInstr" functions with varying numbers
of registers, this uses variadic templates to provide for as many regs as
needed!
Also make IRtranslator use new "buildBr" function instead of some weird generic
one that no-one else would really use.
llvm-svn: 276762
This option, compatible with gas's -mimplicit-it, controls the
generation/checking of implicit IT blocks in ARM/Thumb assembly.
This option allows two behaviours that were not possible before:
- When in ARM mode, emit a warning when assembling a conditional
instruction that is not in an IT block. This is enabled with
-mimplicit-it=never and -mimplicit-it=thumb.
- When in Thumb mode, automatically generate IT instructions when an
instruction with a condition code appears outside of an IT block. This
is enabled with -mimplicit-it=thumb and -mimplicit-it=always.
The default option is -mimplicit-it=arm, which matches the existing
behaviour (allow conditional ARM instructions outside IT blocks without
warning, and error if a conditional Thumb instruction is outside an IT
block).
The general strategy for generating IT blocks in Thumb mode is to keep a
small list of instructions which should be in the IT block, and only
emit them when we encounter something in the input which means we cannot
continue the block. This could be caused by:
- A non-predicable instruction
- An instruction with a condition not compatible with the IT block
- The IT block already contains 4 instructions
- A branch-like instruction (including ALU instructions with the PC as
the destination), which cannot appear in the middle of an IT block
- A label (branching into an IT block is not legal)
- A change of section, architecture, ISA, etc
- The end of the assembly file.
Some of these, such as change of section and end of file, are parsed
outside of the ARM asm parser, so I've added a new virtual function to
AsmParser to ensure any previously-parsed instructions have been
emitted. The ARM implementation of this flushes the currently pending IT
block.
We now have to try instruction matching up to 3 times, because we cannot
know if the current IT block is valid before matching, and instruction
matching changes depending on the IT block state (due to the 16-bit ALU
instructions, which set the flags iff not in an IT block). In the common
case of not having an open implicit IT block and the instruction being
matched not needing one, we still only have to run the matcher once.
I've removed the ITState.FirstCond variable, because it does not store
any information that isn't already represented by CurPosition. I've also
updated the comment on CurPosition to accurately describe it's meaning
(which this patch doesn't change).
Differential Revision: https://reviews.llvm.org/D22760
llvm-svn: 276747
This adds LLVM's 3 main cast instructions (inttoptr, ptrtoint, bitcast) to the
IRTranslator. The first two are direct translations (with 2 MachineInstr types
each). Since LLT discards information, a bitcast might become trivial and we
emit a COPY in those cases instead.
llvm-svn: 276690
I consulted with Lang Hames on this work, and the goal was to add a bit
of "where" in the archive the error occurred along with what the error was.
So this step changes ArchiveMemberHeader into a class with a pointer
to the archive header and the parent archive. Which allows the methods
in the ArchiveMemberHeader to determine which member the header is
for to include that information in the error message.
For this first step the "where" is just the offset to the member in the
archive. The next step will be a new method on ArchiveMemberHeader
to get the full name, if possible, to be use in the error message. Which
will now be possible as ArchiveMemberHeader contains a pointer to
the Archive with its string table and its size, etc. so the full name can
be determined from the header if it is valid.
Also this change adds the missing checks the archive header is actually
contained in the buffer and is not truncated, as well as if the terminating
characters are correct in the header.
And changes one error message in Archive::Child::getNext() where the
name or offset to member is now added.
llvm-svn: 276686
MSVC won't provide the body of this move constructor and assignment
operator, possibly because the copy constructor is banned. Just write
it manually.
llvm-svn: 276685
This prevents StringSwitch from being used with 'auto', which is
important because the inferred type is StringSwitch rather than the
result type. This is a problem because StringSwitch stores addresses
of temporary values rather than copying or moving the value into its
own storage.
This is a compromise that still allows wrapping StringSwitch in other
temporary structures, which (unlike StringSwitch) may be non-trivial
to set up and therefore want to at least be movable. (For an example,
see QueryParser.cpp in clang-tools-extra.)
Changing this uncovered the bug in PassBuilder, also in this patch.
Clang doesn't seem to have any occurrences of the issue.
Re-commit of r276652.
llvm-svn: 276671
There didn't appear to be a good reason to use iplist in this case, a regular
list of unique_ptr works just as well.
Change made in preparation to a new PM port (since iplist is not moveable).
llvm-svn: 276668
Some targets, notably AArch64 for ILP32, have different relocation encodings
based upon the ABI. This is an enabling change, so a future patch can use the
ABIName from MCTargetOptions to chose which relocations to use. Tested using
check-llvm.
The corresponding change to clang is in: http://reviews.llvm.org/D16538
Patch by: Joel Jones
Differential Revision: https://reviews.llvm.org/D16213
llvm-svn: 276654
...but most importantly, it cannot be used well with 'auto', because
the inferred type is StringSwitch rather than the result type. This
is a problem because StringSwitch stores addresses of temporary
values rather than copying or moving the value into its own storage.
Changing this uncovered the bug in PassBuilder, also in this patch.
Clang doesn't seem to have any occurrences of the issue.
llvm-svn: 276652
The public InlineFunction utility assumes that the passed in
InlineFunctionInfo has a valid AssumptionCacheTracker.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22706
llvm-svn: 276609
Allowed loop vectorization with secondary FP IVs. Like this:
float *A;
float x = init;
for (int i=0; i < N; ++i) {
A[i] = x;
x -= fp_inc;
}
The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute.
Differential Revision: https://reviews.llvm.org/D21330
llvm-svn: 276554
This unblocks the new PM part of River's patch in
https://reviews.llvm.org/D22706
Conveniently, this same change was needed for D21921 and so these
changes are just spun out from there.
llvm-svn: 276515
This change lets us prove things like
"{X,+,10} s< 5000" implies "{X+7,+,10} does not sign overflow"
It does this by replacing replacing getConstantDifference by
computeConstantDifference (which is smarter) in
isImpliedCondOperandsViaRanges.
llvm-svn: 276505
This adds versions of operator + and - which are optimized for the LHS/RHS of the
operator being RValue's. When an RValue is available, we can use its storage space
instead of allocating new space.
On code such as ConstantRange which makes heavy use of APInt's over 64-bits in size,
this results in significant numbers of saved allocations.
Thanks to David Blaikie for all the review and most of the code here.
llvm-svn: 276470
This adds the actual MachineLegalizeHelper to do the work and a trivial pass
wrapper that legalizes all instructions in a MachineFunction. Currently the
only transformation supported is splitting up a vector G_ADD into one acting on
smaller vectors.
llvm-svn: 276461
Previously it was storing all the fields of an msf::Layout as
separate members. This is a trivial cleanup to make it store
an msf::Layout directly. This makes the code more readable
since it becomes clear which fields of PDBFile are actually the
msf specific layout information in a sea of other bookkeeping
fields.
llvm-svn: 276460
This makes it easier to have the writable and readable PDB
interfaces share code since the read/write and write-only
interfaces now share a single allocator, you don't have to worry
about a builder building a read only interface and then having
the read-only interface's data become corrupt when the builder
goes out of scope. Now the allocator is specified explicitly
to all constructors, so all interfaces can share a single allocator
that is scoped appropriately.
llvm-svn: 276459
This provides a better layering of responsibilities among different
aspects of PDB writing code. Some of the MSF related code was
contained in CodeView, and some was in PDB prior to this. Further,
we were often saying PDB when we meant MSF, and the two are
actually independent of each other since in theory you can have
other types of data besides PDB data in an MSF. So, this patch
separates the MSF specific code into its own library, with no
dependencies on anything else, and DebugInfoCodeView and
DebugInfoPDB take dependencies on DebugInfoMsf.
llvm-svn: 276458
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address
space.
With this change, these intrinsics are overloaded for any adddress space
for memory objects
and we can use these llvm invariant intrinsics in non-default address
spaces.
Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)
This overloaded intrinsic is needed for representing final or invariant
memory in managed languages.
Reviewers: apilipenko, reames
Subscribers: llvm-commits
llvm-svn: 276447
This allows ErrorAsOutParameter to work better with "optional" errors. For
example, consider a function where for certain input values it is known that
the function can't fail. This can now be written as:
Result foo(Arg X, Error *Err) {
ErrorAsOutParameter EAO(Err);
if (<Error Condition>) {
if (Err)
*Err = <report error>;
else
llvm_unreachable("Unexpected failure!");
}
}
Rather than having to construct an ErrorAsOutParameter under every conditional
where Err is known to be non-null.
llvm-svn: 276430
This facilitates code reuse between the builder classes and the
"frozen" read only versions of the classes used for parsing
existing PDB files.
llvm-svn: 276427
This implements support for writing compiland and compiland source
file info to a binary PDB. This is tested by adding support for
dumping these fields from an existing PDB to yaml, reading them
back in, and dumping them again and verifying the values are as
expected.
llvm-svn: 276426
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.
This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.
We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).
Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly).
Differential Revision: https://reviews.llvm.org/D22460
llvm-svn: 276416
This change adds a hasFileAtIndex method. getChildDeclContext can first call this method, and if it returns true it knows it can then lookup the resolved path cache for the given file index. If we hit that cache then we don't even have to call getFileNameByIndex.
Running dsymutil against the swift executable built from github gives a 20% performance improvement without any change in the binary.
Differential Revision: https://reviews.llvm.org/D22655
Reviewed by friss.
llvm-svn: 276380
Move needsComdatForCounter() to lib/ProfileData/InstrProf.cpp from
lib/Transforms/Instrumentation/InstrProfiling.cpp to make is available for
other files.
Differential Revision: https://reviews.llvm.org/D22643
llvm-svn: 276330
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address space.
With this change, these intrinsics are overloaded for any adddress space for memory objects
and we can use these llvm invariant intrinsics in non-default address spaces.
Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)
This overloaded intrinsic is needed for representing final or invariant memory in managed languages.
Reviewers: tstellarAMD, reames, apilipenko
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22519
llvm-svn: 276316
This provides an elegant pattern to solve the "construct if not in map
already" problem we have many times in LLVM. Without try_emplace we
either have to rely on a sentinel value (nullptr) or do two lookups.
llvm-svn: 276277
They were all auto-incremented from 0 anyway, and I'm getting really annoying
conflicts and runtime failures when different people add more for GlobalISel
(and even when I'm refactoring my own patches).
NFC.
llvm-svn: 276204
The earlier change added hotness attribute to missed-optimization
remarks. This follows up with the analysis remarks (the ones explaining
the reason for the missed optimization).
llvm-svn: 276192
This helps because LoopAccessReport is passed around as a const
reference and we derive the basic block passed as the Value parameter
from the instruction in LoopAccessReport.
llvm-svn: 276191
A seemingly common use for the walker's getClobberingMemoryAccess
function is:
```
MemoryAccess *getClobber(MemorySSAWalker *W, MemoryUseOrDef *MUD) {
const Instruction *I = MUD->getMemoryInst();
return W->getClobberingMemoryAccess(I);
}
```
Which is kind of redundant, since walkers will ultimately query MSSA to
find out which MemoryAccess `I` maps to (...which is always `MUD`).
So, this patch adds an overload of getClobberingMemoryAccess that
accepts MemoryAccesses directly. As a result, the Instruction overload
of getClobberingMemoryAccess becomes a lightweight wrapper around our
new overload.
Additionally, this patch un`virtual`izes the Instruction overload of
getClobberingMemoryAccess, since there doesn't seem to be a walker that
benefits from that being virtual, and I can't think of how else one
would implement it. Happy to make it virtual again if we would benefit
from doing so.
llvm-svn: 276169
This should be all the low-level instruction selection needs to determine how
to implement an operation, with the remaining context taken from the opcode
(e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math).
llvm-svn: 276158
As noted in https://reviews.llvm.org/D22537 , we can use this functionality in
visitSelectInstWithICmp() and InstSimplify, but currently we have duplicated
code.
llvm-svn: 276140
In D12090, the ExprValueMap was added to reuse existing value during SCEV expansion.
However, const folding and sext/zext distribution can make the reuse still difficult.
A simplified case is: suppose we know S1 expands to V1 in ExprValueMap, and
S1 = S2 + C_a
S3 = S2 + C_b
where C_a and C_b are different SCEVConstants. Then we'd like to expand S3 as
V1 - C_a + C_b instead of expanding S2 literally. It is helpful when S2 is a
complex SCEV expr and S2 has no entry in ExprValueMap, which is usually caused
by the fact that S3 is generated from S1 after const folding.
In order to do that, we represent ExprValueMap as a mapping from SCEV to
ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a} into the
ExprValueMap when we create SCEV for V1. When S3 is expanded, it will first
expand S2 to V1 - C_a because of S2->{V1, C_a} in the map, then expand S3 to
V1 - C_a + C_b.
Differential Revision: https://reviews.llvm.org/D21313
llvm-svn: 276136
Reverting this commit for now as it seems to be causing failures on
test-suite tests on the clang-ppc64le-linux-lnt bot.
This reverts commit r276044.
llvm-svn: 276068
We just set PreserveLCSSA to always true since we don't have an
analogous method `mustPreserveAnalysisID(LCSSA)`.
Also port LoopInfo verifier pass to test LoopUnrollPass.
llvm-svn: 276063
canTailDuplicate accepts two blocks and returns true if the first can be
duplicated into the second successfully. Use this function to
encapsulate the heuristic.
llvm-svn: 276062
Summary:
Functions like "slice" and "drop_front" sound like they might mutate the
underlying object, but they don't. Warning on unused results would have
saved me an hour yesterday, and I'm sure I'm not the only one.
LLVM and Clang are clean wrt this warning after D22540.
Reviewers: majnemer
Subscribers: sanjoy, chandlerc, llvm-commits
Differential Revision: https://reviews.llvm.org/D22541
llvm-svn: 276058
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 276044
This adds two pieces:
- RegisterScavenger:::enterBasicBlockEnd() which behaves similar to
enterBasicBlock() but starts tracking at the end of the basic block.
- A RegisterScavenger::backward() method. It is subtly different
from the existing unprocess() method which only considers uses with
the kill flag set: If a value is dead at the end of a basic block with
a last use inside the basic block, unprocess() will fail to mark it as
live. However we cannot change/fix this behaviour because unprocess()
needs to perform the exact reverse operation of forward().
Differential Revision: http://reviews.llvm.org/D21873
llvm-svn: 276043
This step builds on Lang Hames work to change Archive::child_iterator
for better interoperation with Error/Expected. Building on that it is now
possible to return an error message when the size field of an archive
contains non-decimal characters.
llvm-svn: 276025
This makes sure that space is actually available. With this change
running lld on a full file system causes it to exit with
failed to open foo: No space left on device
instead of crashing with a sigbus.
llvm-svn: 276017
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
A companion clang patch is at D22105
Differential Revision: https://reviews.llvm.org/D22106
llvm-svn: 275981
Summary:
The triple used for this distribution is mips64el-linux-gnuabi64.
Reviewers: sdardis
Subscribers: sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D22406
llvm-svn: 275966
This patch updates MemorySSA's use-optimizing walker to be more
accurate and, in some cases, faster.
Essentially, this changed our core walking algorithm from a
cache-as-you-go DFS to an iteratively expanded DFS, with all of the
caching happening at the end. Said expansion happens when we hit a Phi,
P; we'll try to do the smallest amount of work possible to see if
optimizing above that Phi is legal in the first place. If so, we'll
expand the search to see if we can optimize to the next phi, etc.
An iteratively expanded DFS lets us potentially quit earlier (because we
don't assume that we can optimize above all phis) than our old walker.
Additionally, because we don't cache as we go, we can now optimize above
loops.
As an added bonus, this patch adds a ton of verification (if
EXPENSIVE_CHECKS are enabled), so finding bugs is easier.
Differential Revision: https://reviews.llvm.org/D21777
llvm-svn: 275940
Add a "-j" option to llvm-profdata to control the number of threads used.
Auto-detect NumThreads when it isn't specified, and avoid spawning threads when
they wouldn't be beneficial.
I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:
No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total
Changes since the initial commit:
- When handling odd-length inputs, call ThreadPool::wait() before merging the
last profile. Should fix a race/off-by-one (see r275937).
Differential Revision: https://reviews.llvm.org/D22438
llvm-svn: 275938
This is for a situation where the encoding for a register may be
different depending on the specific operand. For some instructions,
we want to apply additional restrictions beyond the encoding's
constraints.
In AMDGPU some operands are VSrc_32, using the VS_32 pseudo register
class which accept VGPRs, SGPRs, or immediates in the encoding.
Some specific instructions with the same encoding operand do not want
to allow immediates or SGPRs, but the encoding format is different
in this case than a regular VGPR_32 operand.
This allows specifying the encoding should be treated the same
without introducing yet another dummy register class.
llvm-svn: 275929
Add a "-j" option to llvm-profdata to control the number of threads
used. Auto-detect NumThreads when it isn't specified, and avoid spawning
threads when they wouldn't be beneficial.
I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:
No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total
Differential Revision: https://reviews.llvm.org/D22438
llvm-svn: 275921
Summary:
Per D22441, MSVC warns on our old implementation of isUInt<64>. It sees
uint64_t(1) << 64 and doesn't realize that it's not going to be
executed. Writing as a template specialization is ugly, but prevents
the warning.
Reviewers: RKSimon
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22472
llvm-svn: 275909
We negated a value with a signed type which invited problems when that
value was the most negative signed number. Use an unsigned type
for the value instead. It will compute the same twos complement
result without the UB.
llvm-svn: 275815
Summary:
The direct motivation for the port is to ensure that the OptRemarkEmitter
tests work with the new PM.
This remains a function pass because we not only create multiple loops
but could also version the original loop.
In the test I need to invoke opt
with -passes='require<aa>,loop-distribute'. LoopDistribute does not
directly depend on AA however LAA does. LAA uses getCachedResult so
I *think* we need manually pull in 'aa'.
Reviewers: davidxl, silvas
Subscribers: sanjoy, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D22437
llvm-svn: 275811
Summary:
The main goal is to able to start using the new OptRemarkEmitter
analysis from the LoopVectorizer. Since the vectorizer was recently
converted to the new PM, it makes sense to convert this analysis as
well.
This pass is currently tested through the LoopDistribution pass, so I am
also porting LoopDistribution to get coverage for this analysis with the
new PM.
Reviewers: davidxl, silvas
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D22436
llvm-svn: 275810
When SelectionDAGISel transforms a node representing an inline asm
block, memory constraint information is not preserved. This can cause
constraints to be broken when a memory offset is of the form:
offset + frame index
when the frame is resolved.
By propagating the constraints all the way to the backend, targets can
enforce memory operands of inline assembly to conform to their constraints.
For MIPSR6, some instructions had their offsets reduced to 9 bits from
16 bits such as ll/sc. This becomes problematic when using inline assembly
to perform atomic operations, as an offset can generated that is too big to
encode in the instruction.
Reviewers: dsanders, vkalintris
Differential Review: https://reviews.llvm.org/D21615
llvm-svn: 275786
At higher optimization levels, we generate the libcall for DIVREM_Ix, which is
fine: aeabi_{u|i}divmod. At -O0 we generate the one for REM_Ix, which is the
default {u}mod{q|h|s|d}i3.
This commit makes sure that we don't generate REM_Ix calls for ABIs that
don't support them (i.e. where we need to use DIVREM_Ix instead). This is
achieved by bailing out of FastISel, which can't handle non-double multi-reg
returns, and letting the legalization infrastructure expand the REM_Ix calls.
It also updates the divmod-eabi.ll test to run under -O0 as well, and adds some
Windows checks to it to make sure we don't break things for it.
Fixes PR27068
Differential Revision: https://reviews.llvm.org/D21926
llvm-svn: 275773
Summary:
Previously we were relying on 2's complement underflow in an int64_t.
Now we cast to a uint64_t so we explicitly get the behavior we want.
Reviewers: rnk
Subscribers: dylanmckay, llvm-commits
Differential Revision: https://reviews.llvm.org/D22445
llvm-svn: 275722
Summary:
The bit width must be greater than zero, otherwise we shift by the
integer's width, which is UB. Also (more obviously) the width must be
less than or equal to the integer's width, otherwise we shift by a
negative number, which is also UB.
Reviewers: rnk
Subscribers: llvm-commits, dylanmckay
Differential Revision: https://reviews.llvm.org/D22442
llvm-svn: 275720
Summary:
Previously we were doing 1 << S. "1" is an int, so this doesn't work
when S >= 32.
This patch also adds some static_asserts to these functions to ensure
that we don't hit UB by shifting left too much.
Reviewers: rnk
Subscribers: llvm-commits, dylanmckay
Differential Revision: https://reviews.llvm.org/D22441
llvm-svn: 275719
Summary:
To enable profile-guided indirect call promotion in ThinLTO mode, we
simply add call graph edges for each profitable target from the profile
to the summaries, then the summary-guided importing will consider the
callee for importing as usual.
Also we need to enable the indirect call promotion pass creation in the
PassManagerBuilder when PerformThinLTO=true (we are in the ThinLTO
backend), so that the newly imported functions are considered for
promotion in the backends.
The IC promotion profiles refer to callees by GUID, which required
adding GUIDs to the per-module VST in bitcode (and assigning them
valueIds similar to how they are assigned valueIds in the combined
index).
Reviewers: mehdi_amini, xur
Subscribers: mehdi_amini, davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21932
llvm-svn: 275707
Summary:
This shift is undefined behavior (and, as compiled by clang, gives the
wrong answer for maxUIntN(64)).
Reviewers: mkuper
Subscribers: llvm-commits, jroelofs, rsmith
Differential Revision: https://reviews.llvm.org/D22430
llvm-svn: 275656
The same value for EM_BPF is being propagated to glibc,
elfutils, and binutils.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 275633
Block 1 and 2 of an MSF file are bit vectors that represent the
list of blocks allocated and free in the file. We had been using
these blocks to write stream data and other data, so we mark them
as the free page map now. We don't yet serialize these pages to
the disk, but at least we make a note of what it is, and avoid
writing random data to them.
Doing this also necessitated cleaning up some of the tests to be
more general and hardcode fewer values, which is nice.
llvm-svn: 275629
Previously we would read a PDB, then write some of it back out,
but write the directory, super block, and other pertinent metadata
back out unchanged. This generates incorrect PDBs since the amount
of data written was not always the same as the amount of data read.
This patch changes things to use the newly introduced `MsfBuilder`
class to write out a correct and accurate set of Msf metadata for
the data *actually* written, which opens up the door for adding and
removing type records, symbol records, and other types of data to
an existing PDB.
llvm-svn: 275627
Summary:
When a pass tries to keep LCSSA form it's often convenient to be able to update
LCSSA for a set of instructions rather than for the entire loop. This patch makes the
processInstruction from LCSSA externally available under a name
formLCSSAForInstruction.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22378
llvm-svn: 275613
This adds an incomplete anders-style implementation for CFLAA. It's
incomplete in that it's missing interprocedural analysis, attrs
handling, etc. and that it needs more tests. More tests and features
will be added in future commits.
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22291
llvm-svn: 275602
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
Summary:
Previously we took an unsigned.
Hooray for type-safety.
Reviewers: chandlerc
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D22282
llvm-svn: 275591
Summary:
This is the first set of changes implementing the RFC from
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/98334
This is a cross-sectional patch; rather than implementing the hotness
attribute for all optimization remarks and all passes in a patch set, it
implements it for the 'missed-optimization' remark for Loop
Distribution. My goal is to shake out the design issues before scaling
it up to other types and passes.
Hotness is computed as an integer as the multiplication of the block
frequency with the function entry count. It's only printed in opt
currently since clang prints the diagnostic fields directly. E.g.:
remark: /tmp/t.c:3:3: loop not distributed: use -Rpass-analysis=loop-distribute for more info (hotness: 300)
A new API added is similar to emitOptimizationRemarkMissed. The
difference is that it additionally takes a code region that the
diagnostic corresponds to. From this, hotness is computed using BFI.
The new API is exposed via an analysis pass so that it can be made
dependent on LazyBFI. (Thanks to Hal for the analysis pass idea.)
This feature can all be enabled by setDiagnosticHotnessRequested in the
LLVM context. If this is off, LazyBFI is not calculated (D22141) so
there should be no overhead.
A new command-line option is added to turn this on in opt.
My plan is to switch all user of emitOptimizationRemark* to use this
module instead.
Reviewers: hfinkel
Subscribers: rcox2, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D21771
llvm-svn: 275583
Calling getModRefInfo with a fence resulted in crashes because fences
don't have a memory location. Add a new predicate to Instruction
called isFenceLike which indicates that the instruction mutates memory
but not any single memory location in particular. In practice, it is a
proxy for the set of instructions which "mayWriteToMemory" but cannot be
used with MemoryLocation::get.
This fixes PR28570.
llvm-svn: 275581
Summary: Convert LoopInstSimplify to new PM. Unfortunately there is no exisiting unittest for this pass.
Reviewers: davidxl, silvas
Subscribers: silvas, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D22280
llvm-svn: 275576
Most possibly problem was caused by the same reason as PR28400. This change
bypasses it by using CallbackVH instead of AssertingVH.
Differential Revision: https://reviews.llvm.org/D20957
llvm-svn: 275563
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.
Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.
Differential Revision: http://reviews.llvm.org/D19338
llvm-svn: 275561
Summary: Port Dead Loop Deletion Pass to the new pass manager.
Reviewers: silvas, davide
Subscribers: llvm-commits, sanjoy, mcrosier
Differential Revision: https://reviews.llvm.org/D21483
llvm-svn: 275453
Summary:
Make the target-specific flags in MachineMemOperand::Flags real, bona
fide enum values. This simplifies users, prevents various constants
from going out of sync, and avoids the false sense of security provided
by declaring static members in classes and then forgetting to define
them inside of cpp files.
Reviewers: MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22372
llvm-svn: 275451
Summary:
- Give it a shorter name (because we're going to refer to it often from
SelectionDAG and friends).
- Split the flags and alignment into separate variables.
- Specialize FlagsEnumTraits for it, so we can do bitwise ops on it
without losing type information.
- Make some enum values constants in MachineMemOperand instead.
MOMaxBits should not be a valid Flag.
- Simplify some of the bitwise ops for dealing with Flags.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22281
llvm-svn: 275438
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.
Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.
Differential Revision: http://reviews.llvm.org/D19338
llvm-svn: 275401
constant hoisting. It not only takes into account the number of uses and the
cost of expressions in which constants appear, but now also the resulting
integer range of the offsets. Thus, the algorithm maximizes the number of uses
within an integer range that will enable more efficient code generation. On
ARM, for example, this will enable code size optimisations because less
negative offsets will be created. Negative offsets/immediates are not supported
by Thumb1 thus preventing more compact instruction encoding.
Differential Revision: http://reviews.llvm.org/D21183
llvm-svn: 275382
Summary:
In this patch we implement the following parts of XRay:
- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.
There are some caveats here:
1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.
2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.
Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk
Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D19904
llvm-svn: 275367
This adds Clang-specific DWARF constants for nullability and ObjC
class properties that are already generated by clang. This patch adds
dwarfdump support and a more comprehensive testcase.
<rdar://problem/27335745>
llvm-svn: 275354
Avoid exposing a cl::opt in a public header and instead promote this
option in the API.
Alternatively, we could land the cl::opt in CommandFlags.h so that
it is available to every tool, but we would still have to find an
option for clang.
llvm-svn: 275348
IPRA try to optimize caller saved register by propagating register
usage information from callee to caller so it is beneficial to have
caller saved registers compare to callee saved registers when IPRA
is enabled. Please find more detailed explanation here
https://groups.google.com/d/msg/llvm-dev/XRzGhJ9wtZg/tjAJqb0eEgAJ.
This change makes local function do not have any callee preserved
register when IPRA is enabled. A simple test case is also added to
verify this change.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D21561
llvm-svn: 275347
See http://reviews.llvm.org/D22079
Changes the Archive::child_begin and Archive::children to require a reference
to an Error. If iterator increment fails (because the archive header is
damaged) the iterator will be set to 'end()', and the error stored in the
given Error&. The Error value should be checked by the user immediately after
the loop. E.g.:
Error Err;
for (auto &C : A->children(Err)) {
// Do something with archive child C.
}
// Check the error immediately after the loop.
if (Err)
return Err;
Failure to check the Error will result in an abort() when the Error goes out of
scope (as guaranteed by the Error class).
llvm-svn: 275316
Summary: Normally when you do a bitwise operation on an enum value, you
get back an instance of the underlying type (e.g. int). But using this
macro, bitwise ops on your enum will return you back instances of the
enum. This is particularly useful for enums which represent a
combination of flags.
Suppose you have a function which takes an int and a set of flags. One
way to do this would be to take two numeric params:
enum SomeFlags { F1 = 1, F2 = 2, F3 = 4, ... };
void Fn(int Num, int Flags);
void foo() {
Fn(42, F2 | F3);
}
But now if you get the order of arguments wrong, you won't get an error.
You might try to fix this by changing the signature of Fn so it accepts
a SomeFlags arg:
enum SomeFlags { F1 = 1, F2 = 2, F3 = 4, ... };
void Fn(int Num, SomeFlags Flags);
void foo() {
Fn(42, static_cast<SomeFlags>(F2 | F3));
}
But now we need a static cast after doing "F2 | F3" because the result
of that computation is the enum's underlying type.
This patch adds a mechanism which gives us the safety of the second
approach with the brevity of the first.
enum SomeFlags {
F1 = 1, F2 = 2, F3 = 4, ..., F_MAX = 128,
LLVM_MARK_AS_BITMASK_ENUM(F_MAX)
};
void Fn(int Num, SomeFlags Flags);
void foo() {
Fn(42, F2 | F3); // No static_cast.
}
The LLVM_MARK_AS_BITMASK_ENUM macro enables overloads for bitwise
operators on SomeFlags. Critically, these operators return the enum
type, not its underlying type, so you don't need any static_casts.
An advantage of this solution over the previously-proposed BitMask class
[0, 1] is that we don't need any wrapper classes -- we can operate
directly on the enum itself.
The approach here is somewhat similar to OpenOffice's typed_flags_set
[2]. But we skirt the need for a wrapper class (and a good deal of
complexity) by judicious use of enable_if. We SFINAE on the presence of
a particular enumerator (added by the LLVM_MARK_AS_BITMASK_ENUM macro)
instead of using a traits class so that it's impossible to use the enum
before the overloads are present. The solution here also seamlessly
works across multiple namespaces.
[0] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150622/283369.html
[1] http://lists.llvm.org/pipermail/llvm-commits/attachments/20150623/073434b6/attachment.obj
[2] https://cgit.freedesktop.org/libreoffice/core/tree/include/o3tl/typed_flags_set.hxx
Reviewers: chandlerc, rsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22279
llvm-svn: 275292
Summary:
This is necessary for D21771. In order to add the hotness attribute to
optimization remarks we need BFI to be available in all passes that emit
optimization remarks.
However we don't want to pay for computing BFI unless the hotness
attribute is requested.
This is achieved by making BFI lazy at the very high-level through a new
analysis pass -- BFI is not calculated unless requested.
I am adding a test to check the laziness under D21771 where the first
user of the analysis is added.
Reviewers: hfinkel, dexonsmith, davidxl
Subscribers: davidxl, dexonsmith, llvm-commits
Differential Revision: http://reviews.llvm.org/D22141
llvm-svn: 275250
Summary:
Refactored the profitability analysis out of the IC promotion pass and
into lib/Analysis so that it can be accessed by the summary index
builder in a follow-on patch to enable IC promotion in ThinLTO (D21932).
Reviewers: davidxl, xur
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22182
llvm-svn: 275216