Reverse iterators to doubly-linked lists can be simpler (and cheaper)
than std::reverse_iterator. Make it so.
In particular, change ilist<T>::reverse_iterator so that it is *never*
invalidated unless the node it references is deleted. This matches the
guarantees of ilist<T>::iterator.
(Note: MachineBasicBlock::iterator is *not* an ilist iterator, but a
MachineInstrBundleIterator<MachineInstr>. This commit does not change
MachineBasicBlock::reverse_iterator, but it does update
MachineBasicBlock::reverse_instr_iterator. See note at end of commit
message for details on bundle iterators.)
Given the list (with the Sentinel showing twice for simplicity):
[Sentinel] <-> A <-> B <-> [Sentinel]
the following is now true:
1. begin() represents A.
2. begin() holds the pointer for A.
3. end() represents [Sentinel].
4. end() holds the poitner for [Sentinel].
5. rbegin() represents B.
6. rbegin() holds the pointer for B.
7. rend() represents [Sentinel].
8. rend() holds the pointer for [Sentinel].
The changes are #6 and #8. Here are some properties from the old
scheme (which used std::reverse_iterator):
- rbegin() held the pointer for [Sentinel] and rend() held the pointer
for A;
- operator*() cost two dereferences instead of one;
- converting from a valid iterator to its valid reverse_iterator
involved a confusing increment; and
- "RI++->erase()" left RI invalid. The unintuitive replacement was
"RI->erase(), RE = end()".
With vector-like data structures these properties are hard to avoid
(since past-the-beginning is not a valid pointer), and don't impose a
real cost (since there's still only one dereference, and all iterators
are invalidated on erase). But with lists, this was a poor design.
Specifically, the following code (which obviously works with normal
iterators) now works with ilist::reverse_iterator as well:
for (auto RI = L.rbegin(), RE = L.rend(); RI != RE;)
fooThatMightRemoveArgFromList(*RI++);
Converting between iterator and reverse_iterator for the same node uses
the getReverse() function.
reverse_iterator iterator::getReverse();
iterator reverse_iterator::getReverse();
Why doesn't iterator <=> reverse_iterator conversion use constructors?
In order to catch and update old code, reverse_iterator does not even
have an explicit conversion from iterator. It wouldn't be safe because
there would be no reasonable way to catch all the bugs from the changed
semantic (see the changes at call sites that are part of this patch).
Old code used this API:
std::reverse_iterator::reverse_iterator(iterator);
iterator std::reverse_iterator::base();
Here's how to update from old code to new (that incorporates the
semantic change), assuming I is an ilist<>::iterator and RI is an
ilist<>::reverse_iterator:
[Old] ==> [New]
reverse_iterator(I) (--I).getReverse()
reverse_iterator(I) ++I.getReverse()
--reverse_iterator(I) I.getReverse()
reverse_iterator(++I) I.getReverse()
RI.base() (--RI).getReverse()
RI.base() ++RI.getReverse()
--RI.base() RI.getReverse()
(++RI).base() RI.getReverse()
delete &*RI, RE = end() delete &*RI++
RI->erase(), RE = end() RI++->erase()
=======================================
Note: bundle iterators are out of scope
=======================================
MachineBasicBlock::iterator, also known as
MachineInstrBundleIterator<MachineInstr>, is a wrapper to represent
MachineInstr bundles. The idea is that each operator++ takes you to the
beginning of the next bundle. Implementing a sane reverse iterator for
this is harder than ilist. Here are the options:
- Use std::reverse_iterator<MBB::i>. Store a handle to the beginning of
the next bundle. A call to operator*() runs a loop (usually
operator--() will be called 1 time, for unbundled instructions).
Increment/decrement just works. This is the status quo.
- Store a handle to the final node in the bundle. A call to operator*()
still runs a loop, but it iterates one time fewer (usually
operator--() will be called 0 times, for unbundled instructions).
Increment/decrement just works.
- Make the ilist_sentinel<MachineInstr> *always* store that it's the
sentinel (instead of just in asserts mode). Then the bundle iterator
can sniff the sentinel bit in operator++().
I initially tried implementing the end() option as part of this commit,
but updating iterator/reverse_iterator conversion call sites was
error-prone. I have a WIP series of patches that implements the final
option.
llvm-svn: 280032
Optional.
For void functions the return type of a nonblocking call changes from
Expected<future<Optional<bool>>> to Expected<future<Error>>, and for functions
returning T the return type changes from Expected<future<Optional<T>>> to
Expected<future<Expected<T>>>.
Inner results need to be checked (since the RPC connection may have dropped
out before a result came back) and Error/Expected provide stronger checking
requirements. It also allows us drop the crufty 'optionalToError' function and
just collapse Errors in the single-threaded call primitives.
llvm-svn: 280016
Instead of putting all possible requests into a single table, we can perform
the extremely dense lookup based on opcode and type-index in constant time
using multi-dimensional array-like things.
This roughly halves the time spent doing legalization, which was dominated by
queries against the Actions table.
llvm-svn: 280011
There should be no functional change here, I'm just making the implementation
of "frem" (to libcall) legalization easier for a followup.
llvm-svn: 279987
Summary: No functional changes, just refactoring to make D23947 simpler.
Reviewers: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23954
llvm-svn: 279982
Summary:
[Coroutines] Part 9: Add cleanup subfunction.
This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll)
Intrinsic Changes:
* coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine.
* coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined.
CoroSplit now creates three subfunctions:
# f$resume - resume logic
# f$destroy - cleanup logic, followed by a deallocation code
# f$cleanup - just the cleanup code
CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not.
Other fixes, improvements:
* Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point.
* Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>)
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23844
llvm-svn: 279971
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.
This matches how we treat these ops in IR with isSafeToSpeculativelyExecute(). There's a
similar bug in Constant::canTrap().
This bug manifests in PR29114:
https://llvm.org/bugs/show_bug.cgi?id=29114
...as a sequence of scalar divisions instead of a vector division on x86 for a <3 x float>
type.
Differential Revision: https://reviews.llvm.org/D23974
llvm-svn: 279970
switch to using one indirect stub manager per logical dylib rather than one per
input module.
LogicalDylib is a helper class used by the CompileOnDemandLayer to manage
symbol resolution between modules during lazy compilation. In particular, it
ensures that internal symbols resolve correctly even in the case where multiple
input modules contain the same internal symbol name (which must to be promoted
to external hidden linkage so that functions in any given module can be split
out by lazy compilation). LogicalDylib's resolution scheme (before this commit)
required one stub-manager per input module. This made recompilation of functions
(by adding a module containing a new definition) difficult, as the stub manager
for any given symbol was bound to the module that supplied the original
definition. By using one stubs manager for the whole logical dylib symbols can
be more easily replaced, although support for doing this is not included in this
patch (it will be implemented in a follow up).
llvm-svn: 279952
Fixed a bug in run-time checks for possible memory conflicts inside loop.
The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size".
Differential revision: https://reviews.llvm.org/D23176
llvm-svn: 279930
When global-isel fails on a MachineFunction MF, MF will be cleaned up
and given to SDISel.
Thanks to this fallback, we can already perform correctness test even if
we support only a small portion of the functions in a test.
llvm-svn: 279891
This is used to communicate that the instruction selection pipeline
failed at some point.
Another way to achieve that would be to have some kind of conditional
scheduling in the PassManager, such that we only schedule a pass based
on the success/failure of another one. The property approach has the
advantage of being lightweight and solve the problem at stake.
llvm-svn: 279885
Summary:
Have the cache pass back the path to the cache entry when it
is ready to be loaded, instead of a buffer.
For gold-plugin we can simply pass this file back to gold directly,
which avoids expensive writing of a separate tmp file. Ensure
the cache entry is not deleted on cleanup by adjusting the setting
of the IsTemporary flags.
Moved the loading of the buffer into llvm-lto2 to maintain current
behavior.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23946
llvm-svn: 279883
By default, this hook tells GlobalISel to abort (report a fatal error)
when it encounters an error. The alternative will be to fall back on
SDISel.
This fall back will be removed when the bring-up of GlobalISel is over.
llvm-svn: 279879
This method allows to reset the state of a MachineFunction as if it was
just created. This will be used during the bring-up of GlobalISel to
provide a way to fallback on SelectionDAG. That way, we can start doing
correctness testing even if we are not able to select all functions via
the global instruction selector.
llvm-svn: 279876
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.
Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered. I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off. As before without -Rpass-with-hotness
there is no overhead.
Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'. As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.
Reviewers: eraman, chandlerc, davidxl, hfinkel
Subscribers: tejohnson, Prazek, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D23415
llvm-svn: 279860
We already have obtained a pointer to the underlying GlobalObject,
use it directly to find the comdat, rather than using the
GlobalValue::getComdat which will do the same thing again.
llvm-svn: 279856
MCContext already has many tasks, and separating CodeView out from it is
probably a good idea. The .cv_loc tracking was modelled on the DWARF
tracking which lived directly in MCContext.
Removes the inclusion of MCCodeView.h from MCContext.h, so now there are
only 10 build actions while I hack on CodeView support instead of 265.
llvm-svn: 279847
Summary: Dead store elimination gets very expensive when large numbers of instructions need to be analyzed. This patch limits the number of instructions analyzed per store to the value of the memdep-block-scan-limit parameter (which defaults to 100). This resulted in no observed difference in performance of the generated code, and no change in the statistics for the dead store elimination pass, but improved compilation time on some files by more than an order of magnitude.
Reviewers: dexonsmith, bruno, george.burgess.iv, dberlin, reames, davidxl
Subscribers: davide, chandlerc, dberlin, davidxl, eraman, tejohnson, mbodart, llvm-commits
Differential Revision: https://reviews.llvm.org/D15537
llvm-svn: 279833
We can't mark ORE (a function pass) preserved as required by the loop
passes because that is how we ensure that the required passes like
LazyBFI are all available any time ORE is used. See the new comments in
the patch.
Instead we use it directly just like the inliner does in D22694.
As expected there is some additional overhead after removing the caching
provided by analysis passes. The worst case, I measured was
LNT/CINT2006_ref/401.bzip2 which regresses by 12%. As before, this only
affects -Rpass-with-hotness and not default compilation.
llvm-svn: 279829
This function allows getting arbitrary sized block of random bytes.
Primary motivation is support for --build-id=uuid in lld.
Differential revision: https://reviews.llvm.org/D23671
llvm-svn: 279807
The assertion doesn't always hold true as sizeof(SDNodeBits) isn't equal
to sizeof(uint16_t) for some targets. For example, sizeof(SDNodeBits)
evaluates to 1, not 2, for ARM's APCS targets.
llvm-svn: 279797
MMI must match the function passed, and MF has a handle on MMI. Use that instead
of accepting it as separate argument. No Functional Change.
llvm-svn: 279701
Save the function in the class, and then don't pass it around. This reduces the
number of parameters and makes calls to member functions simpler.
No Functional Change.
llvm-svn: 279700
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.
Differential Revision: http://reviews.llvm.org/D23850
llvm-svn: 279698
tracksSubRegLiveness only depends on the Subtarget and a cl::opt, there
is not need to change it or save/parse it in a .mir file.
Make the field const and move the initialization LiveIntervalAnalysis to the
MachineRegisterInfo constructor. Also cleanup some code and fix some
instances which better use MachineRegisterInfo::subRegLivenessEnabled() instead
of TargetSubtargetInfo::enableSubRegLiveness().
llvm-svn: 279676
Summary:
This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.
Reviewed by:
arsenm and tstellarAMD
Differential Revision:
http://reviews.llvm.org/D22489
llvm-svn: 279660
In cases where .dwo/.dwp files are guaranteed to be available, skipping
the extra online (in the .o file) inline info can save a substantial
amount of space - see the original r221306 for more details there.
llvm-svn: 279650
The register allocator can split a live interval of a register into a set
of smaller intervals. After the allocation of registers is complete, the
rewriter will modify the IR to replace virtual registers with the corres-
ponding physical registers. At this stage, if a register corresponding
to a subregister of a virtual register is used, the rewriter will check
if that subregister is undefined, and if so, it will add the <undef> flag
to the machine operand. The function verifying liveness of the subregis-
ter would assume that it is undefined, unless any of the subranges of the
live interval proves otherwise.
The problem is that the live intervals created during splitting do not
have any subranges, even if the original parent interval did. This could
result in the <undef> flag placed on a register that is actually defined.
Differential Revision: http://reviews.llvm.org/D21189
llvm-svn: 279625
manager, including both plumbing and logic to handle function pass
updates.
There are three fundamentally tied changes here:
1) Plumbing *some* mechanism for updating the CGSCC pass manager as the
CG changes while passes are running.
2) Changing the CGSCC pass manager infrastructure to have support for
the underlying graph to mutate mid-pass run.
3) Actually updating the CG after function passes run.
I can separate them if necessary, but I think its really useful to have
them together as the needs of #3 drove #2, and that in turn drove #1.
The plumbing technique is to extend the "run" method signature with
extra arguments. We provide the call graph that intrinsically is
available as it is the basis of the pass manager's IR units, and an
output parameter that records the results of updating the call graph
during an SCC passes's run. Note that "...UpdateResult" isn't a *great*
name here... suggestions very welcome.
I tried a pretty frustrating number of different data structures and such
for the innards of the update result. Every other one failed for one
reason or another. Sometimes I just couldn't keep the layers of
complexity right in my head. The thing that really worked was to just
directly provide access to the underlying structures used to walk the
call graph so that their updates could be informed by the *particular*
nature of the change to the graph.
The technique for how to make the pass management infrastructure cope
with mutating graphs was also something that took a really, really large
number of iterations to get to a place where I was happy. Here are some
of the considerations that drove the design:
- We operate at three levels within the infrastructure: RefSCC, SCC, and
Node. In each case, we are working bottom up and so we want to
continue to iterate on the "lowest" node as the graph changes. Look at
how we iterate over nodes in an SCC running function passes as those
function passes mutate the CG. We continue to iterate on the "lowest"
SCC, which is the one that continues to contain the function just
processed.
- The call graph structure re-uses SCCs (and RefSCCs) during mutation
events for the *highest* entry in the resulting new subgraph, not the
lowest. This means that it is necessary to continually update the
current SCC or RefSCC as it shifts. This is really surprising and
subtle, and took a long time for me to work out. I actually tried
changing the call graph to provide the opposite behavior, and it
breaks *EVERYTHING*. The graph update algorithms are really deeply
tied to this particualr pattern.
- When SCCs or RefSCCs are split apart and refined and we continually
re-pin our processing to the bottom one in the subgraph, we need to
enqueue the newly formed SCCs and RefSCCs for subsequent processing.
Queuing them presents a few challenges:
1) SCCs and RefSCCs use wildly different iteration strategies at
a high level. We end up needing to converge them on worklist
approaches that can be extended in order to be able to handle the
mutations.
2) The order of the enqueuing need to remain bottom-up post-order so
that we don't get surprising order of visitation for things like
the inliner.
3) We need the worklists to have set semantics so we don't duplicate
things endlessly. We don't need a *persistent* set though because
we always keep processing the bottom node!!!! This is super, super
surprising to me and took a long time to convince myself this is
correct, but I'm pretty sure it is... Once we sink down to the
bottom node, we can't re-split out the same node in any way, and
the postorder of the current queue is fixed and unchanging.
4) We need to make sure that the "current" SCC or RefSCC actually gets
enqueued here such that we re-visit it because we continue
processing a *new*, *bottom* SCC/RefSCC.
- We also need the ability to *skip* SCCs and RefSCCs that get merged
into a larger component. We even need the ability to skip *nodes* from
an SCC that are no longer part of that SCC.
This led to the design you see in the patch which uses SetVector-based
worklists. The RefSCC worklist is always empty until an update occurs
and is just used to handle those RefSCCs created by updates as the
others don't even exist yet and are formed on-demand during the
bottom-up walk. The SCC worklist is pre-populated from the RefSCC, and
we push new SCCs onto it and blacklist existing SCCs on it to get the
desired processing.
We then *directly* update these when updating the call graph as I was
never able to find a satisfactory abstraction around the update
strategy.
Finally, we need to compute the updates for function passes. This is
mostly used as an initial customer of all the update mechanisms to drive
their design to at least cover some real set of use cases. There are
a bunch of interesting things that came out of doing this:
- It is really nice to do this a function at a time because that
function is likely hot in the cache. This means we want even the
function pass adaptor to support online updates to the call graph!
- To update the call graph after arbitrary function pass mutations is
quite hard. We have to build a fairly comprehensive set of
data structures and then process them. Fortunately, some of this code
is related to the code for building the cal graph in the first place.
Unfortunately, very little of it makes any sense to share because the
nature of what we're doing is so very different. I've factored out the
one part that made sense at least.
- We need to transfer these updates into the various structures for the
CGSCC pass manager. Once those were more sanely worked out, this
became relatively easier. But some of those needs necessitated changes
to the LazyCallGraph interface to make it significantly easier to
extract the changed SCCs from an update operation.
- We also need to update the CGSCC analysis manager as the shape of the
graph changes. When an SCC is merged away we need to clear analyses
associated with it from the analysis manager which we didn't have
support for in the analysis manager infrsatructure. New SCCs are easy!
But then we have the case that the original SCC has its shape changed
but remains in the call graph. There we need to *invalidate* the
analyses associated with it.
- We also need to invalidate analyses after we *finish* processing an
SCC. But the analyses we need to invalidate here are *only those for
the newly updated SCC*!!! Because we only continue processing the
bottom SCC, if we split SCCs apart the original one gets invalidated
once when its shape changes and is not processed farther so its
analyses will be correct. It is the bottom SCC which continues being
processed and needs to have the "normal" invalidation done based on
the preserved analyses set.
All of this is mostly background and context for the changes here.
Many thanks to all the reviewers who helped here. Especially Sanjoy who
caught several interesting bugs in the graph algorithms, David, Sean,
and others who all helped with feedback.
Differential Revision: http://reviews.llvm.org/D21464
llvm-svn: 279618
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.
This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.
Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.
Differential Revision: http://reviews.llvm.org/D23736
llvm-svn: 279602
Specifying isSSA is an extra line at best and results in invalid MI at
worst. Compute the value instead.
Differential Revision: http://reviews.llvm.org/D22722
llvm-svn: 279600
Change this pass constructor to just accept a const TargetMachine * and
use INITIALIZE_TM_PASS, that way we can get rid of the dummy
constructor. The pass will still fail when calling the default
constructor leading to TM == nullptr, this is no different than before
but is more in line what other codegen passes are doing and avoids the
dummy constructor.
llvm-svn: 279598
Summary:
In clang commit r268509 we started to invoke loop-unroll pass from the
driver even under -Os. However, we happen to not initialize optsize
thresholds properly, which si fixed with this change.
r268509 led to some big compile time regressions, because we started to
unroll some loops that we didn't unroll before. With this change I hope
to recover most of the regressions. We still are slightly slower than
before, because we do some checks here and there in loop-unrolling
before we bail out, but at least the slowdown is not that huge now.
Reviewers: hfinkel, chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23388
llvm-svn: 279585
Add the ability to plug a cache on the LTO API.
I tried to write such that a linker implementation can
control the cache backend. This is intrusive and I'm
not totally happy with it, but I can't figure out a
better design right now.
Differential Revision: https://reviews.llvm.org/D23599
llvm-svn: 279576
I want to compute the SSA property of .mir files automatically in
upcoming patches. The problem with this is that some inputs will be
reported as static single assignment with some passes claiming not to
support SSA form. In reality though those passes do not support PHI
instructions => Track the presence of PHI instructions separate from the
SSA property.
Differential Revision: https://reviews.llvm.org/D22719
llvm-svn: 279573
They really should have both types represented, but early variants were created
before MachineInstrs could have multiple types so they're rather ambiguous.
llvm-svn: 279567
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.
This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.
Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.
Differential Revision: http://reviews.llvm.org/D23736
llvm-svn: 279564
Instructions like G_ICMP have multiple types that may need to be legalized (the
boolean output and nearly arbitrary inputs in this case). So the legalizer must
be capable of deciding what to do for each of them separately.
llvm-svn: 279554
Summary:
I assume there was a use case, so maybe this strawman patch will help
clarifying if it is legit.
In any case the current situation is not legit: a ThinLTO compilation
should not trigger an unexpected full LTO compilation.
Right now, adding a --save-temps option triggers this and makes the
number of output differs.
Reviewers: tejohnson
Subscribers: pcc, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23600
llvm-svn: 279550
Summary:
This greatly simplifies our handling of SDNode::SubclassData.
NFC, hopefully. :)
See discussion in D23035 for discussion about the design API of these
bitfields.
Reviewers: chandlerc
Subscribers: llvm-commits, rnk
Differential Revision: https://reviews.llvm.org/D23036
llvm-svn: 279537
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.
This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.
Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.
Differential Revision: http://reviews.llvm.org/D23736
llvm-svn: 279502
Separate algorithms in iplist<T> that don't depend on T into ilist_base,
and unit test them.
While I was adding unit tests for these algorithms anyway, I also added
unit tests for ilist_node_base and ilist_sentinel<T>.
To make the algorithms and unit tests easier to write, I also did the
following minor changes as a drive-by:
- encapsulate Prev/Next in ilist_node_base to so that algorithms are
easier to read, and
- update ilist_node_access API to take nodes by reference.
There should be no real functionality change here.
llvm-svn: 279484
Summary: Before the change, *Opt never actually gets updated by the end
of toNext(), so for every next time the loop has to start over from
child_begin(). This bug doesn't affect the correctness, since Visited prevents
it from re-entering the same node again; but it's slow.
Reviewers: dberris, dblaikie, dannyb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23649
llvm-svn: 279482
Remove all the dead code around ilist_*sentinel_traits. This is a
follow-up to gutting them as part of r279314 (originally r278974),
staged to prevent broken builds in sub-projects.
Uses were removed from clang in r279457 and lld in r279458.
llvm-svn: 279473
Philip commented on r279113 to ask for better comments as to
when to use the different versions of getName. Its also possible
to assert in the simple case that we aren't an overloaded intrinsic
as those have to use the more capable version of getName.
Thanks for the comments Philip.
llvm-svn: 279466
Assembler directives .dtprelword, .dtpreldword, .tprelword, and
.tpreldword generates relocations R_MIPS_TLS_DTPREL32, R_MIPS_TLS_DTPREL64,
R_MIPS_TLS_TPREL32, and R_MIPS_TLS_TPREL64 respectively.
The main motivation for this patch is to be able to write test cases
for checking correctness of the LLD linker's behaviour.
Differential Revision: https://reviews.llvm.org/D23669
llvm-svn: 279439
It use to be non-const for the sole purpose of custom handling of
commons symbol. This is moved now in the regular LTO handling now
and such we can constify the callback.
llvm-svn: 279438
The gold-plugin was doing this internally, now the API is handling
commons correctly based on the given resolution.
Differential Revision: https://reviews.llvm.org/D23739
llvm-svn: 279417
Summary: Reduce store size to avoid leading and trailing zeros.
Reviewers: kcc, eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23648
llvm-svn: 279379
Summary:
We are going to combine poisoning of red zones and scope poisoning.
PR27453
Reviewers: kcc, eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23623
llvm-svn: 279373
unit for use in the PreservedAnalyses set.
This doesn't have any important functional change yet but it cleans
things up and makes the analysis substantially more efficient by
avoiding querying through the type erasure for every analysis.
I also think it makes it much easier to reason about how analyses are
preserved when walking across pass managers and across IR unit
abstractions.
Thanks to Sean and Mehdi both for the comments and suggestions.
Differential Revision: https://reviews.llvm.org/D23691
llvm-svn: 279360
- Always compile print() regardless of LLVM_ENABLE_DUMP. (We usually
only gard dump() functions with that).
- Only show the set properties to reduce output clutter.
- Remove the unused variant that even shows the unset properties.
- Fix comments
llvm-svn: 279338
Currently nodes_iterator may dereference to a NodeType* or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later.
Differential Revision: https://reviews.llvm.org/D23704
Differential Revision: https://reviews.llvm.org/D23705
llvm-svn: 279326
This reverts commit r279053, reapplying r278974 after fixing PR29035
with r279104.
Note that r279312 has been committed in the meantime, and this has been
rebased on top of that. Otherwise it's identical to r278974.
Note for maintainers of out-of-tree code (that I missed in the original
message): if the new isKnownSentinel() assertion is firing from
ilist_iterator<>::operator*(), this patch has identified a bug in your
code. There are a few common patterns:
- Some IR-related APIs htake an IRUnit* that might be nullptr, and pass
in an incremented iterator as an insertion point. Some old code was
using "&*++I", which in the case of end() only worked by fluke. If
the IRUnit in question inherits from ilist_node_with_parent<>, you can
use "I->getNextNode()". Otherwise, use "List.getNextNode(*I)".
- In most other cases, crashes on &*I just need to check for I==end()
before dereferencing.
- There's also occasional code that sends iterators into a function, and
then starts calling I->getOperand() (or other API). Either check for
end() before the entering the function, or early exit.
Note for if the static_assert with HasObsoleteCustomization is firing
for you:
- r278513 has examples of how to stop using custom sentinel traits.
- r278532 removed ilist_nextprev_traits since no one was using it. See
lld's r278469 for the only migration I needed to do.
Original commit message follows.
----
This removes the undefined behaviour (UB) in ilist/ilist_node/etc.,
mainly by removing (gutting) the ilist_sentinel_traits customization
point and canonicalizing on a single, efficient memory layout. This
fixes PR26753.
The new ilist is a doubly-linked circular list.
- ilist_node_base has two ilist_node_base*: Next and Prev. Size-of: two
pointers.
- ilist_node<T> (size-of: two pointers) is a type-safe wrapper around
ilist_node_base.
- ilist_iterator<T> (size-of: two pointers) operates on an
ilist_node<T>*, and downcasts to T* on dereference.
- ilist_sentinel<T> (size-of: two pointers) is a wrapper around
ilist_node<T> that has some extra API for list management.
- ilist<T> (size-of: two pointers) has an ilist_sentinel<T>, whose
address is returned for end().
The new memory layout matches ilist_half_embedded_sentinel_traits<T>
exactly. The Head pointer that previously lived in ilist<T> is
effectively glued to the ilist_half_node<T> that lived in
ilist_half_embedded_sentinel_traits<T>, becoming the Next and Prev in
the ilist_sentinel_node<T>, respectively. sizeof(ilist<T>) is now the
size of two pointers, and there is never any additional storage for a
sentinel.
This is a much simpler design for a doubly-linked list, removing most of
the corner cases of list manipulation (add, remove, etc.). In follow-up
commits, I intend to move as many algorithms as possible into a
non-templated base class (ilist_base) to reduce code size.
Moreover, this fixes the UB in ilist_iterator/getNext/getPrev
operations. Previously, ilist_iterator<T> operated on a T*, even when
the sentinel was not of type T (i.e., ilist_embedded_sentinel_traits and
ilist_half_embedded_sentinel_traits). This added UB to all operations
involving end(). Now, ilist_iterator<T> operates on an ilist_node<T>*,
and only downcasts when the full type is guaranteed to be T*.
What did we lose? There used to be a crash (in some configurations) on
++end(). Curiously (via UB), ++end() would return begin() for users of
ilist_half_embedded_sentinel_traits<T>, but otherwise ++end() would
cause a nice dependable nullptr dereference, crashing instead of a
possible infinite loop. Options:
1. Lose that behaviour.
2. Keep it, by stealing a bit from Prev in asserts builds.
3. Crash on dereference instead, using the same technique.
Hans convinced me (because of the number of problems this and r278532
exposed on Windows) that we really need some assertion here, at least in
the short term. I've opted for #3 since I think it catches more bugs.
I added only a couple of unit tests to root out specific bugs I hit
during bring-up, but otherwise this is tested implicitly via the
extensive usage throughout LLVM.
Planned follow-ups:
- Remove ilist_*sentinel_traits<T>. Here I've just gutted them to
prevent build failures in sub-projects. Once I stop referring to them
in sub-projects, I'll come back and delete them.
- Add ilist_base and move algorithms there.
- Check and fix move construction and assignment.
Eventually, there are other interesting directions:
- Rewrite reverse iterators, so that rbegin().getNodePtr()==&*rbegin().
This allows much simpler logic when erasing elements during a reverse
traversal.
- Remove ilist_traits::createNode, by deleting the remaining API that
creates nodes. Intrusive lists shouldn't be creating nodes
themselves.
- Remove ilist_traits::deleteNode, by (1) asserting that lists are empty
on destruction and (2) changing API that calls it to take a Deleter
functor (intrusive lists shouldn't be in the memory management
business).
- Reconfigure the remaining callback traits (addNodeToList, etc.) to be
higher-level, pulling out a simple_ilist<T> that is much easier to
read and understand.
- Allow tags (e.g., ilist_node<T,tag1> and ilist_node<T,tag2>) so that T
can be a member of multiple intrusive lists.
llvm-svn: 279314
This spiritually reapplies r279012 (reverted in r279052) without the
r278974 parts. The differences:
- Only the HasGetNext trait exists here, so I've only cleaned up (and
tested) it. I still added HasObsoleteCustomization since I know
this will be expanding when r278974 is reapplied.
- I changed the unit tests to use static_assert to catch problems
earlier in the build.
- I added negative tests for the type traits.
Original commit message follows.
----
Change the ilist traits to use decltype instead of sizeof, and add
HasObsoleteCustomization so that additions to this list don't
need to be added in two places.
I suspect this will now work with MSVC, since the trait tested in
r278991 seems to work. If for some reason it continues to fail on
Windows I'll follow up by adding back the #ifndef _MSC_VER.
llvm-svn: 279312
This adds a G_INSERT instruction, which technically makes G_SEQUENCE redundant
(it's equivalent to a G_INSERT into an IMPLICIT_DEF). We'll leave G_SEQUENCE
for now though: it's likely to be far more common as it's a fundamental part of
legalization, so avoiding the mess and bloat of the extra IMPLICIT_DEFs is
probably worthwhile.
llvm-svn: 279306
Summary: This way they can be re-used by target-specific schedulers.
Reviewers: atrick, MatzeB, kparzysz
Subscribers: kparzysz, llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D23678
llvm-svn: 279305
was done to hopefully appease MSVC.
As an upside, this also implements the suggestion Sanjoy made in code
review, so two for one! =]
I'll be watching the bots to see if there are still issues.
llvm-svn: 279295
First, make sure all types involved are represented, rather than being implicit
from the register width.
Second, canonicalize all types to scalar. These operations just act in bits and
don't care about vectors.
Also standardize spelling of Indices in the MachineIRBuilder (NFC here).
llvm-svn: 279294
Unsigned addition and subtraction can reuse the instructions created to
legalize large width operations (i.e. both produce and consume a carry flag).
Signed operations and multiplies get a dedicated op-with-overflow instruction.
Once this is produced the two values are combined into a struct register (which
will almost always be merged with a corresponding G_EXTRACT as part of
legalization).
llvm-svn: 279278
Repeated inserts into AliasSetTracker have quadratic behavior - inserting a
pointer into AST is linear, since it requires walking over all "may" alias
sets and running an alias check vs. every pointer in the set.
We can avoid this by tracking the total number of pointers in "may" sets,
and when that number exceeds a threshold, declare the tracker "saturated".
This lumps all pointers into a single "may" set that aliases every other
pointer.
(This is a stop-gap solution until we migrate to MemorySSA)
This fixes PR28832.
Differential Revision: https://reviews.llvm.org/D23432
llvm-svn: 279274
Summary: We will need these in AMDGPU's new SchedStrategy implmentation.
Reviewers: MatzeB, atrick
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D23679
llvm-svn: 279270
solve completely opaque MSVC build errors. It complains about lots of
stuff with this change without givin nearly enough information to even
try to fix.
llvm-svn: 279231
to run methods, both for transform passes and analysis passes.
This also allows the analysis manager to use a different set of extra
arguments from the pass manager where useful. Consider passes over
analysis produced units of IR like SCCs of the call graph or loops.
Passes of this nature will often want to refer to the analysis result
that was used to compute their IR units (the call graph or LoopInfo).
And for transformations, they may want to communicate special update
information to the outer pass manager. With this change, it becomes
possible to have a run method for a loop pass that looks more like:
PreservedAnalyses run(Loop &L, AnalysisManager<Loop, LoopInfo> &AM,
LoopInfo &LI, LoopUpdateRecord &UR);
And to query the analysis manager like:
AM.getResult<MyLoopAnalysis>(L, LI);
This makes accessing the known-available analyses convenient and clear,
and it makes passing customized data structures around easy.
My initial use case is going to be in updating the pass manager layers
when the analysis units of IR change. But there are more use cases here
such as having a layer that lets inner passes signal whether certain
additional passes should be run because of particular simplifications
made. Two desires for this have come up in the past: triggering
additional optimization after successfully unrolling loops, and
triggering additional inlining after collapsing indirect calls to direct
calls.
Despite adding this layer of generic extensibility, the *only* change to
existing, simple usage are for places where we forward declare the
AnalysisManager template. We really shouldn't be doing this because of
the fragility exposed here, but currently it makes coping with the
legacy PM code easier.
Differential Revision: http://reviews.llvm.org/D21462
llvm-svn: 279227
r279217 where it fails to select the path that other compilers select.
The workaround won't be as careful to produce an error when an analysis
result is incorrect, but we can rely on non-MSVC builds to catch such
errors it seems and MSVC doesn't seem to support the alternative
techniques.
Hoping this brings the windows bots back to life. If not, will have to
revert all of this.
llvm-svn: 279225
into the AnalysisManager class template.
Back when I first added this base class there were separate analysis
managers and some plausible reason why it would be a useful factoring of
common code between them. However, after a lot of refactoring cleaning,
we now have *entirely* shared code. The base class was just an arbitrary
division between code in one class template and a separate class
template. It didn't add anything and forced lots of indirection through
"derived_this" for no real gain.
We can always factor a base CRTP class out with common code if there is
ever some *other* analysis manager that wants to share a subset of
logic. But for now, folding things into the primary template is
a non-trivial simplification with no down sides I see. It shortens the
code considerably, removes an unhelpful abstraction, and will make
subsequent patches *dramatically* less complex which enhance the
analysis manager infrastructure to effectively cope with invalidation.
llvm-svn: 279221
its own invalidate method.
Previously, the technique would assume that if a result didn't have an
invalidate method that didn't exactly match the expected signature it
didn't have one at all. This is in fact not the case. And we had
analyses with incorrect signatures for the invalidate method in the
tree that would be erroneously invalidated in certain cases! Yikes.
Moreover a result might legitimately want to have multiple overloads for
the invalidate method, and if one changes or a new one is needed we
again really want a compiler error. For example in the tree we had not
added the overload for a *function* IR unit to the invalidate routine
for TLI. Doh.
So a new techique for the SFINAE detection here: if the result has *any*
member spelled "invalidate" we turn off the synthesis of a default
version. We don't care if it is a member function or a member variable
or how many overloads there are. Once a result has something by that
name it must provide suitable overloads for the contexts in which it is
used. This seems much more resilient and durable.
Huge props to Richard Smith who helped me figure out how on earth we
could even do this in C++. It took quite some doing. The technique is
remarkably clean however, and merely requires that the analysis results
are not *final* classes. I think that's a requirement we can live with
even if it is a bit odd.
I've fixed the two bad in-tree analysis results. And this will make my
next change which changes the API for invalidate much easier to
validate as correct.
llvm-svn: 279217
directly produce the index as the value type result.
This requires making the index movable which is straightforward. It
greatly simplifies things by allowing us to completely avoid the builder
API and the layers of abstraction inherent there. Instead both pass
managers can directly construct these when run by value. They still
won't be constructed truly eagerly thanks to the optional in the legacy
PM. The code that directly builds the index can also just share a direct
function.
A notable change here is that the result type of the analysis for the
new PM is no longer a reference type. This was really problematic when
making changes to how we handle result types to make our interface
requirements *much* more strict and precise. But I think this is an
overall improvement.
Differential Revision: https://reviews.llvm.org/D23701
llvm-svn: 279216
The ppc64 multistage bot fails on this.
This reverts commit r279124.
Also Revert "CodeGen: Add/Factor out LiveRegUnits class; NFCI" because it depends on the previous change
This reverts commit r279171.
llvm-svn: 279199
This is a little class template that just builds an inheritance chain of
empty classes. Despite how simple this is, it can be used to really
nicely create ranked overload sets. I've added a unittest as much to
document this as test it. You can pass an object of this type as an
argument to a function overload set an it will call the first viable and
enabled candidate at or below the rank of the object.
I'm planning to use this in a subsequent commit to more clearly rank
overload candidates used for SFINAE. All credit for this technique and
both lines of code here to Richard Smith who was helping me rewrite the
SFINAE check in question to much more effectively capture the intended
set of checks.
llvm-svn: 279197
Summary: Reduce store size to avoid leading and trailing zeros.
Reviewers: kcc, eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23648
llvm-svn: 279178
This is a set of register units intended to track register liveness, it
is similar in spirit to LivePhysRegs.
You can also think of this as the liveness tracking parts of the
RegisterScavenger factored out into an own class.
This was proposed in http://llvm.org/PR27609
Differential Revision: http://reviews.llvm.org/D21916
llvm-svn: 279171
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.
Differential Revision: https://reviews.llvm.org/D23597
llvm-svn: 279129
Re-apply r276044 with off-by-1 instruction fix for the reload placement.
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 279124
When running 'opt -O2 verify-uselistorder-nodbg.lto.bc', there are 33m allocations. 8.2m
come from std::string allocations in Intrinsic::getName(). Turns out this method only
returns a std::string because it needs to handle overloads, but that is not the common case.
This adds an overload of getName which just returns a StringRef when there are no overloads
and so saves on the allocations.
llvm-svn: 279113
This reverts commit r279086, reapplying r279084. I'm not sure what I
ran before, because the compile failure for ADTTests reproduced locally.
The problem is that TestRev is calling BidirectionalVector::rbegin()
when the BidirectionalVector is const, but rbegin() is always non-const.
I've updated BidirectionalVector::rbegin() to be callable from const.
Original commit message follows.
--
As a follow-up to r278991, add some tests that check that
decltype(reverse(R).begin()) == decltype(R.rbegin()), and get them
passing by adding std::remove_reference to has_rbegin.
I'm using static_assert instead of EXPECT_TRUE (and updated the other
has_rbegin check from r278991 in the same way) since I figure that's
more helpful.
llvm-svn: 279091
The original patch was breaking some buildbots due to an
incorrect ordering of function definitions which caused some
compilers to recognize a definition but others to not.
llvm-svn: 279089
As a follow-up to r278991, add some tests that check that
decltype(reverse(R).begin()) == decltype(R.rbegin()), and get them
passing by adding std::remove_reference to has_rbegin.
I'm using static_assert instead of EXPECT_TRUE (and updated the other
has_rbegin check from r278991 in the same way) since I figure that's
more helpful.
llvm-svn: 279084
The ARMv8*-A descriptions in the ARM and AArch64 TargetParsers are incorrect
architecturally and mismatched to the backend descriptions.
RAS is an optional extension to ARMv8-A and ARMv8.1-A and mandatory in
ARMv8.2-A. Correct the ARMTargetParser descriptions which had this as enabled
by default in the earlier versions.
The FP16 and SPE extensions are optional in ARMv8.2-A and the backend defaults
them as off. They are not available as extensions to earlier ARMv8-A versions.
Correct the AArch64TargetParser which had these as enabled by default in all
ARMv8-A definitions.
These macros are only used to define preprocessor macros. There are no macros
yet as ACLE has not caught up with ARMv8.2-A so not possible to add a test.
Differential Revision: https://reviews.llvm.org/D23500
llvm-svn: 279078
Summary:
Skip the merging of common symbols for ThinLTO modules, they will be
merged by the final native object link. Trying to merge the symbols and
add to a combined module will incorrectly enable the common symbol to be
internalized in the ThinLTO module. Additionally, we will not want to
create a combined module for ThinLTO distributed builds.
This fixes failures in 7 cpu2006 benchmarks from the new LTO API in
ThinLTO mode.
Reviewers: mehdi_amini
Subscribers: pcc, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23637
llvm-svn: 279023
Summary:
We are going to combine poisoning of red zones and scope poisoning.
PR27453
Reviewers: kcc, eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23623
llvm-svn: 279020
Change the ilist traits to use decltype instead of sizeof, and add
HasObsoleteCustomization so that additions to this list don't need to be
added in two places.
I suspect this will now work with MSVC, since the trait tested in
r278991 seems to work. If for some reason it continues to fail on
Windows I'll follow up by adding back the #ifndef _MSC_VER.
llvm-svn: 279012
Summary: I later (after r278573) found that LoopIterator.h has some overlapping with LoopBodyTraits. It's good to use LoopBodyTraits because a *Traits struct is algorithm independent.
Reviewers: anemet, nadav, mkuper
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23529
llvm-svn: 278996
Duncan found that reverse worked on mutable rbegin(), but the has_rbegin
trait didn't work with a const method. See http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160815/382890.html
for more details.
Turns out this was already solved in clang with has_getDecl. Copied that and made it work for rbegin.
This includes the tests Duncan attached to that thread, including the traits test.
llvm-svn: 278991
Since I stopped writing empty export tries it causes LinkEdit to potentially be completely empty which results in invalid yaml being generated.
To prevent this we skip linkedit data if it is empty.
llvm-svn: 278985
This will allow tail duplication and tail merging during layout to have a
shared threshold to make sure that they don't overlap. No observable change
intended.
llvm-svn: 278981
This removes the undefined behaviour (UB) in ilist/ilist_node/etc.,
mainly by removing (gutting) the ilist_sentinel_traits customization
point and canonicalizing on a single, efficient memory layout. This
fixes PR26753.
The new ilist is a doubly-linked circular list.
- ilist_node_base has two ilist_node_base*: Next and Prev. Size-of: two
pointers.
- ilist_node<T> (size-of: two pointers) is a type-safe wrapper around
ilist_node_base.
- ilist_iterator<T> (size-of: two pointers) operates on an
ilist_node<T>*, and downcasts to T* on dereference.
- ilist_sentinel<T> (size-of: two pointers) is a wrapper around
ilist_node<T> that has some extra API for list management.
- ilist<T> (size-of: two pointers) has an ilist_sentinel<T>, whose
address is returned for end().
The new memory layout matches ilist_half_embedded_sentinel_traits<T>
exactly. The Head pointer that previously lived in ilist<T> is
effectively glued to the ilist_half_node<T> that lived in
ilist_half_embedded_sentinel_traits<T>, becoming the Next and Prev in
the ilist_sentinel_node<T>, respectively. sizeof(ilist<T>) is now the
size of two pointers, and there is never any additional storage for a
sentinel.
This is a much simpler design for a doubly-linked list, removing most of
the corner cases of list manipulation (add, remove, etc.). In follow-up
commits, I intend to move as many algorithms as possible into a
non-templated base class (ilist_base) to reduce code size.
Moreover, this fixes the UB in ilist_iterator/getNext/getPrev
operations. Previously, ilist_iterator<T> operated on a T*, even when
the sentinel was not of type T (i.e., ilist_embedded_sentinel_traits and
ilist_half_embedded_sentinel_traits). This added UB to all operations
involving end(). Now, ilist_iterator<T> operates on an ilist_node<T>*,
and only downcasts when the full type is guaranteed to be T*.
What did we lose? There used to be a crash (in some configurations) on
++end(). Curiously (via UB), ++end() would return begin() for users of
ilist_half_embedded_sentinel_traits<T>, but otherwise ++end() would
cause a nice dependable nullptr dereference, crashing instead of a
possible infinite loop. Options:
1. Lose that behaviour.
2. Keep it, by stealing a bit from Prev in asserts builds.
3. Crash on dereference instead, using the same technique.
Hans convinced me (because of the number of problems this and r278532
exposed on Windows) that we really need some assertion here, at least in
the short term. I've opted for #3 since I think it catches more bugs.
I added only a couple of unit tests to root out specific bugs I hit
during bring-up, but otherwise this is tested implicitly via the
extensive usage throughout LLVM.
Planned follow-ups:
- Remove ilist_*sentinel_traits<T>. Here I've just gutted them to
prevent build failures in sub-projects. Once I stop referring to them
in sub-projects, I'll come back and delete them.
- Add ilist_base and move algorithms there.
- Check and fix move construction and assignment.
Eventually, there are other interesting directions:
- Rewrite reverse iterators, so that rbegin().getNodePtr()==&*rbegin().
This allows much simpler logic when erasing elements during a reverse
traversal.
- Remove ilist_traits::createNode, by deleting the remaining API that
creates nodes. Intrusive lists shouldn't be creating nodes
themselves.
- Remove ilist_traits::deleteNode, by (1) asserting that lists are empty
on destruction and (2) changing API that calls it to take a Deleter
functor (intrusive lists shouldn't be in the memory management
business).
- Reconfigure the remaining callback traits (addNodeToList, etc.) to be
higher-level, pulling out a simple_ilist<T> that is much easier to
read and understand.
- Allow tags (e.g., ilist_node<T,tag1> and ilist_node<T,tag2>) so that T
can be a member of multiple intrusive lists.
llvm-svn: 278974
Summary:
This is part of the "NodeType* -> NodeRef" migration. Notice that since
GraphWriter prints object address as identity, I added a static_assert on
NodeRef to be a pointer type.
Reviewers: dblaikie
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D23580
llvm-svn: 278966
Summary:
Looking at the implementation, GenericDomTree has more specific
requirements on NodeRef, e.g. NodeRefObject->getParent() should compile,
and NodeRef should be a pointer. We can remove the pointer requirement,
but it seems to have little gain, given the limited use cases.
Also changed GraphTraits<Inverse<Inverse<T>> to be more accurate.
Reviewers: dblaikie, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23593
llvm-svn: 278961
This is used to mark functions with the C++11 [[ noreturn ]] or C11 _Noreturn
attributes.
Patch by Victor Leschuk!
https://reviews.llvm.org/D23167
llvm-svn: 278940
Refactored so that a LSRUse owns its fixups, as oppsed to letting the
LSRInstance own them. This makes it easier to rate formulas for
LSRUses, since the fixups are available directly. The Offsets vector
has been removed since it was no longer necessary.
New target hook isFoldableMemAccessOffset(), which is used during formula
rating.
For SystemZ, this is useful to express that loads and stores with
float or vector types with a big/negative offset should be avoided in
loops. Without this, LSR will generate a lot of negative offsets that
would require extra instructions for loading the address.
Updated tests:
test/CodeGen/SystemZ/loop-01.ll
Reviewed by: Quentin Colombet and Ulrich Weigand.
https://reviews.llvm.org/D19152
llvm-svn: 278927
In theory the indices of RC (and thus the index used for LiveRegs) may differ from the indices of OpRC.
Fixed the code to extract the correct RC index.
OpRC contains the first X consecutive elements of RC, and thus their indices are currently de facto the same, therefore a test cannot be added at this point.
Differential Revision: https://reviews.llvm.org/D23491
llvm-svn: 278923
Summary:
See D22198 for the motivation: We have a pass that uses LiveIntervals anyway,
and there is now a requirement to track a physical register that is not
usually tracked at this point of the compilation. The pass also introduces
instructions that affect this physical register, but we want to preserve
LiveIntervals.
Rather than add brittle and rarely exercised code to keep the tracking of
the physical register intact, we want to just remove the corresponding
LiveRange -- it didn't exist before anyway, and subsequent passes don't
expect it to be there.
Reviewers: MatzeB, arsenm
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D22801
llvm-svn: 278920
can given the current __cplusplus definitions).
Without this, Clang triggers TONS of warnings about using a C++17
extension. I tried using LLVM_EXTENSION to turn these off and it doesn't
work.
Suggestions on a better approach are welcome, but at least this makes
the build usable for me again.
llvm-svn: 278909
Summary:
While NFC for now, this will allow more flexibility on the client side
to hold state necessary to back up the stream.
Also when adding caching, this class will grow in complexity.
Note I blindly modified the gold-plugin as I can't compile it.
Reviewers: tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23542
llvm-svn: 278907
minimal and boring form than the old pass manager's version.
This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:
- Array alloca merging
- To support the above, bottom-up inlining with careful history
tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
Instead, it focuses on inlining functions with that attribute.
The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.
The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.
The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.
One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.
Anyways, hopefully a reasonable starting point for this pass.
Differential Revision: https://reviews.llvm.org/D23299
llvm-svn: 278896
IndVarSimplify::sinkUnusedInvariants calls
BasicBlock::getFirstInsertionPt on the ExitBlock and moves instructions
before it. This can return end(), so it's not safe to dereference. Add
an iterator-based overload to Instruction::moveBefore to avoid the UB.
llvm-svn: 278886
Rather than doing a funny dance that relies on dereferencing end() not
crashing, add some API to MachineInstrBundleIterator to get a non-const
version of the iterator.
llvm-svn: 278870
This allows you to annotate switch case fallthrough in a better way
than a "// FALLTHROUGH" comment. Eventually it would be nice to turn
on -Wimplicit-fallthrough, if we can get the code base clean.
llvm-svn: 278868
Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets.
Differential Revision: https://reviews.llvm.org/D23445
llvm-svn: 278799
Summary:
Multiple APIs were taking a StringMap for the ImportLists containing
the entries for for all the modules while operating on a single entry
for the current module. Instead we can pass the desired ModuleImport
directly. Also some of the APIs were not const, I believe just to be
able to use operator[] on the StringMap.
Reviewers: tejohnson
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23537
llvm-svn: 278776
Summary: This is similiar to r278752, where I found that the std::iterator<...> base can be normal.
Reviewers: dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23527
llvm-svn: 278753
in debug info using their stack slots instead of as an indirection of param reg + 0
offset. This is done by detecting FrameIndexSDNodes in SelectionDAG and generating
FrameIndexDbgValues for them. This ultimately generates DBG_VALUEs with stack
location operands.
Differential Revision: http://reviews.llvm.org/D23283
llvm-svn: 278703
This adds two new utility functions findLoopControlBlock and findLoopPreheader
to MachineLoop and MachineLoopInfo. These functions are refactored and taken
from the Hexagon target as they are target independent; thus this is intendend to
be a non-functional change.
Differential Revision: https://reviews.llvm.org/D22959
llvm-svn: 278661
We are trying to prove that one group of operands is a subset of
another. We did this by populating two Sets and determining that every
element within one was inside the other.
However, this is unnecessary. We can simply construct a single set and
test if each operand is within it.
llvm-svn: 278641
Summary:
Refactor the existing support into a LoopDataPrefetch implementation
class and a LoopDataPrefetchLegacyPass class that invokes it.
Add a new LoopDataPrefetchPass for the new pass manager that utilizes
the LoopDataPrefetch implementation class.
Reviewers: mehdi_amini
Subscribers: sanjoy, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23483
llvm-svn: 278591
Summary:
If the backend does not define LLVM/DWARF register mappings, the associated
variables are undefined since the map initializer is called by auto-generated
TableGen routines. This patch initializes the pointers and sizes to nullptr
and zero, respectively, and checks that they are valid before searching
for a mapping.
Reviewers: grosbach, dschuff
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23458
llvm-svn: 278574
Pattern match has some paths which can operate on constant instructions,
but not all. This adds a version of m_value() to return const Value* and
changes ICmp matching to use auto so that it can match both constant and
mutable instructions.
Tests also included for both mutable and constant ICmpInst matching.
This will be used in a future commit to constify ValueTracking.cpp.
llvm-svn: 278570
Summary:
The BitcodeWriterPass was ported a couple years ago, and predates the
PassInfoMixin. Make BitcodeWriterPass from that base class.
Should BitcodeWriterPass be added to the PassRegistry.def file? It seems
like that is only for passes that can be added arbitrarily, e.g. via the
-passes flag to the opt tool. Whereas the bitcode writer is added
specially based on the output type (and requires an output stream and
other parameters). For now I have left it out of the PassRegistry, but
let me know if it should go there.
Finally, I was considering an NFC change of the legacy WriteBitcodePass
to BitcodeWriterLegacyPass to make its usage clearer and more consistent
with other legacy passes. WDYT?
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23465
llvm-svn: 278566
No one is using the capability to implement next and prev another way
(since lld stopped doing it in r278468). Remove the customization point
by moving the API from ilist_nextprev_traits<T> to ilist_node_access.
The old traits class is still useful/necessary API as a target for
friends of node types that inherit privately from ilist_node.
Eventually I plan to either remove it entirely or move the template
parameters to the methods.
(Note: if there's desire to bring back customization of next/prev
pointers in the future (e.g., to pack some bits in there), I think a
traits class like this is an awkward way to accomplish it. Instead, we
should change ilist<T> to be ilist<ilist_node<T>>, and give an extra
template parameter to ilist_node.)
llvm-svn: 278532
Recursive calls to aliasCheck from alias[GEP|Select|PHI] may result in a second call to GetUnderlyingObject for a Value, whose underlying object is already computed. This patch ensures that in this situations, the underlying object is not computed again, and the result of the previous call is resued.
https://reviews.llvm.org/D22305
llvm-svn: 278519