CGP currently drops select's MD_prof profile data when
generating conditional branch which can lead to bad
code layout. The patch fixes the issue.
Differential Revision: http://reviews.llvm.org/D24169
llvm-svn: 280600
Because the recent change about ODR type uniquing in the context,
we can reach types defined in another module during IR linking.
This triggered some assertions in case we IR link without starting
from an empty module. To alleviate that, we can self-map metadata
defined in the destination module so that they won't be visited.
Differential Revision: https://reviews.llvm.org/D23841
llvm-svn: 280599
The only intrusive thing about SparseBitVector's usage of ilist<> was
that new was usually called externally. There were no custom traits.
It seems like the reason to switch to ilist in r41855 was to avoid
pointer invalidation, but std::list<> has that feature too. Maybe
std::list<>::emplace makes this a little more obvious than it was then.
Switch over to std::list<> and simplify the code.
llvm-svn: 280573
Inheriting from std::iterator uses more boiler-plate than manual
typedefs. Avoid that in both ilist_iterator and
MachineInstrBundleIterator.
This has the side effect of removing ilist_iterator from certain ADL
lookups in namespace std; calls to std::next need to be qualified by
"std::" that didn't have to before. The one case of this in-tree was
operating on a temporary, so I used the more compact operator++.
llvm-svn: 280570
Split out iplist_impl from iplist, and change SymbolTableList to inherit
directly from iplist_impl. This makes it more straightforward to add
new template paramaters to iplist [*]:
- iplist_impl takes a "base" list that provides the intrusive
functionality (usually simple_ilist<T>) and a traits class.
- iplist no longer takes a "Traits" template parameter. It only takes
the value_type, T, and instantiates iplist_impl with simple_ilist<T>
and ilist_traits<T>.
- SymbolTableList now inherits from iplist_impl, instead of iplist.
Note for out-of-tree code: if you have an iplist whose second template
parameter was *not* the default (i.e., not ilist_traits<YourT>), you
have three options:
- Stop using a custom traits class, and instead specialize
ilist_traits<YourT>. This is the usual thing to do.
- Specialize iplist<YourT> to pass your custom traits class into
iplist_impl.
- Create your own trivial list type that passes your custom traits class
into iplist_impl (see SymbolTableList<> for an example).
[*]: The eventual goal is to start tracking a sentinel bit on the
MachineInstr list even when LLVM_ENABLE_ABI_BREAKING_CHECKS is off,
which will enable MachineBasicBlock::reverse_iterator to have normal
list invalidation semantics that matching the new
iplist<>::reverse_iterator from r280032.
llvm-svn: 280569
Delete the dead code for Write(ilist_iterator) in the IR Verifier,
inline report(ilist_iterator) at its call sites in the MachineVerifier,
and use simple_ilist<>::iterator in SymbolTableListTraits.
The only remaining reference to ilist_iterator outside of the ilist
implementation is from MachineInstrBundleIterator. I'll get rid of that
in a follow-up.
llvm-svn: 280565
This test was using the wrong type, and so not actually testing much.
ilist_iterator constructors weren't going through ilist_node_access, so
they didn't actually work with private inheritance.
llvm-svn: 280564
1. 0xNN and NNh are accepted as valid hexadecimal numbers, but 0xNNh is not.
0xNN and NNh may come with optional U or L suffix.
2. NNb is accepted as a valid binary (base-2) number, but 0bNN is not.
NNb may come with optional U or L suffix.
Differential Revision: https://reviews.llvm.org/D22112
llvm-svn: 280555
Previously we were splitting our records at 0xFFFF bytes, which the
Microsoft tools don't like.
Should fix failure on the new Windows self-host buildbot.
This length appears in microsoft-pdb/PDB/dbi/dbiimpl.h
llvm-svn: 280522
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.
Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.
Differential Revision: https://reviews.llvm.org/D22840
llvm-svn: 280505
Crash was possible if match() method
was called on object that was moved or object
created with empty constructor.
Testcases updated.
DIfferential revision: https://reviews.llvm.org/D24123
llvm-svn: 280473
r280425 | dehao | 2016-09-01 16:15:50 -0700 (Thu, 01 Sep 2016) | 9 lines
Refactor LICM pass in preparation for LoopSink pass.
Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778).
r280429 | dehao | 2016-09-01 16:31:25 -0700 (Thu, 01 Sep 2016) | 9 lines
Refactor LICM to expose canSinkOrHoistInst to LoopSink pass.
Summary: LoopSink pass shares the same canSinkOrHoistInst functionality with LICM pass. This patch exposes this function in preparation of https://reviews.llvm.org/D22778
llvm-svn: 280453
Summary:
This is pretty useful especially in connection with
BFI's -view-block-freq-propagation-dags. It helped me to track down the
bug that is being fixed in D24118.
While -view-block-freq-propagation-dags displays the high-level
information with static heuristics included (and block frequencies), the
new thing only shows the raw weight as presented by PGO without any of
the static estimates. This helps to distinguished what has been
measured vs. estimated.
For the sample loop in D24118, -view-block-freq-propagation-dags=integer
looks like this:
https://reviews.llvm.org/F2381352
While with -view-cfg-only you can see the underlying branch weights:
https://reviews.llvm.org/F2392296
Reviewers: dexonsmith, bogner, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24144
llvm-svn: 280442
Summary: LoopSink pass shares the same canSinkOrHoistInst functionality with LICM pass. This patch exposes this function in preparation of https://reviews.llvm.org/D22778
Reviewers: chandlerc, davidxl, danielcdh
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24171
llvm-svn: 280429
Summary: This is in preparation for LoopSink pass which calls replaceDominatedUsesWith to update after sinking.
Reviewers: chandlerc, davidxl, danielcdh
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24170
llvm-svn: 280427
They're another source of generic vregs, which are going to need a type on the
definition when we remove the register width from MachineRegisterInfo.
llvm-svn: 280412
Previous change broke the C API for creating an EarlyCSE pass w/
MemorySSA by adding a bool parameter to control whether MemorySSA was
used or not. This broke the OCaml bindings. Instead, change the old C
API entry point back and add a new one to request an EarlyCSE pass with
MemorySSA.
llvm-svn: 280379
If an attribute name has special characters such as '\01', it is not
properly printed in LLVM assembly language format. Since the format
expects the special characters are printed as it is, it has to contain
escape characters to make it printable.
Before:
attributes #0 = { ... "counting-function"="^A__gnu_mcount_nc" ...
After:
attributes #0 = { ... "counting-function"="\01__gnu_mcount_nc" ...
Reviewers: hfinkel, rengolin, rjmccall, compnerd
Subscribers: nemanjai, mcrosier, hans, shenhan, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D23792
llvm-svn: 280357
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:
ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)
where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.
This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.
Fixes PR26761.
Differential Revision: https://reviews.llvm.org/D24038
llvm-svn: 280350
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.
Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.
Differential Revision: https://reviews.llvm.org/D22825
llvm-svn: 280347
Summary:
This change promotes the 'isTailCall(...)' member function to
TargetInstrInfo as a query interface for determining on a per-target
basis whether a given MachineInstr is a tail call instruction. We build
upon this in the XRay instrumentation pass to emit special sleds for
tail call optimisations, where we emit the correct kind of sled.
The tail call sleds look like a mix between the function entry and
function exit sleds. Form-wise, the sled comes before the "jmp"
instruction that implements the tail call similar to how we do it for
the function entry sled. Functionally, because we know this is a tail
call, it behaves much like an exit sled -- i.e. at runtime we may use
the exit trampolines instead of a different kind of trampoline.
A follow-up change to recognise these sleds will be done in compiler-rt,
so that we can start intercepting these initially as exits, but also
have the option to have different log entries to more accurately reflect
that this is actually a tail call.
Reviewers: echristo, rSerge, majnemer
Subscribers: mehdi_amini, dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D23986
llvm-svn: 280334
Older versions of clang defined __has_cpp_attribute in C mode, but
would choke on scoped attributes, as per llvm.org/PR23435. Since we
support building with clang all the way back to 3.1, we have to work
around this issue.
llvm-svn: 280326
Previously we were assuming that any visitation of types would
necessarily be against a type we had binary data for. Reasonable
assumption when were just reading PDBs and dumping them, but once
we start writing PDBs from Yaml this breaks down, because we have
no binary data yet, only Yaml, and from that we need to read the
record kind and perform the switch based on that.
So this patch does that. Instead of having the visitor switch
on the kind that is already in the CVType record, we change the
visitTypeBegin() method to return the Kind, and switch on the
returned value. This way, the default implementation can still
return the value from the CVType, but the implementation which
visits Yaml records and serializes binary PDB type records can
use the field in the Yaml as the source of the switch.
llvm-svn: 280307
This reverts commit r280268, it causes all MSVC 2013 to ICE. This
appears to have been fixed in a later MSVC 2013 update, because I cannot
reproduce it locally. That said, all upstream LLVM bots are broken right
now, so I am reverting.
Also reverts dependent change r280275, "[Hexagon] Deal with undefs when
extending live intervals".
llvm-svn: 280301
We were kind of hacking this together before by embedding the
ability to forward requests into the TypeDeserializer. When
we want to start adding more different kinds of visitor callback
interfaces though, this doesn't scale well and is very inflexible.
So introduce the notion of a pipeline, which itself implements
the TypeVisitorCallbacks interface, but which contains an internal
list of other callbacks to invoke in sequence.
Also update the existing uses of CVTypeVisitor to use this new
pipeline class for deserializing records before visiting them
with another visitor.
llvm-svn: 280293
More preparation for dropping source types from MachineInstrs: regsters coming
out of already-selected code (i.e. non-generic instructions) don't have a type,
but that information is needed so we must add it manually.
This is done via a new G_TYPE instruction.
llvm-svn: 280292
Summary:
Current implementation of LI verifier isn't ideal and fails to detect
some cases when LI is incorrect. For instance, it checks that all
recorded loops are in a correct form, but it has no way to check if
there are no more other (unrecorded in LI) loops in the function. This
patch adds a way to detect such bugs.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits, silvas, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23437
llvm-svn: 280280
Summary:
Use MemorySSA, if requested, to do less conservative memory dependency
checking.
This change doesn't enable the MemorySSA enhanced EarlyCSE in the
default pipelines, so should be NFC.
Reviewers: dberlin, sanjoy, reames, majnemer
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19821
llvm-svn: 280279
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.
The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.
Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)
My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.
Differential Revision: https://reviews.llvm.org/D24000
llvm-svn: 280250