For MIPS, we have to encode the personality routine with
an indirect pointer to absptr; otherwise, some link warning
warning will be raised, and the program might crash in some
early MIPS Android device.
llvm-svn: 209907
Unordered is strictly weaker than monotonic, so if the latter doesn't have any
barriers then the former certainly shouldn't.
rdar://problem/16548260
llvm-svn: 209901
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.
When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.
This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.
I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.
rdar://problem/16227836
llvm-svn: 209883
An address only use of an extract element of a load can be simplified to a
load. Without this the result of the extract element is spilled to the
stack so that an address is available.
llvm-svn: 209788
No test because no in-tree targets change the bitwidth of the
setcc type depending on the bitwidth of the compared type.
Patch by Ke Bai
llvm-svn: 209771
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).
Clang still has to be updated to expose this feature to C.
llvm-svn: 209759
This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux
self-hosting. Because nearly every regression test triggers a segfault, I hope
this will be easy to fix.
llvm-svn: 209747
Use more straightforward way to represent the set of instruction
ranges where the location of a user variable is defined - vector of pairs
of instructions (defining start/end of each range),
instead of a flattened vector of instructions where some instructions
are supposed to start the range, and the rest are supposed to "clobber" it.
Simplify the code which generates actual .debug_loc entries.
No functionality change.
llvm-svn: 209698
Current implementation of calculateDbgValueHistory already creates the
keys in the expected order (user variables are listed in order of appearance),
and should do so later by contract.
No functionality change.
llvm-svn: 209690
I'm not sure exactly where/how we end up with an abstract DbgVariable
with a null DIE, but we do... looking into it & will add a test and/or
fix when I figure it out.
Currently shows up in selfhost or compiler-rt builds.
llvm-svn: 209683
After much puppetry, here's the major piece of the work to ensure that
even when a concrete definition preceeds all inline definitions, an
abstract definition is still created and referenced from both concrete
and inline definitions.
Variables are still broken in this case (see comment in
dbg-value-inlined-parameter.ll test case) and will be addressed in
follow up work.
llvm-svn: 209677
A further step to correctly emitting concrete out of line definitions
preceeding inlined instances of the same program.
To do this, emission of subprograms must be delayed until required since
we don't know which (abstract only (if there's no out of line
definition), concrete only (if there are no inlined instances), or both)
DIEs are required at the start of the module.
To reduce the test churn in the following commit that actually fixes the
bug, this commit introduces the lazy DIE construction and cleans up test
cases that are impacted by the changes in the resulting DIE ordering.
llvm-svn: 209675
This is a precursor to fixing inlined debug info where the concrete,
out-of-line definition may preceed any inlined usage. To cope with this,
the attributes that may appear on the concrete definition or the
abstract definition are delayed until the end of the module. Then, if an
abstract definition was created, it is referenced (and no other
attributes are added to the out-of-line definition), otherwise the
attributes are added directly to the out-of-line definition.
In a couple of cases this causes not just reordering of attributes, but
reordering of types. When the creation of the attribute is delayed, if
that creation would create a type (such as for a DW_AT_type attribute)
then other top level DIEs may've been constructed during the delay,
causing the referenced type to be created and added after those
intervening DIEs. In the extreme case, in cross-cu-inlining.ll, this
actually causes the DW_TAG_basic_type for "int" to move from one CU to
another.
llvm-svn: 209674
Cortex-M4 only has single-precision floating point support, so any LLVM
"double" type will have been split into 2 i32s by now. Fortunately, the
consecutive-register framework turns out to be precisely what's needed to
reconstruct the double and follow AAPCS-VFP correctly!
rdar://problem/17012966
llvm-svn: 209650
Seems my previous fix was insufficient - we were still not adding the
inlined function to the abstract scope list. Which meant it wasn't
flagged as inline, didn't have nested lexical scopes in the abstract
definition, and didn't have abstract variables - so the inlined variable
didn't reference an abstract variable, instead being described
completely inline.
llvm-svn: 209602
This makes front/back symmetric with begin/end, avoiding some confusion.
Added instr_front/instr_back for the old behavior, corresponding to
instr_begin/instr_end. Audited all three in-tree users of back(), all
of them look like they don't want to look inside bundles.
Fixes an assertion (PR19815) when generating debug info on mips, where a
delay slot was bundled at the end of a branch.
llvm-svn: 209580
This seems like a simple cleanup/improved consistency, but also helps
lay the foundation to fix the bug mentioned in the test case: concrete
definitions preceeding any inlined usage aren't properly split into
concrete + abstract (because they're not known to need it until it's too
late).
Once we start deferring this choice until later, we won't have the
choice to put concrete definitions for inlined subroutines in a
different scope from concrete definitions for non-inlined subroutines
(since we won't know at time-of-construction which one it'll be). This
change brings those two cases into alignment ahead of that future
chaneg/fix.
llvm-svn: 209547
It's not really a "ScopeDIE", as such - it's the abstract function
definition's DIE. And we usually use "SP" for subprograms, rather than
"Sub".
llvm-svn: 209499
constructSubprogramDIE was already called for every subprogram in every
CU when the module was started - there's no need to call it again at
module finalization.
llvm-svn: 209372
This reverts commit r208930, r208933, and r208975.
It seems not all fission consumers are ready to handle this behavior.
Reverting until tools are brought up to spec.
llvm-svn: 209338
Committed in r209178 then reverted in r209251 due to LTO breakage,
here's a proper fix for the case of the missing subprogram DIE. The DIEs
were there, just in other compile units. Using the SPMap we can find the
right compile unit to search for and produce cross-unit references to
describe this kind of inlining.
One existing test case needed to be updated because it had a function
that wasn't in the CU's subprogram list, so it didn't appear in the
SPMap.
llvm-svn: 209335
This reverts commit r209178.
This seems to be asserting in an LTO build on some internal Apple
buildbots. No upstream reproduction (and I don't have an LLVM-aware gold
built right now to reproduce it personally) but it's a small patch & the
failure's semi-plausible so I'm going to revert first while I try to
reproduce this.
llvm-svn: 209251
Undecided whether this should include a test case - SROA produces bad
dbg.value metadata describing a value for a reference that is actually
the value of the thing the reference refers to. For now, loosening the
assert lets this not assert, but it's still bogus/wrong output...
If someone wants to tell me to add a test, I'm willing/able, just
undecided. Hopefully we'll get SROA fixed soon & we can tighten up this
assertion again.
llvm-svn: 209240
This change preserves the original algorithm of generating history
for user variables, but makes it more clear.
High-level description of algorithm:
Scan all the machine basic blocks and machine instructions in the order
they are emitted to the object file. Do the following:
1) If we see a DBG_VALUE instruction, add it to the history of the
corresponding user variable. Keep track of all user variables, whose
locations are described by a register.
2) If we see a regular instruction, look at all the registers it clobbers,
and terminate the location range for all variables described by these registers.
3) At the end of the basic block, terminate location ranges for all
user variables described by some register.
Although this change shouldn't be user-visible (the contents of .debug_loc section
should be the same), it changes some internal assumptions about the set
of instructions used to track the variable locations. Watching the bots.
llvm-svn: 209225
In refactoring DwarfUnit::isUnsignedDIType I restricted it to only work
on values with signedness (unsigned or signed), asserting on anything
else (which did uncover some bugs). But it turns out that we do need to
emit constants of signless data, such as pointer constants - only null
pointer constants are known to need this so far, but it's conceivable
that there might be non-null pointer constants at some point (hardcoded
address offsets for device drivers?).
This patch just uses 'unsigned' for signless data such as pointer
constants. Arguably we could use signless representations
(DW_FORM_dataN) instead, allowing a trinary result from isUnsignedDIType
(signed, unsigned, signless), but this seems reasonable for now.
llvm-svn: 209223
This workaround (presumably for ancient GDB) doesn't appear to be
required (GDB 7.5 seems to tolerate function definition DIEs in
namespace scope just fine).
llvm-svn: 209189
Since we visit the whole list of subprograms for each CU at module
start, this is clearly true - don't test for the case, just assert it.
A few old test cases seemed to have incomplete subprogram lists, but any
attempt to reproduce them shows full subprogram lists that even include
entities that have been completely inlined and the out of line
definition removed.
llvm-svn: 209178
When I refactored this in r208636 I accidentally caused this to be added
multiple times to each abstract subprogram (not accounting for the
deduplicating effect of the InlinedSubprogramDIEs set).
This got better in r208798 when the abstract definitions got the
attribute added to them at construction time, but still had the
redundant copies introduced in r208636.
This commit removes those excess DW_AT_inlines and relies solely on the
insertion in r208798.
llvm-svn: 209166
The check in DwarfDebug::constructScopeDIE was meant to consider inlined
subroutines as any non-top-level scope that was a subprogram. Instead of
checking "not top level scope" it was checking if the /subprogram's/
scope was non-top-level.
Fix this and beef up a test case to demonstrate some of the missing
inlined_subroutines are no longer missing.
In the course of fixing this I also found that r208748 (with this fix)
found one /extra/ inlined_subroutine in concrete_out_of_line.ll due to
two inlined_subroutines having the same inlinedAt location. The previous
implementation was collapsing these into a single inlined subroutine.
I'm not sure what the original code was that created this .ll file so
I'm not sure if this actually happens in practice today. Since we
deliberately include column information to disambiguate two calls on the
same line, that may've addressed this bug in the frontend, but it's good
to know that workaround isn't necessary for this particular case
anymore.
llvm-svn: 209165
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.
llvm-svn: 209123
This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern. This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions. No functional change beyond the removal of the old
constructors are intended.
llvm-svn: 209082
This is a preliminary step to help ease the construction of CallLoweringInfo.
Changing the construction to a chained function pattern requires that the
parameter be nullable. However, rather than copying the vector, save a pointer
rather than the reference to permit a late binding of the arguments.
llvm-svn: 209080
This allows us to put dynamic initializers for weak data into the same
comdat group as the data being initialized. This is necessary for MSVC
ABI compatibility. Once we have comdats for guard variables, we can use
the combination to help GlobalOpt fire more often for weak data with
guarded initialization on other platforms.
Reviewers: nlewycky
Differential Revision: http://reviews.llvm.org/D3499
llvm-svn: 209015
I'm not sure this is how it'll be going forward (I'd rather prefer the
definition to be in the main SP mapping, for various reasons) but this
helps me understand how it is today.
llvm-svn: 209009
DIBuilder maintains this invariant and the current DwarfDebug code could
end up doing weird things if it contained declarations (such as putting
the definition DIE inside a CU that contained the declaration - this
doesn't seem like a good idea, so rather than adding logic to handle
this case we'll just ban in for now & cross that bridge if we come to
it later).
llvm-svn: 209004
This reverts commit r208934.
The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.
The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.
llvm-svn: 208978
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.
For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.
llvm-svn: 208934
Since type units in the dwo file are handled by a debug aware tool, they
don't need to leverage the ELF comdat grouping to implement
deduplication. Avoid creating all the .group sections for these as a
space optimization.
llvm-svn: 208930
Abstract variables should never have/use locations. In this case the
data wasn't used, so no functional change intended here, just
simplification.
llvm-svn: 208820
Many old tests using prior schemas still had some brokenness here (both
indirect arrays and arrays with single bogus elements). Fixed those up
so they don't hit the new assertions.
Also reduced nesting in some places, etc.
llvm-svn: 208817
This is just unneccessary - we only create abstract definitions when
we're inlining anyway, so there's no reason to delay this to see if
we're going to inline anything.
llvm-svn: 208798
If the function has the landingpad instruction, then the
handlerdata should be emitted even if the function has
nouwnind attribute. Otherwise, following code will not
work:
void test1() noexcept {
try {
throw_exception();
} catch (...) {
log_unexpected_exception();
}
}
Since the cantunwind was incorrectly emitted and the
LSDA is not available.
llvm-svn: 208791
This was reverted in r208642 due to regressions surrounding file changes
within lexical scopes causing inlining information to be lost.
The issue was in LexicalScopes::getOrCreateInlinedScope, where I was
previously testing "isLexicalBlock" which is false for
"DILexicalBlockFile" (a scope used to represent changes in the current
file name) and assuming it was then a function (breaking out of the
inlined scope path and reaching for the parent non-inlined scopes). By
inverting the condition and testing for "isSubprogram" the correct
behavior is attained.
(also found some weirdness in Clang, see r208742 when reducing this test
case - the resulting test case doesn't apply with the Clang fix, but
I've added a more realistic test case to inline-scopes.ll which does
reproduce the issue and demonstrate the fix)
llvm-svn: 208748
This allows code to statically accept a Function or a GlobalVariable, but
not an alias. This is already a cleanup by itself IMHO, but the main
reason for it is that it gives a lot more confidence that the refactoring to fix
the design of GlobalAlias is correct. That will be a followup patch.
llvm-svn: 208716
compared to 'AddrMode.BaseReg'. In the case that 'AddrMode.BaseReg' is
nullptr, 'Result' will also be nullptr, so the cast causes an assertion. We
should use dyn_cast_or_null here to check 'Result' is not null and it is an
instruction.
Bug found by Mats Petersson, and I reduced his IR to get a test case.
llvm-svn: 208705
The problem occurs when a non-i1 setcc is inverted. For example 'i8 = setcc' will get 'xor 0xff' to invert this. This is clearly wrong when the boolean contents are ZeroOrOne.
This patch introduces getLogicalNOT and updates SetCC legalisation to use it.
Reviewed by Hal Finkel.
llvm-svn: 208641
Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.
This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part). See the testcase.
In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off. This is the
CommitTargetLoweringOpt piece.
I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.
Fixes <rdar://problem/16031651>
llvm-svn: 208640
One test case had to be updated as it still had the extra indirection
for the variable list - removing the extra indirection got it back to
passing.
llvm-svn: 208608
We must validate the value type in TLI::getRegisterByName, because if we
don't and the wrong type was used with the IR intrinsic, then we'll assert
(because we won't be able to find a valid register class with which to
construct the requested copy operation). For PPC64, additionally, the type
information is necessary to decide between the 64-bit register and the 32-bit
subregister.
No functionality change.
llvm-svn: 208508
Filed as PR19712, LLVM fails to detect the right type of an enum
constant when a frontend does not provide an underlying type for the
enumeration type.
llvm-svn: 208502
And the winner by a nose is isUnsignedDIType, for no particular reason.
These two functions were just complements of each other and used in very
related code, so refactor callers to just use one of them.
llvm-svn: 208500
Doesn't seem a good reason to duplicate this code (it was more literally
duplicated prior to r208494, and while the dataN code /does/ actually
fire in this case, it doesn't seem necessary (and the DWARF standard
recommends using udata/sdata pervasively instead of dataN, so as to
indicate signedness of the values))
llvm-svn: 208495
This code looks to have become dead at some time in the past. I tried to
reproduce cases where LLVM would emit constants with dataN, but could
not. Upon inspection it seems the code doesn't do that anymore - the
only time a size is provided by isTypeSigned is when the type is signed,
and in those cases we use sdata. dataN is only used for unsigned types
and isTypeSigned doesn't provide a value for sizeInBits in that case.
Remove the dead cases/size plumbing.
llvm-svn: 208494
When using the ARM AAPCS, HFAs (Homogeneous Floating-point Aggregates) must
be passed in a block of consecutive floating-point registers, or on the stack.
This means that unused floating-point registers cannot be back-filled with
part of an HFA, however this can currently happen. This patch, along with the
corresponding clang patch (http://reviews.llvm.org/D3083) prevents this.
llvm-svn: 208413
comment of the API.
Relaxes the behavior of TargetInstrInfo::commuteInstruction when
TargetInstrInfo::findCommutedOpIndices returns false.
Previously TargetInstrInfo triggered a fatal error in such situation whereas based
on the comment in the API it should just return nullptr. Indeed the only
precondition that should be ensured is that the instruction must be commutable.
llvm-svn: 208371
(r207876 was reverted in r208131 after seeing some consistent buildbot
failure for MSVC 2012. The original commits were in r207724-r207726)
Takumi was nice enough to dig into this and locate this Microsoft
Connect issue:
http://connect.microsoft.com/VisualStudio/feedback/details/814899/forward-as-tuple-debug-implementation-error
describing a bug in MSVC2012's forward_as_tuple implementation.
Since the parameters in this instance are trivial/small, pass them by
value (using make_tuple) instead of perfectly-forwarded tuple of rvalue
references (involving the broken forward_as_tuple). Hopefully this will
satisfy MSVC2012.
llvm-svn: 208364
The old method used by X86TTI to determine partial-unrolling thresholds was
messy (because it worked by testing target features), and also would not
correctly identify the target CPU if certain target features were disabled.
After some discussions on IRC with Chandler et al., it was decided that the
processor scheduling models were the right containers for this information
(because it is often tied to special uop dispatch-buffer sizes).
This does represent a small functionality change:
- For generic x86-64 (which uses the SB model and, thus, will get some
unrolling).
- For AMD cores (because they still currently use the SB scheduling model)
- For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump
the default threshold to 50; we're working on a test case for this).
Otherwise, nothing has changed for any other targets. The logic, however, has
been moved into BasicTTI, so other targets may now also opt-in to this
functionality simply by setting LoopMicroOpBufferSize in their processor
model definitions.
llvm-svn: 208289
When reducing the bitwidth of a comparison against a constant, the
original setcc's result type was used, which was incorrect.
No test since I don't think any other in tree targets change the
bitwidth of the setcc type depending on the bitwidth of the compared
type.
llvm-svn: 208236
1) Fix for printing debug locations for absolute paths.
2) Location printing is moved into public method DebugLoc::print() to avoid re-inventing the wheel.
Differential Revision: http://reviews.llvm.org/D3513
llvm-svn: 208177
Speculatively reverting due to a suspicious failure on a Windows
buildbot.
This reverts commit 10c37a012ea11596d44cd9059fe09c959caf30c8.
llvm-svn: 208131
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
llvm-svn: 208104
Committed initially in r207724-r207726 and reverted due to compiler-rt
crashes in r207732.
Instead, fix this harder with unordered_map and store the LexicalScopes
by value in the map. This did necessitate moving the definition of
LexicalScope above the definition of LexicalScopes.
Let's see how the buildbots/compilers tolerate unordered_map::emplace +
std::piecewise_construct + std::forward_as_tuple...
llvm-svn: 207876
Summary:
Get rid of UserVariables set, and turn DbgValues into MapVector
to get a fixed ordering, as suggested in review for http://reviews.llvm.org/D3573.
Test Plan: llvm regression tests
Reviewers: dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3579
llvm-svn: 207720
Breaks GDB buildbot
(http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu-gdb-75/builds/14517)
GCC emits DW_AT_object_pointer /everywhere/ (declaration, abstract
definition, inlined subroutine), but it looks like GCC relies on it
being somewhere other than the declaration, at least. I'll experiment
further & can hopefully still remove it from the inlined_subroutine.
This reverts commit r207705.
llvm-svn: 207719
They just don't need to be there - they're inherited from the abstract
definition. In theory I would like them to be inherited from the
declaration, but the DWARF standard doesn't quite say that... we can
probably do it anyway but I'm less confident about that so I'll leave it
for a separate commit.
llvm-svn: 207717
This effectively reverts r164326, but adds some comments and
justification and ensures we /don't/ emit the DW_AT_object_pointer on
the (abstract and concrete) definitions. (while still preserving it on
standalone definitions involving ObjC Blocks)
This does increase the size of member function declarations from 7 to 11
bytes, unfortunately, but still seems like the Right Thing to do so that
callers that see only the declaration still have the information about
the object pointer. That said, I don't know what, if any, DWARF
consumers don't have a heuristic to guess this in the case of normal
C++ member functions - perhaps we can remove it entirely.
llvm-svn: 207705
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
%shr = lshr i64 %x, 4
%and = and i64 %shr, 15
%arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
%0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
lsr x8, x0, #1
and x8, x8, #0x78
add x8, x9, x8
If using ubfx, it only needs 2 instrs:
ubfx x8, x0, #4, #4
add x8, x9, x8, lsl #3
This fixes bug 19589
llvm-svn: 207702
DwarfDebug.h has a SmallVector member containing a unique_ptr of an
incomplete type. MSVC doesn't have key functions, so the vtable and
dtor are emitted in AsmPrinter.cpp, where DwarfDebug's ctor is called.
AsmPrinter.cpp include DwarfUnit.h and doesn't get a complete definition
of DwarfTypeUnit. We could fix the problem by including DwarfUnit.h in
DwarfDebug.h, but that would increase header bloat. Instead, define
~DwarfDebug out of line.
llvm-svn: 207701
These were called from distinct places and had significant distinct
behavior. No need to make that a dynamic check inside the function
rather than just having two functions (refactoring some common code into
a helper function to be called from the two separate functions).
llvm-svn: 207539
Seems at some point the intent was to emit fission ranges_base as unique
per CU but the code today emits ranges_base as the start of the ranges
section for all CUs being compiled and all the ranges_base relative
addresses are relative to that. So removing this dead code and leaving
the status quo until there's a reason to change it (perhaps something's
faster if it has distinct ranges for each CU).
llvm-svn: 207464
(Clang doesn't warn here because it knows the string is benign - the
assert still checks what it's intended to - though putting the correct
parens does make clang-format format the code a little better)
llvm-svn: 207456
Since all 4 ctor calls in DwarfDebug just pass in a trivially
constructed DIE with the right tag type, sink the tag selection down
into the Dwarf*Unit ctors (removing the argument entirely from callers
in DwarfDebug) and initialize the DIE member in DwarfUnit.
llvm-svn: 207448
Now that the subtle constructScopeDIE has been refactored into two
functions - one returning memory to take ownership of, one returning a
pointer to already owning memory - push unique_ptr through more APIs.
I think this completes most of the unique_ptr ownership of DIEs.
llvm-svn: 207442
While refactoring out constructScopeDIE into two functions I realized we
were emitting DW_AT_object_pointer in the inlined subroutine when we
didn't need to (GCC doesn't, and the abstract subprogram definition has
the information already).
So here's the refactoring and the bug fix. This is one step of
refactoring to remove some subtle memory ownership semantics. It turns
out the original constructScopeDIE returned ownership in its return
value in some cases and not in others. The split into two functions now
separates those two semantics - further cleanup (unique_ptr, etc) will
follow.
llvm-svn: 207441
entry. This is in preparation for generic DW_OP_piece support.
No functional change so far.
http://reviews.llvm.org/D3373
rdar://problem/15928306
llvm-svn: 207368
Since there's no way to ensure the type unit in the .dwo and the type
unit skeleton in the .o are correlated, this cannot work.
This implementation is a bit inefficient for a few reasons, called out
in comments.
llvm-svn: 207323
Sinking addition of the declaration attribute down to where the
signature is added. So that if the signature is not added neither is the
declaration attribute (this will come in handy when aborting type unit
construction to instead emit the type into the CU directly in some
cases)
Pull out type unit identifier hashing just to simplify the function a
little, it'll be getting longer.
llvm-svn: 207321
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.
I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.
llvm-svn: 207315
The included test case would return the incorrect results, because the expansion
of an shift with a constant shift amount of 0 would generate undefined behavior.
This is because ExpandShiftByConstant assumes that all shifts by constants with
a value of 0 have already been optimized away. This doesn't happen for opaque
constants and usually this isn't a problem, because opaque constants won't take
this code path - they are not supposed to. In the case that the opaque constant
has to be expanded by the legalizer, the legalizer would drop the opaque flag.
In this case we hit the limitations of ExpandShiftByConstant and create incorrect
code.
This commit fixes the legalizer by not dropping the opaque flag when expanding
opaque constants and adding an assertion to ExpandShiftByConstant to catch this
not supported case in the future.
This fixes <rdar://problem/16718472>
llvm-svn: 207304
This also avoids the need for subtly side-effecting calls to manifest
strings in the string table at the point where items are added to the
accelerator tables.
llvm-svn: 207281
Pulls out some more code from some of the rather monolithic DWARF
classes. Unlike the address table, the string table won't move up into
DwarfDebug - each DWARF file has its own string table (but there can be
only one address table).
llvm-svn: 207277
buildbot - do not insert debug intrinsics before phi nodes.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207269
This should reduce the chance of memory leaks like those fixed in
r207240.
There's still some unclear ownership of DIEs happening in DwarfDebug.
Pushing unique_ptr and references through more APIs should help expose
the cases where ownership is a bit fuzzy.
llvm-svn: 207263
Since this doesn't return ownership (the DIE has been added to the
specified parent already) nor return null, just return by reference.
llvm-svn: 207259
This'll make changing to unique_ptr ownership of DIEs easier since the
usages will now have '*' on them making them textually compatible
between unique_ptr and raw pointer.
llvm-svn: 207253
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207235
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207165
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine-intrinsics testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207130
There's only ever one address pool, not one per DWARF output file, so
let's just have one.
(similar refactoring of the string pool to come soon)
llvm-svn: 207026
Some of these types (DwarfDebug in particular) are quite large to begin
with (and I keep forgetting whether DwarfFile is in DwarfDebug or
DwarfUnit... ) so having a few smaller files seems like goodness.
llvm-svn: 207010
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.
Patch by Yuri Gorshenin.
llvm-svn: 206971
This prompted me to push references through most of DwarfDebug. Sorry
for the churn.
Honestly it's a bit silly that we're passing around units all over the
place like that anyway and I think it's mostly due to the DIE attribute
adding utility functions being utilities in DwarfUnit. I should have
another go at moving them out of DwarfUnit...
llvm-svn: 206925
This reverts commit r206780.
This commit was regressing gdb.opt/inline-locals.exp in the GDB 7.5 test
suite. Reverting until I can fix the issue.
llvm-svn: 206867
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.
Other sub-trees will follow.
llvm-svn: 206837
while checking candidate for bit field extract.
Otherwise the value may not fit in uint64_t and this will trigger an
assertion.
This fixes PR19503.
llvm-svn: 206834
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.
This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:
- Header files that need to provide a DEBUG_TYPE for some inline code
can do so by defining the macro before their inline code and undef-ing
it afterward so the macro does not escape.
- We no longer have rampant ODR violations due to including headers with
different DEBUG_TYPE definitions. This may be mostly an academic
violation today, but with modules these types of violations are easy
to check for and potentially very relevant.
Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.
The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.
llvm-svn: 206822
The rationale for this artificial dependency seems to have been lost to the
ravages of time, it is covered by no regression tests, and has no impact on
test-suite performance numbers on either x86 or PPC.
For the test suite, on both x86 and PPC, I ran the test suite 10 times (both as
a baseline and with this change), and found no statistically-significant
changes. For PPC, I used a P7 box. For x86, I used an Intel Xeon E5430. Both
with -O3 -mcpu=native.
This was discussed on-list back in January, but I've not had a chance to run
the performance tests until today.
llvm-svn: 206795
Requires switching some vectors to lists to maintain pointer validity.
These could be changed to forward_lists (singly linked) with a bit more
work - I've left comments to that effect.
llvm-svn: 206780
This reverts commit r206707, reapplying r206704. The preceding commit
to CalcSpillWeights should have sorted out the failing buildbots.
<rdar://problem/14292693>
llvm-svn: 206766
This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite.
I've done a careful audit, added some asserts, and fixed a couple of
bugs (unfortunately, they were in unlikely code paths). There's a small
chance that this will appease the failing bots [1][2]. (If so, great!)
If not, I have a follow-up commit ready that will temporarily add
-debug-only=block-freq to the two failing tests, allowing me to compare
the code path between what the failing bots and what my machines (and
the rest of the bots) are doing. Once I've triggered those builds, I'll
revert both commits so the bots go green again.
[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445
<rdar://problem/14292693>
llvm-svn: 206704
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.
A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.
llvm-svn: 206684
This reverts commit r206666, as planned.
Still stumped on why the bots are failing. Sanitizer bots haven't
turned anything up. If anyone can help me debug either of the failures
(referenced in r206666) I'll owe them a beer. (In the meantime, I'll be
auditing my patch for undefined behaviour.)
llvm-svn: 206677
This reverts commit r206628, reapplying r206622 (and r206626).
Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce
on Darwin, and Chandler can't reproduce on Linux. Asan and valgrind
don't tell us anything, but we're hoping the msan bot will catch it.
So, I'm applying this again to get more feedback from the bots. I'll
leave it in long enough to trigger builds in at least the sanitizer
buildbots (it was failing for reasons unrelated to my commit last time
it was in), and hopefully a few others.... and then I expect to revert a
third time.
[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445
llvm-svn: 206666
This reverts commit r206622 and the MSVC fixup in r206626.
Apparently the remotely failing tests are still failing, despite my
attempt to fix the nondeterminism in r206621.
llvm-svn: 206628
This reverts commit r206556, effectively reapplying commit r206548 and
its fixups in r206549 and r206550.
In an intervening commit I've added target triples to the tests that
were failing remotely [1] (but passing locally). I'm hoping the mystery
is solved? I'll revert this again if the tests are still failing
remotely.
[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
llvm-svn: 206622
Rewrite the shared implementation of BlockFrequencyInfo and
MachineBlockFrequencyInfo entirely.
The old implementation had a fundamental flaw: precision losses from
nested loops (or very wide branches) compounded past loop exits (and
convergence points).
The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue). The old analysis gives non-sensical results:
Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
---- Block Freqs ----
entry = 1.0
for.cond1.preheader = 1.00103
for.cond4.preheader = 5.5222
for.body6 = 18095.19995
for.inc8 = 4.52264
for.inc11 = 0.00109
for.end13 = 0.0
The new analysis gives correct results:
Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
block-frequency-info: nested_loops
- entry: float = 1.0, int = 8
- for.cond1.preheader: float = 4001.0, int = 32007
- for.cond4.preheader: float = 16008001.0, int = 128064007
- for.body6: float = 64048012001.0, int = 512384096007
- for.inc8: float = 16008001.0, int = 128064007
- for.inc11: float = 4001.0, int = 32007
- for.end13: float = 1.0, int = 8
Most importantly, the frequency leaving each loop matches the frequency
entering it.
The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss. I have
unit tests for these types out of tree, but it was decided in the review
to make the classes private to BlockFrequencyInfoImpl, and try to shrink
them (or remove them entirely) in follow-up commits.
The new algorithm should generally have a complexity advantage over the
old. The previous algorithm was quadratic in the worst case. The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear without it.
The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution. Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes. Mass is then distributed through the function, which is
now a DAG. Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.
There are some remaining flaws.
- Irreducible control flow isn't modelled correctly. LoopInfo and
MachineLoopInfo ignore irreducible edges, so this algorithm will
fail to scale accordingly. There's a note in the class
documentation about how to get closer. See also the comments in
test/Analysis/BlockFrequencyInfo/irreducible.ll.
- Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
the 64-bit integer precision used downstream.
- The "bias" calculation proposed on llvmdev is *not* incorporated
here. This will be added in a follow-up commit, once comments from
this review have been handled.
llvm-svn: 206548
Summary:
This prevents the discriminator generation pass from triggering if
the DWARF version being used in the module is prior to 4.
Reviewers: echristo, dblaikie
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3413
llvm-svn: 206507
Previously, SSPBufferSize was assigned the value of the "stack-protector-buffer-size"
attribute after all uses of SSPBufferSize. The effect was that the default
SSPBufferSize was always used during analysis. I moved the check for the
attribute before the analysis; now --param ssp-buffer-size= works correctly again.
Differential Revision: http://reviews.llvm.org/D3349
llvm-svn: 206486
Still only 32-bit ARM using it at this stage, but the promotion allows
direct testing via opt and is a reasonably self-contained patch on the
way to switching ARM64.
At this point, other targets should be able to make use of it without
too much difficulty if they want. (See ARM64 commit coming soon for an
example).
llvm-svn: 206485
This particular DAG combine is designed to kick in when both ConstantFPs will
end up being loaded via a litpool, however those nodes have a semi-legal
status, dictated by isFPImmLegal so in some cases there wouldn't have been a
litpool in the first place. Don't try to be clever in those circumstances.
Picked up while merging some AArch64 tests.
llvm-svn: 206365
handles Intrinsic::trap if TargetOptions::TrapFuncName is set.
This fixes a bug in which the trap function was not taken into consideration
when a program was compiled without optimization (at -O0).
<rdar://problem/16291933>
llvm-svn: 206323
Implement DebugInfoVerifier, which steals verification relying on
DebugInfoFinder from Verifier.
- Adds LegacyDebugInfoVerifierPassPass, a ModulePass which wraps
DebugInfoVerifier. Uses -verify-di command-line flag.
- Change verifyModule() to invoke DebugInfoVerifier as well as
Verifier.
- Add a call to createDebugInfoVerifierPass() wherever there was a
call to createVerifierPass().
This implementation as a module pass should sidestep efficiency issues,
allowing us to turn debug info verification back on.
<rdar://problem/15500563>
llvm-svn: 206300
ARM64 suffered multiple -verify-machineinstr failures (principally over the
xsp/xzr issue) because FastISel was completely ignoring which subset of the
general-purpose registers each instruction required.
More fixes are coming in ARM64 specific FastISel, but this should cover the
generic problems.
llvm-svn: 206283
Got bored, removed some manual memory management.
Pushed references (rather than pointers) through a few APIs rather than
replacing *x with x.get().
llvm-svn: 206222
Thanks to dblaikie for updating the testcase!
Debug info: (bugfix) C++ C/Dtors can be compiled to multiple functions,
therefore, their declaration cannot have one DW_AT_linkage_name.
The specific instances however can and should have that attribute.
This patch reorders the code in DwarfUnit::getOrCreateSubprogramDIE()
to emit linkage names for C/Dtors.
rdar://problem/16362674.
llvm-svn: 206210
BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting
AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for
example, non-power-of-two vector types return non-simple EVTs (not MVT::Other).
llvm-svn: 206150
Nice to be able to just print out the Tag and have the debugger print
dwarf::DW_TAG_subprogram or whatever, rather than an int.
It's a bit finicky (for example DIDescriptor::getTag still returns
unsigned) because some places still handle real dwarf tags + our fake
tags (one day we'll remove the fake tags, hopefully).
llvm-svn: 206098
therefore, their declaration cannot have one DW_AT_linkage_name.
The specific instances however can and should have that attribute.
This patch reorders the code in DwarfUnit::getOrCreateSubprogramDIE()
to emit linkage names for C/Dtors.
rdar://problem/16362674.
llvm-svn: 206096
We had disabled use of TBAA during CodeGen (even when otherwise using AA)
because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to
miss basic type punning that it should catch (and, thus, we'd fail to override
TBAA when we should).
However, when AA is in use during CodeGen, CGP now uses normal GEPs and
bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a
result, BasicAA should be able to make us do the right thing in the face of
type-punning, and it seems safe to enable use of TBAA again. self-hosting seems
fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle.
Note: We still don't update TBAA when merging stack slots, although because
BasicAA should now catch all such cases, this is no longer a blocking issue.
Nevertheless, I plan to commit code to deal with this properly in the near
future.
llvm-svn: 206093
The current memory-instruction optimization logic in CGP, which sinks parts of
the address computation that can be adsorbed by the addressing mode, does this
by explicitly converting the relevant part of the address computation into
IR-level integer operations (making use of ptrtoint and inttoptr). For most
targets this is currently not a problem, but for targets wishing to make use of
IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a
problem for two reasons:
1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr
2. In cases where type-punning was used, and BasicAA was used
to override TBAA, BasicAA may no longer do so. (this had forced us to disable
all use of TBAA in CodeGen; something which we can now enable again)
This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by
default (except for those targets that use AA during CodeGen), and so aside
from some PowerPC subtargets and SystemZ, there should be no change in
behavior. We may be able to switch completely away from the ptrtoint/inttoptr
sinking on all targets, but further testing is required.
I've doubled-up on a number of existing tests that are sensitive to the
address sinking behavior (including some store-merging tests that are
sensitive to the order of the resulting ADD operations at the SDAG level).
llvm-svn: 206092
This is a shared implementation class for BlockFrequencyInfo and
MachineBlockFrequencyInfo, not for BlockFrequency, a related (but
distinct) class.
No functionality change.
<rdar://problem/14292693>
llvm-svn: 206083
-fexhaustive-register-search option to allow an exhaustive search during last
chance recoloring.
This is related to PR18747
Patch by MAYUR PANDEY <mayur.p@samsung.com>.
llvm-svn: 206072
When rematerializing an instruction that defines a super register that would be
used by a physical subregisters we use the related physical super register for
the definition.
To keep the live-range information accurate, all the defined subregisters must be
marked as dead def, otherwise the register allocation may miss some
interferences.
Working on a reduced test-case!
<rdar://problem/16582185>
llvm-svn: 206060
The TargetLowering::expandMUL() helper contains lowering code extracted
from the DAGTypeLegalizer and allows the SelectionDAGLegalizer to expand more
ISD::MUL patterns without having to use a library call.
llvm-svn: 206037
This code has been moved to a new function in the TargetLowering
class called expandMUL(). The purpose of this is to be able
to share lowering code between the SelectionDAGLegalize and
DAGTypeLegalizer classes.
No functionality changed intended.
llvm-svn: 206036
Also updated as many loops as I could find using df_begin/idf_begin -
strangely I found no uses of idf_begin. Is that just used out of tree?
Also a few places couldn't use df_begin because either they used the
member functions of the depth first iterators or had specific ordering
constraints (I added a comment in the latter case).
Based on a patch by Jim Grosbach. (Jim - you just had iterator_range<T>
where you needed iterator_range<idf_iterator<T>>)
llvm-svn: 206016
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.
Patch by Luqman Aden and Alex Crichton!
llvm-svn: 205997
As it turns out the source of the sunkaddr can be a constant, in which case
there is not an instruction to delete, causing the cleanup code introduced in
r204833 to crash. This patch adds a dynamic check to ensure the deleted value is
in fact an instruction and not a constant.
Patch by Louis Gerbarg <lgg@apple.com>
llvm-svn: 205941
FoldConstantArithmetic() only knows how to deal with a few target independent
ISD opcodes. Bail early if it sees a target-specific ISD node. These node do
funny things with operand types which may break the assumptions of the code
that follows, and there's no actual folding that can be done anyway. For example,
non-constant 256 bit vector shifts on X86 have a shift-amount operand that's a
128-bit v4i32 vector regardless of what the first operand type is and that breaks
the assumption that the operand types must match.
rdar://16530923
llvm-svn: 205937
sign/zero/any extensions. However a few places were not checking properly the
property of the load and were turning an indexed load into a regular extended
load. Therefore the indexed value was lost during the process and this was
triggering an assertion.
<rdar://problem/16389332>
llvm-svn: 205923
Summary:
Local common symbols were properly inserted into the .bss section.
However, putting external common symbols in the .bss section would give
them a strong definition.
Instead, encode them as undefined, external symbols who's symbol value
is equivalent to their size.
Reviewers: Bigcheese, rafael, rnk
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3324
llvm-svn: 205811
Until r197284, the entry frequency was constant -- i.e., set to 2^14.
Although current ToT still has a constant entry frequency, since r197284
that has been an implementation detail (which is soon going to change).
- r204690 made the wrong assumption for the CSRCost metric. Adjust
callee-saved register cost based on entry frequency.
- r185393 made the wrong assumption (although it was valid at the
time). Update SpillPlacement.cpp::Threshold to be relative to the
entry frequency.
Since ToT still has 2^14 entry frequency, this should have no observable
functionality change.
<rdar://problem/14292693>
llvm-svn: 205789
Fixes PR16365 - Extremely slow compilation in -O1 and -O2.
The SD scheduler has a quadratic implementation of load clustering
which absolutely blows up compile time for large blocks with constant
pool loads. The MI scheduler has a better implementation of load
clustering. However, we have not done the work yet to completely
eliminate the SD scheduler. Some benchmarks still seem to benefit from
early load clustering, although maybe by chance.
As an intermediate term fix, I just put a nice limit on the number of
DAG users to search before finding a match. With this limit there are no
binary differences in the LLVM test suite, and the PR16365 test case
does not suffer any compile time impact from this routine.
llvm-svn: 205738
When LLVM sees something like (v1iN (vselect v1i1, v1iN, v1iN)) it can
decide that the result is OK (v1i64 is legal on AArch64, for example)
but it still need scalarising because of that v1i1. There was no code
to do this though.
AArch64 and ARM64 have DAG combines to produce efficient code and
prevent that occuring in *most* such situations, but there are edge
cases that they miss. This adds a legalization to cope with that.
llvm-svn: 205626
There were several overlapping problems here, and this solution is
closely inspired by the one adopted in AArch64 in r201381.
Firstly, scalarisation of v1i1 setcc operations simply fails if the
input types are legal. This is fixed in LegalizeVectorTypes.cpp this
time, and allows AArch64 code to be simplified slightly.
Second, vselect with such a setcc feeding into it ends up in
ScalarizeVectorOperand, where it's not handled. I experimented with an
implementation, but found that whatever DAG came out was rather
horrific. I think Hao's DAG combine approach is a good one for
quality, though there are edge cases it won't catch (to be fixed
separately).
Should fix PR19335.
llvm-svn: 205625
recoloring cut-offs are encountered and register allocation failed.
This is related to PR18747
Patch by MAYUR PANDEY <mayur.p@samsung.com>.
llvm-svn: 205601
llc doesn't generate nodes for unconditional fall-through branches for targets
without FastISel implementation (X86 has it, but can be disabled by
"-fast-isel=false") in SelectionDAGBuilder::visitBr().
So for line 4 in the following testcase
1: void foo(int i){
2: switch(i){
3: default:
4: break;
5: }
6: return;
7: }
there is no corresponding line in .debug_line section, and a debugger
cannot set a breakpoint at line 4.
Fix this by always emitting a branch when we're not optimizing and add a
testcase to ensure that there's code on every line we'd want to break.
Patch by Daniil Fukalov.
llvm-svn: 205529
While we were encoding 64 bit values (data8) in the subrange itself,
using a 32 bit type for the subrange was still confusing the gdb. Oh,
and make it unsigned too.
As the comment points out, this could be pushed into the frontend so
that it would be 32 or 64 bit as appropriate, etc.
llvm-svn: 205512
I should have read that comment a little more carefully. ;)
Regression test in the works, committing in the mean time to un-break people.
llvm-svn: 205511
When a vector type legalizes to a larger vector type, and the target does not
support the associated extending load (or truncating store), then legalization
will scalarize the load (or store) resulting in an associated scalarization
cost. BasicTTI::getMemoryOpCost needs to account for this.
Between this, and r205487, PowerPC on the P7 with VSX enabled shows:
MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup
SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup
SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup
(some of these are new; some of these, such as PAQ8p, just reverse regressions
that VSX support would trigger)
llvm-svn: 205495
For an cast (extension, etc.), the currently logic predicts a low cost if the
associated operation (keyed on the destination type) is legal (or promoted).
This is not true when the number of values required to legalize the type is
changing. For example, <8 x i16> being sign extended by <8 x i32> is not
generically cheap on PPC with VSX, even though sign extension to v4i32 is
legal, because two output v4i32 values are required compared to the single
v8i16 input value, and without custom logic in the target, this conversion will
scalarize.
llvm-svn: 205487
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.
llvm-svn: 205453
I'm not sure the comment in the implementation really adds a lot of
value (it's clear that we emit zero when no symbol is provided, but it
doesn't explain why we would do that). Happy to iterate.
llvm-svn: 205386
This removes the magic-number-esque code creating/retrieving the same
label for a debug_loc entry from two places and removes the last small
piece of reusable logic from emitDebugLoc so that there will be less
duplication when refactoring it into two functions (one for debug_loc,
the other for debug_loc.dwo).
llvm-svn: 205382
Seems we didn't have any test coverage for merging... awesome. So I
added some - but hit an llvm-objdump bug while I was there. I'm choosing
not to shave that yak right now.
Code review feedback/bug catch by Adrian Prantl in r205360.
llvm-svn: 205373
No test case (this would invoke UB by examining uninitialized members,
etc, at best - and this code is apparently untested anyway - I'm about
to fix that)
Code review feedback from Adrian Prantl on r205360.
llvm-svn: 205367
This moves one case of raw text checking down into the MCStreamer
interfaces in the form of a virtual function, even if we ultimately end
up consolidating on the one-or-many line tables issue one day, this is
nicer in the interim. This just generally streamlines a bunch of use
cases into a common code path.
llvm-svn: 205287
No other functionality changes, DIBuilder testcase is included in a paired
CFE commit.
This relaxes the assertion in isScopeRef to also accept subclasses of
DIScope.
llvm-svn: 205279
This commit updates the stackmap format to version 1 to indicate the
reorganizaion of several fields. This was done in order to align stackmap
entries to their natural alignment and to minimize padding.
Fixes <rdar://problem/16005902>
llvm-svn: 205254
This adds the ability to expand large (meaning with more than two unique
defined values) BUILD_VECTOR nodes in terms of SCALAR_TO_VECTOR and (legal)
vector shuffles. There is now no limit of the size we are capable of expanding
this way, although we don't currently do this for vectors with many unique
values because of the default implementation of TLI's
shouldExpandBuildVectorWithShuffles function.
There is currently no functional change to any existing targets because the new
capabilities are not used unless some target overrides the TLI
shouldExpandBuildVectorWithShuffles function. As a result, I've not included a
test case for the new functionality in this commit, but regression tests will
(at least) be added soon when I commit support for the PPC QPX vector
instruction set.
The benefit of committing this now is that it makes the
shouldExpandBuildVectorWithShuffles callback, which had to be added for other
reasons regardless, fully functional. I suspect that other targets will
also benefit from tuning the heuristic.
llvm-svn: 205243
There are two general methods for expanding a BUILD_VECTOR node:
1. Use SCALAR_TO_VECTOR on the defined scalar values and then shuffle
them together.
2. Build the vector on the stack and then load it.
Currently, we use a fixed heuristic: If there are only one or two unique
defined values, then we attempt an expansion in terms of SCALAR_TO_VECTOR and
vector shuffles (provided that the required shuffle mask is legal). Otherwise,
always expand via the stack. Even when SCALAR_TO_VECTOR is not legal, this
can still be a good idea depending on what tricks the target can play when
lowering the resulting shuffle. If the target can't do anything special,
however, and if SCALAR_TO_VECTOR is expanded via the stack, this heuristic
leads to sub-optimal code (two stack loads instead of one).
Because only the target knows whether the SCALAR_TO_VECTORs and shuffles for a
build vector of a particular type are likely to be optimial, this adds a new
TLI function: shouldExpandBuildVectorWithShuffles which takes the vector type
and the count of unique defined values. If this function returns true, then
method (1) will be used, subject to the constraint that all of the necessary
shuffles are legal (as determined by isShuffleMaskLegal). If this function
returns false, then method (2) is always used.
This commit does not enhance the current code to support expanding a
build_vector with more than two unique values using shuffles, but I'll commit
an implementation of the more-general case shortly.
llvm-svn: 205230
When the loop vectorizer vectorizes code that uses the loop induction variable,
we often end up with IR like this:
%b1 = insertelement <2 x i32> undef, i32 %v, i32 0
%b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer
%i = add <2 x i32> %b2, <i32 2, i32 3>
If the add in this example is not legal (as is the case on PPC with VSX), it
will be scalarized, and we'll end up with a number of extract_vector_elt nodes
with the vector shuffle as the input operand, and that vector shuffle is fed by
one or more build_vector nodes. By the time that vector operations are
expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by
looking through the vector shuffle (to make sure that no illegal operations are
created), and so the extract_vector_elt -> vector shuffle -> build_vector is
never simplified to an operand of the build vector.
By looking at build_vectors through a shuffle we fix this particular situation,
preventing a vector from being built, only to be deconstructed again (for the
scalarized add) -- an expensive proposition when this all needs to be done via
the stack. We probably want a more comprehensive fix here where we look back
recursively through any shuffles to any build_vectors or scalar_to_vectors,
etc. but that can come later.
llvm-svn: 205179
When expanding EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR using
SelectionDAGLegalize::ExpandExtractFromVectorThroughStack, we store the entire
vector and then load the piece we want. This is fine in isolation, but
generating a new store (and corresponding stack slot) for each extraction ends
up producing code of poor quality. When we scalarize a vector operation (using
SelectionDAG::UnrollVectorOp for example) we generate one EXTRACT_VECTOR_ELT
for each element in the vector. This used to generate one stored copy of the
vector for each element in the vector. Now we search the uses of the vector for
a suitable store before generating a new one, which results in much more
efficient scalarization code.
llvm-svn: 205153
Given IR like:
%bit = and %val, #imm-with-1-bit-set
%tst = icmp %bit, 0
br i1 %tst, label %true, label %false
some targets can emit just a single instruction (tbz/tbnz in the
AArch64 case). However, with ISel acting at the basic-block level, all
three instructions need to be together for this to be possible.
This adds another transformation to CodeGenPrep to expose these
opportunities, if targets opt in via the hook.
llvm-svn: 205086
Construct a uniform Windows target triple nomenclature which is congruent to the
Linux counterpart. The old triples are normalised to the new canonical form.
This cleans up the long-standing issue of odd naming for various Windows
environments.
There are four different environments on Windows:
MSVC: The MS ABI, MSVCRT environment as defined by Microsoft
GNU: The MinGW32/MinGW32-W64 environment which uses MSVCRT and auxiliary libraries
Itanium: The MSVCRT environment + libc++ built with Itanium ABI
Cygnus: The Cygwin environment which uses custom libraries for everything
The following spellings are now written as:
i686-pc-win32 => i686-pc-windows-msvc
i686-pc-mingw32 => i686-pc-windows-gnu
i686-pc-cygwin => i686-pc-windows-cygnus
This should be sufficiently flexible to allow us to target other windows
environments in the future as necessary.
llvm-svn: 204977
This adds back r204781.
Original message:
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
llvm-svn: 204934
In some cases it is possible for CGP to attempt to reuse a base address from
another basic block. In those cases we have to be sure that all the address
math was either done at the same bit width, or that none of it overflowed
before it was extended.
Patch by Louis Gerbarg <lgg@apple.com>
rdar://16307442
llvm-svn: 204833
Implementing the LLVM part of the call to __builtin___clear_cache
which translates into an intrinsic @llvm.clear_cache and is lowered
by each target, either to a call to __clear_cache or nothing at all
incase the caches are unified.
Updating LangRef and adding some tests for the implemented architectures.
Other archs will have to implement the method in case this builtin
has to be compiled for it, since the default behaviour is to bail
unimplemented.
A Clang patch is required for the builtin to be lowered into the
llvm intrinsic. This will be done next.
llvm-svn: 204802
This reverts commit r204781.
I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.
llvm-svn: 204784
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
llvm-svn: 204781
Implement Pass::releaseMemory() in BlockFrequencyInfo and
MachineBlockFrequencyInfo. Just delete the private implementation when
not in use. Switch to a std::unique_ptr to make the logic more clear.
<rdar://problem/14292693>
llvm-svn: 204741
Usually opaque constants shouldn't be folded, unless they are simple unary
operations that don't create new constants. Although this shouldn't drop the
opaque constant flag. This commit fixes this.
Related to <rdar://problem/14774662>
llvm-svn: 204737
If GT/UGT or LT/ULT were set to expand, a comparison
with a constant would replace it with the illegal
cond code.
There are several more places later in this function that
will have the same basic problem.
Theoretically R600 should hit this problem for a test,
but for some reason it doesn't.
llvm-svn: 204727
This is a pretty straight forward translation for COFF, we just need to
stick the data in a COMDAT section marked as
IMAGE_COMDAT_SELECT_NODUPLICATES.
N.B. We must be careful to avoid sticking entities with private linkage
in COMDAT groups. COFF is pretty hostile to the renaming of entities so
we must be careful to disallow GlobalVariables with unstable names.
llvm-svn: 204703
Implement debug_loc.dwo, as well as llvm-dwarfdump support for dumping
this section.
Outlined in the DWARF5 spec and http://gcc.gnu.org/wiki/DebugFission the
debug_loc.dwo section has more variation than the standard debug_loc,
allowing 3 different forms of entry (plus the end of list entry). GCC
seems to, and Clang certainly, only use one form, so I've just
implemented dumping support for that for now.
It wasn't immediately obvious that there was a good refactoring to share
the implementation of dumping support between debug_loc and
debug_loc.dwo, so they're separate for now - ideas welcome or I may come
back to it at some point.
As per a comment in the code, we could choose different forms that may
reduce the number of debug_addr entries we emit, but that will require
further study.
llvm-svn: 204697
This seems excessive - switching section isn't expensive (or if it is
we're already being wasteful, since we emitted the debug_loc section
symbol earlier anyway) and otherwise there's no work that happens in
this function when the list is empty.
llvm-svn: 204696
When register allocator's stage is RS_Spill, we choose spill over using the CSR
for the first time, if the spill cost is lower than CSRCost.
When register allocator's stage is < RS_Split, we choose pre-splitting over
using the CSR for the first time, if the cost of splitting is lower than
CSRCost.
CSRCost is set with command-line option "regalloc-csr-first-time-cost". The
default value is 0 to generate the same codes as before this commit.
With a value of 15 (1 << 14 is the entry frequency), I measured performance
gain of 3% on 253.perlbmk and 1.7% on 197.parser, with instrumented PGO,
on an arm device.
rdar://16162005
llvm-svn: 204690
Factor out two functions calculateRegionSplitCost and doRegionSplit
from tryRegionSplit. These two functions will be used in coming patches.
rdar://16162005
llvm-svn: 204684
No functional change intended.
Merging up-front rather than delaying this task until later. This just
seems simpler and more efficient (avoiding growing the debug loc list
only to have to skip over those post-merged entries, etc).
llvm-svn: 204679
This is used to avoid relocations in the dwo file by allowing
DW_AT_ranges specified in debug_info.dwo to be relative to this base
address. (r204667 implements the base-relative DW_AT_ranges side of
this)
llvm-svn: 204672
This removes the debug_ranges relocations from debug_info.dwo (but
doesn't implement the DW_AT_GNU_ranges_base which is also necessary for
correct functioning)
llvm-svn: 204668
This is a pretty straight forward translation for COFF, we just need to
stick the function in a COMDAT section marked as
IMAGE_COMDAT_SELECT_NODUPLICATES.
llvm-svn: 204565
This patch renames method 'isConstantSplat' as 'getConstantSplatValue'
(mainly for consistency reasons), and rewrites its logic to ensure
that we always perform a legal 'cast<ConstantSDNode>'.
Added test shift-combine-crash.ll to verify that DAGCombiner no longer crashes with an assertion failure in the attempt to simplify a vector shift by a vector of all undef counts.
llvm-svn: 204536
We make sure a spill is not hoisted to a hotter outer loop by adding
a condition. Hoist a spill to outer loop if there are multiple dependents
(it can be beneficial if more than one dependents are hoisted) or
if DepSV (the hoisting source) is hotter than SV (the hoisting destination).
rdar://16268194
llvm-svn: 204522
Type units have no addresses, so there's no need for DW_AT_addr_base.
This removes another relocation from every skeletal type unit and brings
LLVM's skeletal type units in line with GCC's (containing only
GNU_dwo_name (strp), comp_dir (strp), and GNU_pubnames (flag_present)).
Cary's got some ideas about using str_index in the .o file to reduce
those last two relocations (well, replace two relocations with one
relocation (pointing to the string index) and two indicies)
llvm-svn: 204506
This option caused LowerInvoke to generate code using SJLJ-based
exception handling, but there is no code left that interprets the
jmp_buf stack that the resulting code maintained (llvm.sjljeh.jblist).
This option has been obsolete for a while, and replaced by
SjLjEHPrepare.
This leaves the default behaviour of LowerInvoke, which is to convert
invokes to calls.
Differential Revision: http://llvm-reviews.chandlerc.com/D3136
llvm-svn: 204388
Use the range machinery for DW_AT_ranges and DW_AT_high/lo_pc.
This commit moves us from a single range per subprogram to extending
ranges if we are:
a) In the same section, and
b) In the same enclosing CU.
This means we have more fine grained ranges for compile units, and fewer
ranges overall when we have multiple functions in the same CU
adjacent to each other in the object file.
Also remove all of the earlier hacks around this functionality for
function sections etc. Also update all of the testcases to take into
account the merging functionality.
with a fix for location entries in the debug_loc section:
Make sure that debug loc entries are relative to the low_pc
of the compile unit. This means that when we only have a single
range that the offset should be just relative to the low_pc
of the unit, for multiple ranges for a CU this means that we'll be
relative to 0 which we emit along with DW_AT_ranges.
This mostly shows up with linked binaries, so add a testcase with
multiple CUs so that our location is going to be offset of a CU
with a non-zero low_pc.
llvm-svn: 204377
This commit moves us from a single range per subprogram to extending
ranges if we are:
a) In the same section, and
b) In the same enclosing CU.
This means we have more fine grained ranges for compile units, and fewer
ranges overall when we have multiple functions in the same CU
adjacent to each other in the object file.
Also remove all of the earlier hacks around this functionality for
function sections etc. Also update all of the testcases to take into
account the merging functionality.
llvm-svn: 204277
This isn't a complete fix - it falls back to non-comp_dir when multiple
compile units are in play. Adding a map of comp_dir to table is part of
the more general solution, but I gave up (in the short term) when I
realized I'd also have to calculate the size of each type unit so as to
produce correct DW_AT_stmt_list attributes.
llvm-svn: 204202
When deployment target version information is available, emit it to the
target streamer for inclusion in the object file.
rdar://11337778
llvm-svn: 204191
Summary:
SLP Vectorization of intrinsics (r203707) has exposed cases where the
expansion of vector bswap is failing (PR19151).
Reviewers: hfinkel
CC: chandlerc
Differential Revision: http://llvm-reviews.chandlerc.com/D3104
llvm-svn: 204163
This allows us to catch more opportunities for ODR-based type uniquing
during LTO.
Paired commit with CFE which updates some testcases to verify the new
DIBuilder behavior.
llvm-svn: 204106
This removes an attribute (and more importantly, a relocation) from
skeleton type units and removes some unnecessary file names from the
debug_line section that remains in the .o (and linked executable) file.
There's still a few places we could shave off some more space here:
* use compilation dir of the underlying compilation unit (since all the
type units share that compilation dir - though this would be more
complicated in LTO cases where they don't (keep a map of compilation
dir->line table header?))
* Remove some of the unnecessary header fields from the line table since
they're not needed in this situation (about 12 bytes per table).
llvm-svn: 204099
When emitting assembly there's no support for emitting separate line
tables for each compilation unit - so LLVM emits .loc directives
producing a single line table.
Line tables have an implicit directory (index 0) equal to the
compilation directory (DW_AT_comp_dir) of the compilation unit that
references them.
If multiple compilation units (with possibly disparate compilation
directories) reference the same line table, we must avoid relying on
this ambiguous directory.
Achieve this my simply not setting the compilation directory on the line
table when we're in this situation (multiple units while emitting
assembly).
llvm-svn: 204094
We still do a few lookups into the line table mapping in MCContext that
could be factored out into a single lookup (rather than looking it up
once for the table label, once to set the compilation unit, once for
each time we need a file ID, etc... ) but assembly output complicates
that somewhat as we still need a virtual dispatch back to the
MCAsmStreamer in that case.
llvm-svn: 204092
Our handling of compilation directory in DwarfDebug was broken
(incorrectly using the 'last' compilation directory (that of the last
CU in the metadata list) for all function emission in any CU). By moving
this handling down into MCDwarf the issue is fixed as the compilation
dir is tracked correctly per line table.
llvm-svn: 204089
See r204027 for the precursor to this that applied to asm debug info.
This required some non-obvious API changes to handle the case of asm
output (we never go asm->asm so this didn't come up in r204027): the
modification of the file/directory name by MCDwarfLineTableHeader needed
to be reflected in the MCAsmStreamer caller so it could print the
appropriate .file directive, so those StringRef parameters are now
non-const ref (in/out) parameters rather than just const.
llvm-svn: 204069
Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get
promoted to fp_to_sint v8f32->v8i32. This is a legal operation on AVX.
For that to work properly, we also need to teach the legalizer about the
specific promotion required here. The default vector promotion uses
bitcasting to a vector type of the same total size. We want to promote the
vector element type, effectively widening the operation and then truncating
the result. This is analogous to the current logic of how int_to_fp is
promoted.
The change also factors out some code from the int_to_fp promotion code to
ValueType::widenIntegerVectorElementType. This is now shared between
int_to_fp and fp_to_int.
There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in
X86. It can now go through the new target-independent fp_to_*int promotion
logic.
I also checked that no other target uses Promote for these ops yet, so there
shouldn't be any unexpected change in behavior.
Fixes <rdar://problem/16202247>
llvm-svn: 204058
- Adds support for inserting vzerouppers before tail-calls.
This is enabled implicitly by having MachineInstr::copyImplicitOps preserve
regmask operands, which allows VZeroUpperInserter to see where tail-calls use
vector registers.
- Fixes a bug that caused the previous version of this optimization to miss some
vzeroupper insertion points in loops. (Loops-with-vector-code that followed
loops-without-vector-code were mistakenly overlooked by the previous version).
- New algorithm never revisits instructions.
Fixes <rdar://problem/16228798>
llvm-svn: 204021
based on the ODR.
This adds an OdrMemberMap to DwarfDebug which is used to unique C++
member function declarations based on the unique identifier of their
containing class and their mangled name.
We can't use the usual DIRef mechanism here because DIScopes are indexed
using their entire MDNode, including decl_file and decl_line, which need
not be unique (see testcase).
Prior to this change multiple redundant member function declarations would
end up in the same uniqued DW_TAG_class_type.
llvm-svn: 203982
any lexical scopes then go ahead and turn on DW_AT_ranges for the
compile unit since we would be claiming to describe in the CU
a range for which we don't have information in the CU otherwise.
llvm-svn: 203969
issue in that the new MachineRegisterInfo bundle iterators didn't
dereference to the START of the bundle, while the old skipBundle()
method did.
llvm-svn: 203890
These linkages were introduced some time ago, but it was never very
clear what exactly their semantics were or what they should be used
for. Some investigation found these uses:
* utf-16 strings in clang.
* non-unnamed_addr strings produced by the sanitizers.
It turns out they were just working around a more fundamental problem.
For some sections a MachO linker needs a symbol in order to split the
section into atoms, and llvm had no idea that was the case. I fixed
that in r201700 and it is now safe to use the private linkage. When
the object ends up in a section that requires symbols, llvm will use a
'l' prefix instead of a 'L' prefix and things just work.
With that, these linkages were already dead, but there was a potential
future user in the objc metadata information. I am still looking at
CGObjcMac.cpp, but at this point I am convinced that linker_private
and linker_private_weak are not what they need.
The objc uses are currently split in
* Regular symbols (no '\01' prefix). LLVM already directly provides
whatever semantics they need.
* Uses of a private name (start with "\01L" or "\01l") and private
linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm
agrees with clang on L being ok or not for a given section. I have two
patches in code review for this.
* Uses of private name and weak linkage.
The last case is the one that one could think would fit one of these
linkages. That is not the case. The semantics are
* the linker will merge these symbol by *name*.
* the linker will hide them in the final DSO.
Given that the merging is done by name, any of the private (or
internal) linkages would be a bad match. They allow llvm to rename the
symbols, and that is really not what we want. From the llvm point of
view, these objects should really be (linkonce|weak)(_odr)?.
For now, just keeping the "\01l" prefix is probably the best for these
symbols. If we one day want to have a more direct support in llvm,
IMHO what we should add is not a linkage, it is just a hidden_symbol
attribute. It would be applicable to multiple linkages. For example,
on weak it would produce the current behavior we have for objc
metadata. On internal, it would be equivalent to private (and we
should then remove private).
llvm-svn: 203866
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&. At this point they almost behave like normal iterators!
Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.
llvm-svn: 203865
This patch fixes the bug in peephole optimization that folds a load which defines one vreg into the one and only use of that vreg. With debug info, a DBG_VALUE that referenced the vreg considered to be a use, preventing the optimization. The fix is to ignore DBG_VALUE's during the optimization, and undef a DBG_VALUE that references a vreg that gets removed.
Patch by Trevor Smigiel!
llvm-svn: 203829
Summary:
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.
This is an update of D2973 which was reverted because of a bug reported
as PR19084.
Reviewers: t.p.northover, chapuni
Reviewed By: t.p.northover
CC: llvm-commits, alex, chapuni
Differential Revision: http://llvm-reviews.chandlerc.com/D3021
llvm-svn: 203797
for use with C++11 range-based for-loops.
The gist of phase 1 is to remove the skipInstruction() and skipBundle()
methods from these iterators, instead splitting each iterator into a version
that walks operands, a version that walks instructions, and a version that
walks bundles. This has the result of making some "clever" loops in lib/CodeGen
more verbose, but also makes their iterator invalidation characteristics much
more obvious to the casual reader. (Making them concise again in the future is a
good motivating case for a pre-incrementing range adapter!)
Phase 2 of this undertaking with consist of removing the getOperand() method,
and changing operator*() of the operand-walker to return a MachineOperand&. At
that point, it should be possible to add range views for them that work as one
might expect.
llvm-svn: 203757
On ELF and COFF an alias is just another name for a position in the file.
There is no way to refer to a position in another file, so an alias to
undefined is meaningless.
MachO currently doesn't support aliases. The spec has a N_INDR, which when
implemented will have a different set of restrictions. Adding support for
it shouldn't be harder than any other IR extension.
For now, having the IR represent what is actually possible with current
tools makes it easier to fix the design of GlobalAlias.
llvm-svn: 203705
I could fold the callers into their one call site, but the indirection
(given how verbose choosing the section is) seemed helpful.
The use of a member function pointer's a bit "tricky", but seems limited
enough, the call sites are simple/clean/clear, and there's only one use.
llvm-svn: 203619
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
llvm-svn: 203559
the legalization cost must be included to get an accurate
estimation of the total cost of the scalarized vector.
The inaccurate cost triggered unprofitable SLP vectorization on
32-bit X86.
Summary:
Include legalization overhead when computing scalarization cost
Reviewers: hfinkel, nadav
CC: chandlerc, rnk, llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2992
llvm-svn: 203509
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.
This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.
llvm-svn: 203437
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
detail
2) Change it to actually be a *Use* iterator rather than a *User*
iterator.
3) Add an adaptor which is a User iterator that always looks through the
Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
they wanted a use_iterator (and to explicitly dig out the User when
needed), or a user_iterator which makes the Use itself totally
opaque.
Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.
The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.
However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]
llvm-svn: 203364
This is already done for shifts. Allow it for rotations as well. E.g.:
(rotl:i32 x, (trunc (and y, 31))) -> (rotl:i32 x, (and (trunc y), 31))
Use the newly factored-out distributeTruncateThroughAnd.
With this patch and some X86.td tweaks we should be able to remove redundant
masking of the rotation amount like in the example above. HW implicitly
performs this masking.
The testcase will be added as part of the X86 patch.
llvm-svn: 203316
This is the new idiom:
x<<(y&31) | x>>((0-y)&31)
which is recognized as:
x ROTL (y&31)
The change refines matchRotateSub. In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.
llvm-svn: 203315
Slightly change the wording in the function comment. Originally, it can be
misunderstood as we turned the input into two subsequent rotates.
Better connect the comment which talks about Mask and the code which used
LoBits. Renamed variable to MaskLoBits.
llvm-svn: 203314
be split and the result type widened.
When the condition of a vselect has to be split it makes no sense widening the
vselect and thereby widening the condition. We end up in an endless loop of
widening (vselect result type) and splitting (condition mask type) doing this.
Instead, split both the condition and the vselect and widen the result.
I ran this over the test suite with i686 and mattr=+sse and saw no regressions.
Fixes PR18036.
llvm-svn: 203311
First: refactor out the emission of entries into the .debug_loc section
into its own routine.
Second: add a new class ByteStreamer that can be used to either emit
using an AsmPrinter or hash using DIEHash the series of bytes that
would be emitted. Use this in all of the location emission routines
for the .debug_loc section.
No functional change intended outside of a few additional comments
in verbose assembly.
llvm-svn: 203304
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.
Patch by Manuel Jacob.
llvm-svn: 203230
Summary:
llvm/MC/MCSectionMachO.h and llvm/Support/MachO.h both had the same
definitions for the section flags. Instead, grab the definitions out of
support.
No functionality change.
Reviewers: grosbach, Bigcheese, rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2998
llvm-svn: 203211
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.
The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.
The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.
I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.
The net result is that we don't create temporary labels that are never used.
llvm-svn: 203204
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.
The rules are:
1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)
The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.
Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.
llvm-svn: 203156
This compiles with no changes to clang/lld/lldb with MSVC and includes
overloads to various functions which are used by those projects and llvm
which have OwningPtr's as parameters. This should allow out of tree
projects some time to move. There are also no changes to libs/Target,
which should help out of tree targets have time to move, if necessary.
llvm-svn: 203083
This is consistent with GDB ToT and reduces the number of relocations in
(type and compile) units, substantially reducing relocations and debug
size in fission + type units builds.
llvm-svn: 203082
pointed to by the attribute, rather than the form as a first
step to determining how to hash the values. No functional change
intended.
llvm-svn: 203044
This works by moving the existing code into the DIEValue hierarchy
and using the DwarfDebug pointer off of the AsmPrinter to access
any global information we need.
llvm-svn: 203033
This enables us to figure out where in the debug_loc section our
locations are so that we can eventually hash them. It also helps
remove some special case code in emission. No functional change.
llvm-svn: 203018
Before llvm-mc would print it, but llc was assuming that it would produce
another section changing directive before one was needed. That assumption is
false with inline asm.
Fixes PR19049.
Another option would be to always create the section, but in the asm printer
avoid printing sections changes during initialization. That would work, but
* We do use the fact that llvm-mc prints it in testing. The tests can be changed
if needed.
* A quick poll on IRC suggest that most developers prefer the implicit .text to
be printed.
llvm-svn: 203001
already lived there and it is where it belongs -- this is the in-memory
debug location representation.
This is just cleanup -- Modules can actually cope with this, but that
doesn't make it right. After chatting with folks that have out-of-tree
stuff, going ahead and moving the rest of the headers seems preferable.
llvm-svn: 202960
Patchpoints already did this. Doing it for stackmaps is a convenience
for the runtime in the event that it needs to scratch register to
patch or perform a runtime call thunk.
Unlike patchpoints, we just assume the AnyRegCC calling
convention. This is the only language and target independent calling
convention specific to stackmaps so makes sense. Although the calling
convention is not currently used to select the scratch registers.
llvm-svn: 202943
selection dag (PR19012)
In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).
The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.
This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.
Differential Revision: http://llvm-reviews.chandlerc.com/D2954
llvm-svn: 202930