Commit Graph

18635 Commits

Author SHA1 Message Date
David Majnemer d5ab35f265 X86: Call __main using the SelectionDAG
Synthesizing a call directly using the MI layer would confuse the frame
lowering code.  This is problematic as frame lowering is highly
sensitive the particularities of calls, etc.

llvm-svn: 230129
2015-02-21 05:49:45 +00:00
Matthias Braun 876e7172ee LiveRangeCalc: Don't start liveranges of PHI instruction at the block begin.
Summary:
Letting them begin at the PHI instruction slightly simplifies the code
but more importantly avoids breaking the assumption that live ranges
starting at the block begin are also live at the end of the predecessor
blocks. The MachineVerifier checks that but was apparently never run in
the few instances where liveranges are calculated for machine-SSA
functions.

Reviewers: qcolombet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7779

llvm-svn: 230093
2015-02-20 23:43:14 +00:00
Rafael Espindola 9075f77064 Use short names for jumptable sections.
Also refactor code to remove some duplication.

llvm-svn: 230087
2015-02-20 23:28:28 +00:00
Eric Christopher d4e723f2cf Used the cached subtarget off of the MachineFunction.
llvm-svn: 230078
2015-02-20 22:36:11 +00:00
Matt Arsenault 0dc54c4dee Add generic fmad DAG node.
This allows sharing of FMA forming combines to work
with instructions that have the same semantics as a separate
multiply and add.

This is expand by default, and only formed post legalization
so it shouldn't have much impact on targets that do not want it.

llvm-svn: 230070
2015-02-20 22:10:33 +00:00
Eric Christopher 9ecaa174d6 Grab the DataLayout off of the TargetMachine since that's where
it's stored.

llvm-svn: 230059
2015-02-20 20:56:39 +00:00
Eric Christopher f734a8bae7 Get the function specific subtarget.
llvm-svn: 230038
2015-02-20 18:44:17 +00:00
Eric Christopher 1df0c519fc Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

llvm-svn: 230037
2015-02-20 18:44:15 +00:00
Igor Laevsky 7fc58a4ad8 Generalize statepoint lowering to use ImmutableStatepoint. Move statepoint lowering into a separate function 'LowerStatepoint' which uses ImmutableStatepoint instead of a CallInst. Also related utility functions are changed to receive ImmutableCallSite.
Differential Revision: http://reviews.llvm.org/D7756 

llvm-svn: 230017
2015-02-20 15:28:35 +00:00
Nick Lewycky b73c041005 Fix build with gcc. This has a -Wsequence-point error on 'MII', which is a good point.
llvm-svn: 229979
2015-02-20 07:17:40 +00:00
Eric Christopher a7249ec1a7 Remove more uses of TargetMachine::getSubtargetImpl from the
AsmPrinter.

getSubtargetInfo now asserts that the MachineFunction exists.
Debug printing of register naming now uses the register info
from MCAsmInfo as that's unchanging.

llvm-svn: 229978
2015-02-20 07:16:19 +00:00
Eric Christopher 78a3f6cc4d AsmPrinter::doFinalization is at the module level and so doesn't
have access to a target specific subtarget info. Grab the module
level MCSubtargetInfo for the JumpInstrTable output stubs.

llvm-svn: 229974
2015-02-20 06:59:48 +00:00
Eric Christopher 97ea7622b5 Remove the MCInstrInfo cached variable as it was only used in a
single place and replace calls to getSubtargetImpl with calls
to get the subtarget from the MachineFunction where valid.

llvm-svn: 229971
2015-02-20 06:35:21 +00:00
Chandler Carruth 301ed0c3b4 Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
This doesn't pass 'ninja check-llvm' for me. Lots of tests, including
the ones updated, fail with crashes and other explosions.

llvm-svn: 229952
2015-02-20 02:15:36 +00:00
Reid Kleckner 0b647e6cca EH: Prune unreachable resume instructions during Dwarf EH preparation
Today a simple function that only catches exceptions and doesn't run
destructor cleanups ends up containing a dead call to _Unwind_Resume
(PR20300). We can't remove these dead resume instructions during normal
optimization because inlining might introduce additional landingpads
that do have cleanups to run. Instead we can do this during EH
preparation, which is guaranteed to run after inlining.

Fixes PR20300.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7744

llvm-svn: 229944
2015-02-20 01:00:19 +00:00
Eric Christopher cd37bf5483 This needs to be a const variable so the two sides of the ternary
operator agree on type.

llvm-svn: 229938
2015-02-20 00:03:45 +00:00
Eric Christopher 2105ae98f6 Only use the initialized MCInstrInfo if it's been initialized already
during SetupMachineFunction. This is also the single use of MII
and it'll be changing to TargetInstrInfo (which is MachineFunction
based) in the next commit here.

llvm-svn: 229931
2015-02-19 23:52:35 +00:00
Eric Christopher 7330264146 Migrate away a use of the subtarget (and TargetMachine) from
AsmPrinterDwarf since the information is on the MCRegisterInfo
via the MCContext and MMI that we already have on the AsmPrinter.

llvm-svn: 229928
2015-02-19 23:29:42 +00:00
Ahmed Bougacha 4c2b0781a5 [CodeGen] Use ArrayRef instead of std::vector&. NFC.
The former lets us use SmallVectors.  Do so in ARM and AArch64.

llvm-svn: 229925
2015-02-19 23:13:10 +00:00
Eric Christopher cbdbf39881 MCTargetOptions reside on the TargetMachine that we always have via
TargetOptions.

llvm-svn: 229917
2015-02-19 21:29:51 +00:00
Eric Christopher 457864178f Remove a call to TargetMachine::getSubtarget from the inline
asm support in the asm printer. If we can get a subtarget from
the machine function then we should do so, otherwise we can
go ahead and create a default one since we're at the module
level.

llvm-svn: 229916
2015-02-19 21:24:23 +00:00
Eric Christopher 64d35be6d6 Remove unused argument from emitInlineAsmStart.
llvm-svn: 229907
2015-02-19 19:52:25 +00:00
Eric Christopher 504f388a84 Update and remove a few calls to TargetMachine::getSubtargetImpl
out of the asm printer.

llvm-svn: 229883
2015-02-19 18:46:23 +00:00
Benjamin Kramer ea68a944a1 Demote vectors to arrays. No functionality change.
llvm-svn: 229861
2015-02-19 15:26:17 +00:00
Chandler Carruth b89464a9b6 [x86,sdag] Two interrelated changes to the x86 and sdag code.
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.

However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.

Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).

This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!

There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.

llvm-svn: 229835
2015-02-19 10:36:19 +00:00
Reid Kleckner 7bb0738d82 Add an IR-to-IR test for dwarf EH preparation using opt
This tests the simple resume instruction elimination logic that we have
before making some changes to it.

llvm-svn: 229768
2015-02-18 23:17:41 +00:00
Andrew Kaylor 179543bb9b Style and formatting fixes for r229715
llvm-svn: 229758
2015-02-18 22:52:18 +00:00
Reid Kleckner 4dd0304e34 dos2unix the WinEH file and tests
llvm-svn: 229735
2015-02-18 19:52:46 +00:00
David Blaikie 30f2f3fc98 Remove unused member variables (-Wunused-private-field)
llvm-svn: 229722
2015-02-18 18:52:49 +00:00
Andrew Kaylor 527c5dc68d Adding implementation to outline C++ catch handlers for native Windows 64 exception handling.
Differential Revision: http://reviews.llvm.org/D7363

llvm-svn: 229715
2015-02-18 18:31:51 +00:00
Michael Kuperstein af9befa6b7 Fixes two issue in SimplifyDemandedBits of sext_in_reg:
1) We should not try to simplify if the sext has multiple uses
2) There is no need to simplify is the source value is already sign-extended.

Patch by Gil Rapaport <gil.rapaport@intel.com>

Differential Revision: http://reviews.llvm.org/D6949

llvm-svn: 229659
2015-02-18 09:43:40 +00:00
Daniel Jasper ed9eb7209e NFC: Use range-based for loops and more consistent naming.
No functional changes intended.

(I plan on doing some modifications to this function and would like to
have as few unrelated changes as possible in the patch)

llvm-svn: 229649
2015-02-18 08:19:16 +00:00
Daniel Jasper 4d7b04384e Remove experimental options to control machine block placement.
This reverts r226034. Benchmarking with those flags has not revealed
anything interesting.

llvm-svn: 229648
2015-02-18 08:18:07 +00:00
Matthias Braun 11042c8523 LiveRangeCalc: Rename some parameters from kill to use, NFC.
Those parameters did not necessarily describe kill points but just uses.

llvm-svn: 229601
2015-02-18 01:50:52 +00:00
Rafael Espindola 2d7ec9a860 Twines should be passed by const ref.
llvm-svn: 229590
2015-02-17 23:44:22 +00:00
Rafael Espindola df19519800 Add r228939 back with a fix.
The problem in the original patch was not switching back to .text after printing
an eh table.

Original message:

On ELF, put PIC jump tables in a non executable section.

Fixes PR22558.

llvm-svn: 229586
2015-02-17 23:34:51 +00:00
Duncan P. N. Exon Smith 57bab0bc97 AsmPrinter: Take range in DwarfExpression::AddExpression(), NFC
Previously `DwarfExpression::AddExpression()` relied on
default-constructing the end iterators for `DIExpression` -- once the
operands are represented explicitly via `MDExpression` (instead of via
the strange `StringRef` navigator in `DIHeaderIterator`) this won't
work.  Explicitly take an iterator for the end of the range.

llvm-svn: 229572
2015-02-17 22:30:56 +00:00
Rafael Espindola 68fa249cb5 Add r228980 back.
Add support for having multiple sections with the same name and comdat.

Using this in combination with -ffunction-sections allows LLVM to output a .o
file with mulitple sections named .text. This saves space by avoiding long
unique names of the form .text.<C++ mangled name>.

llvm-svn: 229541
2015-02-17 20:48:01 +00:00
Eric Christopher a49d68e078 Make the ARM AsmPrinter independent of global subtarget
initialization. Initialize the subtarget once per function and
migrate Emit{Start|End}OfAsmFile to either use attributes on the
TargetMachine or get information from the subtarget we'd use
for assembling. One bit (getISAEncoding) touched the general
AsmPrinter and the debug output. Handle this one by passing
the function for the subprogram down and updating all callers
and users.

The top-level-ness of the ARM attribute output for assembly is,
by nature, contrary to how we'd want to do this for an LTO
situation where we have multiple cpu architectures so this
solution is good enough for now.

llvm-svn: 229528
2015-02-17 20:02:32 +00:00
Eric Christopher ffc5ff32d1 80-column fixups.
llvm-svn: 229527
2015-02-17 20:02:28 +00:00
Sanjay Patel ab7e86e5be Canonicalize splats as build_vectors (PR22283)
This is a follow-on patch to:
http://reviews.llvm.org/D7093

That patch canonicalized constant splats as build_vectors, 
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.

This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283

The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...

This improves an existing x86 AVX test and changes codegen in an ARM test.

Differential Revision: http://reviews.llvm.org/D7389

llvm-svn: 229511
2015-02-17 16:54:32 +00:00
Benjamin Kramer 6cd780ff21 Prefer SmallVector::append/insert over push_back loops.
Same functionality, but hoists the vector growth out of the loop.

llvm-svn: 229500
2015-02-17 15:29:18 +00:00
Duncan P. N. Exon Smith 752d6df22d AsmPrinter: Use DIExpression default constructor, NFC
llvm-svn: 229464
2015-02-17 02:42:45 +00:00
Duncan P. N. Exon Smith b474937929 AsmPrinter: Stop creating DebugLocs
While looking at a heap profile of a clang LTO bootstrap with -g, I
noticed that 2.2% of memory in an `llvm-lto` of clang is from calling
`DebugLoc::get()` in `collectVariableInfo()` (accounting for ~40% of
memory used for `MDLocation`s).

I suspect this was introduced by r226736, whose goal was to prevent
uniquing of `DebugLoc`s (goal achieved, if so).

There's no reason we need a `DebugLoc` here at all -- it was just being
used for (in)convenient API -- so the fix is to pass the scope and
inlined-at directly to `LexicalScopes::findInlinedScope()`.

llvm-svn: 229459
2015-02-17 00:02:27 +00:00
Matthias Braun 15635c5f85 RegisterCoalescer: Don't rematerialize subregister definitions.
We cannot simply rematerialize instructions which only defining a
subregister, as the final value also depends on the previous
instructions.

This fixes test/CodeGen/R600/subreg-coalescer-bug.ll with subreg
liveness enabled.

llvm-svn: 229444
2015-02-16 22:05:17 +00:00
Matthias Braun 1b901a4435 RegisterCoalescer: Do not look for regclass of IMPLICIT_DEF.
IMPLICIT_DEF is a generic instruction and has no (fixed) output register
class defined. The rematerialization code of the register coalescer
should not scan the instruction description for a register class.

This fixes a problem showing up in
test/CodeGen/R600/subreg-coalescer-crash.ll with subregister liveness
enabled.

llvm-svn: 229443
2015-02-16 22:05:12 +00:00
Mehdi Amini 3e0023b8f6 SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here too
Update SPARC tests to match.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229438
2015-02-16 21:47:58 +00:00
Matthias Braun 2eab0e30e1 RegisterCoalescer: Improve previous fix for wrong def after.
The previous fix in r225503 was needlessly complicated. The problem goes
away as well if the arguments to MergeValueNumberInto are supplied in the
correct order.
This was previously missed because the existing code already had the
wrong order but an additional later Merge was hiding the bug for the
main liverange VNI.

llvm-svn: 229424
2015-02-16 19:34:27 +00:00
Aaron Ballman f17583ee9c MSVC 2013 supports std::forward_as_tuple, while MSVC 2012 did not; so we can move to using the improved API.
llvm-svn: 229414
2015-02-16 18:21:19 +00:00
Andrew Trick 05938a5481 AArch64: Safely handle the incoming sret call argument.
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.

Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.

Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering

llvm-svn: 229413
2015-02-16 18:10:47 +00:00
Michael Kuperstein fc3e626752 Fix quoting of #pragma comment for MS compat, LLVM part.
For #pragma comment(linker, ...) MSVC expects the comment string to be quoted, but for #pragma comment(lib, ...) the compiler itself quotes the library name.
Since this distinction disappears by the time the directive reaches the backend, move quoting for the "lib" version to the frontend.

Differential Revision: http://reviews.llvm.org/D7652

llvm-svn: 229375
2015-02-16 11:57:17 +00:00
Aaron Ballman f9a1897c72 Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition.
llvm-svn: 229340
2015-02-15 22:54:22 +00:00
Chandler Carruth 1b5285dd57 [SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats
directly into blends of the splats.

These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.

Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.

llvm-svn: 229308
2015-02-15 12:18:12 +00:00
Chandler Carruth 499d7332c5 [x86] Fix PR22377, a regression with the new vector shuffle legality
test.

This was just a matter of the DAG combine for vector shuffles being too
aggressive. This is a bit of a grey area, but I think generally if we
can re-use intermediate shuffles, we should. Certainly, given the test
cases I have available, this seems like the right call.

llvm-svn: 229285
2015-02-15 07:01:10 +00:00
Duncan P. N. Exon Smith 70eb9c5ae5 CodeGen: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

Also, add `Function::getFnStackAlignment()`, and canonicalize:

getAttributes().getStackAlignment(AttributeSet::FunctionIndex)
  => getFnStackAlignment()

llvm-svn: 229208
2015-02-14 01:44:41 +00:00
Matthias Braun 33cc10724d Revert "On ELF, put PIC jump tables in a non executable section."
This reverts commit r228939.

The commit broke something in the output of exception handling tables on
darwin x86-64.

llvm-svn: 229203
2015-02-14 01:16:54 +00:00
Reid Kleckner 2d5fb68ee0 Unify the two EH personality classification routines I wrote
We only need one.

llvm-svn: 229193
2015-02-14 00:21:02 +00:00
Andrea Di Biagio b14ae8692d [CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to speculate calls to cttz/ctlz.
SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are
'cheap' for the target. Therefore, some of the logic in CodeGenPrepare
that was originally added at revision 224899 can now be removed.

This patch is basically a no functional change. It removes the duplicated
logic in CodeGenPrepare and converts all the existing target specific tests
for cttz/ctlz into SimplifyCFG tests.

Differential Revision: http://reviews.llvm.org/D7608

llvm-svn: 229105
2015-02-13 14:15:48 +00:00
Arnaud A. de Grandmaison a7c90d8487 [PBQP] Conservativelly allocatable nodes can be spilled and give a better solution
Although such nodes are allocatable, the cost of spilling may be less than
allocating to register, so spilling the node may provide a better solution.
The assert does not account for this case, so remove it for now.

llvm-svn: 229103
2015-02-13 12:04:42 +00:00
Chandler Carruth 30d69c2e36 [PM] Remove the old 'PassManager.h' header file at the top level of
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.

This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.

The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".

llvm-svn: 229094
2015-02-13 10:01:29 +00:00
Chandler Carruth 71f308adb7 Re-sort #include lines using my handy dandy ./utils/sort_includes.py
script. This is in preparation for changes to lots of include lines.

llvm-svn: 229088
2015-02-13 09:09:03 +00:00
Chandler Carruth d99f427e31 Revert a series of commits starting at r228886 which is triggering some
regressions for LLDB on Linux. Rafael indicated on lldb-dev that we
should just go ahead and revert these but that he wasn't at a computer.
The patches backed out are as follows:

r228980: Add support for having multiple sections with the name and ...
r228889: Invert the section relocation map.
r228888: Use the existing SymbolTableIndex intsead of doing a lookup.
r228886: Create the Section -> Rel Section map when it is first needed.

These patches look pretty nice to me, so hoping its not too hard to get
them re-instated. =D

llvm-svn: 229080
2015-02-13 07:52:39 +00:00
Rafael Espindola b6a812ebb1 Add support for having multiple sections with the same name and comdat.
Using this in combination with -ffunction-sections allows LLVM to output a .o
file with mulitple sections named .text. This saves space by avoiding long
unique names of the form .text.<C++ mangled name>.

llvm-svn: 228980
2015-02-12 23:29:51 +00:00
Hal Finkel 271e9f2870 [SDAG] Don't try to use FP_EXTEND/FP_ROUND for int<->fp promotions
The PowerPC backend has long promoted some floating-point vector operations
(such as select) to integer vector operations. Unfortunately, this behavior was
broken by r216555. When using FP_EXTEND/FP_ROUND for promotions, we must check
that both the old and new types are floating-point types. Otherwise, we must
use BITCAST as we did prior to r216555 for everything.

llvm-svn: 228969
2015-02-12 22:43:52 +00:00
Rafael Espindola 203c5b9f39 On ELF, put PIC jump tables in a non executable section.
Fixes PR22558.

llvm-svn: 228939
2015-02-12 17:46:49 +00:00
Rafael Espindola 29786d4c16 Put each jump table in an independent section if the function is too.
This allows the linker to GC both, fixing pr22557.

llvm-svn: 228937
2015-02-12 17:16:46 +00:00
Benjamin Kramer 5f6a907288 MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line with countTrailingZeros
Update all callers.

llvm-svn: 228930
2015-02-12 15:35:40 +00:00
Ahmed Bougacha 24433a7005 [CodeGen] Don't blindly combine (fp_round (fp_round x)) to (fp_round x).
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.

Differential Revision: http://reviews.llvm.org/D7571

llvm-svn: 228911
2015-02-12 06:15:29 +00:00
Jonas Paulsson bf8d0cc699 Fix SelectionDAG compile time issue with alias analysis.
Add new token factor node and its users to worklist if alias analysis is
turned on, in DAGCombiner::visitTokenFactor(). Alias analysis may cause
a lot of new token factors to be inserted into the DAG, and they need to
be optimized to avoid significant slow-downs.

Reviewed by Hal Finkel.

llvm-svn: 228841
2015-02-11 16:10:31 +00:00
Rafael Espindola 25d2c20c0c Don't repeat name in comment and clang-format a function.
llvm-svn: 228831
2015-02-11 14:44:17 +00:00
Arnaud A. de Grandmaison de79026d5e [PBQP] Cautiously update edge costs in the solver
The NodeMetadata are maintained in an incremental way. When an edge between
2 nodes has its cost updated, in the course of graph reduction for example,
the NodeMetadata need first to have the old edge cost removed, then the new
edge cost added. Only once the NodeMetadata have been fully updated, it
becomes safe to consider promoting the nodes to the
ConservativelyAllocatable or OptimallyReducible sets. Previously, this
promotion was occuring right after the removing the old cost, and this was
breaking the assumption that a ConservativelyAllocatable should not be
spilled.

This patch also adds asserts to:
 - enforces the invariant that a node's reduction can not be downgraded,
 - only not provably allocatable or optimally reducible nodes can be spilled.

llvm-svn: 228816
2015-02-11 08:25:36 +00:00
Zachary Turner 3bd47cee78 Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects.
This allows IDEs to recognize the entire set of header files for
each of the core LLVM projects.

Differential Revision: http://reviews.llvm.org/D7526
Reviewed By: Chris Bieneman

llvm-svn: 228798
2015-02-11 03:28:02 +00:00
Reid Kleckner 96d011315a Don't promote asynch EH invokes of nounwind functions to calls
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.

Also add some landingpads to invalid LLVM IR test cases that lack them.

Over-the-shoulder reviewed by David Majnemer.

llvm-svn: 228782
2015-02-11 01:23:16 +00:00
Petar Jovanovic d9f52043b1 Fix makeLibCall argument (signed) in SoftenFloatRes_XINT_TO_FP function
The isSigned argument of makeLibCall function was hard-coded to false
(unsigned). This caused zero extension on MIPS64 soft float.
As the result SingleSource/Benchmarks/Stanford/FloatMM test and
SingleSource/UnitTests/2005-07-17-INT-To-FP test failed. 
The solution was to use the proper argument.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D7292

llvm-svn: 228765
2015-02-10 23:30:14 +00:00
Adrian Prantl ca7e470221 Debug Info: Support variables that are described by more than one MMI
table entry. This happens when SROA splits up an alloca and the resulting
allocas cannot be lowered to SSA values because their address is passed
to a function.

Fixes PR22502.

llvm-svn: 228764
2015-02-10 23:18:28 +00:00
Adrian Prantl d49691f779 Fix indentation.
llvm-svn: 228763
2015-02-10 23:18:15 +00:00
Andrew Kaylor 78b53dbcc1 Adding support for llvm.eh.begincatch and llvm.eh.endcatch intrinsics and beginning the documentation of native Windows exception handling.
Differential Revision: http://reviews.llvm.org/D7398

llvm-svn: 228733
2015-02-10 19:52:43 +00:00
Jonas Paulsson a25a3f4fea Two comment typo fixes in lib/CodeGen/SelectionDAG/DAGCombiner.cpp.
llvm-svn: 228700
2015-02-10 15:34:29 +00:00
Jonas Paulsson afa6813816 Bugfix for missed dependency from store to load in buildSchedGraph().
Background: When handling underlying objects for a store, the vector
of previous mem uses, mapped to the same Value, is afterwards cleared
(regardless of ThisMayAlias). This means that during handling of the
next store using the same Value, adjustChainDeps() must be called,
otherwise a dependency might be missed.

For example, three spill/reload (NonAliasing) memory accesses using
the same Value 'a', with different offsets:

    SU(2): store  @a
    SU(1): store  @a, Offset:1
    SU(0): load   @a

In this case we have:

* SU(1) does not need a dep against SU(0). Therefore,SU(0) ends up in
  RejectMemNodes and is removed from the mem-uses list (AliasMemUses
  or NonAliasMemUses), as this list is cleared.

* SU(2) needs a dep against SU(0). Therefore, SU(2) must check
  RejectMemNodes by calling adjustChainDeps().

Previously, for store SUs, adjustChainDeps() was only called if
MayAlias was true, missing the S(2) to S(0) dependency in the case
above. The fix is to always call adjustChainDeps(), regardless of
MayAlias, since this applies both for AliasMemUses and
NonAliasMemUses.

No testcase found for any in-tree target.

llvm-svn: 228686
2015-02-10 13:03:32 +00:00
Chandler Carruth b65d61a2e8 [x86] Fix PR22524: the DAG combiner was incorrectly handling illegal
nodes when folding bitcasts of constants.

We can't fold things and then check after-the-fact whether it was legal.
Once we have formed the DAG node, arbitrary other nodes may have been
collapsed to it. There is no easy way to go back. Instead, we need to
test for the specific folding cases we're interested in and ensure those
are legal first.

This could in theory make this less powerful for bitcasting from an
integer to some vector type, but AFAICT, that can't actually happen in
the SDAG so its fine. Now, we *only* whitelist specific int->fp and
fp->int bitcasts for post-legalization folding. I've added the test case
from the PR.

(Also as a note, this does not appear to be in 3.6, no backport needed)

llvm-svn: 228656
2015-02-10 02:25:56 +00:00
Adrian Prantl 27bd01f71c Debug info: Use DW_OP_bit_piece instead of DW_OP_piece in the
intermediate representation. This
- increases consistency by using the same granularity everywhere
- allows for pieces < 1 byte
- DW_OP_piece didn't actually allow storing an offset.

Part of PR22495.

llvm-svn: 228631
2015-02-09 23:57:15 +00:00
Duncan P. N. Exon Smith b407bb2789 DebugInfo: Remove DW_TAG_constant
Remove handling for DW_TAG_constant.  We started producing it in
r110656, but reverted that in r110876 without dropping the support.
Finish the job.

llvm-svn: 228623
2015-02-09 22:48:04 +00:00
Benjamin Kramer a9591b5880 Move DebugLocs around instead of copying.
llvm-svn: 228491
2015-02-07 12:28:15 +00:00
Bruce Mitchener 7e575ed1ea Add more DWARF 5 language constants.
Differential Revision: http://reviews.llvm.org/D7430

llvm-svn: 228487
2015-02-07 06:35:30 +00:00
Quentin Colombet a8cb36ea43 [LiveIntervalAnalysis] Speed up creation of live ranges for physical registers
by using a segment set.

The patch addresses a compile-time performance regression in the LiveIntervals
analysis pass (see http://llvm.org/bugs/show_bug.cgi?id=18580). This regression
is especially critical when compiling long functions. Our analysis had shown
that the most of time is taken for generation of live intervals for physical
registers. Insertions in the middle of the array of live ranges cause quadratic
algorithmic complexity, which is apparently the main reason for the slow-down. 

Overview of changes:
- The patch introduces an additional std::set<Segment>* member in LiveRange for
  storing segments in the phase of initial creation. The set is used if this
  member is not NULL, otherwise everything works the old way. 
- The set of operations on LiveRange used during initial creation (i.e. used by
  createDeadDefs and extendToUses) have been reimplemented to use the segment
  set if it is available.
- After a live range is created the contents of the set are flushed to the
  segment vector, because the set is not as efficient as the vector for the
  later uses of the live range. After the flushing, the set is deleted and
  cannot be used again.
- The set is only for live ranges computed in
  LiveIntervalAnalysis::computeLiveInRegUnits() and getRegUnit() but not in
  computeVirtRegs(), because I did not bring any performance benefits to
  computeVirtRegs() and for some examples even brought a slow down.

Patch by Vaidas Gasiunas <vaidas.gasiunas@sap.com>

Differential Revision: http://reviews.llvm.org/D6013

llvm-svn: 228421
2015-02-06 18:42:41 +00:00
Matthias Braun 7dc96dea0a LiveInterval: Fix SubRange memory leak.
llvm-svn: 228405
2015-02-06 17:28:47 +00:00
Arnaud A. de Grandmaison 77f62d45ef [PBQP] Fix comment wording. NFC
llvm-svn: 228390
2015-02-06 11:28:16 +00:00
Daniel Jasper 4bb224decb Small cleanup of MachineLICM.cpp
Specifically:
- Calculate the loop pre-header once at the stat of HoistOutOfLoop, so:
  - We don't-DFS walk the MachineDomTree if we aren't going to do anything
  - Don't call getCurPreheader for each Scope
- Don't needlessly use a do-while loop
- Use early exit for Scopes.size() == 0

No functional changes intended.

llvm-svn: 228350
2015-02-05 22:39:46 +00:00
Ahmed Bougacha e892d13d90 [CodeGen] Add hook/combine to form vector extloads, enabled on X86.
The combine that forms extloads used to be disabled on vector types,
because "None of the supported targets knows how to perform load and
sign extend on vectors in one instruction."

That's not entirely true, since at least SSE4.1 X86 knows how to do
those sextloads/zextloads (with PMOVS/ZX).
But there are several aspects to getting this right.
First, vector extloads are controlled by a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).

The interesting optimization enables folding of s/zextloads to illegal
(splittable) vector types, expanding them into smaller legal extloads.

It's not ideal (it introduces some legalization-like behavior in the
combine) but it's better than the obvious alternative: form illegal
extloads, and later try to split them up.  If you do that, you might
generate extloads that can't be split up, but have a valid ext+load
expansion.  At vector-op legalization time, it's too late to generate
this kind of code, so you end up forced to scalarize. It's better to
just avoid creating egregiously illegal nodes.

This optimization is enabled unconditionally on X86.

Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).

Also note that the existing combine that forms extloads is now also
enabled on legal vectors.  This doesn't have a big effect on X86
(because sext+load is usually combined to sext_inreg+aextload).
On ARM it fires on some rare occasions; that's for a separate commit.

Differential Revision: http://reviews.llvm.org/D6904

llvm-svn: 228325
2015-02-05 18:31:02 +00:00
Rafael Espindola f8d662aa4b Add a FIXME.
Thanks to Eric for the suggestion.

llvm-svn: 228300
2015-02-05 14:57:47 +00:00
Rafael Espindola a092f17580 Don' try to make sections in comdats SHF_MERGE.
Parts of llvm were not expecting it and we wouldn't print
the entity size of the section.

Given what comdats are used for, having SHF_MERGE sections would be
just a small improvement, so just disable it for now.

Fixes pr22463.

llvm-svn: 228196
2015-02-04 21:27:24 +00:00
Matthias Braun 26e7ea6267 MachineCSE: Clear dead-def flag on CSE.
In case CSE reuses a previoulsy unused register the dead-def flag has to
be cleared on the def operand, as exposed by the arm64-cse.ll test.

This fixes PR22439 and the corresponding rdar://19694987

Differential Revision: http://reviews.llvm.org/D7395

llvm-svn: 228178
2015-02-04 19:35:16 +00:00
Michael Kuperstein cd63c5fa73 Fixes a bug in vector load legalization that confused bits and bytes.
Differential Revision: http://reviews.llvm.org/D7400

llvm-svn: 228168
2015-02-04 18:54:01 +00:00
Owen Anderson 21b1788ad0 Remove a gross usage of environment variables in MachineVerifier, replacing it with support for setting the -verify-machineinstrs flag via an environment variable in LIT.
This preserves the handy functionality of force-enabling the MachineVerifier, without the need to embed usage of environment variables in LLVM client applications.

llvm-svn: 228079
2015-02-04 00:02:59 +00:00
Arnaud A. de Grandmaison 10797c5707 [PBQP] Provide more information in the debug prints
Based on a patch by Jonas Paulsson

llvm-svn: 228068
2015-02-03 23:40:24 +00:00
Eric Christopher 36fe028a2a Only access TLOF via the TargetMachine, not TargetLowering.
llvm-svn: 227949
2015-02-03 07:22:52 +00:00
Lang Hames d48bf3f912 [PBQP Regalloc] Pre-spill vregs that have no legal physregs.
The PBQP::RegAlloc::MatrixMetadata class assumes that matrices have at least two
rows/columns (for the spill option plus at least one physreg). This patch
ensures that that invariant is met by pre-spilling vregs that have no physreg
options so that no node (and no corresponding edges) need be added to the PBQP
graph.

This fixes a bug in an out-of-tree target that was identified by Jonas Paulsson.
Thanks for tracking this down Jonas!

llvm-svn: 227942
2015-02-03 06:14:06 +00:00
Rafael Espindola 6ffb1d7e3c Move simple case earlier and use a continue.
llvm-svn: 227841
2015-02-02 19:22:51 +00:00
Adrian Prantl 9013c4b3b1 Debug Info: Relax assertion in isUnsignedDIType() to allow floats to be
described by integer constants. This is a bit ugly, but if the source
language allows arbitrary type casting, the debug info must follow suit.

For example:
  void foo() {
    float a;
    *(int *)&a = 0;
  }
For the curious: SROA replaces the float alloca with an i32 alloca, which
is then optimized away and described via dbg.value(i32 0, ...).

llvm-svn: 227827
2015-02-02 18:31:58 +00:00
Michael Kuperstein 13fbd45263 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

(Re-commit of r227728)

Differential Revision: http://reviews.llvm.org/D6789

llvm-svn: 227752
2015-02-01 16:56:04 +00:00
Michael Kuperstein e86aa9a8a4 Revert r227728 due to bad line endings.
llvm-svn: 227746
2015-02-01 16:15:07 +00:00
Chandler Carruth c956ab6603 [multiversion] Switch the TTI queries from TargetMachine to Subtarget
now that we have a correct and cached subtarget specific to the
function.

Also, finish providing a cached per-function subtarget in the core
LLVMTargetMachine -- that layer hadn't switched over yet.

The only use of the TargetMachine was to re-lookup a subtarget for
a particular function to work around the fact that TTI was immutable.
Now that it is per-function and we haved a cached subtarget, use it.

This still leaves a few interfaces with real warts on them where we were
passing Function objects through the TTI interface. I'll remove these
and clean their usage up in subsequent commits now that this isn't
necessary.

llvm-svn: 227738
2015-02-01 14:22:17 +00:00
Chandler Carruth c340ca839c [multiversion] Remove the cached TargetMachine pointer from the
intermediate TTI implementation template and instead query up to the
derived class for both the TargetMachine and the TargetLowering.

Most of the derived types had a TLI cached already and there is no need
to store a less precisely typed target machine pointer.

This will in turn make it much cleaner to look up the TLI via
a per-function subtarget instead of the generic subtarget, and it will
pave the way toward pulling the subtarget used for unroll preferences
into the same form once we are *always* using the function to look up
the correct subtarget.

llvm-svn: 227737
2015-02-01 14:01:15 +00:00
Chandler Carruth 8b04c0d26a [multiversion] Switch all of the targets over to use the
TargetIRAnalysis access path directly rather than implementing getTTI.

This even removes getTTI from the interface. It's more efficient for
each target to just register a precise callback that creates their
specific TTI.

As part of this, all of the targets which are building their subtargets
individually per-function now build their TTI instance with the function
and thus look up the correct subtarget and cache it. NVPTX, R600, and
XCore currently don't leverage this functionality, but its trivial for
them to add it now.

llvm-svn: 227735
2015-02-01 13:20:00 +00:00
Chandler Carruth 5ec2b1d11a [multiversion] Implement the old pass manager's TTI wrapper pass in
terms of the new pass manager's TargetIRAnalysis.

Yep, this is one of the nicer bits of the new pass manager's design.
Passes can in many cases operate in a vacuum and so we can just nest
things when convenient. This is particularly convenient here as I can
now consolidate all of the TargetMachine logic on this analysis.

The most important change here is that this pushes the function we need
TTI for all the way into the TargetMachine, and re-creates the TTI
object for each function rather than re-using it for each function.
We're now prepared to teach the targets to produce function-specific TTI
objects with specific subtargets cached, etc.

One piece of feedback I'd love here is whether its worth renaming any of
this stuff. None of the names really seem that awesome to me at this
point, but TargetTransformInfoWrapperPass is particularly ... odd.
TargetIRAnalysisWrapper might make more sense. I would want to do that
rename separately anyways, but let me know what you think.

llvm-svn: 227731
2015-02-01 12:26:09 +00:00
Chandler Carruth fdb9c573f7 [multiversion] Thread a function argument through all the callers of the
getTTI method used to get an actual TTI object.

No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.

The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.

llvm-svn: 227730
2015-02-01 12:01:35 +00:00
Michael Kuperstein bd57186c76 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

Differential Revision: http://reviews.llvm.org/D6789

llvm-svn: 227728
2015-02-01 11:44:44 +00:00
Chandler Carruth 93dcdc47db [PM] Switch the TargetMachine interface from accepting a pass manager
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.

This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.

I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.

With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.

llvm-svn: 227685
2015-01-31 11:17:59 +00:00
Chandler Carruth 705b185f90 [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

llvm-svn: 227669
2015-01-31 03:43:40 +00:00
Alexey Samsonov 964f4dbaae Fix memory leak in WinEHPrepare introduced in r227405.
This leak was detected by ASan bootstrap of LLVM.

llvm-svn: 227625
2015-01-30 22:07:05 +00:00
Reid Kleckner 2a2235087f Update comments to use unreachable instead of llvm.trap, as implemented now
llvm-svn: 227502
2015-01-29 22:32:26 +00:00
Rafael Espindola ba31e27f0a Compute the ELF SectionKind from the flags.
Any code creating an MCSectionELF knows ELF and already provides the flags.

SectionKind is an abstraction used by common code that uses a plain
MCSection.

Use the flags to compute the SectionKind. This removes a lot of
guessing and boilerplate from the MCSectionELF construction.

llvm-svn: 227476
2015-01-29 17:33:21 +00:00
Rafael Espindola 093dcc43f7 Use isMergeableConst now that it is sane.
llvm-svn: 227441
2015-01-29 14:23:28 +00:00
Rafael Espindola 33804cac96 Remove MergeableConst.
Only the specific ones (MergeableConst4, MergeableConst8, MergeableConst16) are
handled specially.

llvm-svn: 227440
2015-01-29 14:12:41 +00:00
Benjamin Kramer eb63e4d6f8 EHPrepare: Remove leftover initialization code for DomTrees.
While there modernize some loops. NFC.

llvm-svn: 227436
2015-01-29 13:26:50 +00:00
Rafael Espindola e2d4b2df39 Use enum values. NFC.
llvm-svn: 227435
2015-01-29 13:25:44 +00:00
Rafael Espindola fad3901095 Don't create multiple mergeable sections with -fdata-sections.
ELF has support for sections that can be split into fixed size or
null terminated entities.

Since these sections can be split by the linker, it is not necessary
to split them in codegen.

This reduces the combined .o size in a llvm+clang build from
202,394,570 to 173,819,098 bytes.

The time for linking clang with gold (on a VM, on a laptop) goes
from 2.250089985 to 1.383001792 seconds.

The flip side is the size of rodata in clang goes from 10,926,785
to 10,929,345 bytes.

The increase seems to be because of http://sourceware.org/bugzilla/show_bug.cgi?id=17902.

llvm-svn: 227431
2015-01-29 12:43:28 +00:00
Chandler Carruth a1a622746e Remove an unused private field added r227405 to fix a Clang warning.
llvm-svn: 227415
2015-01-29 02:44:53 +00:00
Reid Kleckner 8fcb0ad8a6 Remove unused variable
llvm-svn: 227408
2015-01-29 00:55:44 +00:00
Reid Kleckner 1185fced3d Add a Windows EH preparation pass that zaps resumes
If the personality is not a recognized MSVC personality function, this
pass delegates to the dwarf EH preparation pass. This chaining supports
people on *-windows-itanium or *-windows-gnu targets.

Currently this recognizes some personalities used by MSVC and turns
resume instructions into traps to avoid link errors.  Even if cleanups
are not used in the source program, LLVM requires the frontend to emit a
code path that resumes unwinding after an exception.  Clang does this,
and we get unreachable resume instructions. PR20300 covers cleaning up
these unreachable calls to resume.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7216

llvm-svn: 227405
2015-01-29 00:41:44 +00:00
Manuel Jacob 6f508c578b Add nullptr checks for TargetSelectionDAGInfo in SelectionDAG.
TSI is not guaranteed be non-null in SelectionDAG.

llvm-svn: 227397
2015-01-28 23:50:40 +00:00
Philip Reames 23cf2e2f97 Remove gc.root's performCustomLowering
This is a refactoring to restructure the single user of performCustomLowering as a specific lowering pass and remove the custom lowering hook entirely.

Before this change, the LowerIntrinsics pass (note to self: rename!) was essentially acting as a pass manager, but without being structured in terms of passes. Instead, it proxied calls to a set of GCStrategies internally. This adds a lot of conceptual complexity (i.e. GCStrategies are stateful!) for very little benefit. Since there's been interest in keeping the ShadowStackGC working, I extracting it's custom lowering pass into a dedicated pass and just added that to the pass order. It will only run for functions which opt-in to that gc.

I wasn't able to find an easy way to preserve the runtime registration of custom lowering functionality. Given that no user of this exists that I'm aware of, I made the choice to just remove that. If someone really cares, we can look at restoring it via dynamic pass registration in the future.

Note that despite the large diff, none of the lowering code actual changes. I added the framing needed to make it a pass and rename the class, but that's it.

Differential Revision: http://reviews.llvm.org/D7218

llvm-svn: 227351
2015-01-28 19:28:03 +00:00
Rafael Espindola a05b3b73a4 Simplify code. NFC.
llvm-svn: 227333
2015-01-28 17:54:19 +00:00
Hal Finkel 34c94d5caa Correct the AggressiveAntiDepBreaker's handling of subregisters defining super registers
As the AggressiveAntiDepBreaker iterated backward through a scheduling region,
we must leave super registers live through subregister definitions so that all
relevant subregister definitions are renamed together. The problem was that we
were also discarding sub-register use locations as the sub-registers are
redefined. The result is that we'd rename the super register along with some,
but not all, subregister definitions.

  R0_D = {R0_L, R1_L}
  R0_L = {R0_S, R1_S}

  %R0_L<def> = TRLi9 16, pred:8, pred:%noreg
  %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg
  %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill>
  %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
  %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
  %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S

  Anti:   %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S
   Def Groups: R4_D=g213->g215(via R4_S)->g214(via R4_L)->g216(via R5_S)->g216(via R4_L)->g217(via R5_L)
   Use Groups: R0_D=g0->g218(last-use) R1_L->g219(last-use) R6_S=g204->g220(last-use)
  Anti:   %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R0_L=g208->g209(via R0_S)->g218(via R0_D)->g210(via R1_S)->g210(via R0_D)
   Antidep reg: R0_L (real dependency)
   Use Groups: R0_L=g210->g224(last-use) R0_S->g225(last-use) R1_S->g226(last-use)
  Anti:   %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R1_L=g219->g210(via R0_D)
   Antidep reg: R1_L (real dependency)
   Use Groups: R1_L=g210->g229(last-use)
  Anti:   %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill>
   Def Groups: R0_L=g224->g225(via R0_S)->g210(via R0_D)->g226(via R1_S)->g226(via R0_D)
   Antidep reg: R0_L Use Groups: R2_L=g192 R0_S=g226->g230(last-use) R0_L=g226->g231(last-use) R1_S->g232(last-use)
  Anti:   %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg
   Def Groups: R1_L=g229->g226(via R0_D)
   Antidep reg: R1_L Use Groups: R1_L=g226->g233(last-use) R0_S=g230
  Anti:   %R0_L<def> = TRLi9 16, pred:8, pred:%noreg
   Def Groups: R0_L=g231->g230(via R0_S)->g226(via R0_D)->g232(via R1_S)->g232(via R0_D)
   Antidep reg: R0_L
   Rename Candidates for Group g232:
    R0_D: elcInt64Regs :: R0_D R1_D R2_D R3_D R4_D R5_D R8_D R9_D R10_D R11_D R12_D R13_D R14_D R15_D R16_D R17_D R18_D R19_D R20_D R21_D R22_D R23_D R24_D R25_D
    R0_L: elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L
    R0_S: elcShrtRegs elcShrtRegs :: R0_S R1_S R2_S R3_S R4_S R5_S R8_S R9_S R10_S R11_S R12_S R13_S R14_S R15_S R16_S R17_S R18_S R19_S R20_S R21_S R22_S R23_S R24_S R25_S
   Find Registers: [R12_D: R12_D R12_L R12_S]
   Breaking anti-dependence edge on R0_L: R0_D->R12_D(1 refs) R0_L->R12_L(2 refs) R0_S->R12_S(2 refs)
   Use Groups:
  ...

  %R12_L<def> = TRLi9 16, pred:8, pred:%noreg
  %R1_L<def> = LSRLrr %R1_L<kill>, %R12_S, pred:8, pred:%noreg
  %R0_L<def> = LSRLrr %R2_L<kill>, %R12_S, pred:8, pred:%noreg, %R12_L<imp-use>
  %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
  %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
  %R4_D<def> = ASRDrr %R12_D<kill>, %R6_S

With this change, we now produce:

  Anti:   %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S
   Def Groups: R4_D=g213->g215(via R4_S)->g214(via R4_L)->g216(via R5_S)->g216(via R4_L)->g217(via R5_L)
   Use Groups: R0_D=g0->g218(last-use) R1_L->g219(last-use) R6_S=g204->g220(last-use)
  Anti:   %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R0_L=g208->g209(via R0_S)->g218(via R0_D)->g210(via R1_S)->g210(via R0_D)
   Antidep reg: R0_L (real dependency)
   Use Groups: R0_L=g210
  Anti:   %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R1_L=g219->g210(via R0_D)
   Antidep reg: R1_L (real dependency)
   Use Groups: R1_L=g210
  Anti:   %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill>
   Def Groups: R0_L=g210->g210(via R0_D)->g210(via R0_D)
   Antidep reg: R0_L Use Groups: R2_L=g192 R0_S=g210 R0_L=g210
  Anti:   %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg
   Def Groups: R1_L=g210->g210(via R0_D)
   Antidep reg: R1_L Use Groups: R1_L=g210 R0_S=g210
  Anti:   %R0_L<def> = TRLi9 16, pred:8, pred:%noreg
   Def Groups: R0_L=g210->g210(via R0_D)->g210(via R0_D)
   Antidep reg: R0_L
   Rename Candidates for Group g210:
    R0_D: elcInt64Regs :: R0_D R1_D R2_D R3_D R4_D R5_D R8_D R9_D R10_D R11_D R12_D R13_D R14_D R15_D R16_D R17_D R18_D R19_D R20_D R21_D R22_D R23_D R24_D R25_D
    R0_L: elcIntRegs elcIntAIRegs elcIntRegs elcIntRegs elcIntRegs elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L
    R1_L: elcIntRegs elcIntRegs elcIntRegs elcIntRegs elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L
    R0_S: elcShrtRegs elcShrtRegs :: R0_S R1_S R2_S R3_S R4_S R5_S R8_S R9_S R10_S R11_S R12_S R13_S R14_S R15_S R16_S R17_S R18_S R19_S R20_S R21_S R22_S R23_S R24_S R25_S
   Find Registers: [R12_D: R12_D R12_L R13_L R12_S]
   Breaking anti-dependence edge on R0_L: R0_D->R12_D(1 refs) R0_L->R12_L(7 refs) R1_L->R13_L(5 refs) R0_S->R12_S(2 refs)
   Use Groups:
  ...

  %R12_L<def> = TRLi9 16, pred:8, pred:%noreg
  %R13_L<def> = LSRLrr %R13_L<kill>, %R12_S, pred:8, pred:%noreg
  %R12_L<def> = LSRLrr %R2_L<kill>, %R12_S<kill>, pred:8, pred:%noreg, %R12_L<imp-use,kill>
  %R13_L<def> = ANDLri %R13_L<kill>, 2047, pred:8, pred:%noreg
  %R12_L<def> = ANDLri %R12_L<kill>, 2047, pred:8, pred:%noreg
  %R4_D<def> = ASRDrr %R12_D, %R6_S, %R12_L<imp-def>, %R12_S<imp-def>, %R13_S<imp-def>

As demonstrated by this example, this is also somewhat unfortunate, because
there is actually no need to rename the super register in this case (it is
fully covered by later subregister definitions), but we don't seem to track
enough information here to exploit that either.

Thanks to Daniil Troshkov for reporting the issue. The debug outputs in this
commit message are from Daniil.

llvm-svn: 227311
2015-01-28 14:44:14 +00:00
Chandler Carruth b81dfa6378 [LPM] Stop using the string based preservation API. It is an
abomination.

For starters, this API is incredibly slow. In order to lookup the name
of a pass it must take a memory fence to acquire a pointer to the
managed static pass registry, and then potentially acquire locks while
it consults this registry for information about what passes exist by
that name. This stops the world of LLVMs in your process no matter
how little they cared about the result.

To make this more joyful, you'll note that we are preserving many passes
which *do not exist* any more, or are not even analyses which one might
wish to have be preserved. This means we do all the work only to say
"nope" with no error to the user.

String-based APIs are a *bad idea*. String-based APIs that cannot
produce any meaningful error are an even worse idea. =/

I have a patch that simply removes this API completely, but I'm hesitant
to commit it as I don't really want to perniciously break out-of-tree
users of the old pass manager. I'd rather they just have to migrate to
the new one at some point. If others disagree and would like me to kill
it with fire, just say the word. =]

llvm-svn: 227294
2015-01-28 04:57:56 +00:00
David Blaikie e245228903 Add description to assert
llvm-svn: 227291
2015-01-28 02:43:15 +00:00
David Blaikie fa1a3c7cf5 PR22356: DebugInfo: Handle the size of a member where the type of that member is a typedef (or other sugar) of a declaration.
llvm-svn: 227290
2015-01-28 02:34:53 +00:00
Quentin Colombet 308b171318 Revert r227242 - Merge vector stores into wider vector stores (PR21711).
This commit creates infinite loop in DAG combine for in the LLVM test-suite
for aarch64 with mcpu=cylcone (just having neon may be enough to expose this).

llvm-svn: 227272
2015-01-27 23:58:01 +00:00
Sanjay Patel b1ca4e48d4 remove function names from comments; NFC
llvm-svn: 227256
2015-01-27 22:26:56 +00:00
Sanjay Patel 6b280777b7 fix typos; NFC
llvm-svn: 227253
2015-01-27 22:16:52 +00:00
Sanjay Patel bcf62f2fa2 Merge vector stores into wider vector stores (PR21711)
This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

The 'f3' test case in that report presents a situation where we have two 128-bit
stores extracted from a 256-bit source vector. 

Instead of producing this:

vmovaps %xmm0, (%rdi)
vextractf128    $1, %ymm0, 16(%rdi)

This patch merges the 128-bit stores into a single 256-bit store:

vmovups %ymm0, (%rdi)

Differential Revision: http://reviews.llvm.org/D7208

llvm-svn: 227242
2015-01-27 20:50:27 +00:00
Manuel Jacob 8220addad7 Add a FIXME in SelectionDAGBuilder before an assert that is valid only on X86.
When lowering memcpy, memset or memmove, this assert checks whether the pointer
operands are in an address space < 256 which means "user defined address space"
on X86.  However, this notion of "user defined address space" does not exist
for other targets.

llvm-svn: 227191
2015-01-27 13:14:35 +00:00
Eric Christopher 337262068f Replace some uses of getSubtargetImpl with the cached version
off of the MachineFunction or with the version that takes a
Function reference as an argument.

llvm-svn: 227185
2015-01-27 08:48:42 +00:00
Eric Christopher 7592b0c5e6 Have the PBQP register allocator use the subtarget on the MachineFunction.
(and remove an extraneous private).

llvm-svn: 227181
2015-01-27 08:27:06 +00:00
Eric Christopher ad6bedb43e Fix build failure with pointer vs reference.
NB: Saving files after editing helps.
llvm-svn: 227178
2015-01-27 08:00:42 +00:00
Eric Christopher 2c63549386 Update a few calls to getSubtarget<> to either be getSubtargetImpl
when we didn't need the cast to the base class or the cached version
off of the subtarget.

llvm-svn: 227176
2015-01-27 07:54:39 +00:00
Eric Christopher 3d4276f053 The subtarget is cached on the MachineFunction. Access it directly.
llvm-svn: 227173
2015-01-27 07:31:29 +00:00
Eric Christopher 349d5886e5 MachineRegisterInfo can access TII off of the MachineFunction's
subtarget and so doesn't need the TargetMachine or to access via
getSubtargetImpl. Update all callers.

llvm-svn: 227160
2015-01-27 01:15:16 +00:00
Eric Christopher 4e048eeb2a Migrate AtomicExpandPass and DwarfEHPrepare to using a Function-ized getSubtargetImpl.
llvm-svn: 227159
2015-01-27 01:04:42 +00:00
Eric Christopher fccff37b53 Migrate CodeGenPrepare to use the Function based getSubtarget
code.

llvm-svn: 227157
2015-01-27 01:01:38 +00:00
Eric Christopher 2b214e7ad3 Grab the TargetLowering info from the DAG rather than querying for
a subtarget.

llvm-svn: 227156
2015-01-27 01:01:36 +00:00
Eric Christopher b11a1b7b2c Cache the lookup of TargetLowering in the atomic expand pass.
llvm-svn: 227121
2015-01-26 19:45:40 +00:00
Ahmed Bougacha 9a9e1a59ce [SelectionDAG] Fix assert message copypasta. NFC.
llvm-svn: 227119
2015-01-26 19:31:42 +00:00
Eric Christopher 8b7706517c Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

llvm-svn: 227113
2015-01-26 19:03:15 +00:00
Philip Reames 56a03938f7 Revert GCStrategy ownership changes
This change reverts the interesting parts of 226311 (and 227046).  This change introduced two problems, and I've been convinced that an alternate approach is preferrable anyways.

The bugs were:
- Registery appears to require all users be within the same linkage unit.  After this change, asking for "statepoint-example" in Transform/ would sometimes get you nullptr, whereas asking the same question in CodeGen would return the right GCStrategy.  The correct long term fix is to get rid of the utter hack which is Registry, but I don't have time for that right now.  227046 appears to have been an attempt to fix this, but I don't believe it does so completely.
- GCMetadataPrinter::finishAssembly was being called more than once per GCStrategy.  Each Strategy was being added to the GCModuleInfo multiple times.

Once I get time again, I'm going to split GCModuleInfo into the gc.root specific part and a GCStrategy owning Analysis pass.  I'm probably also going to kill off the Registry.  Once that's done, I'll move the new GCStrategyAnalysis and all built in GCStrategies into Analysis.  (As original suggested by Chandler.)  This will accomplish my original goal of being able to access GCStrategy from Transform/  without adding all of the builtin GCs to IR/.  

llvm-svn: 227109
2015-01-26 18:26:35 +00:00
Adrian Prantl 40cb819c6f Debug info: Fix PR22296 by omitting the DW_AT_location if we lost the
physical register that is described in a DBG_VALUE.

In the testcase the DBG_VALUE describing "p5" becomes unavailable
because the register its address is in is clobbered and we (currently)
aren't smart enough to realize that the value is rematerialized immediately
after the DBG_VALUE and/or is actually a stack slot.

llvm-svn: 227056
2015-01-25 19:04:08 +00:00
Elena Demikhovsky a3232f764e Implemented cost model for masked load/store operations.
llvm-svn: 227035
2015-01-25 08:44:46 +00:00
Saleem Abdulrasool fc07c72674 CodeGen: drive-by formatting clean ups
Minor tweaks to whitespace formatting that I noticed was off.  NFC.

llvm-svn: 227014
2015-01-24 20:19:45 +00:00
Andrea Di Biagio 8381475a75 [DAG] Fix wrong canonicalization performed on shuffle nodes.
This fixes a regression introduced by r226816.
When replacing a splat shuffle node with a constant build_vector,
make sure that the new build_vector has a valid number of elements.

Thanks to Patrik Hagglund for reporting this problem and providing a
small reproducible.

llvm-svn: 227002
2015-01-24 11:54:29 +00:00
Reid Kleckner 3d4638b391 Fix assertion when C++ EH filters are present in functions using SEH
Should fix PR22305.

llvm-svn: 226969
2015-01-23 23:51:25 +00:00
Adrian Prantl 70f2a736db Address more review comments for DIExpression::iterator.
- input_iterator
- define an operator->
- make constructors private were possible

llvm-svn: 226967
2015-01-23 23:40:47 +00:00
Adrian Prantl 4bd0a0c3ed Move the accessor functions from DIExpression::iterator into a wrapper
DIExpression::Operand, so we can write range-based for loops.

Thanks to David Blaikie for the idea.

llvm-svn: 226939
2015-01-23 21:24:41 +00:00
Reid Kleckner 5cc1569c54 Classify functions by EH personality type rather than using the triple
This mostly reverts commit r222062 and replaces it with a new enum. At
some point this enum will grow at least for other MSVC EH personalities.

Also beefs up the way we were sniffing the personality function.
Previously we would emit the Itanium LSDA despite using
__C_specific_handler.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6987

llvm-svn: 226920
2015-01-23 18:49:01 +00:00
Adrian Prantl 577feba44b Debug Info / PR22309: Allow union types to be emitted as unsigned constants.
llvm-svn: 226919
2015-01-23 18:01:39 +00:00
Mehdi Amini 5059813c2d DAGCombine: always constant fold FMA when target disable FP exceptions
Summary: When trying to constant fold an FMA in the DAG, getNode()
fails to fold the FMA if an operand is not finite. In this case this
patch allows the constant folding if !TLI->hasFloatingPointExceptions()

Reviewers: resistor

Reviewed By: resistor

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D6912

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 226901
2015-01-23 07:07:20 +00:00
Jan Vesely 6269e3ca2f SelectionDAG: Add KnownBits and SignBits computation for EXTRACT_ELEMENT
v2: use getZExtValue
    add missing break
    codestyle

v3: add few more comments

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226880
2015-01-22 23:42:41 +00:00
Ramkumar Ramachandra 75a4f35b26 Intrinsics: introduce llvm_any_ty aka ValueType Any
Specifically, gc.result benefits from this greatly. Instead of:

gc.result.int.*
gc.result.float.*
gc.result.ptr.*
...

We now have a gc.result.* that can specialize to literally any type.

Differential Revision: http://reviews.llvm.org/D7020

llvm-svn: 226857
2015-01-22 20:14:38 +00:00
Sanjay Patel 37c41c1d2c merge consecutive stores of extracted vector elements (PR21711)
This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. 
That patch was checked in at r224611, but reverted at r225031 because it
caused a failure outside of the regression tests.

The cause of the crash was not recognizing consecutive stores that have mixed
source values (loads and vector element extracts), so this patch adds a check
to bail out if any store value is not coming from a vector element extract.

This patch also refactors the shared logic of the constant source and vector
extracted elements source cases into a helper function.

Differential Revision: http://reviews.llvm.org/D6850
 

llvm-svn: 226845
2015-01-22 18:21:26 +00:00
David Blaikie e7d473461e Revert "PR21408: Workaround the appearance of duplicate variables due to problems when inlining two calls to the same function from the same call site."
The underlying bug has been fixed in r226736 so there's no need to
workaround this anymore.

This reverts commit r220923.

llvm-svn: 226842
2015-01-22 17:49:59 +00:00
Adrian Prantl 2585a98d38 Rename DIExpressionIterator to DIExpression::iterator.
Addresses review feedback from Duncan.

llvm-svn: 226835
2015-01-22 16:55:20 +00:00
Michael Kuperstein 25e34d11f3 [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

Fixed recommit of r226811.

llvm-svn: 226816
2015-01-22 13:07:28 +00:00
Michael Kuperstein ff74032018 Revert r226811, MSVC accepts code sane compilers don't.
llvm-svn: 226814
2015-01-22 12:48:07 +00:00
Michael Kuperstein 84fad3e5c9 [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

llvm-svn: 226811
2015-01-22 12:37:23 +00:00
Elena Demikhovsky 150d9f3187 Fixed a bug in type legalizer for masked load/store intrinsics.
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.

llvm-svn: 226808
2015-01-22 12:07:59 +00:00
Elena Demikhovsky 94cfbbab33 Fixed a comment
llvm-svn: 226806
2015-01-22 10:01:36 +00:00
Elena Demikhovsky 9c26462a27 Fixed a bug in narrowing store operation.
Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type,
since the size of VT (1 bit) is not equal to its actual store size(8 bits).

Added a test provided by David (dag@cray.com)

llvm-svn: 226805
2015-01-22 09:39:08 +00:00
Reid Kleckner f690f50519 Win64 SEH: Emit the constant 1 for catch-all into xdata
llvm-svn: 226767
2015-01-22 02:27:44 +00:00
Adrian Prantl 531641a0c6 Make DwarfExpression use the new DIExpressionIterator. NFC.
llvm-svn: 226748
2015-01-22 00:00:59 +00:00
Tim Northover 3007ba0ab3 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
It can help with argument juggling on some targets, and is generally a good
idea.

llvm-svn: 226740
2015-01-21 23:17:19 +00:00
Matthias Braun c1988f384c LiveIntervalAnalysis: Mark subregister defs as undef when we determined they are only reading a dead superregister value
This was not necessary before as this case can only be detected when the
liveness analysis is at subregister level.

llvm-svn: 226733
2015-01-21 22:55:13 +00:00
Matthias Braun 311730ac78 LiveIntervalAnalysis: Factor out code to update liveness on vreg def removal
This cleans up code and is more in line with the general philosophy of
modifying LiveIntervals through LiveIntervalAnalysis instead of changing
them directly.

This also fixes a case where SplitEditor::removeBackCopies() would miss
the subregister ranges.

llvm-svn: 226690
2015-01-21 19:02:30 +00:00
Matthias Braun cfb8ad29b5 LiveIntervalAnalysis: Factor out code to update liveness on physreg def removal
This cleans up code and is more in line with the general philosophy of
modifying LiveIntervals through LiveIntervalAnalysis instead of changing
them directly.

llvm-svn: 226687
2015-01-21 18:50:21 +00:00
Matthias Braun 1002baf7b9 LiveIntervalAnalysis: Remove unused pruneValue() variant.
llvm-svn: 226686
2015-01-21 18:45:57 +00:00
Tim Northover cf3d80fedb Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))"
It hadn't gone through review yet, but was still on my local copy.

This reverts commit r226663

llvm-svn: 226665
2015-01-21 15:48:52 +00:00
Tim Northover 85cd2791c9 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
llvm-svn: 226663
2015-01-21 15:43:28 +00:00
Daniel Jasper 6b77455f81 Prevent binary-tree deterioration in sparse switch statements.
This addresses part of llvm.org/PR22262. Specifically, it prevents
considering the densities of sub-ranges that have fewer than
TLI.getMinimumJumpTableEntries() elements. Those densities won't help
jump tables.

This is not a complete solution but works around the most pressing
issue.

Review: http://reviews.llvm.org/D7070
llvm-svn: 226600
2015-01-20 19:43:33 +00:00
Daniel Jasper d106b734cf Factor out a splitSwitchCase() function so that it can be reused.
This is in preparation for a fix to llvm.org/PR22262. One of the ideas
here is to first find a good jump table range first and then split
before and after it. Thereby, we don't need to use the
split-based-on-density heuristic at all, which can make the "binary
tree" deteriorate in various cases.

Also some minor cleanups.

No functional changes.

llvm-svn: 226551
2015-01-20 08:57:44 +00:00
Chandler Carruth 10f28f26fd [PM] Replace the Pass argument in MergeBasicBlockIntoOnlyPred with
a DominatorTree argument as that is the analysis that it wants to
update.

This removes the last non-loop utility function in Utils/ which accepts
a raw Pass argument.

llvm-svn: 226537
2015-01-20 01:37:09 +00:00
Adrian Prantl 5883af3faa Remove support for DIVariable's FlagIndirectVariable and expect
frontends to use a DIExpression with a DW_OP_deref instead.

This is not only a much more natural place for this informationl; there
is also a technical reason: The FlagIndirectVariable is used to mark a
variable that is turned into a reference by virtue of the calling
convention; this happens for example to aggregate return values.
The inliner, for example, may actually need to undo this indirection to
correctly represent the value in its new context. This is impossible to
implement because the DIVariable can't be safely modified. We can however
safely construct a new DIExpression on the fly.

llvm-svn: 226476
2015-01-19 17:57:29 +00:00
Rafael Espindola 12ca34f53f Bring r226038 back.
No change in this commit, but clang was changed to also produce trivial comdats when
needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

llvm-svn: 226467
2015-01-19 15:16:06 +00:00
Chandler Carruth 37df2cfbf8 [PM] Remove the Pass argument from all of the critical edge splitting
APIs and replace it and numerous booleans with an option struct.

The critical edge splitting API has a really large surface of flags and
so it seems worth burning a small option struct / builder. This struct
can be constructed with the various preserved analyses and then flags
can be flipped in a builder style.

The various users are now responsible for directly passing along their
analysis information. This should be enough for the critical edge
splitting to work cleanly with the new pass manager as well.

This API is still pretty crufty and could be cleaned up a lot, but I've
focused on this change just threading an option struct rather than
a pass through the API.

llvm-svn: 226456
2015-01-19 12:09:11 +00:00
Michael Kuperstein 54c61edee7 [MIScheduler] Slightly better handling of constrainLocalCopy when both source and dest are local
This fixes PR21792.

Differential Revision: http://reviews.llvm.org/D6823

llvm-svn: 226433
2015-01-19 07:30:47 +00:00
David Blaikie 9459832ebd std::unique_ptrify the MCStreamer argument to createAsmPrinter
llvm-svn: 226414
2015-01-18 20:29:04 +00:00
Mehdi Amini 37f316afaf Improve DAG combine pass on certain IR vector patterns
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.

In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.

The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.

This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.

This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.

I've included a test case.

Radar link: rdar://problem/19287012
Fix for PR 21943.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 226360
2015-01-17 01:35:56 +00:00
Matthias Braun 7618b2b23d RegisterCoalescer: Cleanup and improved comment for a subtle detail.
llvm-svn: 226353
2015-01-17 00:33:13 +00:00
Matthias Braun 0eb940aed0 RegisterCoalescer: Cleanup by factoring out a common expression
llvm-svn: 226352
2015-01-17 00:33:11 +00:00
Matthias Braun e2fa081615 RegisterCoalescer: Cleanup comment style
- Consistenly put comments above the function declaration, not the
  definition. To achieve this some duplicate comments got merged and
  some comment parts describing implementation details got moved into their
  functions.
- Consistently use doxygen comments above functions.
- Do not use doxygen comments inside functions.

llvm-svn: 226351
2015-01-17 00:33:09 +00:00
Matthias Braun fc6ef3a270 RegisterCoalescer: Drive-by typo + whitespace fix
llvm-svn: 226350
2015-01-17 00:33:06 +00:00
Philip Reames 287987ca13 Update a comment
Be a bit more explicit about the fact that addrspace(1) is not reserved.

llvm-svn: 226344
2015-01-16 23:21:07 +00:00
Philip Reames 36319538d0 clang-format all the GC related files (NFC)
Nothing interesting here...

llvm-svn: 226342
2015-01-16 23:16:12 +00:00
Philip Reames 2b45395876 Move ownership of GCStrategy objects to LLVMContext
Note: This change ended up being slightly more controversial than expected.  Chandler has tentatively okayed this for the moment, but I may be revisiting this in the near future after we settle some high level questions.

Rather than have the GCStrategy object owned by the GCModuleInfo - which is an immutable analysis pass used mainly by gc.root - have it be owned by the LLVMContext. This simplifies the ownership logic (i.e. can you have two instances of the same strategy at once?), but more importantly, allows us to access the GCStrategy in the middle end optimizer. To this end, I add an accessor through Function which becomes the canonical way to get at a GCStrategy instance.

In the near future, this will allows me to move some of the checks from http://reviews.llvm.org/D6808 into the Verifier itself, and to introduce optimization legality predicates for some of the recent additions to InstCombine. (These will follow as separate changes.)

Differential Revision: http://reviews.llvm.org/D6811

llvm-svn: 226311
2015-01-16 20:07:33 +00:00
Philip Reames 7de640a876 Remove gc.root's findCustomSafePoints mechanism
Searching all of the existing gc.root implementations I'm aware of (all three of them), there was exactly one use of this mechanism, and that was to implement a performance improvement that should have been applied to the default lowering.

Having this function is requiring a dependency on a CodeGen class (MachineFunction), in a class which is otherwise completely independent of CodeGen. I could solve this differently, but given that I see absolutely no value in preserving this mechanism, I going to just get rid of it.

Note: Tis is the first time I'm intentionally breaking previously supported gc.root functionality. Given 3.6 has branched, I believe this is a good time to do this.

Differential Revision: http://reviews.llvm.org/D7004

llvm-svn: 226305
2015-01-16 19:33:28 +00:00
Timur Iskhodzhanov 60b721363c Revert r226242 - Revert Revert Don't create new comdats in CodeGen
This breaks AddressSanitizer (ninja check-asan) on Windows

llvm-svn: 226251
2015-01-16 08:38:45 +00:00
Rafael Espindola 67a79e72f5 Revert "Revert Don't create new comdats in CodeGen"
This reverts commit r226173, adding r226038 back.

No change in this commit, but clang was changed to also produce trivial comdats for
costructors, destructors and vtables when needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

llvm-svn: 226242
2015-01-16 02:22:55 +00:00
Hal Finkel 5ef58eb86d Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers""
Reapply r226071 with fixes. Two fixes:

 1. We need to manually remove the old and create the new 'deaf defs'
    associated with physical register definitions when we move the definition of
    the physical register from the copy point to the point of the original vreg def.

    This problem was picked up by the machinstr verifier, and could trigger a
    verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've
    turned on the verifier in the tests.

 2. When moving the def point of the phys reg up, we need to make sure that it
    is neither defined nor read in between the two instructions. We don't, however,
    extend the live ranges of phys reg defs to cover uses, so just checking for
    live-range overlap between the pair interval and the phys reg aliases won't
    pick up reads. As a result, we manually iterate over the range and check for
    reads.

    A test soon to be committed to the PowerPC backend will test this change.

Original commit message:

[RegisterCoalescer] Remove copies to reserved registers

This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:

<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>

(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )

Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).

An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.

llvm-svn: 226200
2015-01-15 20:32:09 +00:00
Philip Reames 66c9fb0d52 Style cleanup of old gc.root lowering code
Use static functions for helpers rather than static member functions.  a) this changes the linking (minor at best), and b) this makes it obvious no object state is involved.

llvm-svn: 226198
2015-01-15 19:49:25 +00:00
Philip Reames b87144160e clang-format GCStrategy.cpp & GCRootLowering.cpp (NFC)
llvm-svn: 226196
2015-01-15 19:39:17 +00:00
Philip Reames f27f373895 Split GCStrategy.cpp into two files (NFC)
This preparation for an update to http://reviews.llvm.org/D6811.  GCStrategy.cpp will hopefully be moving into IR/, where as the lowering logic needs to stay in CodeGen/

llvm-svn: 226195
2015-01-15 19:29:42 +00:00
Timur Iskhodzhanov f5adf13fac Revert Don't create new comdats in CodeGen
It breaks AddressSanitizer on Windows.

llvm-svn: 226173
2015-01-15 16:14:34 +00:00
Mehdi Amini fa546b29a0 Fix SelectionDAG -view-*-dags filtering
llvm-svn: 226163
2015-01-15 12:03:32 +00:00
Alexander Kornienko 8c0809c7f8 Replace size method call of containers to empty method where appropriate
This patch was generated by a clang tidy checker that is being open sourced.
The documentation of that checker is the following:

/// The emptiness of a container should be checked using the empty method
/// instead of the size method. It is not guaranteed that size is a
/// constant-time function, and it is generally more efficient and also shows
/// clearer intent to use empty. Furthermore some containers may implement the
/// empty method but not implement the size method. Using empty whenever
/// possible makes it easier to switch to another container in the future.

Patch by Gábor Horváth!

llvm-svn: 226161
2015-01-15 11:41:30 +00:00
Chandler Carruth b98f63dbdb [PM] Separate the TargetLibraryInfo object from the immutable pass.
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.

Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.

llvm-svn: 226157
2015-01-15 10:41:28 +00:00
Hal Finkel dd669615dd Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"
Reverting this while I investigate some bad behavior this is causing. As a
possibly-related issue, adding -verify-machineinstrs to one of the test cases
now fails because of this change:

  llc test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll -march=x86-64 -o - -verify-machineinstrs

*** Bad machine code: No instruction at def index ***
- function:    foo
- basic block: BB#0 return (0x10007e21f10) [0B;736B)
- liverange:   [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78
4r,784d:0)  0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r
- register:    %DS
Valno #3 is defined at 624r

*** Bad machine code: Live segment doesn't end at a valid instruction ***
- function:    foo
- basic block: BB#0 return (0x10007e21f10) [0B;736B)
- liverange:   [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78
4r,784d:0)  0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r
- register:    %DS
[624r,624d:3)
LLVM ERROR: Found 2 machine code errors.

where 624r corresponds exactly to the interval combining change:

624B    %RSP<def> = COPY %vreg16; GR64:%vreg16
        Considering merging %vreg16 with %RSP
                RHS = %vreg16 [608r,624r:0)  0@608r
                updated: 608B   %RSP<def> = MOV64rm <fi#3>, 1, %noreg, 0, %noreg; mem:LD8[%saved_stack.1]
        Success: %vreg16 -> %RSP
        Result = %RSP

llvm-svn: 226086
2015-01-15 03:08:59 +00:00
Chandler Carruth 62d4215baa [PM] Move TargetLibraryInfo into the Analysis library.
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.

This is in preparation for porting this analysis to the new pass
manager.

No functionality changed, and updates inbound for Clang and Polly.

llvm-svn: 226078
2015-01-15 02:16:27 +00:00
NAKAMURA Takumi 95b3880dd0 Win64Exception.cpp: Try to fix crash for x64 EH. "Per" might be null there.
llvm-svn: 226077
2015-01-15 02:15:21 +00:00
Hal Finkel 8299646236 [RegisterCoalescer] Remove copies to reserved registers
This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:

<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>

(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )

Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).

An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.

llvm-svn: 226071
2015-01-15 01:25:28 +00:00
Ramkumar Ramachandra dba7329ebb [GC] CodeGenPrep transform: simplify offsetable relocate
The transform is somewhat involved, but the basic idea is simple: find
derived pointers that have been offset from the base pointer using gep
and replace the relocate of the derived pointer with a gep to the
relocated base pointer (with the same offset).

llvm-svn: 226060
2015-01-14 23:27:07 +00:00
Reid Kleckner e80a0a7572 Use MMI->getPersonality() instead of MMI->getPersonalities()[MMI->getPersonalityIndex()]
Also nuke the comment about supporting multiple personalities in a
single function, aka PR1414. That's just crazy.

llvm-svn: 226052
2015-01-14 22:47:54 +00:00
Matthias Braun 96a319588a MachineVerifier: Allow undef reads if a matching superreg is defined.
Summary:
Some pseudo instruction expansions break down a wide register use into
multiple uses of smaller sub registers. If the super register was
partially undefined the broken down sub registers may be completely
undefined now leading to MachineVerifier complaints. Unfortunately
liveness information to add the required dead flags is not easily
(cheaply) available when expanding pseudo instructions.

This commit changes the verifier to be quiet if there is an additional
implicit use of a super register. Pseudo instruction expanders can use
this to mark cases where partially defined values get potentially broken
into completely undefined ones.

Differential Revision: http://reviews.llvm.org/D6973

llvm-svn: 226047
2015-01-14 22:25:14 +00:00
Rafael Espindola fad1639a12 Don't create new comdats in CodeGen.
This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

llvm-svn: 226038
2015-01-14 20:55:48 +00:00
Chandler Carruth e3288147f0 [MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement.
Some benchmarks have shown that this could lead to a potential
performance benefit, and so adding some flags to try to help measure the
difference.

A possible explanation. In diamond-shaped CFGs (A followed by either
B or C both followed by D), putting B and C both in between A and
D leads to the code being less dense than it could be. Always either
B or C have to be skipped increasing the chance of cache misses etc.
Moving either B or C to after D might be beneficial on average.

In the long run, but we should probably do a better job of analyzing the
basic block and branch probabilities to move the correct one of B or
C to after D. But even if we don't use this in the long run, it is
a good baseline for benchmarking.

Original patch authored by Daniel Jasper with test tweaks and a second
flag added by me.

Differential Revision: http://reviews.llvm.org/D6969

llvm-svn: 226034
2015-01-14 20:19:29 +00:00
Reid Kleckner 9b5eaf0d5a Emit the Itanium LSDA for unknown EH personalities on Win64
This fixes lots of generic CodeGen tests that use __gcc_personality_v0.
This suggests that using ExceptionHandling::MSVC was a mistake, and we
should instead classify each function by personality function. This
would, for example, allow us to LTO a binary containing uses of SEH and
Itanium EH.

llvm-svn: 226019
2015-01-14 18:50:10 +00:00
Reid Kleckner b57c1dc0f7 Remove dead code for llvm.eh.selector in the old EH model
llvm-svn: 226018
2015-01-14 18:49:39 +00:00
Chandler Carruth d9903888d9 [cleanup] Re-sort all the #include lines in LLVM using
utils/sort_includes.py.

I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.

llvm-svn: 225974
2015-01-14 11:23:27 +00:00
Mehdi Amini d8976b8ed3 SelectionDAG: add a -filter-view-dags option to llc
This option takes the name of the basic block you want to visualize
with -view-*-dags

Differential Revision: http://reviews.llvm.org/D6948

llvm-svn: 225953
2015-01-14 06:03:18 +00:00
Mehdi Amini 648eff1695 DAG Combiner: Fold SelectCC When Cond is UNDEF
In case folding a node end up with a NaN as operand for the select, 
the folding of the condition of the selectcc node returns "UNDEF".

Differential Revision: http://reviews.llvm.org/D6889

llvm-svn: 225952
2015-01-14 05:45:24 +00:00
Mehdi Amini 7b068f6ba4 Add assertions for out of bound index in ComputeLinearIndex
llvm-svn: 225951
2015-01-14 05:38:48 +00:00
Mehdi Amini 8923cc5470 Fold a loop for array processing in ComputeLinearIndex
When processing an array, every Elt has the same layout, it is
useless to recursively call each ComputeLinearIndex on each element.
Just do it once and multiply by the number of elements.
    
Differential Revision: http://reviews.llvm.org/D6832

llvm-svn: 225949
2015-01-14 05:33:01 +00:00
JF Bastien eeea8970b4 Revert "Insert random noops to increase security against ROP attacks (llvm)"
This reverts commit:
http://reviews.llvm.org/D3392

llvm-svn: 225948
2015-01-14 05:24:33 +00:00
Matt Arsenault bd22342322 Implement new way of expanding extloads.
Now that the source and destination types can be specified,
allow doing an expansion that doesn't use an EXTLOAD of the
result type. Try to do a legal extload to an intermediate type
and extend that if possible.

This generalizes the special case custom lowering of extloads
R600 has been using to work around this problem.

This also happens to fix a bug that would incorrectly use more
aligned loads than should be used.

llvm-svn: 225925
2015-01-14 01:35:17 +00:00
JF Bastien dcdd5ad252 Insert random noops to increase security against ROP attacks (llvm)
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks.

Command line options:
  -noop-insertion // Enable noop insertion.
  -noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion)
  -max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions.

In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future).
  -fdiversify

This is the llvm part of the patch.
clang part: D3393

http://reviews.llvm.org/D3392
Patch by Stephen Crane (@rinon)

llvm-svn: 225908
2015-01-14 01:07:26 +00:00
Hal Finkel 665026838b Adjust ScheduleDAGSDNodes::RegDefIter for patchpoints
PATCHPOINT is a strange pseudo-instruction. Depending on how it is used, and
whether or not the AnyReg calling convention is being used, it might or might
not define a value. However, its TableGen definition says that it defines one
value, and so when it doesn't, the code in ScheduleDAGSDNodes::RegDefIter
becomes confused and the code that uses the RegDefIter will try to get the
register class of the MVT::Other type associated with the PATCHPOINT's chain
result (under certain circumstances).

This will be covered by the PPC64 PatchPoint test cases once that support is
re-committed.

llvm-svn: 225907
2015-01-14 01:07:03 +00:00
Reid Kleckner 0a57f65514 CodeGen support for x86_64 SEH catch handlers in LLVM
This adds handling for ExceptionHandling::MSVC, used by the
x86_64-pc-windows-msvc triple. It assumes that filter functions have
already been outlined in either the frontend or the backend. Filter
functions are used in place of the landingpad catch clause type info
operands. In catch clause order, the first filter to return true will
catch the exception.

The C specific handler table expects the landing pad to be split into
one block per handler, but LLVM IR uses a single landing pad for all
possible unwind actions. This patch papers over the mismatch by
synthesizing single instruction BBs for every catch clause to fill in
the EH selector that the landing pad block expects.

Missing functionality:
- Accessing data in the parent frame from outlined filters
- Cleanups (from __finally) are unsupported, as they will require
  outlining and parent frame access
- Filter clauses are unsupported, as there's no clear analogue in SEH

In other words, this is the minimal set of changes needed to write IR to
catch arbitrary exceptions and resume normal execution.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6300

llvm-svn: 225904
2015-01-14 01:05:27 +00:00
Adrian Prantl 7813d9c979 Debug Info: Implement DwarfCompileUnit::addComplexAddress() using
DIEDwarfExpression (and get rid of a bunch of redundant code).

NFC

llvm-svn: 225900
2015-01-14 01:01:30 +00:00
Adrian Prantl ad768c3719 Debug Info: Emitting a register in DwarfExpression may fail. Report the
status in a bool and let the users deal with the error.

NFC.

llvm-svn: 225899
2015-01-14 01:01:28 +00:00
Adrian Prantl 658676c3ea Debug Info: Move DIEDwarfExpression into DwarfExpression.h because it
needs to be accessed from both DwarfCompileUnit.cpp and DwarfUnit.cpp.

NFC.

llvm-svn: 225898
2015-01-14 01:01:22 +00:00
Eric Christopher 6e30cd95cb Migrate ABIName to MCTargetOptions so that it can be shared between
the TargetMachine level and the MC level.

llvm-svn: 225891
2015-01-14 00:50:31 +00:00
Adrian Prantl 8efadbf868 Debug Info: Don't bother emitting DW_AT_frame_base if the function has
no frame register. "Tested" via an assertion triggered by DwarfExpression.

llvm-svn: 225858
2015-01-14 00:15:16 +00:00
Adrian Prantl 1411577ad9 Revert "Debug Info: Bail out of AddMachineRegPiece() if MachineReg is not a"
This reverts commit r225852, it was a bad idea.

MachineReg should always be a physical register. If it isn't this DebugLoc
shouldn't have been created in the first place.

llvm-svn: 225857
2015-01-14 00:15:12 +00:00
Adrian Prantl e8e0bac270 Debug Info: Bail out of AddMachineRegPiece() if MachineReg is not a
physical register. The call to getMinimalPhysRegClass() later on asserts
on this condition.

llvm-svn: 225852
2015-01-13 23:39:15 +00:00
Adrian Prantl 092d9489ed Debug Info: Move the complex expression handling (=the remainder) of
emitDebugLocValue() into DwarfExpression.

Ought to be NFC, but it actually uncovered a bug in the debug-loc-asan.ll
testcase. The testcase checks that the address of variable "y" is stored
at [RSP+16], which also lines up with the comment.
It also check(ed) that the *value* of "y" is stored in RDI before that,
but that is actually incorrect, since RDI is the very value that is
stored in [RSP+16]. Here's the assembler output:

	movb	2147450880(%rcx), %r8b
	#DEBUG_VALUE: bar:y <- RDI
	cmpb	$0, %r8b
	movq	%rax, 32(%rsp)          # 8-byte Spill
	movq	%rsi, 24(%rsp)          # 8-byte Spill
	movq	%rdi, 16(%rsp)          # 8-byte Spill
.Ltmp3:
	#DEBUG_VALUE: bar:y <- [RSP+16]

Fixed the comment to spell out the correct register and the check to
expect an address rather than a value.

Note that the range that is emitted for the RDI location was and is still
wrong, it claims to begin at the function prologue, but really it should
start where RDI is first assigned.

llvm-svn: 225851
2015-01-13 23:39:11 +00:00
Adrian Prantl 0a3bfdbd37 cleanup.
llvm-svn: 225848
2015-01-13 23:11:51 +00:00
Adrian Prantl 172ab66a11 Document, cleanup, and clang-format DwarfExpression.h
llvm-svn: 225847
2015-01-13 23:11:07 +00:00
Adrian Prantl 8995f5c92f Debug Info: Turn DIExpression::getFrameRegister() into an isFrameRegister()
function.

NFC.

llvm-svn: 225846
2015-01-13 23:10:43 +00:00
Matthias Braun f50ab43214 DAGCombiner: simplify by using condition variables; NFC
llvm-svn: 225836
2015-01-13 22:17:46 +00:00
Matt Arsenault bf0db918b2 R600: Implement getRecipEstimate
This requires a new hook to prevent expanding sqrt in terms
of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are
all the same rate, so this expansion would just double the number
of instructions and cycles.

llvm-svn: 225828
2015-01-13 20:53:23 +00:00
Hal Finkel c4ee2c5188 [StackMaps] Use CurrentFnSymForSize
When computing the call-site offset, use AP.CurrentFnSymForSize instead of
AP.CurrentFnSym. There should be no change for other targets, but this is
necessary for generating valid expressions for PPC64/ELF.

llvm-svn: 225807
2015-01-13 17:48:07 +00:00
Hal Finkel 0ad96c818c [StackMaps] Mark in CallLoweringInfo when lowering a patchpoint
While, generally speaking, the process of lowering arguments for a patchpoint
is the same as lowering a regular indirect call, on some targets it may not be
exactly the same. Targets may not, for example, want to add additional register
dependencies that apply only to making cross-DSO calls through linker stubs,
may not want to load additional registers out of function descriptors, and may
not want to add additional side-effect-causing instructions that cannot be
removed later with the call itself being generated.

The PowerPC target will use this in a future commit (for all of the reasons
stated above).

llvm-svn: 225806
2015-01-13 17:48:04 +00:00
Hal Finkel df87f9383b [StackMaps] Allow the target to pre-process the live-out mask
Some targets, PowerPC for example, have pseudo-registers (such as that used to
represent the rounding mode), that don't have DWARF register numbers or a
register class. These are used only for internal dependency tracking, and
should not appear in the recorded live-outs. This adds a callback allowing the
target to pre-process the live-out mask in order to remove these kinds of
registers so that the StackMaps code does not complain about them and/or
attempt to include them in the output.

This will be used by the PowerPC target in a future commit.

llvm-svn: 225805
2015-01-13 17:47:59 +00:00
Olivier Sallenave 325096980b Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now guarded with that hook.
llvm-svn: 225795
2015-01-13 15:06:36 +00:00
Mehdi Amini 22e59748ef Peephole opt needs optimizeSelect() to keep track of newly created MIs
Peephole optimizer is scanning a basic block forward. At some point it 
needs to answer the question "given a pointer to an MI in the current 
BB, is it located before or after the current instruction".
To perform this, it keeps a set of the MIs already seen during the scan, 
if a MI is not in the set, it is assumed to be after.
It means that newly created MIs have to be inserted in the set as well.

This commit passes the set as an argument to the target-dependent 
optimizeSelect() so that it can properly update the set with the 
(potentially) newly created MIs.

llvm-svn: 225772
2015-01-13 07:07:13 +00:00
Reid Kleckner 3542ace6ef Rename llvm.recoverframeallocation to llvm.framerecover
This name is less descriptive, but it sort of puts things in the
'llvm.frame...' namespace, relating it to frameallocate and
frameaddress. It also avoids using "allocate" and "allocation" together.

llvm-svn: 225752
2015-01-13 01:51:34 +00:00
Reid Kleckner e9b8931873 Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics
These intrinsics allow multiple functions to share a single stack
allocation from one function's call frame. The function with the
allocation may only perform one allocation, and it must be in the entry
block.

Functions accessing the allocation call llvm.recoverframeallocation with
the function whose frame they are accessing and a frame pointer from an
active call frame of that function.

These intrinsics are very difficult to inline correctly, so the
intention is that they be introduced rarely, or at least very late
during EH preparation.

Reviewers: echristo, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D6493

llvm-svn: 225746
2015-01-13 00:48:10 +00:00
Matt Arsenault a982e4f82b Combine fcmp + select to fminnum / fmaxnum if no nans and legal
Also require unsafe FP math for no since there isn't a way to
test for signed zeros.

llvm-svn: 225744
2015-01-13 00:43:00 +00:00
Adrian Prantl 66f2595845 Debug Info: Move support for constants into DwarfExpression.
Move the declaration of DebugLocDwarfExpression into DwarfExpression.h
because it needs to be accessed from AsmPrinterDwarf.cpp and DwarfDebug.cpp

NFC.

llvm-svn: 225734
2015-01-13 00:04:06 +00:00
Adrian Prantl a4c30d6509 Make DwarfExpression store the AsmPrinter instead of the TargetMachine.
NFC.

llvm-svn: 225731
2015-01-12 23:36:56 +00:00
Adrian Prantl 9cffbd8daa remove extra semicolon
llvm-svn: 225730
2015-01-12 23:36:50 +00:00
Reid Kleckner bba20f06de musttail: Only set the inreg flag for fastcall and vectorcall
Otherwise we'll attempt to forward ECX, EDX, and EAX for cdecl and
stdcall thunks, leaving us with no scratch registers for indirect call
targets.

Fixes PR22052.

llvm-svn: 225729
2015-01-12 23:28:23 +00:00
Adrian Prantl 337e360279 Run clang-format on the parts of AsmPrinterDwarf where it improves the
readability.

llvm-svn: 225726
2015-01-12 23:03:23 +00:00
Adrian Prantl 0fec811d7b Debug Info: Add a virtual destructor to DwarfExpression.
Thanks Chandler for noticing!

llvm-svn: 225724
2015-01-12 22:59:28 +00:00
Adrian Prantl 0d5df0ac1c Untwine this expression. Thanks to David for noticing!
llvm-svn: 225720
2015-01-12 22:39:14 +00:00
Adrian Prantl 0e6ffb9d0d Debug Info: Implement DwarfUnit::addRegisterOpPiece() using DwarfExpression.
NFC.

llvm-svn: 225717
2015-01-12 22:37:16 +00:00
Adrian Prantl 00dbc2a7d3 Debug Info: Implement DwarfUnit::addRegisterOffset using DwarfExpression.
No functional change.

llvm-svn: 225707
2015-01-12 22:19:26 +00:00
Adrian Prantl b16d9ebb0c Debug info: Factor out the creation of DWARF expressions from AsmPrinter
into a new class DwarfExpression that can be shared between AsmPrinter
and DwarfUnit.

This is the first step towards unifying the two entirely redundant
implementations of dwarf expression emission in DwarfUnit and AsmPrinter.

Almost no functional change — Testcases were updated because asm comments
that used to be on two lines now appear on the same line, which is
actually preferable.

llvm-svn: 225706
2015-01-12 22:19:22 +00:00
Matthias Braun f5d931f716 RegisterCoalescer: Turn some impossible conditions into asserts
This is a fixed version of reverted r225500. It fixes the too early
if() continue; of the last patch and adds a comment to the unorthodox
loop.

llvm-svn: 225652
2015-01-12 19:10:17 +00:00
Ahmed Bougacha e03bef7543 [SimplifyLibCalls] Factor out fortified libcall handling.
This lets us remove CGP duplicate.

Differential Revision: http://reviews.llvm.org/D6541

llvm-svn: 225640
2015-01-12 17:22:43 +00:00
Joerg Sonnenberger 8a36a8e5d4 Revert r225500, it leads to infinite loops.
llvm-svn: 225590
2015-01-10 21:49:36 +00:00
Lang Hames 1e923ec122 Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revision
introduced.

A test case for the bug was already committed in r225385.

Patch by Rafael Espindola.

llvm-svn: 225534
2015-01-09 18:55:42 +00:00
Matthias Braun 7e87384592 RegisterCoalescer: Fix removeCopyByCommutingDef with subreg liveness
The code that eliminated additional coalescable copies in
removeCopyByCommutingDef() used MergeValueNumberInto() which internally
may merge A into B or B into A. In this case A and B had different Def
points, so we have to reset ValNo.Def to the intended one after merging.

llvm-svn: 225503
2015-01-09 03:01:31 +00:00
Matthias Braun ea399e59cf RegisterCoalescer: Some cleanup in removeCopyByCommutingDef(), NFC
llvm-svn: 225502
2015-01-09 03:01:28 +00:00
Matthias Braun 55586a2f2d RegisterCoalescer: No need to set kill flags, they are recompute later anyway
llvm-svn: 225501
2015-01-09 03:01:26 +00:00
Matthias Braun 6588b145fc RegisterCoalescer: Turn some impossible conditions into asserts
llvm-svn: 225500
2015-01-09 03:01:23 +00:00
Hal Finkel 0ce7f372e5 [DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities)
As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA,
we need to extend the incoming operands so that the resulting node will really
be legal. This is currently enabled only for PowerPC, and it happens to work
there regardless, but this should fix the functionality for everyone else
should anyone else wish to use it.

llvm-svn: 225492
2015-01-09 01:29:29 +00:00
Hal Finkel 33ead6f901 Partial fix to r225380 (More FMA folding opportunities)
As pointed out by Aditya (and Owen), there are two things wrong with this code.
First, it adds patterns which elide FP extends when forming FMAs, and that might
not be profitable on all targets (it belongs behind the pre-existing
aggressive-FMA-formation flag). This is fixed by this change.

Second, the resulting nodes might have operands of different types (the
extensions need to be re-added). That will be fixed in the follow-up commit.

llvm-svn: 225485
2015-01-09 00:45:54 +00:00
Hal Finkel 0709f5160f [MachineLICM] A command-line option to hoist even cheap instructions
Add a command-line option to enable hoisting even cheap instructions (in
low-register-pressure situations). This is turned off by default, but has
proved useful for testing purposes.

llvm-svn: 225470
2015-01-08 22:10:48 +00:00
Duncan P. N. Exon Smith e90f1165d8 CodeGen: Use handy new-fangled post-increment, NFC
Drive-by cleanup; I noticed this when reviewing the patch that became
r225466.

llvm-svn: 225468
2015-01-08 21:07:55 +00:00
Duncan P. N. Exon Smith 5914a97af8 CodeGen: Use range-based for loops, NFC
Patch by Ramkumar Ramachandra!

llvm-svn: 225466
2015-01-08 20:44:33 +00:00
Elena Demikhovsky 285fbd551a Masked Load/Store - fixed a bug in type legalization.
llvm-svn: 225441
2015-01-08 12:29:19 +00:00
Michael Kuperstein 698ea3b488 Fix include ordering, NFC.
llvm-svn: 225439
2015-01-08 11:59:43 +00:00
Michael Kuperstein 8c65e31a5a Move SPAdj logic from PEI into the targets (NFC)
PEI tries to keep track of how much starting or ending a call sequence adjusts the stack pointer by, so that it can resolve frame-index references. Currently, it takes a very simplistic view of how SP adjustments are done - both FrameStartOpcode and FrameDestroyOpcode adjust it exactly by the amount written in its first argument.

This view is in fact incorrect for some targets (e.g. due to stack re-alignment, or because it may want to adjust the stack pointer in multiple steps). However, that doesn't cause breakage, because most targets (the only in-tree exception appears to be 32-bit ARM) rely on being able to simplify the call frame pseudo-instructions earlier, so this code is never hit. 

Moving the computation into TargetInstrInfo allows targets to override the way the adjustment is computed if they need to have a non-zero SPAdj.

Differential Revision: http://reviews.llvm.org/D6863

llvm-svn: 225437
2015-01-08 11:04:38 +00:00
Quentin Colombet a799e2e014 [RegAllocGreedy] Introduce a late pass to repair broken hints.
A broken hint is a copy where both ends are assigned different colors. When a
variable gets evicted in the neighborhood of such copies, it is likely we can
reconcile some of them.


** Context **

Copies are inserted during the register allocation via splitting. These split
points are required to relax the constraints on the allocation problem. When
such a point is inserted, both ends of the copy would not share the same color
with respect to the current allocation problem. When variables get evicted,
the allocation problem becomes different and some split point may not be
required anymore. However, the related variables may already have been colored.

This usually shows up in the assembly with pattern like this:
def A
...
save A to B
def A
use A
restore A from B
...
use B

Whereas we could simply have done:
def B
...
def A
use A
...
use B


** Proposed Solution **

A variable having a broken hint is marked for late recoloring if and only if
selecting a register for it evict another variable. Indeed, if no eviction
happens this is pointless to look for recoloring opportunities as it means the
situation was the same as the initial allocation problem where we had to break
the hint.

Finally, when everything has been allocated, we look for recoloring
opportunities for all the identified candidates.
The recoloring is performed very late to rely on accurate copy cost (all
involved variables are allocated).
The recoloring is simple unlike the last change recoloring. It propagates the
color of the broken hint to all its copy-related variables. If the color is
available for them, the recoloring uses it, otherwise it gives up on that hint
even if a more complex coloring would have worked.

The recoloring happens only if it is profitable. The profitability is evaluated
using the expected frequency of the copies of the currently recolored variable
with a) its current color and b) with the target color. If a) is greater or
equal than b), then it is profitable and the recoloring happen.


** Example **

Consider the following example:
BB1:
  a =
  b =
BB2:
  ...
   = b
   = a
Let us assume b gets split:
BB1:
  a =
  b =
BB2:
  c = b
  ...
  d = c
  = d
  = a
Because of how the allocation work, b, c, and d may be assigned different
colors. Now, if a gets evicted to make room for c, assuming b and d were
assigned to something different than a.
We end up with:
BB1:
  a =
  st a, SpillSlot
  b =
BB2:
  c = b
  ...
  d = c
  = d
  e = ld SpillSlot
  = e
This is likely that we can assign the same register for b, c, and d,
getting rid of 2 copies.


** Performances **

Both ARM64 and x86_64 show performance improvements of up to 3% for the
llvm-testsuite + externals with Os and O3. There are a few regressions too that
comes from the (in)accuracy of the block frequency estimate.

<rdar://problem/18312047>

llvm-svn: 225422
2015-01-08 01:16:39 +00:00
Ahmed Bougacha 2b6917b020 [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532

llvm-svn: 225421
2015-01-08 00:51:32 +00:00
Matthias Braun 9d7bc0874c RegisterCoalescer: Do not remove IMPLICIT_DEFS if they are required for subranges.
The register coalescer used to remove implicit_defs when they are
covered by the main range anyway. With subreg liveness tracking we can't
do that anymore in places where the IMPLICIT_DEF is required as begin of
a subregister liverange.

llvm-svn: 225416
2015-01-08 00:21:23 +00:00
Matthias Braun d55e6ddacf RegisterCoalescer: Fix valuesIdentical() in some subrange merge cases.
I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a
subregister index in SrcReg/DstReg, but they are actually subregister
indices of the coalesced register that get you back to SrcReg/DstReg
when applied.

Fixed the bug, improved comments and simplified code accordingly.

Testcase by Tom Stellard!

llvm-svn: 225415
2015-01-07 23:58:38 +00:00
Matthias Braun 4fe686af00 LiveInterval: Implement feedback by Quentin Colombet.
llvm-svn: 225413
2015-01-07 23:35:11 +00:00
Adrian Prantl d88af278b9 Update a comment.
llvm-svn: 225399
2015-01-07 21:35:13 +00:00
Ahmed Bougacha 67dd2d25a3 [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.
A few loops do trickier things than just iterating on an MVT subset,
so I'll leave them be for now.
Follow-up of r225387.

llvm-svn: 225392
2015-01-07 21:27:10 +00:00
Olivier Sallenave 0451532996 More FMA folding opportunities.
llvm-svn: 225380
2015-01-07 20:54:17 +00:00
Adrian Prantl 3dd48c6fde Debug info: Allow aggregate types to be described by constants.
llvm-svn: 225378
2015-01-07 20:48:58 +00:00
Olivier Sallenave e64ad7cedd Test commit
llvm-svn: 225368
2015-01-07 19:45:17 +00:00
Philip Reames 352fb93773 Add a missing file from 225365
llvm-svn: 225366
2015-01-07 19:13:28 +00:00
Philip Reames 4ac17a3026 Introduce an example statepoint GC strategy
This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.)

Most of the validation logic added here as proof of concept will soon move in to the Verifier.  That move is dependent on http://reviews.llvm.org/D6811

There was discussion in the review thread about addrspace(1) being reserved for something.  I'm going to follow up on a seperate llvmdev thread.  If needed, I'll update all the code at once.

Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others.

Differential Revision: http://reviews.llvm.org/D6808

llvm-svn: 225365
2015-01-07 19:07:50 +00:00
Jonas Paulsson fcf0cba88c New method SDep::isNormalMemoryOrBarrier() in ScheduleDAGInstrs.cpp.
Used to iterate over previously added memory dependencies in
adjustChainDeps() and iterateChainSucc().

SDep::isCtrl() was previously used in these places, that also gave
anti and output edges. The code may be worse if these are followed,
because MisNeedChainEdge() will conservatively return true since a
non-memory instruction has no memory operands, and a false chain dep
will be added. It is also unnecessary since all memory accesses of
interest will be reached by memory dependencies, and there is a budget
limit for the number of edges traversed.

This problem was found on an out-of-tree target with enabled alias
analysis. No test case for an in-tree target has been found.

Reviewed by Hal Finkel.

llvm-svn: 225351
2015-01-07 13:38:29 +00:00
Jonas Paulsson bf408bbe38 Fix typos in comment and option help texts.
For -enable-aa-sched-mi and -use-tbaa-in-sched-mi.

llvm-svn: 225350
2015-01-07 13:20:57 +00:00
Lang Hames 66f755f84f Revert r224935 "Refactor duplicated code. No intended functionality change."
This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin.
Reverting to get the bots green while I track down the source of the changed
behavior.

llvm-svn: 225311
2015-01-06 23:04:36 +00:00
Mehdi Amini f3721bf619 SelectionDAGBuilder: move constant initialization out of loop
No semantic change intended.

Reviewers: resistor

Differential Revision: http://reviews.llvm.org/D6834

llvm-svn: 225278
2015-01-06 18:20:04 +00:00
Andrea Di Biagio f807a6f297 [CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz.
This patch improves the logic added at revision 224899 (see review D6728) that
teaches the backend when it is profitable to speculate calls to cttz/ctlz.

The original algorithm conservatively avoided speculating more than one
instruction from a basic block in a control flow grap modelling an if-statement.
In particular, the only allowed instruction (excluding the terminator) was a
call to cttz/ctlz. However, there are cases where we could be less conservative
and still be able to speculate a call to cttz/ctlz.

With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the
result is zero extended/truncated in the same basic block, and the zext/trunc
instruction is "free" for the target.

Added new test cases to CodeGen/X86/cttz-ctlz.ll

Differential Revision: http://reviews.llvm.org/D6853

llvm-svn: 225274
2015-01-06 17:41:18 +00:00
Frederic Riss e541e0b327 Make DIE.h a public CodeGen header.
dsymutil would like to use all the AsmPrinter/MCStreamer infrastructure
to stream out the DWARF. In order to do so, it will reuse the DIE object
and so this header needs to be public.

The interface exposed here has some corners that cannot be used without a
DwarfDebug object, but clients that want to stream Dwarf can just avoid
these.

Differential Revision: http://reviews.llvm.org/D6695

llvm-svn: 225208
2015-01-05 21:29:41 +00:00
Craig Topper d3c02f177a Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert.
llvm-svn: 225160
2015-01-05 10:15:49 +00:00
Hal Finkel 5772566ed6 [PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference
The existing code provided for specifying a global loop alignment preference.
However, the preferred loop alignment might depend on the loop itself. For
recent POWER cores, loops between 5 and 8 instructions should have 32-byte
alignment (while the others are better with 16-byte alignment) so that the
entire loop will fit in one i-cache line.

To support this, getPrefLoopAlignment has been made virtual, and can be
provided with an optional MachineLoop* so the target can inspect the loop
before answering the query. The default behavior, as before, is to return the
value set with setPrefLoopAlignment. MachineBlockPlacement now queries the
target for each loop instead of only once per function. There should be no
functional change for other targets.

llvm-svn: 225117
2015-01-03 17:58:24 +00:00
Alexey Samsonov 553185ee4b Revert "merge consecutive stores of extracted vector elements"
This reverts commit r224611. This change causes crashes
in X86 DAG->DAG Instruction Selection.

llvm-svn: 225031
2014-12-31 00:40:28 +00:00
David Blaikie aeaa5bf55e DebugInfo: Omit is_stmt from line table entries on the same line.
GCC does this for non-zero discriminators and since GCC doesn't produce
column info, that was the only place it comes up there. For LLVM, since
we can emit discriminators and/or column info, it makes more sense to
invert the condition and just test for changes in line number.

This should resolve at least some of the GDB 7.5 test suite failures
created by recent Clang changes that increase the location fidelity
(which, since Clang defaults to including column info on Linux by
default created a bunch of cases that confused GDB).

In theory we could do this better/differently by grouping actual source
statements together in a similar manner to the way lexical scopes are
handled but given that GDB isn't really in a position to consume that (&
users are probably somewhat used to different lines being different
'statements') this seems the safest and cheapest change. (I'm concerned
that doing this 'right' would bloat the debugloc data even further -
something Duncan's working hard to address)

llvm-svn: 225011
2014-12-30 22:47:13 +00:00
Peter Collingbourne 7ef497b1f5 x86_64: Fix calls to __morestack under the large code model.
Under the large code model, we cannot assume that __morestack lives within
2^31 bytes of the call site, so we cannot use pc-relative addressing. We
cannot perform the call via a temporary register, as the rax register may
be used to store the static chain, and all other suitable registers may be
either callee-save or used for parameter passing. We cannot use the stack
at this point either because __morestack manipulates the stack directly.

To avoid these issues, perform an indirect call via a read-only memory
location containing the address.

This solution is not perfect, as it assumes that the .rodata section
is laid out within 2^31 bytes of each function body, but this seems to
be sufficient for JIT.

Differential Revision: http://reviews.llvm.org/D6787

llvm-svn: 225003
2014-12-30 20:05:19 +00:00
Michael Kuperstein c43b063358 [COFF] Don't try to add quotes to already quoted linker directives
If a linker directive is already quoted, don't try to quote it again, otherwise it creates a mess.
This pops up in places like:
#pragma comment(linker,"\"/foo bar'\"")

Differential Revision: http://reviews.llvm.org/D6792

llvm-svn: 224998
2014-12-30 19:23:48 +00:00
Rafael Espindola bed67f3adc Refactor duplicated code.
No intended functionality change.

llvm-svn: 224935
2014-12-29 15:18:31 +00:00
Andrea Di Biagio 22ee3f63b9 [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz.
If the control flow is modelling an if-statement where the only instruction in
the 'then' basic block (excluding the terminator) is a call to cttz/ctlz,
CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control
flow graph.

Example:
\code
entry:
  %cmp = icmp eq i64 %val, 0
  br i1 %cmp, label %end.bb, label %then.bb

then.bb:
  %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
  br label %end.bb

end.bb:
  %cond = phi i64 [ %c, %then.bb ], [ 64, %entry]
\code

In this example, basic block %then.bb is taken if value %val is not zero.
Also, the phi node in %end.bb would propagate the size-of in bits of %val
only if %val is equal to zero.

With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb
into basic block %entry only if cttz is cheap to speculate for the target.

Added two new hooks in TargetLowering.h to let targets customize the behavior
(i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The
two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'.
By default, both methods return 'false'.
On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has
LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI.

Differential Revision: http://reviews.llvm.org/D6728

llvm-svn: 224899
2014-12-28 11:07:35 +00:00
Elena Demikhovsky 87700a734d Scalarizer for masked load and store intrinsics.
Masked vector intrinsics are a part of common LLVM IR, but they are really supported on AVX2 and AVX-512 targets. I added a code that translates masked intrinsic for all other targets. The masked vector intrinsic is converted to a chain of scalar operations inside conditional basic blocks.

http://reviews.llvm.org/D6436

llvm-svn: 224897
2014-12-28 08:54:45 +00:00
Timur Iskhodzhanov b6fa52f274 Band-aid fix for PR22032: don't emit DWARF debug info if AddressSanitizer is enabled on Windows
llvm-svn: 224860
2014-12-26 17:00:51 +00:00
David Majnemer 25b383ac66 Silence GCC's -Wparentheses warning
No functionality change intended.

llvm-svn: 224833
2014-12-25 10:03:23 +00:00
Elena Demikhovsky fb81b93e17 Masked Load/Store - Changed the order of parameters in intrinsics.
No functional changes.
The documentation is coming.

llvm-svn: 224829
2014-12-25 07:49:20 +00:00
David Majnemer 2913eca4e2 CodeGen: Error on redefinitions instead of asserting
It's possible to have a prior definition of a symbol in module asm.
Raise an error instead of crashing.

llvm-svn: 224828
2014-12-24 23:06:55 +00:00
David Majnemer 8e92dfee20 CodeGen: Allow aliases to be overridden by variables
llvm-svn: 224827
2014-12-24 22:44:29 +00:00
David Majnemer 58cb80c940 MC: Label definitions are permitted after .set directives
.set directives may be overridden by other .set directives as well as
label definitions.

This fixes PR22019.

llvm-svn: 224811
2014-12-24 10:27:50 +00:00
Matthias Braun 51ca510094 LiveInterval: Remove accidentally committed debug code.
llvm-svn: 224807
2014-12-24 02:35:07 +00:00
Matthias Braun dbcca0dbb4 LiveInterval: Introduce createMainRangeFromSubranges().
This function constructs the main liverange by merging all subranges if
subregister liveness tracking is available. This should be slightly
faster to compute instead of performing the liveness calculation again
for the main range. More importantly it avoids cases where the main
liverange would cover positions where no subrange was live. These cases
happened for partial definitions where the actual defined part was dead
and only the undefined parts used later.

The register coalescing requires that every part covered by the main
live range has at least one subrange live.

I also expect this function to become usefull later for places where the
subranges are modified in a way that it is hard to correctly fix the
main liverange in the machine scheduler, we can simply reconstruct it
from subranges then.

llvm-svn: 224806
2014-12-24 02:11:51 +00:00
Matthias Braun 7030dda8d5 RegisterCoalescer: With subrange liveness there may be no RedefVNI for unused lanes.
llvm-svn: 224805
2014-12-24 02:11:48 +00:00
Matthias Braun 36768c684f LiveRangeEdit: Check for completely empy subranges after removing ValNos.
Completely empty subranges are not allowed and must be removed when
subreg liveness is enabled.

llvm-svn: 224804
2014-12-24 02:11:46 +00:00
Matthias Braun f603c88d13 LiveIntervalAnalysis: Fix performance bug that I introduced in r224663.
Without a reference the code did not remember when moving the iterators
of the subranges/registerunit ranges forward and instead would scan from
the beginning again at the next position.

llvm-svn: 224803
2014-12-24 02:11:43 +00:00
Adrian Prantl 3026a54aa2 Debug Info: In symmetry to DW_TAG_pointer_type, do not emit the byte size
of a DW_TAG_ptr_to_member_type.
This restores the behavior from before r224780-r224781.

llvm-svn: 224799
2014-12-24 01:17:51 +00:00
Mehdi Amini d38920891e Always assert in DAGCombine and not only when -debug is enabled
Right now in DAG Combine check the validity of the returned type 
only when -debug is given on the command line. However usually 
the test cases in the validation does not use -debug. 
An Assert build should always check this.

llvm-svn: 224779
2014-12-23 18:59:02 +00:00
Michael Kuperstein f4536ea6e8 [DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements
This partially fixes PR21943.

For AVX, we go from:

vmovq   (%rsi), %xmm0
vmovq   (%rdi), %xmm1
vpermilps       $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps       $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps       $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps       $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps       $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

To the expected:

vmovq   (%rdi), %xmm0
vmovhpd (%rsi), %xmm0, %xmm0
retq

Fixing this for AVX2 is still open.

Differential Revision: http://reviews.llvm.org/D6749

llvm-svn: 224759
2014-12-23 08:59:45 +00:00
Reid Kleckner ce0093344f Make musttail more robust for vector types on x86
Previously I tried to plug musttail into the existing vararg lowering
code. That turned out to be a mistake, because non-vararg calls use
significantly different register lowering, even on x86. For example, AVX
vectors are usually passed in registers to normal functions and memory
to vararg functions.  Now musttail uses a completely separate lowering.

Hopefully this can be used as the basis for non-x86 perfect forwarding.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6156

llvm-svn: 224745
2014-12-22 23:58:37 +00:00
Quentin Colombet 84f89ccd45 [CodeGenPrepare] Handle properly the promotion of operands when this does not
generate instructions.

Fixes PR21978.
Related to <rdar://problem/18310086>

llvm-svn: 224717
2014-12-22 18:11:52 +00:00
Rafael Espindola 366e5c1bf1 The leak detector is dead, long live asan and valgrind.
In resent times asan and valgrind have found way more memory management bugs
in llvm than the special purpose leak detector.

llvm-svn: 224703
2014-12-22 13:00:36 +00:00
Saleem Abdulrasool 90c224a143 CodeGen: minor style tweaks to SSP
Clean up some style related things in the StackProtector CodeGen.  NFC.

llvm-svn: 224693
2014-12-21 21:52:38 +00:00
Matt Arsenault 22b4c256e1 Enable (sext x) == C --> x == (trunc C) combine
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.

llvm-svn: 224691
2014-12-21 16:48:42 +00:00
Saleem Abdulrasool 57b5fe57e3 CodeGen: constify and use range loop for SSP
Use range-based for loop and constify the iterators.  NFC.

llvm-svn: 224683
2014-12-20 21:37:51 +00:00
Matthias Braun 714c494ca1 LiveIntervalAnalysis: No kill flags for partially undefined uses.
We must not add kill flags when reading a vreg with some undefined
subregisters, if subreg liveness tracking is enabled.  This is because
the register allocator may reuse these undefined subregisters for other
values which are not killed.

llvm-svn: 224664
2014-12-20 01:54:50 +00:00
Matthias Braun 7f8dece1d7 LiveIntervalAnalysis: cleanup addKills(), NFC
- Use more const modifiers
- Use references for things that can't be nullptr
- Improve some variable names

llvm-svn: 224663
2014-12-20 01:54:48 +00:00
Reid Kleckner f2acbbaf22 EH: Sink computation of local PadMap variable into function that uses it
No functionality change.

llvm-svn: 224635
2014-12-19 22:30:08 +00:00
Reid Kleckner 93acac6cfc Add the ExceptionHandling::MSVC enumeration
It is intended to be used for a family of personality functions that
have similar IR preparation requirements. Typically when interoperating
with MSVC personality functions, bits of functionality need to be
outlined from the main function into helper functions. There is also
usually more than one landing pad per invoke, which does not match the
LLVM IR landingpad representation.

None of this is implemented yet. This change just adds a new enum that
is active for *-windows-msvc and delegates to the EH removal preparation
pass.  No functionality change for other targets.

llvm-svn: 224625
2014-12-19 22:19:48 +00:00
Sanjay Patel 0428a5786e merge consecutive stores of extracted vector elements
Add a path to DAGCombiner::MergeConsecutiveStores() 
to combine multiple scalar stores when the store operands
are extracted vector elements. This is a partial fix for
PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

For the new test case, codegen improves from:

   vmovss  %xmm0, (%rdi)
   vextractps      $1, %xmm0, 4(%rdi)
   vextractps      $2, %xmm0, 8(%rdi)
   vextractps      $3, %xmm0, 12(%rdi)
   vextractf128    $1, %ymm0, %xmm0
   vmovss  %xmm0, 16(%rdi)
   vextractps      $1, %xmm0, 20(%rdi)
   vextractps      $2, %xmm0, 24(%rdi)
   vextractps      $3, %xmm0, 28(%rdi)
   vzeroupper
   retq

To:

   vmovups	%ymm0, (%rdi)
   vzeroupper
   retq

Patch reviewed by Nadav Rotem.

Differential Revision: http://reviews.llvm.org/D6698

llvm-svn: 224611
2014-12-19 20:23:41 +00:00
Matthias Braun aeb50b3805 RegisterCoalescer: rewrite eliminateUndefCopy().
This also fixes problems with undef copies of subregisters. I can't
attach a testcase for that as none of the targets in trunk has
subregister liveness tracking enabled.

llvm-svn: 224560
2014-12-19 01:39:46 +00:00
Adrian Prantl cf44e7870b Explain why LLVM is emitting a DW_AT_containing_type inside of a class.
llvm-svn: 224555
2014-12-19 00:01:20 +00:00
Matthias Braun 15abf3743c LiveIntervalAnalysis: Cleanup computeDeadValues
- This also fixes a bug introduced in r223880 where values were not
  correctly marked as Dead anymore.
- Cleanup computeDeadValues(): split up SubRange code variant, simplify
  arguments.

llvm-svn: 224538
2014-12-18 19:58:52 +00:00
Eric Christopher 661f2d1ca1 Add a new string member to the TargetOptions struct for the name
of the abi we should be using. For targets that don't use the
option there's no change, otherwise this allows external users
to set the ABI via string and avoid some of the -backend-option
pain in clang.

Use this option to move the ABI for the ARM port from the
Subtarget to the TargetMachine and update the testcases
accordingly since it's no longer valid to set via -mattr.

llvm-svn: 224492
2014-12-18 02:20:58 +00:00
Matthias Braun 0a410f6243 RegisterCoalescer: Fix stripCopies() picking up main range instead of subregister range
This fixes a problem where stripCopies() would switch to values in the
main liverange when it crossed a copy instruction. However when joining
subranges we need to stay in the respective subregister ranges.

llvm-svn: 224461
2014-12-17 21:25:20 +00:00
Matthias Braun 8142efa8ea ExecutionDepsFix: Correctly handle wide registers.
The ExecutionDepsFix previously mapped each register to 1 or zero
registers of the register class it was called with and therefore
simulating liveness for.  This was problematic for cases involving wider
registers like Q0 on ARM where ExecutionDepsFix gets invoked for the Dxx
registers. In these cases the wide register would get mapped to the last
matching D register, while it should have been all matching D registers.
This commit changes the AliasMap to use a SmallVector to map registers
to potentially multiple destination regclass registers. This is required
to avoid regressions with subregister liveness tracking enabled.

llvm-svn: 224447
2014-12-17 19:13:47 +00:00
Michael Kuperstein 047b1a0400 [DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.

This fixes PR15872 and provides a partial fix for PR21711.

Differential Revision: http://reviews.llvm.org/D6678

llvm-svn: 224429
2014-12-17 12:32:17 +00:00
Toma Tabacu a23f13c3b0 [mips] Set GCC-compatible MIPS asssembler options before inline asm blocks.
Summary:
When generating MIPS assembly, LLVM always overrides the default assembler options by emitting the '.set noreorder', '.set nomacro' and '.set noat' directives,
while GCC uses the default options if an assembly-level function contains inline assembly code.

This becomes a problem when the code generated by LLVM is interleaved with inline assembly which assumes GCC-like assembler options (from Linux, for example).

This patch fixes these conflicts by setting the appropriate assembler options at the beginning of an inline asm block and popping them at the end.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6637

llvm-svn: 224425
2014-12-17 10:56:16 +00:00
Matthias Braun f4a72cd06e RegisterCoalescer: Sprinkle some const modifiers.
llvm-svn: 224409
2014-12-17 02:18:13 +00:00
Quentin Colombet fc2201e922 [CodeGenPrepare] Reapply r224351 with a fix for the assertion failure:
The type promotion helper does not support vector type, so when make
such it does not kick in in such cases.

Original commit message:
[CodeGenPrepare] Move sign/zero extensions near loads using type promotion.

This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>

llvm-svn: 224402
2014-12-17 01:36:17 +00:00
David Blaikie 8b979f01c6 PR21875: codegen for non-type template parameters of nullptr_t type
llvm-svn: 224399
2014-12-17 00:43:22 +00:00
Reid Kleckner 04b69f89aa Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion."
This reverts commit r224351. It causes assertion failures when building
ICU.

llvm-svn: 224397
2014-12-17 00:29:23 +00:00
Hans Wennborg 224cb82a39 SelectionDAG switch lowering: use 'unsigned' to count destination popularity
SwitchInst::getNumCases() returns unsinged, so using uint64_t to count cases
seems unnecessary.

Also fix a missing CHECK in the test case.

llvm-svn: 224393
2014-12-16 23:41:59 +00:00
Sanjay Patel 7129c10cae merge consecutive loads that are offset from a base address
SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads
when the first load was offset from a base address. 

This patch recognizes that pattern and subtracts the offset before comparing
the second load to see if it is consecutive.

The codegen change in the new test case improves from:

vmovsd	32(%rdi), %xmm0
vmovsd	48(%rdi), %xmm1 
vmovhpd	56(%rdi), %xmm1, %xmm1
vmovhpd	40(%rdi), %xmm0, %xmm0
vinsertf128	$1, %xmm1, %ymm0, %ymm0

To:

vmovups	32(%rdi), %ymm0

An existing test case is also improved from:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovsd	24(%rdi), %xmm2
vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
vmovhpd	8(%rdi), %xmm1, %xmm3

To:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovhpd	24(%rdi), %xmm0, %xmm0
vmovhpd	8(%rdi), %xmm1, %xmm1

This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ).

Differential Revision: http://reviews.llvm.org/D6642

llvm-svn: 224379
2014-12-16 21:57:18 +00:00
Matt Arsenault dd3b77d64c Move lowerConstant to AsmPrinter
This was a static function before, and NVPTX duplicated it
because it wasn't exposed.

llvm-svn: 224354
2014-12-16 19:16:14 +00:00
Quentin Colombet d5e57b731f [CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>

llvm-svn: 224351
2014-12-16 19:09:03 +00:00
Aaron Ballman 0d6a010c13 Fixing -Wsign-compare warnings; NFC.
llvm-svn: 224337
2014-12-16 14:04:11 +00:00
Matthias Braun 1aed6ffa35 LiveRangeCalc: Rewrite subrange calculation
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)

llvm-svn: 224313
2014-12-16 04:03:38 +00:00
Adrian Prantl b9fa945d51 ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.

llvm-svn: 224294
2014-12-16 00:20:49 +00:00
Matthias Braun c3a72c2e5f Revert "LiveRangeCalc: Rewrite subrange calculation"
Revert until I find out why non-subreg enabled targets break.

This reverts commit 6097277eefb9c5fb35a7f493c783ee1fd1b9d6a7.

llvm-svn: 224278
2014-12-15 21:36:35 +00:00
Matthias Braun 0352201e15 LiveRangeCalc: Rewrite subrange calculation
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)

llvm-svn: 224272
2014-12-15 21:16:21 +00:00
Matthias Braun 42fab34ffb LiveRangeCalc: use more range based for loops; NFC
llvm-svn: 224263
2014-12-15 19:40:46 +00:00
Michael Ilseman addddc441f Silence more static analyzer warnings.
Add in definedness checks for shift operators, null checks when
pointers are assumed by the code to be non-null, and explicit
unreachables.

llvm-svn: 224255
2014-12-15 18:48:43 +00:00
Akira Hatanaka 7ba78302b5 Rename argument strings of codegen passes to avoid collisions with command line
options.

This commit changes the command line arguments (PassInfo::PassArgument) of two
passes, MachineFunctionPrinter and MachineScheduler, to avoid collisions with
command line options that have the same argument strings.

This bug manifests when the PassList construct (defined in opt.cpp) is used
in a tool that links with codegen passes. To reproduce the bug, paste the
following lines into llc.cpp and run llc.

#include "llvm/IR/LegacyPassNameParser.h"
static llvm:🆑:list<const llvm::PassInfo*, bool, llvm::PassNameParser>
PassList(llvm:🆑:desc("Optimizations available:"));

rdar://problem/19212448

llvm-svn: 224186
2014-12-13 04:52:04 +00:00
Andrea Di Biagio d65fd9facd Reapply "[MachineScheduler] Fix for PR21807: minor code difference building with/without -g."
This reapplies r224118 with a fix for test 'misched-code-difference-with-debug.ll'.
That test was failing on some buildbots because it was x86 specific but it was
missing a target triple.
Added an explicit triple to test misched-code-difference-with-debug.ll.

llvm-svn: 224126
2014-12-12 15:09:58 +00:00
Andrea Di Biagio 5634a54efc Revert: [MachineScheduler] Fix for PR21807: minor code difference building with/without -g.
Test 'misched-code-difference-with-debug.ll' was failing on some buildbots.

llvm-svn: 224121
2014-12-12 13:34:03 +00:00
Andrea Di Biagio 01236e3eca [MachineScheduler] Fix for PR21807: minor code difference building with/without -g.
This patch fixes the issue reported as PR21807. There was a minor difference
in the generated code depending on the -g flag.

The cause was that with -g the machine scheduler used a different
scheduling strategy. This decision was based on the number of instructions
in a schedule region and included debug instructions in that count.

This patch fixes the issue in MISched and provides a test.

Patch by Russell Gallop!

llvm-svn: 224118
2014-12-12 12:41:22 +00:00
Ekaterina Romanova 90ff20d8f5 A fix for PR21176.
DW_OP_const <const> doesn't describe a constant value, but a value at a constant address. 
The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value. 
Added DW_OP_stack_value to the stack. 

Marked incorrect-variable-debugloc1.ll to xfail for PowerPC64, while the the failure (PR21881) 
is being investigated. 

llvm-svn: 224098
2014-12-12 05:11:47 +00:00
Philip Reames 60de8b29f7 Comment and minor code cleanup for GCStrategy (NFC)
Updating comments to reflect the current state of the world after my recent changes to ownership structure and generally better describe what a GCStrategy is and how it works.

llvm-svn: 224086
2014-12-12 00:49:03 +00:00
Matt Arsenault 810cb62962 Add target hook for whether it is profitable to reduce load widths
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.

llvm-svn: 224084
2014-12-12 00:00:24 +00:00
Duncan P. N. Exon Smith d6f8e4b03c CodeGen: Stop using LeakDetector for MachineInstr
Since `MachineInstr` is required to have a trivial destructor, it cannot
remove itself from `LeakDetection`.  Remove the calls.

As it happens, this requirement is because `MachineFunction` allocates
all `MachineInstr`s in a custom allocator; when the `MachineFunction` is
destroyed they're dropped of the edge.  There's no benefit to detecting
leaks.

llvm-svn: 224061
2014-12-11 21:51:37 +00:00
Matthias Braun 7e37a5f523 [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

This is the 2nd attempt at this after realizing that PassManager::add() may
actually delete the pass.

llvm-svn: 224059
2014-12-11 21:26:47 +00:00
Rafael Espindola 01c73610d0 This reverts commit r224043 and r224042.
check-llvm was failing.

llvm-svn: 224045
2014-12-11 20:03:57 +00:00
Matthias Braun a7c82a9f1d [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

llvm-svn: 224042
2014-12-11 19:42:05 +00:00
Matthias Braun a4e932db16 [CodeGen] Let MachineVerifierPass own its banner string
llvm-svn: 224041
2014-12-11 19:41:51 +00:00
Patrik Hagglund cb06a36c9a Bugfix in InlineSpiller::traceSiblingValue().
Properly determine whether or not a phi was added by splitting.
Check against the current VNInfo of OrigLI instead of against the
OrigVNI argument.

Patch provided by Jonas Paulsson. Reviewed by Quentin Colombet.

llvm-svn: 224009
2014-12-11 10:40:17 +00:00
Ekaterina Romanova 75fd123967 Reverting commit 223981, because the test that I added (incorrect-variable-debugloc1.ll) failed for llvm-ppc64.
The test is failing for llvm-ppc64 because for this platform the location list is not being generated at all (most likely because of the bug in PPC code optimization or generation). I will file a bug agains PPC compiler, but meanwhile, until PPC bug is fixed, I will have to revert my change.  

llvm-svn: 224000
2014-12-11 06:22:35 +00:00
Philip Reames 1e30897497 GCStrategy should not own GCFunctionInfo
This change moves the ownership and access of GCFunctionInfo (the object which describes the safepoints associated with a safepoint under GCRoot) to GCModuleInfo. Previously, this was owned by GCStrategy which was in turned owned by GCModuleInfo. This made GCStrategy module specific which is 'surprising' given it's name and other purposes.

There's a few more changes needed, but we're getting towards the point we can reuse GCStrategy for gc.statepoint as well.

p.s. The style of this code ends up being a mess. I was trying to move code around without otherwise changing much. Once I get the ownership structure rearranged, I will go through and fixup spacing, naming, comments etc.

Differential Revision: http://reviews.llvm.org/D6587

llvm-svn: 223994
2014-12-11 01:47:23 +00:00
Matthias Braun 09afa1ea74 LiveInterval: Use range based for loops for subregister ranges.
llvm-svn: 223991
2014-12-11 00:59:06 +00:00
Ekaterina Romanova ceeaba7932 A fix for PR21176.
DW_OP_const <const> doesn't describe a constant value, but a value at a constant address.
The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value.

Added DW_OP_stack_value to the stack.

-This line, and those below, will be ignored--

M    lib/CodeGen/AsmPrinter/DwarfDebug.cpp
A    test/DebugInfo/incorrect-variable-debugloc1.ll

llvm-svn: 223981
2014-12-10 23:19:56 +00:00
Matthias Braun 96761959d4 LiveInterval: Use more range based for loops for value numbers and segments.
llvm-svn: 223978
2014-12-10 23:07:54 +00:00
Aaron Ballman e5a2a0c9a8 Silencing a -Wsequence-point warning, and the resulting undefined behavior. NFC.
llvm-svn: 223926
2014-12-10 14:14:54 +00:00
Matthias Braun 96d7732b08 MachineVerifier: Allow physreg use if just a subreg is defined.
We can't mark partially undefined registers, so we have to allow reading
a register in the machine verifier if just parts of a register are
defined.

llvm-svn: 223896
2014-12-10 01:13:13 +00:00
Matthias Braun 21554d9b30 MachineVerifier: Allow LiveInterval segments to end at a partial write.
In the subregister liveness tracking case we do not create implicit
reads on partial register writes anymore, still we need to produce a new
SSA value for partial writes so the live segment has to end.

llvm-svn: 223895
2014-12-10 01:13:11 +00:00
Matthias Braun 279f83645c VirtRegMap: Improve block live-in info if subregister liveness is available.
llvm-svn: 223894
2014-12-10 01:13:08 +00:00
Matthias Braun d70caaf5a5 VirtRegMap: No implicit defs/uses for super registers with subreg liveness tracking.
Adding the implicit defs/uses to the superregisters is semantically questionable
but was not dangerous before as the register allocator never assigned the same
register to two overlapping LiveIntervals even when the actually live
subregisters do not overlap. With subregister liveness tracking enabled this
does actually happen and leads to subsequent bugs if we don't stop adding
the superregister defs/uses.

llvm-svn: 223892
2014-12-10 01:13:04 +00:00
Matthias Braun 587e27415d LiveRegMatrix: Respect subregister liveness when allocating registers.
llvm-svn: 223891
2014-12-10 01:13:01 +00:00
Matthias Braun a0f0c1f013 LiveIntervalUnion: Allow specification of liverange when unifying/extracting.
This allows it to add subregister ranges into the union.

llvm-svn: 223890
2014-12-10 01:12:59 +00:00
Matthias Braun 14f764c872 RegisterCoalescer: Preserve subregister liveranges.
llvm-svn: 223888
2014-12-10 01:12:52 +00:00
Matthias Braun 2079aa9140 LiveInterval: Add removeEmptySubRanges().
llvm-svn: 223887
2014-12-10 01:12:40 +00:00
Matthias Braun 8970d847c4 LiveIntervalAnalysis: Add subregister aware variants pruneValue().
llvm-svn: 223886
2014-12-10 01:12:36 +00:00
Matthias Braun e3d3b88cb9 Add a flag to enable/disable subregister liveness.
llvm-svn: 223884
2014-12-10 01:12:30 +00:00
Matthias Braun e5f861b781 LiveIntervalAnalysis: Adapt repairIntervalsInRange() to subregister liveness.
llvm-svn: 223883
2014-12-10 01:12:26 +00:00
Matthias Braun fe896c703c LiveRangeEdit: Adapt eliminateDeadDef() to subregister liveness.
llvm-svn: 223882
2014-12-10 01:12:23 +00:00
Matthias Braun 7044d69e87 LiveIntervalAnalysis: Adapt handleMove() to subregister ranges.
llvm-svn: 223881
2014-12-10 01:12:20 +00:00
Matthias Braun 20e1f38a41 LiveIntervalAnalysis: Update SubRanges in shrinkToUses().
llvm-svn: 223880
2014-12-10 01:12:18 +00:00
Matthias Braun 2f66232bde LiveIntervalAnalysis: Compute subregister ranges.
llvm-svn: 223878
2014-12-10 01:12:12 +00:00
Matthias Braun 3f1d8fdd33 LiveInterval: Add support to track liveness of subregisters.
This code adds the required data structures. Algorithms to compute it follow.

llvm-svn: 223877
2014-12-10 01:12:10 +00:00
Matthias Braun e62c207092 LiveInterval: Add a 'covers' operation to LiveRange.
llvm-svn: 223876
2014-12-10 01:12:06 +00:00
Philip Reames de226055ca Remove the Module pointer from GCStrategy and GCMetadataPrinter
In the current implementation, GCStrategy is a part of the ownership structure for the gc metadata which describes a Module. It also contains a reference to the module in question. As a result, GCStrategy instances are essentially Module specific.

I plan to transition away from this design. Instead, a GCStrategy will be owned by the LLVMContext. It will be a lightweight policy object which contains no information about the Modules or Functions involved, but can be easily reached given a Function.

The first step in this transition is to remove the direct Module reference from GCStrategy. This also requires removing the single user of this reference, the GCMetadataPrinter hierarchy. In theory, this will allow the lifetime of the printers to be scoped to the LLVMContext as well, but in practice, I'm not actually changing that. (Yet?)

An alternate design would have been to move the direct Module reference into the GCMetadataPrinter and change the keying of the owning maps to explicitly key off both GCStrategy and Module. I'm open to doing it that way instead, but didn't see much value in preserving the per Module association for GCMetadataPrinters.

The next change in this sequence will be to start unwinding the intertwined ownership between GCStrategy, GCModuleInfo, and GCFunctionInfo.

Differential Revision: http://reviews.llvm.org/D6566

llvm-svn: 223859
2014-12-09 23:57:54 +00:00
Duncan P. N. Exon Smith 5bf8fef580 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

llvm-svn: 223802
2014-12-09 18:38:53 +00:00
Juergen Ributzka 8bda738221 [CGP] Rewrite pattern match for splitBranchCondition to work with Values instead.
Rewrite the pattern match code to work also with Values instead with
Instructions only. Also remove the no longer need matcher (m_Instruction).

llvm-svn: 223797
2014-12-09 17:50:10 +00:00
Juergen Ributzka 194350a936 Revert "Move function to obtain branch weights into the BranchInst class. NFC."
This reverts commit r223784 and copies the 'ExtractBranchMetadata' to CodeGenPrepare.

llvm-svn: 223795
2014-12-09 17:32:12 +00:00
Juergen Ributzka c1bbcbbd32 [CodeGenPrepare] Split branch conditions into multiple conditional branches.
This optimization transforms code like:
bb1:
  %0 = icmp ne i32 %a, 0
  %1 = icmp ne i32 %b, 0
  %or.cond = or i1 %0, %1
  br i1 %or.cond, label %TrueBB, label %FalseBB

into a multiple branch instructions like:

bb1:
  %0 = icmp ne i32 %a, 0
  br i1 %0, label %TrueBB, label %bb2
bb2:
  %1 = icmp ne i32 %b, 0
  br i1 %1, label %TrueBB, label %FalseBB

This optimization is already performed by SelectionDAG, but not by FastISel.
FastISel cannot perform this optimization, because it cannot generate new
MachineBasicBlocks.

Performing this optimization at CodeGenPrepare time makes it available to both -
SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be
removed. There are currenty a few differences in codegen for X86 and PPC, so
this commmit only enables it for FastISel.

Reviewed by Jim Grosbach

This fixes rdar://problem/19034919.

llvm-svn: 223786
2014-12-09 16:36:13 +00:00
Owen Anderson 558012a3fc Fix a few instances found in SelectionDAG where we were not handling F16 at parity with F32 and F64.
llvm-svn: 223760
2014-12-09 06:50:39 +00:00
Hal Finkel c8cf2b88bc Handle early-clobber registers in the aggressive anti-dep breaker
The aggressive anti-dep breaker, used by the PowerPC backend during post-RA
scheduling (but is available to all targets), did not handle early-clobber MI
operands (at all). When constructing the list of available registers for the
replacement of some def operand, check the using instructions, and remove
registers assigned to early-clobbered defs from the set.

Fixes PR21452.

llvm-svn: 223727
2014-12-09 01:00:59 +00:00
Tom Stellard 3e01d47d98 MISched: Fix moving stores across barriers
This fixes an issue with ScheduleDAGInstrs::buildSchedGraph
where stores without an underlying object would not be added
as a predecessor to the current BarrierChain.

llvm-svn: 223717
2014-12-08 23:36:48 +00:00
Justin Bogner 61ba2e3996 InstrProf: An intrinsic and lowering for instrumentation based profiling
Introduce the ``llvm.instrprof_increment`` intrinsic and the
``-instrprof`` pass. These provide the infrastructure for writing
counters for profiling, as in clang's ``-fprofile-instr-generate``.

The implementation of the instrprof pass is ported directly out of the
CodeGenPGO classes in clang, and with the followup in clang that rips
that code out to use these new intrinsics this ends up being NFC.

Doing the instrumentation this way opens some doors in terms of
improving the counter performance. For example, this will make it
simple to experiment with alternate lowering strategies, and allows us
to try handling profiling specially in some optimizations if we want
to.

Finally, this drastically simplifies the frontend and puts all of the
lowering logic in one place.

llvm-svn: 223672
2014-12-08 18:02:35 +00:00
Hans Wennborg 08de833c1c SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

SimplifyCFG currently does this transformation, but I'm working towards changing
that so we can optimize harder based on unreachable defaults.

Differential Revision: http://reviews.llvm.org/D6510

llvm-svn: 223566
2014-12-06 01:28:50 +00:00
Eric Christopher d1fb7e4590 These two calls were grabbing the same register info. Unify them.
llvm-svn: 223502
2014-12-05 19:23:55 +00:00
Ahmed Bougacha 55e3c2d9cf [CodeGenPrepare] Use variables for reused values. NFC.
llvm-svn: 223491
2014-12-05 18:04:40 +00:00
Hal Finkel 66d7791176 Revert "r223440 - Consider subregs when calling MI::registerDefIsDead for phys deps"
Reverting this because, while it fixes the problem in the reduced test case, it
does not fix the problem in the full test case from the bug report.

llvm-svn: 223442
2014-12-05 02:07:35 +00:00
Hal Finkel d013d99fe0 Consider subregs when calling MI::registerDefIsDead for phys deps
The scheduling dependency graph is built bottom-up within each scheduling
region, and ScheduleDAGInstrs::addPhysRegDeps is called to add output/anti
dependencies, based on physical registers, to the SUs for instructions
based on those that come before them.

In the test case, we start before post-RA scheduling with a block that looks
like this:

...
	INLINEASM <...
andc $0,$0,$2
stdcx. $0,0,$3
bne- 1b
> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:G8RC], %X6<earlyclobber,def,dead>, $1:[mem], %X3<kill>, $2:[reguse:G8RC], %X5<kill>, $3:[reguse:G8RC], %X3, $4:[mem], %X3, $5:[clobber], %CC<earlyclobber,imp-def,dead>, <<badref>>
	...
	%X4<def,dead> = ANDIo8 %X4<kill>, 1, %CR0<imp-def,dead>, %CR0GT<imp-def>
	...
	%R29<def> = ISEL %R3<undef>, %R4<kill>, %CR0GT<kill>

where it is relevant that %CC is an alias to %CR0, and that %CR0GT is a
subregister of %CR0. However, for post-RA scheduling, no dependency was added
to prevent the INLINEASM from being scheduled in between the ANDIo8 and the
ISEL (which communicate via the %CR0GT register).

In ScheduleDAGInstrs::addPhysRegDeps, when called for the %CC operand, we'd
iterate over all of its aliases (which include %CC itself and also %CR0), and
look for previously-encountered defs of those registers. We'd find the ANDIo8,
but decide not to add a dependency between the INLINEASM and the ANDIo8 because
both the INLINEASM's def of %CC is dead, and also the ANDIo8 def of %CR0 is
dead. This ignores, however, that ANDIo8 has a non-dead def of %CR0GT, a
subregister of %CR0, and thus a dependency still must exist.

To fix this problem, when calling registerDefIsDead on the SU with the def, we
also check all subregisters for possible non-dead defs, and add the dependency
if any are found.

Fixes PR21742.

llvm-svn: 223440
2014-12-05 01:57:22 +00:00
Adrian Prantl ab255fcd09 Cleanup: Calls to getDwarfRegNum() may actually fail, if there is
no DWARF register number mapping, or if the register was a virtual
register that was never materialized. Previously, we would just emit a
bogus location, after this patch we don't emit a location at all by
doing an early exit.

After my bugfix in r223401 today, this doesn't actually happen on any
target that I tested this with, but it's still preferable to make the
possibility of a failure explicit.

llvm-svn: 223428
2014-12-05 01:02:46 +00:00
Adrian Prantl da7e03f1bf Simplify implementation and testcase of r223401 based on feedback from dblaikie.
llvm-svn: 223405
2014-12-04 22:58:41 +00:00
Adrian Prantl a3ae0b3b5b Debug info: If the RegisterCoalescer::reMaterializeTrivialDef() is
eliminating all uses of a vreg, update any DBG_VALUE describing that vreg
to point to the rematerialized register instead.

llvm-svn: 223401
2014-12-04 22:29:04 +00:00
Patrik Hagglund d06de4b954 Use DomTree in MachineSink to sink over diamonds.
According to a previous FIXME comment we now not only look at MBB
successors, but also handle code sinking past them:

  x = computation
  if () {} else {}
  use x

The instruction could be sunk over the whole diamond for the
if/then/else (or loop, etc), allowing it to be sunk into other blocks
after that.

Modified test added in r204522, due to one spill less present.

Minor fixes in comments.

Patch provided by Jonas Paulsson. Reviewed by Hal Finkel.

llvm-svn: 223350
2014-12-04 10:36:42 +00:00
Simon Pilgrim be24ab367b [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Elena Demikhovsky f1de34b84d Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 223348
2014-12-04 09:40:44 +00:00
Matt Arsenault 4e27343eec Allow target to specify prefix for labels
Use the MCAsmInfo instead of the DataLayout, and allow
specifying a custom prefix for labels specifically. HSAIL
requires that labels begin with @, but global symbols with &.

llvm-svn: 223323
2014-12-04 00:06:57 +00:00
Quentin Colombet 079aba733a [RegAllocFast] Handle implicit definitions conservatively.
Prior to this commit, physical registers defined implicitly were considered free
right after their definition, i.e.. like dead definitions. Therefore, their uses
had to immediately follow their definitions, otherwise the related register may
be reused to allocate a virtual register.

This commit fixes this assumption by keeping implicit definitions alive until
they are actually used. The downside is that if the implicit definition was dead
(and not marked at such), we block an otherwise available register. This is
however conservatively correct and makes the fast register allocator much more
robust in particular regarding the scheduling of the instructions.

Fixes PR21700.

llvm-svn: 223317
2014-12-03 23:38:08 +00:00
Peter Collingbourne 51d2de7b9e Prologue support
Patch by Ben Gamari!

This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute.  There are a two primary usecases
that these attributes aim to serve,

  1. Function prologue sigils

  2. Function hot-patching: Enable the user to insert `nop` operations
     at the beginning of the function which can later be safely replaced
     with a call to some instrumentation facility

  3. Runtime metadata: Allow a compiler to insert data for use by the
     runtime during execution. GHC is one example of a compiler that
     needs this functionality for its tables-next-to-code functionality.

Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.

Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.

The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.

The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.

References
----------

This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html

Test Plan: testsuite

Differential Revision: http://reviews.llvm.org/D6454

llvm-svn: 223189
2014-12-03 02:08:38 +00:00
Hal Finkel bbdee93638 [PowerPC] Implement readcyclecounter for PPC32
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).

This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).

Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.

llvm-svn: 223161
2014-12-02 22:01:00 +00:00
Philip Reames 72fbe7a6f0 Restructure some assertion checking based on post commit feedback by Aaron and Tom.
llvm-svn: 223150
2014-12-02 21:01:48 +00:00
Philip Reames f814a511da Appease a build bot complaining about an unused variable that's used in an assertion.
llvm-svn: 223142
2014-12-02 19:28:57 +00:00
Philip Reames 1a1bdb22bf [Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder
This is the third patch in a small series.  It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085).  The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.  

With this change, gc.statepoints should be functionally complete.  The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.

I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated.  The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.  

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints.  Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack.  The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.  

In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator.  In principal, we shouldn't need to eagerly spill at all.  The register allocator should do any spilling required and the statepoint should simply record that fact.  Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.  

Reviewed by: atrick, ributzka

llvm-svn: 223137
2014-12-02 18:50:36 +00:00
Ahmed Bougacha 54b7d334c7 [MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction.
Go through implicit defs of CSMI and MI, and clear the kill flags on
their uses in all the instructions between CSMI and MI.
We might have made some of the kill flags redundant, consider:
  subs  ... %NZCV<imp-def>        <- CSMI
  csinc ... %NZCV<imp-use,kill>   <- this kill flag isn't valid anymore
  subs  ... %NZCV<imp-def>        <- MI, to be eliminated
  csinc ... %NZCV<imp-use,kill>
Since we eliminated MI, and reused a register imp-def'd by CSMI
(here %NZCV), that register, if it was killed before MI, should have
that kill flag removed, because it's lifetime was extended.

Also, add an exhaustive testcase for the motivating example.

Reviewed by: Juergen Ributzka <juergen@apple.com>

llvm-svn: 223133
2014-12-02 18:09:51 +00:00
Philip Reames 0365f1a376 [Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend
This is the second patch in a small series.  This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints.  It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code.  I will be submitting the SelectionDAG parts within the next 24-48 hours.  Since those pieces are by far the most complicated, I wanted to minimize the size of that patch.  That patch will include the tests which exercise the functionality in this patch.  The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.

The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots.  The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes.  The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT.  (Doing so would break relocation semantics for collectors which wish to relocate roots.)

The implementation of STATEPOINT is closely modeled on PATCHPOINT.  Eventually, much of the code in this patch will be removed.  The long term plan is to merge the functionality provided by statepoints and patchpoints.  Merging their implementations in the backend is likely to be a good starting point.

Reviewed by: atrick, ributzka

llvm-svn: 223085
2014-12-01 22:52:56 +00:00
Ahmed Bougacha fb6eeb74c5 [MachineVerifier] Accept a MBB with a single landing pad successor.
The MachineVerifier used to check that there was always exactly one
unconditional branch to a non-landingpad (normal) successor.
If that normal successor to an invoke BB is unreachable, it seems
reasonable to only have one successor, the landing pad.
On targets other than AArch64 (and on AArch64 with a different testcase),
the branch folder turns the branch to the landing pad into a fallthrough.
The MachineVerifier, which relies on AnalyzeBranch, is unable to check
the condition, and doesn't complain. However, it does in this specific
testcase, where the branch to the landing pad remained.
Make the MachineVerifier accept it.

llvm-svn: 223059
2014-12-01 18:43:53 +00:00
Hans Wennborg 5bef5b522b Revert r223049, r223050 and r223051 while investigating test failures.
I didn't foresee affecting the Clang test suite :/

llvm-svn: 223054
2014-12-01 17:36:43 +00:00
Hans Wennborg 1571336fb2 SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

llvm-svn: 223049
2014-12-01 17:08:32 +00:00
Akira Hatanaka b9991a2656 [stack protector] Set edge weights for newly created basic blocks.
This commit fixes a bug in stack protector pass where edge weights were not set
when new basic blocks were added to lists of successor basic blocks.

Differential Revision: http://reviews.llvm.org/D5766

llvm-svn: 222987
2014-12-01 04:27:03 +00:00
Hans Wennborg 6dfb041ffc Switch lowering: reformat some for loops etc. NFC
llvm-svn: 222962
2014-11-29 21:24:12 +00:00
Hans Wennborg 6c42d1a5de Switch lowering: Fix broken 'Figure out which block is next' code
This doesn't seem to have worked in a long time, but other optimizations
would clean it up.

llvm-svn: 222961
2014-11-29 21:17:05 +00:00
Simon Pilgrim 2bfd9129f4 Target triple OS detection tidyup. NFC
Use Triple::isOS*() helpers where possible.

llvm-svn: 222960
2014-11-29 19:18:21 +00:00
Duncan P. N. Exon Smith 9bc81fbe92 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
Elena Demikhovsky 6de72e5b0c Converted back to Unix format (after my last commit 222632)
llvm-svn: 222636
2014-11-23 15:21:53 +00:00
Elena Demikhovsky 9e5089a938 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
Manman Ren f0a582bada Debug Info: revert r222195, r222210 and r222239.
This is no longer needed after David's fix at r222377 + r222485.
rdar://18958417

llvm-svn: 222563
2014-11-21 19:55:23 +00:00
Manman Ren c98ec0e70a [Objective-C] Support a new special module flag that will be put into the
objc_imageinfo struct.

rdar://17954668

llvm-svn: 222558
2014-11-21 19:24:55 +00:00
Sanjay Patel eb4a4d5aeb Don't repeat class/function/variable names in comments. NFC.
llvm-svn: 222555
2014-11-21 18:58:38 +00:00
Sanjay Patel b06441aded Less space; NFC
llvm-svn: 222546
2014-11-21 18:05:59 +00:00
Andrea Di Biagio 0225b5bf6f [DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero.
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.

This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.

llvm-svn: 222536
2014-11-21 14:32:06 +00:00
Andrea Di Biagio 26e8f4d166 [DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.

llvm-svn: 222522
2014-11-21 11:33:07 +00:00
Hao Liu 44e5d7a131 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
Matthias Braun 745ea43687 RegisterCoalescer: Improve debug messages
- Show "Considering..." message after flipping so you actually see the final
  destination vreg as destination.
- Add a message on final join, so you can grep for "Success" messages to obtain
  a list of which register got merged with which.

llvm-svn: 222382
2014-11-19 19:46:17 +00:00
Matthias Braun d2f4c77800 Add a print and verify pass after the RegisterCoalescer
llvm-svn: 222381
2014-11-19 19:46:15 +00:00
Matthias Braun 47760d9667 MachineVerifier: Report register for bad liveranges
llvm-svn: 222380
2014-11-19 19:46:13 +00:00
Matthias Braun 9f87d75060 Introduce register dump helper
llvm-svn: 222379
2014-11-19 19:46:11 +00:00
Simon Pilgrim 3ac3b251a9 [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.

I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.

Differential Revision: http://reviews.llvm.org/D5699

llvm-svn: 222340
2014-11-19 10:06:49 +00:00
David Blaikie 70573dcd9f Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
David Blaikie 5106ce7897 Remove StringMap::GetOrCreateValue in favor of StringMap::insert
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)

Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.

llvm-svn: 222319
2014-11-19 05:49:42 +00:00
Owen Anderson b5a259935c Fix an incorrect chain operand when expanding INSERT_VECTOR operations through the stack.
Patch by Daniil Troshkov!

llvm-svn: 222254
2014-11-18 20:50:19 +00:00
Frederic Riss fdccfc1e19 Allow DwarfCompileUnit::constructImportedEntityDIE to instanciate a GlobalVariable DIE.
Usually global variables are in a retain list and instanciated before
any call to constructImportedEntityDIE is made. This isn't true for
forward declarations though.
The testcase for this change is generated by a clang patched to emit
such forward declarations (patch at http://reviews.llvm.org/D6173
which will land soon). The updated testcase tests more than just
global variables, it now tests every type of 'using' clause we
support.

llvm-svn: 222217
2014-11-18 02:46:11 +00:00
Manman Ren 554865da5b Debug Info: In DIBuilder, the context field of a global variable is updated to
use DIScopeRef.

A paired commit at clang will follow to show cases where we will use an
identifer for the context of a global variable.

rdar://18958417

llvm-svn: 222195
2014-11-18 00:29:08 +00:00
Oliver Stannard d29db9b949 Fix optimisations of SELECT_CC which assumed result is boolean
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.

This is a follow-up to D6210/r221693.

llvm-svn: 222123
2014-11-17 10:49:31 +00:00
Craig Topper f98c606479 Add missing semicolon from r222118.
llvm-svn: 222119
2014-11-17 05:58:26 +00:00
Craig Topper cf0444ba2a Move register class name strings to a single array in MCRegisterInfo to reduce static table size and number of relocation entries.
Indices into the table are stored in each MCRegisterClass instead of a pointer. A new method, getRegClassName, is added to MCRegisterInfo and TargetRegisterInfo to lookup the string in the table.

llvm-svn: 222118
2014-11-17 05:50:14 +00:00
Craig Topper 6438fc3d05 Replace a couple asserts with static_asserts.
llvm-svn: 222114
2014-11-17 00:26:50 +00:00
Craig Topper 7f416c8acb Convert some EVTs to MVTs where only a SimpleValueType is needed.
llvm-svn: 222109
2014-11-16 21:17:18 +00:00
Andrea Di Biagio e13a0b81f4 [DAG] Improved target independent vector shuffle folding logic.
This patch teaches the DAGCombiner how to combine shuffles according to rules:
   shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2)

llvm-svn: 222090
2014-11-15 22:56:25 +00:00
Reid Kleckner c2291f3905 Rename EH related stuff to be more precise
Summary:
The current "WinEH" exception handling type is more about Itanium-style
LSDA tables layered on top of the Windows native unwind info format
instead of .eh_frame tables or EHABI unwind info. Use the name
"ItaniumWinEH" to better reflect the hybrid nature of the design.

Also rename isExceptionHandlingDWARF to usesItaniumLSDAForExceptions,
since the LSDA is part of the Itanium C++ ABI document, and not the
DWARF standard.

Reviewers: echristo

Subscribers: llvm-commits, compnerd

Differential Revision: http://reviews.llvm.org/D6279

llvm-svn: 222062
2014-11-14 23:31:07 +00:00
Reid Kleckner 283bc2ed28 Allow the use of functions as typeinfo in landingpad clauses
This is one step towards supporting SEH filter functions in LLVM.

llvm-svn: 221954
2014-11-14 00:35:50 +00:00
Reid Kleckner 971c3ea67b Use nullptr instead of NULL for variadic sentinels
Windows defines NULL to 0, which when used as an argument to a variadic
function, is not a null pointer constant. As a result, Clang's
-Wsentinel fires on this code. Using '0' would be wrong on most 64-bit
platforms, but both MSVC and Clang make it work on Windows. Sidestep the
issue with nullptr.

llvm-svn: 221940
2014-11-13 22:55:19 +00:00
Aditya Nandakumar 3053155652 We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed.
llvm-svn: 221926
2014-11-13 21:29:21 +00:00
Aditya Nandakumar a27193297f This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively
llvm-svn: 221878
2014-11-13 09:26:31 +00:00
Frederic Riss 0f7abef2cf Add an assert and a test that verify r221709's fix.
llvm-svn: 221854
2014-11-13 03:20:23 +00:00
Quentin Colombet f5485bb008 [CodeGenPrepare] Handle zero extensions in the TypePromotionHelper.
Prior to this patch the TypePromotionHelper was promoting only sign extensions.
Supporting zero extensions changes:
- How constants are extended.
- How sign extensions, zero extensions, and truncate are composed together.
- How the type of the extended operation is recorded. Now we need to know the
  kind of the extension as well as its type.

Each change is fairly small, unlike the diff.
Most of the diff are comments/variable renaming to say "extension" instead of
"sign extension".

The performance improvements on the test suite are within the noise.

Related to <rdar://problem/18310086>.

llvm-svn: 221851
2014-11-13 01:44:51 +00:00
Frederic Riss 3a6b354b3e Fix emission of Dwarf accelerator table when there are multiple CUs.
The DIE offset in the accel tables is an offset relative to the start
of the debug_info section, but we were encoding the offset to the
start of the containing CU.

llvm-svn: 221837
2014-11-12 23:48:14 +00:00
Ahmed Bougacha 026600d967 [CodeGenPrepare] Replace other uses of EVT::getEVT with TL::getValueType.
r221820 fixed a problem (PR21548) where an iPTR was used in TLI legality checks,
which isn't valid and resulted in a failed assertion.
The solution was to lower pointer types into the correct target's VT, by
using TL::getValueType instead of EVT::getEVT.

This commit changes 3 other uses of EVT::getEVT, but without any tests:
- One of these non-lowered EVTs is passed to allowsMisalignedMemoryAccesses,
which goes into target's TL implementation and doesn't cause any problem (yet.)
- Two others are passed to TLI.isOperationLegalOrCustom:
  - one only looks at extensions, so doesn't concern pointers.
  - one only looks at binary operators, so also isn't a problem.

The latter might some day be exposed to pointers and cause the same assert as
the original PR, because there's a comment hinting at also supporting cast ops.

For consistency, update all of them and be done with it.

llvm-svn: 221827
2014-11-12 23:05:03 +00:00
Ahmed Bougacha 0788d49a40 [CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered instead.
Fixes PR21548.  Related to PR20474.

llvm-svn: 221820
2014-11-12 22:16:55 +00:00
Timur Iskhodzhanov 0e76a16200 Temporary fix for PR21528 - use mangled C++ function names in COFF debug info to un-break ASan on Windows
llvm-svn: 221813
2014-11-12 20:21:20 +00:00
Timur Iskhodzhanov a11b32b7e5 [COFF] Make it clearer that the symbols subsection holds function display name rather than just name
llvm-svn: 221812
2014-11-12 20:10:09 +00:00
Duncan P. N. Exon Smith de36e8040f Revert "IR: MDNode => Value"
Instead, we're going to separate metadata from the Value hierarchy.  See
PR21532.

This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.

llvm-svn: 221711
2014-11-11 21:30:22 +00:00
Tom Roeder 6312f4a422 Fix build break: remove unused variable in FCFI.
llvm-svn: 221710
2014-11-11 21:26:33 +00:00
Frederic Riss 8ad4f498fb Totally forget deallocated SDNodes in SDDbgInfo.
What would happen before that commit is that the SDDbgValues associated with
a deallocated SDNode would be marked Invalidated, but SDDbgInfo would keep
a map entry keyed by the SDNode pointer pointing to this list of invalidated
SDDbgNodes. As the memory gets reused, the list might get wrongly associated
with another new SDNode. As the SDDbgValues are cloned when they are transfered,
this can lead to an exponential number of SDDbgValues being produced during
DAGCombine like in http://llvm.org/bugs/show_bug.cgi?id=20893

Note that the previous behavior wasn't really buggy as the invalidation made
sure that the SDDbgValues won't be used. This commit can be considered a
memory optimization and as such is really hard to validate in a unit-test.

llvm-svn: 221709
2014-11-11 21:21:08 +00:00
Tom Roeder eb7a303d1b Add Forward Control-Flow Integrity.
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.

This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.

Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.

Review: http://reviews.llvm.org/D4167
llvm-svn: 221708
2014-11-11 21:08:02 +00:00
Oliver Stannard 8c2c67e63c LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.

llvm-svn: 221693
2014-11-11 17:36:01 +00:00
Saleem Abdulrasool d2c5d7f6da Transforms: address some late comments
We already use the llvm namespace.  Remove the unnecessary prefix.  Use the
StringRef::equals method to compare with C strings rather than instantiating
std::strings.

Addresses late review comments from David Majnemer.

llvm-svn: 221564
2014-11-08 00:00:50 +00:00
Saleem Abdulrasool 5898e09057 Transform: add SymbolRewriter pass
This introduces the symbol rewriter. This is an IR->IR transformation that is
implemented as a CodeGenPrepare pass. This allows for the transparent
adjustment of the symbols during compilation.

It provides a clean, simple, elegant solution for symbol inter-positioning. This
technique is often used, such as in the various sanitizers and performance
analysis.

The control of this is via a custom YAML syntax map file that indicates source
to destination mapping, so as to avoid having the compiler to know the exact
details of the source to destination transformations.

llvm-svn: 221548
2014-11-07 21:32:08 +00:00
Lang Hames cdd9077f3a [RegAlloc] Kill off the trivial spiller - nobody is using it any more.
llvm-svn: 221474
2014-11-06 19:12:38 +00:00
Rafael Espindola 26cfbea738 Compute the correct jump table entries on 32 bit windows.
On 32 bit windows we use label differences and .set does not suppress
rolocations, a combination that was not used before r220256.

This fixes PR21497.

llvm-svn: 221456
2014-11-06 14:39:49 +00:00
Rafael Espindola 83f0ea8982 Add three other sections when L symbols are allowed.
llvm-svn: 221436
2014-11-06 05:01:21 +00:00
Rafael Espindola bf77ed6826 Allow L symbols in no_dead_strip sections.
If a section cannot be dead stripped, it is safe to use L symbols, since
the linker will keep all of it in the end.

llvm-svn: 221431
2014-11-06 02:42:03 +00:00
Duncan P. N. Exon Smith c5754a65e6 IR: MDNode => Value: NamedMDNode::getOperator()
Change `NamedMDNode::getOperator()` from returning `MDNode *` to
returning `Value *`.  To reduce boilerplate at some call sites, add a
`getOperatorAsMDNode()` for named metadata that's expected to only
return `MDNode` -- for now, that's everything, but debug node named
metadata (such as llvm.dbg.cu and llvm.dbg.sp) will soon change.  This
is part of PR21433.

Note that there's a follow-up patch to clang for the API change.

llvm-svn: 221375
2014-11-05 18:16:03 +00:00
Andrea Di Biagio ce46b97b48 [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.

This allows for example to simplify the following code:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
  %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
  %3 = or <4 x i32> %1, %2
  ret <4 x i32> %3
}

Before this patch llc (-mcpu=corei7) generated:
        andps  LCPI1_0(%rip), %xmm0, %xmm0
        andps  LCPI1_1(%rip), %xmm1, %xmm1
        orps   %xmm1, %xmm0, %xmm0
        retq

With this patch we generate a single 'vpblendw'.

llvm-svn: 221343
2014-11-05 13:04:14 +00:00
Craig Topper 12f0d9ef2c Improve logic that decides if its profitable to commute when some of the virtual registers involved have uses/defs chains connecting them to physical register. Fix up the tests that this change improves.
llvm-svn: 221336
2014-11-05 06:43:02 +00:00
David Blaikie 3a443c29b9 Provide gmlt-like inline scope information in the skeleton CU to facilitate symbolication without needing the .dwo files
Clang -gsplit-dwarf self-host -O0, binary increases by 0.0005%, -O2,
binary increases by 25%.

A large binary inside Google, split-dwarf, -O0, and other internal flags
(GDB index, etc) increases by 1.8%, optimized build is 35%.

The size impact may be somewhat greater in .o files (I haven't measured
that much - since the linked executable -O0 numbers seemed low enough)
due to relocations. These relocations could be removed if we taught the
llvm-symbolizer to handle indexed addressing in the .o file (GDB can't
cope with this just yet, but GDB won't be reading this info anyway).
Also debug_ranges could be shared between .o and .dwo, though ideally
debug_ranges would get a schema that could used index(+offset)
addressing, and move to the .dwo file, then we'd be back to sharing
addresses in the address pool again.

But for now, these sizes seem small enough to go ahead with this.

Verified that no other DW_TAGs are produced into the .o file other than
subprograms and inlined_subroutines.

llvm-svn: 221306
2014-11-04 22:12:25 +00:00
David Blaikie 9bfd7a9f43 Move cross-unit DIE caching to the DwarfFile level, so it doesn't interfere with fission-gmlt data and produce skeleton<>full unit cross referencing.
llvm-svn: 221305
2014-11-04 22:12:18 +00:00
Arnaud A. de Grandmaison a11cab3120 [PBQP] Callee saved regs should have a higher cost than scratch regs
Registers are not all equal. Some are not allocatable (infinite cost),
some have to be preserved but can be used, and some others are just free
to use.

Ensure there is a cost hierarchy reflecting this fact, so that the
allocator will favor scratch registers over callee-saved registers.

llvm-svn: 221293
2014-11-04 20:51:29 +00:00
Arnaud A. de Grandmaison 829dd81377 [PBQP] Tweak spill costs and coalescing benefits
This patch improves how the different costs (register, interference, spill
and coalescing) relates together. The assumption is now that:
 - coalescing (or any other "side effect" of reg alloc) is negative, and
   instead of being derived from a spill cost, they use the block
   frequency info.
 - spill costs are in the [MinSpillCost:+inf( range
 - register or interference costs are in [0.0:MinSpillCost( or +inf

The current MinSpillCost is set to 10.0, which is a random value high
enough that the current constraint builders do not need to worry about
when settings costs. It would however be worth adding a normalization
step for register and interference costs as the last step in the
constraint builder chain to ensure they are not greater than SpillMinCost
(unless this has some sense for some architectures). This would work well
with the current builder pipeline, where all costs are tweaked relatively
to each others, but could grow above MinSpillCost if the pipeline is
deep enough.

The current heuristic is tuned to depend rather on the number of uses of
a live interval rather than a density of uses, as used by the greedy
allocator. This heuristic provides a few percent improvement on a number
of benchmarks (eembc, spec, ...) and will definitely need to change once
spill placement is implemented: the current spill placement is really
ineficient, so making the cost proportionnal to the number of use is a
clear win.

llvm-svn: 221292
2014-11-04 20:51:24 +00:00
David Majnemer b925715c56 CodeGen: Enable DWARF emission for MS ABI targets
This is experimental, just barely enough to get things to not
immediately combust.

A note for those who are curious:
Only lld can successfully link the object files, other linkers truncate
the section names making the debug sections illegible to debuggers.

Even with this in mind, we believe we are having trouble with SECREL
relocations.

llvm-svn: 221245
2014-11-04 08:03:31 +00:00
Sanjoy Das e839965faa The patchpoint lowering logic would crash with live constants equal to
the tombstone or empty keys of a DenseMap<int64_t, T>.  This patch
fixes the issue (and adds a tests case).

llvm-svn: 221214
2014-11-04 00:59:21 +00:00
Sanjoy Das 429c9cae09 Change logic in StackMaps::recordStackMapOpers to use the isInt<32>
predicate instead of bitwise operations.

This is not a functional change.

llvm-svn: 221209
2014-11-04 00:06:57 +00:00
David Blaikie 5b02a19f90 Use common range handling for the CU's ranges
This generalizes the range handling for ranges in both the skeleton and
full unit, laying the foundation for the addition of more ranges (rather
than just the CU's special case) in the skeleton CU with fission+gmlt.

llvm-svn: 221202
2014-11-03 23:10:59 +00:00
David Blaikie 542616d47c Push the CURangeList down into the skeleton CU (where available) rather than the full CU
So that it may be shared between skeleton/full compile unit, for CU
ranges and other ranges to be added for fission+gmlt.

(at some point we might want some kind of object shared between the
skeleton and full compile units for all those things we only want one of
in that scope, rather than having the full unit always look through to
the skeleton... - alternatively, we might be able to have the skeleton
pointer (or another, separate pointer) point to the skeleton or to the
unit itself in non-fission, so we don't have to special case its
absence)

llvm-svn: 221186
2014-11-03 21:52:56 +00:00
David Blaikie ce343492ee Add DwarfCompileUnit::BaseAddress to track the base address used by relative addressing in debug_ranges and debug_loc
This is one of a few steps to generalize range handling to include the
CU range (thus the CU's range list will be moved into the range list
list, losing track of the base address in the process), which means
generalizing ranges from both the skeleton and full unit under fission.

And... then I can used that generalized support for ranges in
fission+gmlt where there'll be a bunch more ranges in the skeleton.

llvm-svn: 221182
2014-11-03 21:15:30 +00:00
Paul Robinson ad06e430ce Normally an 'optnone' function goes through fast-isel, which does not
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.

Commit includes the test that found this, plus another one that got
missed in the original optnone work.

llvm-svn: 221168
2014-11-03 18:19:26 +00:00
David Blaikie 077ad48447 Cleanup some unused or trivial functions in DwarfCompileUnit
llvm-svn: 221164
2014-11-03 17:10:38 +00:00
David Blaikie bc532b44a0 Sink DwarfUnit::CURanges into DwarfCompileUnit
llvm-svn: 221161
2014-11-03 16:40:43 +00:00
Oliver Stannard cf6bfb1dd0 Revert r221150, as it broke sanitizer tests
llvm-svn: 221151
2014-11-03 12:19:03 +00:00
Oliver Stannard 652ec6ee89 Emit .eh_frame with relocations to functions, rather than sections
When LLVM emits DWARF call frame information, it currently creates a local,
section-relative symbol in the code section, which is pointed to by a
relocation on the .eh_frame section. However, for C++ we emit some functions in
section groups, and the SysV ABI has some rules to make it easier to remove
these sections
(http://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_group_rules):

  A symbol table entry with STB_LOCAL binding that is defined relative to one
  of a group's sections, and that is contained in a symbol table section that is
  not part of the group, must be discarded if the group members are discarded.
  References to this symbol table entry from outside the group are not allowed.

This means that we need to use the function symbol for the relocation, not a
temporary symbol.

There was a comment in the code claiming that the local symbol was used to
avoid creating a relocation, but a relocation must be created anyway as the
code and CFI are in different sections.

llvm-svn: 221150
2014-11-03 12:02:51 +00:00
David Blaikie 89a26f012a Sink range list handling down from DwarfUnit into its only use, in DwarfCompileUnit.
llvm-svn: 221123
2014-11-03 02:41:49 +00:00
David Blaikie 4aa49b2054 Formatting
llvm-svn: 221095
2014-11-02 08:52:37 +00:00
David Blaikie cafd962d97 Add DwarfUnit::isDwoUnit and use it to generalize string creation
Currently we only need to emit skeleton strings into the CU header and
we do this by explicitly calling "addLocalString". With gmlt-in-fission,
we'll be emitting a bunch of other strings from other codepaths where
it's not statically known that these strings will be local or not.

Introduce a virtual function to indicate whether this unit is a DWO unit
or not (I'm not sure if we have a good term for this, the
opposite/alternative to 'skeleton' unit) and use that to generalize the
string emission logic so that strings can be correctly emitted in both
the skeleton and dwo unit when in split dwarf mode.

And to demonstrate that this works, switch the existing special callers
of addLocalString in the skeleton builder to addString - and they still
work. Yay.

llvm-svn: 221094
2014-11-02 08:51:37 +00:00
David Blaikie 279c451c0b Remove the last mention of LineTablesOnly from DwarfUnit, sinking it into DwarfCompileUnit
This is a useful distinction/invariant/delination to make because
LineTablesOnly mode is never relevant to type units, so it's clear that
we're not doing weird line-tables-only-with-types by making this API
choice.

It also lays the foundations nicely for adding gmlt-like data to fission
skeleton CUs while limiting the effects to CUs and not TUs.

llvm-svn: 221093
2014-11-02 08:18:06 +00:00
David Blaikie 3363a57c8e Sink DwarfUnit::applySubprogramAttributesToDefinition into DwarfCompileUnit
llvm-svn: 221092
2014-11-02 08:09:09 +00:00
David Blaikie 978020807a Sink DwarfUnit::addExpr into DwarfCompileUnit
llvm-svn: 221090
2014-11-02 07:11:55 +00:00
David Blaikie 8c485b5d74 Fix the build from the last commit
llvm-svn: 221089
2014-11-02 07:08:12 +00:00
David Blaikie 02a6333ba7 Sink DwarfUnit::applyVariableAttributes into DwarfCompileUnit
llvm-svn: 221088
2014-11-02 07:06:51 +00:00
David Blaikie 4bc0881ac7 Sink DwarfUnit::addLocationList down into DwarfCompileUnit
llvm-svn: 221087
2014-11-02 07:03:19 +00:00
David Blaikie 77895fb276 Sink DwarfUnit::addComplexAddress down into DwarfCompileUnit
llvm-svn: 221086
2014-11-02 06:58:44 +00:00
David Blaikie f7435ee6ce Push DwarfUnit::addAddress down into DwarfCompileUnit
llvm-svn: 221085
2014-11-02 06:46:40 +00:00
David Blaikie 7d48be2b7b Sink DwarfUnit::addVariableAddress into DwarfCompileUnit since type units don't have variables
llvm-svn: 221084
2014-11-02 06:37:23 +00:00
David Blaikie 192b45c1ef DebugInfo: Sink accelerator table lists down (GlobalNames/Types) into DwarfCompileUnit
llvm-svn: 221083
2014-11-02 06:16:39 +00:00
David Blaikie 98cf172175 Add DwarfUnit::addGlobalType to match DwarfUnit::addGlobalName
(these will shortly become virtual, with a null implementation in
DwarfUnit (since type units don't have accelerator tables in the current
schema) and the current implementation down in DwarfCompileUnit, moving
the actual maps there too)

llvm-svn: 221082
2014-11-02 06:06:14 +00:00
David Blaikie 871c2d9d63 DebugInfo: Refactor index type DIE initialization by rolling it into the accessor
llvm-svn: 221080
2014-11-02 03:09:13 +00:00
David Blaikie ce47366150 Be sure to initialize DwarfCompileUnit::LabelBegin now that it may be skipped in initSection
llvm-svn: 221079
2014-11-02 02:40:26 +00:00
David Blaikie b6726a9ece Don't bother creating LabelBegin for .dwo units
This would help catch cases where we might otherwise try to reference a
dwo CU label, which would be weird - because without relocations in the
dwo file it's not generally meaningful to talk about the CU offsets
there (or, if it is, we can do so in absolute terms without using a
relocation to compute it).

llvm-svn: 221078
2014-11-02 02:26:24 +00:00
David Blaikie 27e35f2302 Drop DwarfCompileUnit::getLocalLabel* in favor of just mapping through the skeleton explicitly.
Confusing to do this two different ways - I'm not too wedded to either
one, but here goes.

llvm-svn: 221076
2014-11-02 01:21:43 +00:00
David Blaikie f4bdc31271 Sink DwarfUnit::LabelBegin down into DwarfCompileUnit since that's the only place it's needed.
llvm-svn: 221075
2014-11-02 01:21:40 +00:00
David Blaikie ae57e66e36 Sink dwarf unit length emission down into DwarfUnit::emitHeader
This allows the CU label to be emitted only for compile units, as
they're the only ones that need it (so they can be referenced from
pubnames)

llvm-svn: 221072
2014-11-01 23:59:23 +00:00
David Blaikie 983bfea0d0 Remove DwarfUnit::LabelEnd in favor of computing the length of the section directly
This was a compile-unit specific label (unused in type units) and seems
unnecessary anyway when we can more easily directly compute the size of
the compile unit.

llvm-svn: 221067
2014-11-01 23:07:14 +00:00
David Blaikie a34568b220 Sink DwarfUnit::SectionSym into DwarfCompileUnit as it's only needed/used there.
llvm-svn: 221062
2014-11-01 20:06:28 +00:00
David Blaikie a5437b6bf6 Make DwarfCompileUnit::Skeleton more narrowly typed (DwarfCompileUnit* instead of DwarfUnit*) now that it's specific to DwarfCompileUnit anyway.
llvm-svn: 221060
2014-11-01 19:26:05 +00:00
David Blaikie 7cbf58af15 Sink DwarfUnit::Skeleton down into DwarfCompileUnit
Type units no longer have skeletons and it's misleading to be able to
query for a type unit's skeleton (it might incorrectly lead one to
conclude that if a unit doesn't have a skeleton it's not in a .dwo
file... ).

llvm-svn: 221055
2014-11-01 18:18:07 +00:00
David Blaikie f6dac29a83 Sink DwarfDebug::AbstractSPDies down into DwarfFile
This is the first big step to allowing gmlt-like inline scope
information in the skeleton CU. While this commit doesn't change the
functionality, it's only a small step to call
"constructAbstractSubprogramDIE" on both the InfoHolder and the
SkeletonHolder (when in use) and that will at least create the abstract
SP dies in that case, though still not creating the other subprograms.

llvm-svn: 221051
2014-11-01 17:21:26 +00:00
David Blaikie 2f08093dda Remove unused function
llvm-svn: 221037
2014-11-01 01:15:26 +00:00
David Blaikie a8be0a794f And... fix the build some more.
llvm-svn: 221036
2014-11-01 01:15:24 +00:00
David Blaikie 1998a2e789 Just iterate the DwarfCompileUnits rather than trying to filter them out of the list of all units.
llvm-svn: 221034
2014-11-01 01:11:19 +00:00
David Blaikie bbad0bbb2f Add '*' to auto variable that is a pointer, as per the coding conventions.
llvm-svn: 221033
2014-11-01 01:03:39 +00:00
David Blaikie a33cd6aa27 Add DwarfCompileUnit::getSkeleton that returns DwarfCompileUnit* to avoid having to cast from DwarfUnit* on every call.
llvm-svn: 221031
2014-11-01 00:50:34 +00:00
Duncan P. N. Exon Smith 3872d0084c IR: MDNode => Value: Instruction::getMetadata()
Change `Instruction::getMetadata()` to return `Value` as part of
PR21433.

Update most callers to use `Instruction::getMDNode()`, which wraps the
result in a `cast_or_null<MDNode>`.

llvm-svn: 221024
2014-11-01 00:10:31 +00:00
David Blaikie 1d96cc2ff3 Sink some of DwarfDebug::collectDeadVariables down into DwarfCompileUnit.
llvm-svn: 221010
2014-10-31 22:30:30 +00:00
David Blaikie 49be5b357b Sink most of DwarfDebug::constructAbstractSubprogramScopeDIE into DwarfCompileUnit
llvm-svn: 221005
2014-10-31 21:57:02 +00:00
Quentin Colombet c32615dfef [CodeGenPrepare] Move extractelement close to store if they can be combined.
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.


** Context **

Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.


** Motivating Example **

Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
 %in1 = load <2 x i32>* %addr1, align 8
 %extract = extractelement <2 x i32> %in1, i32 1
 %out = or i32 %extract, 1
 store i32 %out, i32* %dest, align 4
 ret void
}

As it is, this IR generates the following assembly on armv7:
  vldr  d16, [r0]            @vector load  
  vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
  orr r0, r0, #1           @ scalar bitwise or
  str r0, [r1]               @ scalar store
  bx  lr

Whereas we could generate much faster code:
  vldr  d16, [r0]               @ vector load
  vorr.i32  d16, #0x1     @ vector bitwise or
  vst1.32 {d16[1]}, [r1:32] @ vector extract + store
  bx  lr

Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.


** Proposed Solution **

To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.

Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).

The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.

For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.

[1] We can imagine targets that can combine an extractelement with  other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.

Differential Revision: http://reviews.llvm.org/D5921

<rdar://problem/14170854>

llvm-svn: 220978
2014-10-31 17:52:53 +00:00
David Blaikie fb2efde341 Correct assert text from r220923
Noticed in post-commit review by Adrian Prantl.

llvm-svn: 220967
2014-10-31 16:45:36 +00:00
Hao Liu e02b1a068f PR20557: Fix the bug that bogus cpu parameter crashes llc on AArch64 backend.
Initial patch by Oleg Ranevskyy.

llvm-svn: 220945
2014-10-31 02:35:34 +00:00
Ahmed Bougacha 9f336c4ec5 [SelectionDAG] When scalarizing trunc, don't assert for legal operands.
r212242 introduced a legalizer hook, originally to let AArch64 widen
v1i{32,16,8} rather than scalarize, because the legalizer expected, when
scalarizing the result of a conversion operation, to already have
scalarized the operands.  On AArch64, v1i64 is legal, so that commit
ensured operations such as v1i32 = trunc v1i64 wouldn't assert.

It did that by choosing to widen v1 types whenever possible.  However,
v1i1 types, for which there's no legal widened type, would still trigger
the assert.

This commit fixes that, by only scalarizing a trunc's result when the
operand has already been scalarized, and introducing an extract_elt
otherwise.  
This is similar to r205625.

Fixes PR20777.

llvm-svn: 220937
2014-10-30 23:46:50 +00:00
Louis Gerbarg e8f9c78247 Fix incorrect invariant check in DAG Combine
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.

Tests included.

rdar://18816719

llvm-svn: 220933
2014-10-30 22:21:03 +00:00
David Blaikie 76fd43c653 PR21408: Workaround the appearance of duplicate variables due to problems when inlining two calls to the same function from the same call site.
llvm-svn: 220923
2014-10-30 20:20:11 +00:00
NAKAMURA Takumi f51a34ec1f Whitespace.
llvm-svn: 220857
2014-10-29 15:23:11 +00:00
David Blaikie ff468a5e0b Minimize the scope of some variables, NFC.
llvm-svn: 220759
2014-10-28 02:57:26 +00:00
Lang Hames 5fe30ca56f [PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of these
sets as keys into a cache of interference matrice values in the Interference
constraint adder.

Creating interference matrices was one of the large remaining time-sinks in
PBQP. Caching them reduces the total compile time (when using PBQP) on the
nightly test suite by ~10%.

llvm-svn: 220688
2014-10-27 17:44:25 +00:00
David Blaikie df1cca3a24 Remove some unnecessary casts.
llvm-svn: 220658
2014-10-26 23:37:04 +00:00
Frederic Riss 987fe22789 Sink DwarfUnit::constructImportedEntityDIE into DwarfCompileUnit.
So that it has access to getOrCreateGlobalVariableDIE. If we ever support
decsribing using directive in C++ classes (thus requiring support in type
units), it will certainly use another mechanism anyway.

Differential Revision: http://reviews.llvm.org/D5975

llvm-svn: 220594
2014-10-24 21:31:09 +00:00
Matt Arsenault 4b7bd2d52f Fix copy paste comment
llvm-svn: 220581
2014-10-24 18:13:10 +00:00
David Blaikie 80e5b1ebd1 DebugInfo: Sink DwarfDebug::ScopeVariables down into DwarfFile
(part of refactoring to allow subprogram emission in both the skeleton
and main units to enable -gmlt-like data to be included in the skeleton
for live inlined backtracing purposes)

llvm-svn: 220578
2014-10-24 17:57:34 +00:00
David Blaikie 0dc623c232 Remove DwarfDebug::FirstCU as it has no use
It was only being used as a flag to identify the lack of debug info from
within endModule - use the section labels for that instead.

llvm-svn: 220575
2014-10-24 17:53:38 +00:00
Sanjay Patel 957efc23bb Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
2014-10-24 17:02:16 +00:00
Marcello Maggioni 2259400f8e Added reset of LexicalScope in LiveDebugVariables reset function.
llvm-svn: 220545
2014-10-24 02:46:50 +00:00
Timur Iskhodzhanov 2bc90fdbdc Fix PR21189 -- Emit symbol subsection required to debug LLVM-built binaries with VS2012+
Reviewed at http://reviews.llvm.org/D5772

llvm-svn: 220544
2014-10-24 01:27:45 +00:00
David Blaikie f4c504e03c DebugInfo: Remove DwarfDebug::addScopeVariable now that it's just a trivial wrapper
llvm-svn: 220542
2014-10-24 00:43:47 +00:00
Ahmed Bougacha 7daf3b89f9 [SelectionDAG] Teach the vector scalarizer about FP conversions.
This adds support for legalization of instructions of the form:

  [fp_conv] <1 x i1> %op to <1 x double>

where fp_conv is one of fpto[us]i, [us]itofp.  This used to assert
because they were simply missing from the vector operand scalarizer.

A similar problem arose in r190830, with trunc instead.

Fixes PR20778.

Differential Revision: http://reviews.llvm.org/D5810

llvm-svn: 220533
2014-10-23 22:49:25 +00:00
Ahmed Bougacha e05afd4d1e Update comment and fix typos in assert message. (NFC)
llvm-svn: 220531
2014-10-23 22:40:34 +00:00
Tim Northover e4c7be56bf ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodes
x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS
dependency because it was represented by a pair of CopyFromReg(EFLAGS) ->
CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an
implicit-def on the instruction, where the result numbers in the DAG and the
Uses list in TableGen matched up precisely.

The Copy notation seems much more robust, so this patch extends ScheduleDAG
rather than refactoring x86.

Should fix PR20376.

llvm-svn: 220529
2014-10-23 22:31:48 +00:00
David Blaikie 1dd573db45 DebugInfo: Remove DwarfDebug::CurrentFnArguments since we have to handle argument ordering of other arguments (abstract arguments) in the same way and already have code for that too.
While refactoring this code I was confused by both the name I had
introduced (addNonArgumentVariable... but it has all this logic to
handle argument numbering and keep things in order?) and by the
redundancy. Seems when I fixed the misordered inlined argument handling,
I didn't realize it was mostly redundant with the argument ordering code
(which I may've also written, I'm not sure). So let's just rely on the
more general case.

The only oddity in output this produces is that it means when we emit
all the variables for the current function, we don't track when we've
finished the argument variables and are about to start the local
variables and insert DW_AT_unspecified_parameters (for varargs
functions) there. Instead it ends up after the local variables, scopes,
etc. But this isn't invalid and doesn't cause DWARF consumers problems
that I know of... so we'll just go with that because it makes the code
nice & simple.

(though, let's see what the buildbots have to say about this - *crosses
fingers*)

There will be some cleanup commits to follow to remove the now trivial
wrappers, etc.

llvm-svn: 220527
2014-10-23 22:27:50 +00:00
David Blaikie 3b5c84008d DebugInfo: Sink DwarfDebug::addNonArgumentScopeVariable into DwarfFile.
llvm-svn: 220520
2014-10-23 22:04:30 +00:00
David Blaikie ff4181adec DebugInfo: Remove DwarfDebug::addCurrentFnArgument declaration now that it's moved to DwarfFile.
llvm-svn: 220515
2014-10-23 21:53:17 +00:00
David Blaikie 49cfc8ca8e DebugInfo: Simplify/tidy/correct global variable decl/def emission handling.
This fixes a bug (introduced by fixing the IR emitted from Clang where
the definition of a static member would be scoped within the class,
rather than within its lexical decl context) where the definition of a
static variable would be placed inside a class.

It also improves source fidelity by scoping static class member
definitions inside the lexical decl context in which tehy are written
(eg: namespace n { class foo { static int i; } int foo::i; } - the
definition of 'i' will be within the namespace 'n' in the DWARF output
now).

Lastly, and the original goal, this reduces debug info size slightly
(and makes debug info easier to read, etc) by placing the definitions of
non-member global variables within their namespace, rather than using a
separate namespace-scoped declaration along with a definition at global
scope.

Based on patches and discussion with Frédéric.

llvm-svn: 220497
2014-10-23 19:12:43 +00:00
David Blaikie bcfa8a56b1 Remove explicit (void) use of DwarfFile::DD that was accidentally left in r220452.
Caught in post-commit review by Frédéric.

llvm-svn: 220487
2014-10-23 16:12:58 +00:00
David Blaikie f299947bfa [DebugInfo] Sink DwarfDebug::addCurrentFnArgument down into DwarfFile.
Variable handling will be sunk into DwarfFile so that abstract variables
and the like can be shared across multiple CUs (to handle cross-CU
inlining, for example).

llvm-svn: 220453
2014-10-23 00:16:05 +00:00
David Blaikie 2b22b1e9a2 [DebugInfo] Add DwarfDebug& to DwarfFile.
Use the DwarfDebug in one function that previously took it as a
parameter, and lay the foundation for use this for other operations
coming soon.

llvm-svn: 220452
2014-10-23 00:16:03 +00:00
David Blaikie 263a008525 [DebugInfo] Remove LexicalScopes::isCurrentFunctionScope and CSE a use of LexicalScopes::getCurrentFunctionScope
Now that we're sure the only root (non-abstract) scope is the current
function scope, there's no need for isCurrentFunctionScope, the property
can be tested directly instead.

llvm-svn: 220451
2014-10-23 00:06:27 +00:00
Benjamin Kramer 7ad22403fb Strength reduce constant-sized vectors into arrays. No functionality change.
llvm-svn: 220412
2014-10-22 19:55:26 +00:00
Matt Arsenault 8226ca29e2 Fix typo
llvm-svn: 220353
2014-10-22 00:28:59 +00:00
Matt Arsenault 7c93690be0 Add minnum / maxnum codegen
llvm-svn: 220342
2014-10-21 23:01:01 +00:00
Arnaud A. de Grandmaison 5c7fe7e97b Pacify bots and simplify r220321
llvm-svn: 220335
2014-10-21 21:50:49 +00:00
Arnaud A. de Grandmaison a61262f989 [PBQP] Teach PassConfig to tell if the default register allocator is used.
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.

llvm-svn: 220321
2014-10-21 20:47:22 +00:00
Arnaud A. de Grandmaison d3648d0cbe [PBQP] Fix coalescing benefits
As coalescing registers is a benefit, the cost should be improved (i.e. made smaller) when coalescing is possible.

llvm-svn: 220302
2014-10-21 16:24:15 +00:00
Rafael Espindola c606bfe660 Fix a bit of confusion about .set and produce more readable assembly.
Every target we support has support for assembly that looks like

a = b - c
.long a

What is special about MachO is that the above combination suppresses the
production of a relocation.

With this change we avoid producing the intermediary labels when they don't
add any value.

llvm-svn: 220256
2014-10-21 01:17:30 +00:00
Rafael Espindola 74dd8547db Make AsmPrinter::EmitLabelOffsetDifference a static helper and simplify.
It had exactly one caller in a position where we know hasSetDirective is true.

llvm-svn: 220250
2014-10-21 00:25:49 +00:00
Philip Reames 5a3f5f751b Introduce enum values for previously defined metadata types. (NFC)
Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well.  Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison.  This change adds enum value for three existing metadata types:
+    MD_nontemporal = 9, // "nontemporal"
+    MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access"
+    MD_nonnull = 11 // "nonnull"

I went through an updated various uses as well.  I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate.  For example, there were several items in LoopInfo.cpp I chose not to update.

llvm-svn: 220248
2014-10-21 00:13:20 +00:00
Lang Hames ad0962aec5 [PBQP] Replace the interference-constraints algorithm with a faster version
loosely based on linear scan.

On x86-64 this is good for a ~2% drop in compile time on the nightly test suite.

llvm-svn: 220143
2014-10-18 17:26:07 +00:00
Pete Cooper 230332f4fe Check for dynamic alloca's when selecting lifetime intrinsics.
TL;DR: Indexing maps with [] creates missing entries.

The long version:

When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime.  Trouble is, we don't first check to see if this is a dynamic alloca.

On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime.  0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots.

PEI would later trigger an assert because it expects the stack protector to not be dead.

This fix ensures that we only get frame indices for static allocas, ie, those in the map.  Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken.

rdar://problem/18672951

llvm-svn: 220099
2014-10-17 22:59:33 +00:00
Juergen Ributzka ad2363f9ee [Stackmaps] Enable invoking the patchpoint intrinsic.
Patch by Kevin Modzelewski
Reviewers: atrick, ributzka
Reviewed By: ributzka
Subscribers: llvm-commits, reames

Differential Revision: http://reviews.llvm.org/D5634

llvm-svn: 220055
2014-10-17 17:39:00 +00:00
Jan Vesely af62cf4db0 SelectionDAG: Add sext_inreg optimizations
v2: use dyn_cast
    fixup comments
v3: use cast

Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220044
2014-10-17 14:45:25 +00:00
Juergen Ributzka fd4633e1a5 Reduce code duplication between patchpoint and non-patchpoint lowering. NFC.
This is in preparation for another patch that makes patchpoints invokable.

Reviewers: atrick, ributzka
Reviewed By: ributzka
Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5657

llvm-svn: 219967
2014-10-16 21:26:35 +00:00
Robin Morisset e2de06bef6 Erase fence insertion from SelectionDAGBuilder.cpp (NFC)
Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
  exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
  does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
  It happens to mostly work for the other targets because they are extremely
  conservative, but Power for example had to switch to AtomicExpand to be
  able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
  in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
  x.store(1);
Thread 1:
  y.store(1);
Thread 2:
  r1 = x.load();
  r2 = y.load();
Thread 3:
  r3 = y.load();
  r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..

This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.

Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.

Test Plan: make check-all, no functional change

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5474

llvm-svn: 219957
2014-10-16 20:34:57 +00:00
Eric Christopher 2181fb2ff3 Avoid caching the MachineFunction, we don't use it outside of
runOnMachineFunction.

llvm-svn: 219847
2014-10-15 21:06:25 +00:00
Rafael Espindola 7b61ddfa6e Simplify handling of --noexecstack by using getNonexecutableStackSection.
llvm-svn: 219799
2014-10-15 16:12:52 +00:00
Jingyue Wu 2954280f6a [MachineSink] Use the real post dominator tree
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in
isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo.
The old heuristics caused instructions to sink unnecessarily, and might create
register pressure.

This is the second try of the fix. The first one (D4814) caused a performance
regression due to failing to sink instructions out of loops (PR21115). This
patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower
one regardless of whether the target block post-dominates the source.

Thanks Alexey Volkov for reporting PR21115! 

Test Plan:
Added a NVPTX codegen test to verify that our change prevents the backend from
over-sinking. It also shows the unnecessary register pressure caused by
over-sinking.

Added an X86 test to verify we can sink instructions out of loops regardless of
the dominance relationship. This test is reduced from Alexey's test in PR21115.

Updated an affected test in X86.

Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime
performance. Results are attached separately in the review thread.

Reviewers: Jiangning, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D5633

llvm-svn: 219773
2014-10-15 03:27:43 +00:00
Gerolf Hoflehner a4c96d02a2 [AAarch64] Optimize CSINC-branch sequence
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.

Examples:

1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
   to b.<invCC>

2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
   to b.<CC>


rdar://problem/18506500

llvm-svn: 219742
2014-10-14 23:07:53 +00:00
Rafael Espindola 76936ebc49 Remove unused member variable.
Fixes pr20904.

llvm-svn: 219706
2014-10-14 18:53:16 +00:00
David Blaikie 3dfe4788ae DebugInfo: Ensure that all debug location scope chains from instructions within a function, lead to the function itself.
Let me tell you a tale...

Originally committed in r211723 after discovering a nasty case of weird
scoping due to inlining, this was reverted in r211724 after it fired in
ASan/compiler-rt.

(minor diversion where I accidentally committed/reverted again in
r211871/r211873)

After further testing and fixing bugs in ArgumentPromotion (r211872) and
Inlining (r212065) it was recommitted in r212085. Reverted in r212089
after the sanitizer buildbots still showed problems.

Fixed another bug in ArgumentPromotion (r212128) found by this
assertion.

Recommitted in r212205, reverted in r212226 after it crashed some more
on sanitizer buildbots.

Fix clang some more in r212761.

Recommitted in r212776, reverted in r212793. ASan failures.
Recommitted in r213391, reverted in r213432, trying to reproduce flakey
ASan build failure.

Fixed bugs in r213805 (ArgPromo + DebugInfo), r213952
(LiveDebugVariables strips dbg_value intrinsics in functions not
described by debug info).

Recommitted in r214761, reverted in r214999, flakey failure on Windows
buildbot.

Fixed DeadArgElimination + DebugInfo bug in r219210.

Recommitted in r219215, reverted in r219512, failure on ObjC++ atomic
properties in the test-suite on Darwin.

Fixed ObjC++ atomic properties issue in Clang in r219690.

[This commit is provided 'as is' with no hope that this is the last time
I commit this change either expressed or implied]

llvm-svn: 219702
2014-10-14 18:22:52 +00:00
David Blaikie 24026502d5 Revert "Fix stuff... again."
Accidental commit.

This reverts commit r219693.

llvm-svn: 219695
2014-10-14 17:13:09 +00:00
David Blaikie e75f963c61 Revert some parts of r196288 that were confusing and untested.
If we figure out why they should be here, let's add some testing of some
kind so we can better demonstrate why it's needed.

llvm-svn: 219694
2014-10-14 17:12:02 +00:00
David Blaikie 27549023b0 Fix stuff... again.
llvm-svn: 219693
2014-10-14 17:11:59 +00:00
Eric Christopher 307c2cb26f Remove unnecessary TargetMachine.h includes.
llvm-svn: 219672
2014-10-14 07:22:08 +00:00
Eric Christopher 6062180203 Grab the subtarget and subtarget dependent variables off of
MachineFunction rather than TargetMachine.

llvm-svn: 219671
2014-10-14 07:22:00 +00:00
Eric Christopher b66367a891 Grab the subtarget and subtarget dependent variables off of
MachineFunction rather than TargetMachine.

llvm-svn: 219670
2014-10-14 07:17:23 +00:00
Eric Christopher 92b4bcbbee Instead of the TargetMachine cache the MachineFunction
and TargetRegisterInfo in the peephole optimizer. This
makes it easier to grab subtarget dependent variables off
of the MachineFunction rather than the TargetMachine.

llvm-svn: 219669
2014-10-14 07:17:20 +00:00
Eric Christopher eb9e87f6e3 Access subtarget specific variables off of the MachineFunction's
cached subtarget and not the TargetMachine.

llvm-svn: 219668
2014-10-14 07:00:33 +00:00
Eric Christopher 99556d77ef Access the subtarget off of the MachineFunction via the DAG
scheduler or via the SelectionDAG if available. Otherwise
grab the subtarget off of the MachineFunction by going up
the parent chain.

llvm-svn: 219666
2014-10-14 06:56:25 +00:00
Eric Christopher b65c7b919c Remove the use and member variable of the TargetMachine from
MachineLICM as we can get the same data off of the MachineFunction.

llvm-svn: 219663
2014-10-14 06:26:57 +00:00
Eric Christopher 20c98938bb Have MachineInstrBundle use the MachineFunction for subtarget
access rather than the TargetMachine.

llvm-svn: 219662
2014-10-14 06:26:55 +00:00
Eric Christopher d3fa440d08 Access the subtarget off of the MachineFunction rather than
through the TargetMachine.

llvm-svn: 219661
2014-10-14 06:26:53 +00:00
Eric Christopher 2a321f74f0 Remove the TargetMachine from DFAPacketizer since it was only
being used to grab subtarget specific things that we can grab
from the MachineFunction anyhow.

llvm-svn: 219650
2014-10-14 01:03:16 +00:00
Eric Christopher 1c5fce0ebb Migrate another set of getSubtargetImpl away.
llvm-svn: 219636
2014-10-13 21:57:44 +00:00
Adrian Prantl 049d21caea Add an assertion about the integrity of the iterator.
Broken parent scope pointers in inlined DIVariables can cause
ensureAbstractVariableIsCreated to insert new abstract scopes, thus
invalidating the iterator in this loop and leading to hard-to-debug
crashes. Useful when manually reducing IR for testcases.

llvm-svn: 219628
2014-10-13 20:44:58 +00:00
Adrian Prantl 13c58820f8 constify the getters in SDNodeDbgValue.
llvm-svn: 219627
2014-10-13 20:43:47 +00:00
Chad Rosier df82a33d42 Refactor debug statement and remove dead argument. NFC.
llvm-svn: 219626
2014-10-13 19:46:39 +00:00
Benjamin Kramer 7000ca3f55 Modernize old-style static asserts. NFC.
llvm-svn: 219588
2014-10-12 17:56:40 +00:00
David Blaikie 325c5757aa Revert "DebugInfo: Ensure that all debug location scope chains from instructions within a function, lead to the function itself."
This invariant is violated (& the assertions fire) on some Objective C++
in the test-suite. Reverting while I investigate.

This reverts commit r219215.

llvm-svn: 219523
2014-10-10 18:46:21 +00:00
Hal Finkel 7a87f8a670 [MiSched] Fix a logic error in tryPressure()
Fixes a logic error in the MachineScheduler found by Steve Montgomery (and
confirmed by Andy). This has gone unfixed for months because the fix has been
found to introduce some small performance regressions. However, Andy has
recommended that, at this point, we fix this to avoid further dependence on the
incorrect behavior (and then follow-up separately on any regressions), and I
agree.

Fixes PR18883.

llvm-svn: 219512
2014-10-10 17:06:20 +00:00
David Blaikie 7d6f29d1ee Simplify a few uses of DwarfDebug::SPMap
llvm-svn: 219510
2014-10-10 16:59:52 +00:00
Timur Iskhodzhanov 2cf8a1ded8 Reorder functions in WinCodeViewLineTables.cpp [NFC]
This helps read the comments and understand the code in a natural order

llvm-svn: 219508
2014-10-10 16:05:32 +00:00
Benjamin Kramer 2c99e413ba Reduce double set lookups. NFC.
llvm-svn: 219505
2014-10-10 15:32:50 +00:00
Timur Iskhodzhanov 7edfc5948b Fix a small typo, NFC
llvm-svn: 219492
2014-10-10 12:52:58 +00:00
David Blaikie 4191cbce8c Sink the per-CU part of DwarfDebug::finishSubprogramDefinitions into DwarfCompileUnit.
llvm-svn: 219477
2014-10-10 06:39:29 +00:00
David Blaikie 58410f241e Sink most of DwarfDebug::constructAbstractSubprogramScopeDIE down into DwarfCompileUnit.
llvm-svn: 219476
2014-10-10 06:39:26 +00:00
David Blaikie 9ab48849ad Avoid unnecessary map lookup/insertion.
llvm-svn: 219466
2014-10-10 03:09:38 +00:00
Sanjay Patel 3d497cd778 Improve sqrt estimate algorithm (fast-math)
This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))

This has 2 benefits: less code / faster code and one less estimate instruction 
that may lose precision.

The only target that will be affected (until http://reviews.llvm.org/D5658 is approved)
is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf
or vector sqrtf and 4 less flops for a double-precision sqrt. 
We also eliminate a constant load and extra register usage.

Differential Revision: http://reviews.llvm.org/D5682

llvm-svn: 219445
2014-10-09 21:26:35 +00:00
Sanjay Patel 6d28da10e5 delete function names from comments
llvm-svn: 219444
2014-10-09 21:24:46 +00:00
David Blaikie 73cc705a37 Remove unused parameter
llvm-svn: 219440
2014-10-09 20:36:27 +00:00
David Blaikie 78b65b6f2c Sink DwarfDebug::createAndAddScopeChildren down into DwarfCompileUnit.
llvm-svn: 219437
2014-10-09 20:26:15 +00:00
David Blaikie 1d072348cf Sink DwarfDebug::constructSubprogramScopeDIE down into DwarfCompileUnit
llvm-svn: 219436
2014-10-09 20:21:36 +00:00
David Blaikie 8b2fdb83c5 Sink DwarfDebug::createScopeChildrenDIE down into DwarfCompileUnit.
llvm-svn: 219422
2014-10-09 18:24:28 +00:00
Lang Hames 8f31f448c5 [PBQP] Replace PBQPBuilder with composable constraints (PBQPRAConstraint).
This patch removes the PBQPBuilder class and its subclasses and replaces them
with a composable constraints class: PBQPRAConstraint. This allows constraints
that are only required for optimisation (e.g. coalescing, soft pairing) to be
mixed and matched.

This patch also introduces support for target writers to supply custom
constraints for their targets by overriding a TargetSubtargetInfo method:

std::unique_ptr<PBQPRAConstraints> getCustomPBQPConstraints() const;

This patch should have no effect on allocations.

llvm-svn: 219421
2014-10-09 18:20:51 +00:00
David Blaikie 4a1a44e3bf Sink DwarfDebug.cpp::constructVariableDIE into DwarfCompileUnit.
llvm-svn: 219419
2014-10-09 17:56:39 +00:00
David Blaikie ee7df55306 Move DwarfUnit::constructVariableDIE down to DwarfCompileUnit, since it's only needed there.
llvm-svn: 219418
2014-10-09 17:56:36 +00:00
David Blaikie 0fbf8bdb08 Sink DwarfDebug::constructLexicalScopeDIE into DwarfCompileUnit
llvm-svn: 219414
2014-10-09 17:08:42 +00:00
David Blaikie a09bd0a15a Missing reformatting
llvm-svn: 219413
2014-10-09 17:08:38 +00:00
David Blaikie 01b48a84dc Sink DwarfDebug::constructInlinedScopeDIE into DwarfCompileUnit
This introduces access to the AbstractSPDies map from DwarfDebug so
DwarfCompileUnit can access it. Eventually this'll sink down to
DwarfFile, but it'll still be generically accessible - not much
encapsulation to provide it. (constructInlinedScopeDIE could stay
further up, in DwarfFile to avoid exposing this - but I don't think
that's particularly better)

llvm-svn: 219411
2014-10-09 16:50:53 +00:00
Eric Christopher edba30c434 Remove more calls to getSubtargetImpl from the schedulers and
remove cached or unnecessary TargetMachines.

llvm-svn: 219387
2014-10-09 06:28:06 +00:00
Eric Christopher 143f02c47d Remove unused argument to CreateTargetScheduleState and change
the TargetMachine to a TargetSubtargetInfo since everything
we wanted is off of that.

llvm-svn: 219382
2014-10-09 01:59:35 +00:00
Eric Christopher caf275126e Remove uses of getSubtargetImpl from ResourcePriorityQueue and
replace them with calls off of the MachineFuncton.

llvm-svn: 219381
2014-10-09 01:59:31 +00:00
Eric Christopher 147c2ea05a Remove the uses of getSubtargetImpl from InstrEmitter and remove
the now unused TargetMachine variable.

llvm-svn: 219379
2014-10-09 01:35:29 +00:00
Eric Christopher 85de8f98a9 Use the subtarget on the dag to get TargetFrameLowering rather
than off the target machine.

llvm-svn: 219378
2014-10-09 01:35:27 +00:00
Eric Christopher 2ae2de7562 Remove uses of the TargetMachine from FunctionLoweringInfo
via caching TargetLowering and using the MachineFunction.

llvm-svn: 219375
2014-10-09 00:57:31 +00:00
David Blaikie de12375c96 Push DwarfDebug::attachRangesOrLowHighPC down into DwarfCompileUnit
llvm-svn: 219372
2014-10-09 00:21:42 +00:00
David Blaikie 524002004d Sink DwarfDebug::addScopeRangeList down into DwarfCompileUnit
(& add a few accessors/make a couple of things public for this - it's a
bit of a toss-up, but I think I prefer it this way, keeping some more of
the meaty code down in DwarfCompileUnit - if only to make for smaller
implementation files, etc)

I think we could simplify range handling a bit if we removed the range
lists from each unit and just put a single range list on DwarfDebug,
similar to address pooling.

llvm-svn: 219370
2014-10-09 00:11:39 +00:00
Eric Christopher 40cba91ad1 Remove unnecessary include.
llvm-svn: 219368
2014-10-08 23:38:40 +00:00
Eric Christopher f55d4714d2 Use both the cached TLI and the subtarget off of the DAG in
the DAG combiner.

llvm-svn: 219367
2014-10-08 23:38:39 +00:00
Eric Christopher 4e3d6ded99 Remove getSubtargetImpl calls from FastISel, we can get it from
the MachineFunction where it's already cached.

llvm-svn: 219366
2014-10-08 23:38:33 +00:00
David Blaikie e5feec502d Sink DwarfUnit::addSectionDelta into DwarfCompileUnit, the only place it's needed.
llvm-svn: 219364
2014-10-08 23:30:05 +00:00
David Blaikie 33702a31e8 Reformat some stuff I missed in recent previous commits
llvm-svn: 219356
2014-10-08 23:09:42 +00:00
David Blaikie 6c0ee4ece3 Sink and coalesce DwarfDebug.cpp::addSectionLabel and DwarfUnit::addSectionLabel down into DwarfCompileUnit::addSectionLabel
llvm-svn: 219351
2014-10-08 22:46:27 +00:00
Eric Christopher ffcbe9b048 Remove dead call to getTypeToTransformTo. The result is
unused.

llvm-svn: 219347
2014-10-08 22:25:45 +00:00
David Blaikie f76aeaec66 DebugInfo: The rest of pushing DwarfDebug::constructScopeDIE down into DwarfCompileUnit
Funnily enough, I copied it, but didn't actually remove the original in
r219345. Let's do that.

llvm-svn: 219346
2014-10-08 22:23:10 +00:00
David Blaikie 9c65b1355c Push DwarfDebug::constructScopeDIE down into DwarfCompileUnit
One of many steps to generalize subprogram emission to both the DWO and
non-DWO sections (to emit -gmlt-like data under fission). Once the
functions are pushed down into DwarfCompileUnit some of the data
structures will be pushed at least into DwarfFile so that they can be
unique per-file, allowing emission to both files independently.

llvm-svn: 219345
2014-10-08 22:20:02 +00:00
Eric Christopher 1e845f269b Remove a bunch of getSubtargetImpl calls since we already have
a cached TLI instance.

llvm-svn: 219342
2014-10-08 21:08:32 +00:00
Timur Iskhodzhanov 5fcaeebb72 Fix COFF section index relocation should be 16 bits, not 32
Original patch by Andrey Guskov!
http://reviews.llvm.org/D5651

llvm-svn: 219327
2014-10-08 18:01:49 +00:00
Eric Christopher 58a2461368 Use the TargetLowering information we already have on the
SelectionDAG in SelectionDAGBuilder rather than going through
the TargetMachine for lookup.

llvm-svn: 219292
2014-10-08 09:50:54 +00:00
Eric Christopher d2670a34c9 Grab the TargetRegisterInfo off of the subtarget from the
MachineFunction rather than a lookup on the TargetMachine
to avoid unnecessary lookups.

llvm-svn: 219291
2014-10-08 09:50:52 +00:00
Eric Christopher 000ef037d4 Replace calls to get the subtarget and TargetFrameLowering with
cached variables and a single call in the constructor.

llvm-svn: 219287
2014-10-08 08:46:34 +00:00
Eric Christopher 51bedaf223 Use cached subtarget rather than looking it up on the
TargetMachine again.

llvm-svn: 219285
2014-10-08 07:51:41 +00:00
Eric Christopher b17140de35 Cache TargetLowering on SelectionDAGISel and update previous
calls to getTargetLowering() with the cached variable.

llvm-svn: 219284
2014-10-08 07:32:17 +00:00
Eric Christopher 60eb343e83 Cache SelectionDAGISel TargetInstrInfo lookups on the class and
propagate. Also use the TargetSubtargetInfo and the MachineFunction
and move TargetRegisterInfo query closer to uses.

llvm-svn: 219273
2014-10-08 01:58:03 +00:00
Eric Christopher 896044822e Reset the target options and optimization level as the first
thing we do inside selection dag. This code needs to be
migrated to queries on the function rather than global
data, but this organizes things before we start grabbing
the subtarget.

llvm-svn: 219271
2014-10-08 01:58:01 +00:00
Eric Christopher 8d07f44560 Have the selection dag grab TargetLowering off of the subtarget
inside init rather than have it passed in as an argument.

llvm-svn: 219270
2014-10-08 01:57:58 +00:00
Eric Christopher d9636c1dcf Have SelectionDAG's subtarget TargetSelectionDAGInfo be set
during init rather than construction time.

llvm-svn: 219262
2014-10-08 00:32:59 +00:00
Sanjay Patel 25d3c1cf61 typos
llvm-svn: 219221
2014-10-07 17:38:33 +00:00
Sanjay Patel eb0cc1bbf3 typos
llvm-svn: 219220
2014-10-07 17:36:50 +00:00
David Blaikie ff669d1723 DebugInfo: Ensure that all debug location scope chains from instructions within a function, lead to the function itself.
Let me tell you a tale...

Originally committed in r211723 after discovering a nasty case of weird
scoping due to inlining, this was reverted in r211724 after it fired in
ASan/compiler-rt.

(minor diversion where I accidentally committed/reverted again in
r211871/r211873)

After further testing and fixing bugs in ArgumentPromotion (r211872) and
Inlining (r212065) it was recommitted in r212085. Reverted in r212089
after the sanitizer buildbots still showed problems.

Fixed another bug in ArgumentPromotion (r212128) found by this
assertion.

Recommitted in r212205, reverted in r212226 after it crashed some more
on sanitizer buildbots.

Fix clang some more in r212761.

Recommitted in r212776, reverted in r212793. ASan failures.
Recommitted in r213391, reverted in r213432, trying to reproduce flakey
ASan build failure.

Fixed bugs in r213805 (ArgPromo + DebugInfo), r213952
(LiveDebugVariables strips dbg_value intrinsics in functions not
described by debug info).

Recommitted in r214761, reverted in r214999, flakey failure on Windows
buildbot.

Fixed DeadArgElimination + DebugInfo bug in r219210.

Recommitting and hoping that's the last of it.

[That one burned down, fell over, then sank into the swamp.]

llvm-svn: 219215
2014-10-07 16:56:20 +00:00
Hal Finkel 9808595319 [DAGCombine] Remove SIGN_EXTEND-related inf-loop
The patch's author points out that, despite the function's documentation,
getSetCCResultType is only used to get the SETCC result type (with one
here-removed problematic exception). In one case, getSetCCResultType was being
used to get the predicate type to use for a SELECT node, and then
SIGN_EXTENDing (or truncating) to get the input predicate to match that type.
Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new
SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was
wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the
extension/truncation seems unnecessary here: SELECT is defined as:

  Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1
  then the high bits must conform to getBooleanContents.

So here we remove this use of getSetCCResultType and update
getSetCCResultType's documentation to reflect its actual uses.

Patch by deadal nix!

llvm-svn: 219141
2014-10-06 20:19:47 +00:00
Sanjay Patel 7bc9185ab5 Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)
The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c:

float distance = sqrt(dx * dx + dy * dy + dz * dz);
float mag = dt / (distance * distance * distance);

Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces:

   addis 3, 2, .LCPI4_2@toc@ha
   lfs 4, .LCPI4_2@toc@l(3)
   addis 3, 2, .LCPI4_1@toc@ha
   lfs 0, .LCPI4_1@toc@l(3)
   fcmpu 0, 1, 4
   beq 0, .LBB4_2
# BB#1:
   frsqrtes 4, 1
   addis 3, 2, .LCPI4_0@toc@ha
   lfs 5, .LCPI4_0@toc@l(3)
   fnmsubs 13, 1, 5, 1
   fmuls 6, 4, 4
   fmadds 1, 13, 6, 5
   fmuls 1, 4, 1
   fres 4, 1                <--- reciprocal of reciprocal square root
   fnmsubs 1, 1, 4, 0
   fmadds 4, 4, 1, 4
.LBB4_2:
   fmuls 1, 4, 2
   fres 2, 1
   fnmsubs 0, 1, 2, 0
   fmadds 0, 2, 0, 2
   fmuls 1, 3, 0
   blr

After the patch, this simplifies to:

frsqrtes 0, 1
addis 3, 2, .LCPI4_1@toc@ha
fres 5, 2
lfs 4, .LCPI4_1@toc@l(3)
addis 3, 2, .LCPI4_0@toc@ha
lfs 7, .LCPI4_0@toc@l(3)
fnmsubs 13, 1, 4, 1
fmuls 6, 0, 0
fnmsubs 2, 2, 5, 7
fmadds 1, 13, 6, 4
fmadds 2, 5, 2, 5
fmuls 0, 0, 1
fmuls 0, 0, 2
fmuls 1, 3, 0
blr

Differential Revision: http://reviews.llvm.org/D5628

llvm-svn: 219139
2014-10-06 19:31:18 +00:00
Benjamin Kramer 6bf8af5de9 DbgValueHistoryCalculator: Store modified registers in a BitVector instead of std::set.
And iterate over the smaller map instead of the larger set first.  Reduces the time spent in
calculateDbgValueHistory by 30-40%.

llvm-svn: 219123
2014-10-06 15:31:04 +00:00
David Blaikie febfafd13a DebugInfo: Sink constructImportedEntityDIE down into DwarfUnit from DwarfDebug.
It was just calling a bunch of DwarfUnit functions anyway, as can be
seen by the simplification of removing "TheCU" from all the function
calls in the implementation.

llvm-svn: 219103
2014-10-06 05:37:24 +00:00
Chandler Carruth daa1ff985c [x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffle
that are unused.

This allows the combiner to delete math feeding shuffles where the math
isn't actually necessary. This improves some of the vperm2x128 tests
that regressed when the vector shuffle lowering started actually
generating vperm instructions rather than forcibly decomposing them.

Sadly, this isn't enough to get this *really* right because we still
form a completely unnecessary permutation. To fix that, we also need to
fold shuffles which just rearrange concatenated or inserted subvectors.

llvm-svn: 219086
2014-10-05 19:14:34 +00:00
David Blaikie 60b8662ea7 Remove unused map
This became unnecessary/unused in r208636

llvm-svn: 219085
2014-10-05 16:31:13 +00:00
Benjamin Kramer 2e52f02864 Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and weirdness exposed by it.
llvm-svn: 219068
2014-10-04 22:44:29 +00:00
Benjamin Kramer c6cc58e703 Remove unnecessary copying or replace it with moves in a bunch of places.
NFC.

llvm-svn: 219061
2014-10-04 16:55:56 +00:00
David Blaikie cda2aa823e Sink DwarfDebug::updateSubprogramScopeDIE into DwarfCompileUnit
This requires exposing some of the current function state from
DwarfDebug. I hope there's not too much of that to expose as I go
through all the functions, but it still seems nicer to expose singular
data down to multiple consumers, than have consumers expose raw mapping
data structures up to DwarfDebug for building subprograms.

Part of a series of refactoring to allow subprograms in both the
skeleton and dwo CUs under Fission.

llvm-svn: 219060
2014-10-04 16:24:00 +00:00
David Blaikie 8945219dc9 Reformatting accidentally left out of r219057
llvm-svn: 219059
2014-10-04 16:00:26 +00:00
David Blaikie 14499a7d68 Sink DwarfDebug::attachLowHighPC into DwarfCompileUnit
One of many things to sink down into DwarfCompileUnit to allow handling
of subprograms in both the skeleton and dwo CU under Fission.

llvm-svn: 219058
2014-10-04 15:58:47 +00:00
David Blaikie 37c5231051 Move DwarfCompileUnit from DwarfUnit.h to its own header (DwarfCompileUnit.h)
In preparation for sinking all the subprogram emission code down from
DwarfDebug into DwarfCompileUnit, this will avoid bloating
DwarfUnit.h/cpp greatly and make concerns a bit more clear/isolated.

(sinking this handling down is part of the work to handle emitting
minimal subprograms for -gmlt-like data into the skeleton CU under
fission)

llvm-svn: 219057
2014-10-04 15:49:50 +00:00
Duncan P. N. Exon Smith 176b691d32 Revert "Revert "DI: Fold constant arguments into a single MDString""
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash.  The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).

Original commit message follows.

--

This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 219010
2014-10-03 20:01:09 +00:00
Adam Nemet ff63a2dc51 [ISel] Keep matching state consistent when folding during X86 address match
In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends.  During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.

However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node.  In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE.  Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.

Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat.  Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node).  We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed.  On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.

I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask.  This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user.  Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).

So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE.  This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly.  (Previous patches used HandleSDNode to detect the
CSE but that's not practical here).  The listener is only installed on X86.

I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc.  The only thing we pay for is the
creation of the listener.  The callback never ever triggers in spec2k since
this is a corner case.

Fixes rdar://problem/18206171

llvm-svn: 219009
2014-10-03 20:00:34 +00:00
Benjamin Kramer e12a6bac32 Eliminate some deep std::vector copies. NFC.
llvm-svn: 218999
2014-10-03 18:33:16 +00:00
Renato Golin 4e31ae1051 Revert 202433 - Provide a target override for the latest regalloc heuristic
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.

See PR18996.

llvm-svn: 218981
2014-10-03 12:20:53 +00:00
Chandler Carruth 7425c8c279 Fix the threshold added in r186434 (a re-apply of r185393) and updaated
to be a ManagedStatic in r218163 to not be a global variable written and
read to from within the innards of SpillPlacement.

This will fix a really scary race condition for anyone that has two
copies of LLVM running spill placement concurrently. Yikes!

This will also fix a really significant compile time hit that r218163
caused because the spill placement threshold read is actually in the
*very* hot path of this code. The memory fence on each read was showing
up as huge compile time regressions when spilling is responsible for
most of the compile time. For example, optimizing sanitized code showed
over 50% compile time regressions here. =/

llvm-svn: 218921
2014-10-02 22:23:14 +00:00
Duncan P. N. Exon Smith 786cd049fc Revert "DI: Fold constant arguments into a single MDString"
This reverts commit r218914 while I investigate some bots.

llvm-svn: 218918
2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith 571f97bd90 DI: Fold constant arguments into a single MDString
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 218914
2014-10-02 21:56:57 +00:00
Adrian Prantl 87b7eb9d0f Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00
Adrian Prantl b458dc2eee Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
2014-10-01 18:10:54 +00:00
Adrian Prantl 25a7174e7a Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778
2014-10-01 17:55:39 +00:00
Jingyue Wu fd47fb9976 Revert r216862 due to a performance regression
Reported by Alexey Volkov in PR21115

llvm-svn: 218771
2014-10-01 15:22:13 +00:00
David Blaikie 32b0f365a2 Implement DW_TAG_subrange_type with DW_AT_count rather than DW_AT_upper_bound
This allows proper disambiguation of unbounded arrays and arrays of zero
bound ("struct foo { int x[]; };" and "struct foo { int x[0]; }"). GCC
instead produces an upper bound of -1 in the latter situation, but count
seems tidier. This way lower_bound is provided if it's not the language
default and count is provided if the count is known, otherwise it's
omitted. Simple.

If someone wants to look at rdar://problem/12566646 and see if this
change is acceptable to that bug/fix, that might be helpful (see the
empty-and-one-elem-array.ll test case which cites that radar).

llvm-svn: 218726
2014-10-01 00:56:55 +00:00
David Blaikie 6cca8109ab Omit DW_AT_inline under -gmlt to save a little more space.
llvm-svn: 218719
2014-09-30 23:29:16 +00:00
David Blaikie 1cae849c04 DebugInfo: Sink the code emitting DW_AT_APPLE_omit_frame_ptr down to a more common spot.
No functional change. Pre-emptive refactoring before I start pushing
some of this subprogram creation down into DWARFCompileUnit so I can
build different subprograms in the skeleton unit from the dwo unit for
adding -gmlt-like data to the skeleton.

llvm-svn: 218713
2014-09-30 22:32:49 +00:00
David Blaikie e1c79749ca Disable the -gmlt optimization implemented in r218129 under Darwin due to issues with dsymutil.
r218129 omits DW_TAG_subprograms which have no inlined subroutines when
emitting -gmlt data. This makes -gmlt very low cost for -O0 builds.

Darwin's dsymutil reasonably considers a CU empty if it has no
subprograms (which occurs with the above optimization in -O0 programs
without any force_inline function calls) and drops the line table, CU,
and everything in this situation, making backtraces impossible.

Until dsymutil is modified to account for this, disable this
optimization on Darwin to preserve the desired functionality.
(see r218545, which should be reverted after this patch, for other
discussion/details)

Footnote:
In the long term, it doesn't look like this scheme (of simplified debug
info to describe inlining to enable backtracing) is tenable, it is far
too size inefficient for optimized code (the DW_TAG_inlined_subprograms,
even once compressed, are nearly twice as large as the line table
itself (also compressed)) and we'll be considering things like Cary's
two level line table proposal to encode all this information directly in
the line table.

llvm-svn: 218702
2014-09-30 21:28:32 +00:00
Sanjay Patel ab7f460bca Use the target-specified iteration count to opt out of any further refinement of an estimate. NFC.
llvm-svn: 218700
2014-09-30 20:44:23 +00:00
Sanjay Patel 8fde95cb2b Split the estimate() interface into separate functions for each type. NFC.
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...

llvm-svn: 218698
2014-09-30 20:28:48 +00:00
Andrea Di Biagio c7c524129b [DAG] Check in advance if a build_vector has a legal type before attempting to convert it into a shuffle.
Currently, the DAG Combiner only tries to convert type-legal build_vector nodes
into shuffles. This patch simply moves the logic that checks if a
build_vector has a legal value type up before we even start analyzing the
operands. This allows to early exit immediately from method
'visitBUILD_VECTOR' if the node type is known to be illegal.

No functional change intended.

llvm-svn: 218677
2014-09-30 15:30:22 +00:00
Matt Arsenault 93ffe58f90 Add MachineOperand::ChangeToFPImmediate and setFPImm
llvm-svn: 218579
2014-09-28 19:24:59 +00:00
James Molloy 463db9a77c [AArch64] Redundant store instructions should be removed as dead code
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.

This problem is found in spec2006-197.parser.

For example,
  stur    w10, [x11, #-4]
  stur    w10, [x11, #-4]
Then one of the two stur instructions can be removed.

Patch by David Xu!

llvm-svn: 218569
2014-09-27 17:02:54 +00:00
Sanjay Patel bdf1e38856 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484

llvm-svn: 218553
2014-09-26 23:01:47 +00:00
David Xu 418da223dd Revert patch ofr218493
llvm-svn: 218494
2014-09-26 02:28:03 +00:00
David Xu 64f661ee0b Redundant store instructions should be removed as dead code
llvm-svn: 218493
2014-09-26 02:02:09 +00:00
Eric Christopher 3976f78247 Move resetTargetOptions from taking a MachineFunction to a Function
since we are accessing the TargetMachine that we're a member
function of.

llvm-svn: 218489
2014-09-26 01:28:10 +00:00
Bruno Cardoso Lopes d04f7596e7 [MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfo
Machine Sink uses loop depth information to select between successors BBs to
sink machine instructions into, where BBs within smaller loop depths are
preferable.  This patch adds support for choosing between successors by using
profile information from BlockFrequencyInfo instead, whenever the information
is available.

Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5%
execution speedup in average on x86-64 darwin.

<rdar://problem/18021659>

llvm-svn: 218472
2014-09-25 23:14:26 +00:00
Tom Stellard 529efcf9d0 SelectionDAG: Remove #if NDEBUG from check for a post-isel hook
The InstrEmitter will skip the check of MI.hasPostISelHook()
before calling AdjustInstrPostInstrSelection() when NDEBUG
is not defined.

This was added in r140228, and I'm not sure if it is intentional or not,
but it is a likely source for bugs, because it means with
Release+Asserts builds you can forget to set the hasPostISelHook
flag on TableGen definitions and AdjustInstrPostInstrSelection() will
still be called.

llvm-svn: 218458
2014-09-25 18:59:22 +00:00
Robin Morisset 810739d174 Lower idempotent RMWs to fence+load
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).

This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.

Details of the optimization are discussed in D5091.

Test Plan: make check-all + a new test

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5422

llvm-svn: 218455
2014-09-25 17:27:43 +00:00
Jiangning Liu 3b096172cf Clear PreferredExtendType for in each function-specific state FunctionLoweringInfo.
llvm-svn: 218364
2014-09-24 03:22:56 +00:00
Robin Morisset 6dbbbc28b0 [X86] Make wide loads be managed by AtomicExpand
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.

Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).

Test Plan: make check-all (lots of tests for this functionality already exist)

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5404

llvm-svn: 218332
2014-09-23 20:59:25 +00:00
Robin Morisset dedef3325f Add AtomicExpandPass::bracketInstWithFences, and use it whenever getInsertFencesForAtomic would trigger in SelectionDAGBuilder
Summary:
The goal is to eventually remove all the code related to getInsertFencesForAtomic
in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works
mostly by accident because the backends are overly conservative), and repeats the
same logic that goes in emitLeading/TrailingFence.

In this patch, I make AtomicExpandPass insert the fences as it knows better
where to put them. Because this requires getting the fences and not just
passing an IRBuilder around, I had to change the return type of
emitLeading/TrailingFence.
This code only triggers on ARM for now. Because it is earlier in the pipeline
than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so
SelectionDAGBuilder does not add barriers anymore on ARM.

If this patch is accepted I plan to implement emitLeading/TrailingFence for all
backends that setInsertFencesForAtomic(true), which will allow both making them
less conservative and simplifying SelectionDAGBuilder once they are all using
this interface.

This should not cause any functionnal change so the existing tests are used
and not modified.

Test Plan: make check-all, benefits from existing tests of atomics on ARM

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5179

llvm-svn: 218329
2014-09-23 20:31:14 +00:00
Lang Hames d5f496d57c [MCJIT] Nuke MachineRelocation and MachineCodeEmitter. Now that the old JIT is
gone they're no longer needed.

llvm-svn: 218320
2014-09-23 18:08:47 +00:00
Sanjay Patel 6a42292795 Use SDValue bool operator to reduce code. No functional change.
llvm-svn: 218314
2014-09-23 16:24:20 +00:00
David Majnemer 597be2ded6 MC: ReadOnlyWithRel section kinds should map to rdata in COFF
Don't consider ReadOnlyWithRel as a writable section in COFF, they
really belong in .rdata.

llvm-svn: 218268
2014-09-22 20:39:23 +00:00
Sanjay Patel b67bd262ea Refactor reciprocal square root estimate into target-independent function; NFC.
This is purely a plumbing patch. No functional changes intended.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.

Next steps:

    The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
    Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
    X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.

Differential Revision: http://reviews.llvm.org/D5425

llvm-svn: 218219
2014-09-21 15:19:15 +00:00
Sanjay Patel d649235fc3 mop up: "Don’t duplicate function or class name at the beginning of the comment."
llvm-svn: 218218
2014-09-21 14:48:16 +00:00
Sanjay Patel 69df41e92e mop up: "Don’t duplicate function or class name at the beginning of the comment."
llvm-svn: 218194
2014-09-20 22:39:16 +00:00
David Majnemer b8dbebb31c MC: Treat ReadOnlyWithRel and ReadOnlyWithRelLocal as ReadOnly for COFF
A problem with our old behavior becomes observable under x86-64 COFF
when we need a read-only GV which has an initializer which is referenced
using a relocation: we would mark the section as writable.  Marking the
section as writable interferes with section merging.

This fixes PR21009.

llvm-svn: 218179
2014-09-20 07:31:46 +00:00
Peter Collingbourne 975726345c Fix crash with an insertvalue that produces an empty object.
llvm-svn: 218171
2014-09-20 00:10:47 +00:00
Chris Bieneman 1a98490ce5 Converting SpillPlacement's BlockFrequency threshold to a ManagedStatic to avoid static constructors and destructors.
llvm-svn: 218163
2014-09-19 22:46:28 +00:00
David Blaikie 3a7ce252cc Omit DW_TAG_subprograms for subprograms without inlined subroutines when producing -gmlt data
To reduce the size of -gmlt data, skip the subprograms without any
inlined subroutines. Since we've now got the ability to make these
determinations in the backend (funnily enough - we added the flag so we
wouldn't produce ranges under -gmlt, but with this change we use the
flag, but go back to producing ranges under -gmlt).

Instead, just produce CU ranges to inform the consumer which parts of
the code are described by this CU's line table. Tools could inspect the
line table directly to compute the range, but the CU ranges only seem to
be about 0.5% of object/executable size, so I'm not too worried about
teaching llvm-symbolizer that trick just yet - it's certainly a possible
piece of future work.

Update an llvm-symbolizer test just to demonstrate that this schema is
acceptable there (if it wasn't, the compiler-rt tests would catch this,
but good to have an in-llvm-tree test for llvm-symbolizer's behavior
here)

Building the clang binary with -gmlt with this patch reduces the total
size of object files by 5.1% (5.56% without ranges) without compression
and the executable by 4.37% (4.75% without ranges).

llvm-svn: 218129
2014-09-19 17:03:16 +00:00
Frederic Riss 9ba9efff56 Change DwarfCompileUnit::createGlobalVariable to getOrCreateGlobalVariable.
Summary:
This will allow to request the creation of a forward delacred variable
at is point of use (for imported declarations, this will be
DwarfDebug::constructImportedEntityDIE) rather than having to put the
forward decl in a retention list.

Note that getOrCreateGlobalVariable returns the actual definition DIE when the
routine creates a declaration and a definition DIE. If you agree this is the
right behavior, then I'll have a followup patch that registers the definition
in the DIE map instead of the declaration as it is today (this 'breaks' only
one test, where we test that the imported entity is the declaration). I'm
not sure what's best here, but it's easy enough for a consumer to follow the
DW_AT_specification link to get to the declaration, whereas it takes more
work to find the actual definition from a declaration DIE.

Reviewers: echristo, dblaikie, aprantl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5381

llvm-svn: 218126
2014-09-19 15:12:03 +00:00
Hal Finkel 62ac736faa Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

llvm-svn: 218120
2014-09-19 11:42:56 +00:00
Jiangning Liu ffbc690933 Optimize sext/zext insertion algorithm in back-end.
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.

llvm-svn: 218101
2014-09-19 05:30:35 +00:00
David Blaikie 03c3dbeb62 Omit DW_AT_frame_base under -gmlt for size
llvm-svn: 218100
2014-09-19 04:55:05 +00:00
David Blaikie 0b9438b1c1 Describe the -gmlt optimization committed in the previous revision.
llvm-svn: 218099
2014-09-19 04:47:46 +00:00
David Blaikie 73b65d236c Omit all the extra static attributes on subprograms in -gmlt
This omission will be done in a fancier manner once we're dealing with
"put gmlt in the skeleton CUs under fission" - it'll have to be
conditional on the kind of CU we're emitting into (skeleton or gmlt).

llvm-svn: 218098
2014-09-19 04:30:36 +00:00
Hans Wennborg c0f0c511db Fix an it's vs. its typo.
llvm-svn: 218093
2014-09-19 01:14:56 +00:00
Frederic Riss 0baab0cded Revert part of r218041.
The patch moved some logic around in an attempt to generate potentially more
DW_AT_declaration attributes. The patch was flawed though and it stopped
generating the attribute in some cases.

llvm-svn: 218060
2014-09-18 16:41:04 +00:00
Frederic Riss be26dfb595 Always emit DW_AT_declaration attribute when the variable isn't a definition.
Summary:
This doesn't show up today as we don't emit decalration only variables. This
will be tested when the followup patches implementing import of forward
declared entities lands in clang.

Reviewers: echristo, dblaikie, aprantl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5382

llvm-svn: 218041
2014-09-18 09:38:23 +00:00
Eric Christopher d85ffb1fc0 Add a new pass FunctionTargetTransformInfo. This pass serves as a
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.

No functional change.

llvm-svn: 218004
2014-09-18 00:34:14 +00:00
Robin Morisset 25c8e318e4 [X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPass
This required a new hook called hasLoadLinkedStoreConditional to know whether
to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to
CmpXchg (X86).

Apart from that, the new code in AtomicExpandPass is mostly moved from
X86AtomicExpandPass. The main result of this patch is to get rid of that
pass, which had lots of code duplicated with AtomicExpandPass.

llvm-svn: 217928
2014-09-17 00:06:58 +00:00
Quentin Colombet ac55b15bf4 [CodeGenPrepare][AddressingModeMatcher] The promotion mechanism was expecting
instructions when truncate, sext, or zext were created. Fix that.

llvm-svn: 217926
2014-09-16 22:36:07 +00:00
Owen Anderson bfc80a45a7 Add back a fallback case for targets that do not or cannot implement getNoopForMachoTarget().
llvm-svn: 217899
2014-09-16 20:28:00 +00:00
Hal Finkel cc4f31d3d7 Fix BasicTTI::getCmpSelInstrCost to deal with illegal vector types
The default implementation of getCmpSelInstrCost, which provides the cost of
icmp/fcmp/select instructions, did not deal sensibly with illegal vector types
that were scalarized. We'd ask for the legalization cost of the vector type,
which would return something like (4, f64) given an input of <4 x double>, and
we'd then check the TLI status of the ISD opcode on that scalar type. This would
result in querying (ISD::VSELECT, f64), for example. Amusingly enough,
ISD::VSELECT on scalar types is marked as Legal by default (as with most other
operations), and most backends never change this because VSELECT is never
generated on scalars. However, seeing the resulting operation as Legal, we'd
neglect to add the scalarization cost before returning. The result is that we'd
grossly under-estimate the cost of cmps/selects on illegal vector types.

Now, if type legalization clearly results in scalarization, we skip the early
return and add the scalarization cost.

llvm-svn: 217859
2014-09-16 04:35:50 +00:00
David Blaikie ba656e1d7c DebugInfo: Add comment describing the need to disable address pool usage in skeleton units.
Post commit review from Eric Christopher.

llvm-svn: 217842
2014-09-15 22:41:25 +00:00
Sanjay Patel d4f4c4e416 Replace repeated null checks with an assert. NFC.
Without a vector to hold the created ops, these 
functions don't have any use.

llvm-svn: 217831
2014-09-15 21:52:51 +00:00
Juergen Ributzka d111d29f90 [FastISel] Move optimizeCmpPredicate to FastISel base class. NFC.
Make the optimizeCmpPredicate function available to all targets.

llvm-svn: 217822
2014-09-15 20:47:13 +00:00
Sanjay Patel bb29221129 Replace dead links to "Hacker's Delight" with general references. NFC.
llvm-svn: 217814
2014-09-15 19:47:44 +00:00
Rafael Espindola 6865d6f08a Fix a lot of confusion around inserting nops on empty functions.
On MachO, and MachO only, we cannot have a truly empty function since that
breaks the linker logic for atomizing the section.

When we are emitting a frame pointer, the presence of an unreachable will
create a cfi instruction pointing past the last instruction. This is perfectly
fine. The FDE information encodes the pc range it applies to. If some tool
cannot handle this, we should explicitly say which bug we are working around
and only work around it when it is actually relevant (not for ELF for example).

Given the unreachable we could omit the .cfi_def_cfa_register, but then
again, we could also omit the entire function prologue if we wanted to.

llvm-svn: 217801
2014-09-15 18:32:58 +00:00
Quentin Colombet 9dcb724d31 [CodeGenPrepare][AddressingModeMatcher] Fix a think-o for the sext(zext) -> zext promotion
introduced in r217629.
We were returning the old sext instead of the new zext as the promoted instruction!

Thanks Joerg Sonnenberger for the test case.

llvm-svn: 217800
2014-09-15 18:26:58 +00:00
Yaron Keren 66b0cebf7f In DwarfEHPrepare, after all passes are run, RewindFunction may be a dangling
pointer to a dead function. To make sure it's valid, doFinalization nullptrs
RewindFunction just like the constructor and so it will be found on next run.

llvm-svn: 217737
2014-09-14 20:36:28 +00:00
Owen Anderson e68ca8d4ba Allow targets to custom legalize vector insertion and extraction.
llvm-svn: 217711
2014-09-12 22:16:11 +00:00
Owen Anderson ec4f873d34 Remove an unnecessary restriction. MIsNeedChainEdge() should be checked even when scheduler AliasAnalysis is not
enabled.  A good chunk of the MIsNeedChainEdge() is logic that is valid and should be applied even for targets
that are not using for alias analysis.

llvm-svn: 217706
2014-09-12 21:17:55 +00:00
Benjamin Kramer 6d527ef9d6 Legalizer: Use the scalar bit width when promoting bit counting instrs on
vectors.

e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup
the result by 32 bits, not 64. PR20917.

llvm-svn: 217671
2014-09-12 12:50:27 +00:00
Quentin Colombet b2c5c6dde3 [CodeGenPrepare] Teach the addressing mode matcher how to promote zext.
I.e., teach it about 'sext (zext a to ty) to ty2' => zext a to ty2.

llvm-svn: 217629
2014-09-11 21:22:14 +00:00
David Blaikie 6741bb09bb Remove the unused string section symbol parameter from DwarfFile::emitStrings
And since it /looked/ like the DwarfStrSectionSym was unused, I tried
removing it - but then it turned out that DwarfStringPool was
reconstructing the same label (and expecting it to have already been
emitted) and uses that.

So I kept it around, but wanted to pass it in to users - since it seemed
a bit silly for DwarfStringPool to have it passed in and returned but
itself have no use for it. The only two users don't handle strings in
both .dwo and .o files so they only ever need the one symbol - no need
to keep it (and have an unused symbol) in the DwarfStringPool used for
fission/.dwo.

Refactor a bunch of accelerator table usage to remove duplication so I
didn't have to touch 4-5 callers.

llvm-svn: 217628
2014-09-11 21:12:48 +00:00
Matt Arsenault 8239eaab99 Add DAG combine for shl + add of constants.
Do
 (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)

This is already done for multiplies, but since multiplies
by powers of two are turned into shifts, we also need
to handle it here.

This might want checks for isLegalAddImmediate to avoid
transforming an add of a legal immediate with one that isn't.

llvm-svn: 217610
2014-09-11 17:34:19 +00:00
Sanjay Patel 7bd228a82e Combine fmul vector FP constants when unsafe math is allowed.
This is an extension of the change made with r215820:
http://llvm.org/viewvc/llvm-project?view=revision&revision=215820

That patch allowed combining of splatted vector FP constants that are multiplied.

This patch allows combining non-uniform vector FP constants too by relaxing the
check on the type of vector. Also, canonicalize a vector fmul in the
same way that we already do for scalars - if only one operand of the fmul is a
constant, make it operand 1. Otherwise, we miss potential folds.

This fold is also done by -instcombine, but it's possible that extra
fmuls may have been generated during lowering.

Differential Revision: http://reviews.llvm.org/D5254

llvm-svn: 217599
2014-09-11 15:45:27 +00:00
David Xu f7aff68fe3 Build correct vector filled with undef nodes
llvm-svn: 217570
2014-09-11 05:10:28 +00:00
Adrian Prantl 1383d6f808 Cleanup: Use the appropriate API for accessing the DIVariable of a
DBG_VALUE intrinsic.

llvm-svn: 217533
2014-09-10 18:52:29 +00:00
Sanjay Patel b653de1ada Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066

llvm-svn: 217528
2014-09-10 17:58:16 +00:00
Yuri Gorshenin 3939dec1f7 [asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code.
Summary: [asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5189

llvm-svn: 217482
2014-09-10 09:45:49 +00:00
David Blaikie d0f103775a Sink PrevCU updating into DwarfUnit::addRange to ensure consistency
So that the two operations in DwarfDebug couldn't get separated (because
I accidentally separated them in some work in progress), put them
together. While we're here, move DwarfUnit::addRange to
DwarfCompileUnit, since it's not relevant to type units.

llvm-svn: 217468
2014-09-09 23:13:01 +00:00
David Blaikie deb174fed5 Remove DwarfDebug::PrevSection, PrevCU is sufficient for handling address range holes.
PrevSection/PrevCU are used to detect holes in the address range of a CU
to ensure the DW_AT_ranges does not include those holes. When we see a
function with no debug info, though it may be in the same range as the
prior and subsequent functions, there should be a gap in the CU's
ranges. By setting PrevCU to null in that case, the range would not be
extended to cover the gap.

llvm-svn: 217466
2014-09-09 22:56:36 +00:00
Patrik Hagglund 57d315b7c1 [MachineSinking] Conservatively clear kill flags after coalescing.
This solves the problem of having a kill flag inside a loop
with a definition of the register prior to the loop:

%vreg368<def> ...

Inside loop:

        %vreg520<def> = COPY %vreg368
        %vreg568<def,tied1> = add %vreg341<tied0>, %vreg520<kill>

=> was coalesced into =>

        %vreg568<def,tied1> = add %vreg341<tied0>, %vreg368<kill>

MachineVerifier then complained:
*** Bad machine code: Virtual register killed in block, but needed live out. ***

The kill flag for %vreg368 is incorrect, and is cleared by this patch.

This is similar to the clearing done at the end of
MachineSinking::SinkInstruction().

Patch provided by Jonas Paulsson.

Reviewed by Quentin Colombet and Juergen Ributzka.

llvm-svn: 217427
2014-09-09 07:47:00 +00:00
Hans Wennborg 18f0a986c1 Fast-ISel: Remove dead code after falling back from selecting call instructions (PR20863)
Previously, fast-isel would not clean up after failing to select a call
instruction, because it would have called flushLocalValueMap() which moves
the insertion point, making SavedInsertPt in selectInstruction() invalid.

Fixing this by making SavedInsertPt a member variable, and having
flushLocalValueMap() update it.

This removes some redundant code at -O0, and more importantly fixes PR20863.

Differential Revision: http://reviews.llvm.org/D5249

llvm-svn: 217401
2014-09-08 20:24:10 +00:00
Sanjay Patel 394c333e3e Group unsafe fmul math folds together for easier reading. No functional change.
llvm-svn: 217399
2014-09-08 20:16:42 +00:00
Sanjay Patel f4b7a6b030 Fix the FIXME that was just added in r217390 - remove a bunch of redundant fold permutations.
The testcases for these folds already exist in test/CodeGen/X86/fp-fast.ll.

llvm-svn: 217393
2014-09-08 18:22:51 +00:00
Sanjay Patel 8170dea280 group unsafe math folds together for easier reading
Also added a FIXME regarding redundant folds for non-canonicalized constants.

llvm-svn: 217390
2014-09-08 17:32:19 +00:00
Chad Rosier 3528c1e4c6 [AArch64] Improve AA to remove unneeded edges in the AA MI scheduling graph.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator Review: http://reviews.llvm.org/D5103

llvm-svn: 217371
2014-09-08 14:43:48 +00:00
David Blaikie c42f9ac01c DebugInfo: Do not use DW_FORM_GNU_addr_index in skeleton CUs, GDB 7.8 errors on this.
It's probably not a huge deal to not do this - if we could, maybe the
address could be reused by a subprogram low_pc and avoid an extra
relocation, but it's just one per CU at best.

llvm-svn: 217338
2014-09-07 17:31:42 +00:00
Sanjay Patel 75cc90eddc Allow vector fsub ops with constants to get the same optimizations as scalars.
This problem is bigger than just fsub, but this is the minimum fix to solve
fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve
zero subtraction with the same change.

llvm-svn: 217286
2014-09-05 22:26:22 +00:00
Sanjay Patel f4b8debc1c clean up; NFC
llvm-svn: 217278
2014-09-05 20:55:46 +00:00
Rafael Espindola b582372e87 Revert "Disable the fix for pr20793 because of a gnu ld bug."
This reverts commit r217211.

Both the bfd ld and gold outputs were valid. They were using a Rela relocation,
so the value present in the relocated location was not used, which caused me
to misread the output.

llvm-svn: 217264
2014-09-05 18:03:38 +00:00
Adrian Prantl e5e8ce64de Set the parent pointer of cloned DBG_VALUE instructions correctly.
Fixes PR20523.

When spilling variables onto the stack, spillVirtReg() is setting the
parent pointer of the cloned DBG_VALUE intrinsic for the stack location
to the parent pointer of the original intrinsic. MachineInstr parent
pointers should however always point to the parent basic block.

MBB is shadowing the MBB member variable. The instruction still ends up
being inserted into the right basic block, because it's inserted after MI
which serves as the iterator.

I failed at constructing a reliable testcase for this, see
http://llvm.org/bugs/show_bug.cgi?id=20523 for a large testcases.

llvm-svn: 217260
2014-09-05 17:10:10 +00:00
Rafael Espindola 7eb3b06ca2 Disable the fix for pr20793 because of a gnu ld bug.
llvm-svn: 217211
2014-09-05 00:14:12 +00:00
Rafael Espindola 7c7d7b92df Refactor to avoid code duplication. NFC.
llvm-svn: 217207
2014-09-05 00:02:50 +00:00
Rafael Espindola c4b4253f7c Fix pr20793.
With this patch the third field of llvm.global_ctors is also used on ELF.

llvm-svn: 217202
2014-09-04 23:03:58 +00:00
Reid Kleckner 7c4059eb89 MC Win64: Put unwind info for COMDAT code into the same COMDAT group
Summary:
This fixes a long standing issue where we would emit many little .text
sections and only one .pdata and .xdata section. Now we generate one
.pdata / .xdata pair per .text section and associate them correctly.

Fixes PR19667.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5181

llvm-svn: 217176
2014-09-04 17:42:03 +00:00
Juergen Ributzka 4bea494569 Revert r216803 "[MachineSinking] Clear kill flag of all operands at all their uses."
This reverts commit r216803, because it might have broken the buildbot.
The issue is tracked in PR20842.

llvm-svn: 217120
2014-09-04 02:07:36 +00:00
Robin Morisset ed3d48f161 Refactor AtomicExpandPass and add a generic isAtomic() method to Instruction
Summary:
Split shouldExpandAtomicInIR() into different versions for Stores/Loads/RMWs/CmpXchgs.
Makes runOnFunction cleaner (no more redundant checking/casting), and will help moving
the X86 backend to this pass.

This requires a way of easily detecting which instructions are atomic.
I followed the pattern of mayReadFromMemory, mayWriteOrReadMemory, etc.. in making
isAtomic() a method of Instruction implemented by a switch on the opcodes.

Test Plan: make check

Reviewers: jfb

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D5035

llvm-svn: 217080
2014-09-03 21:29:59 +00:00
Robin Morisset a47cb411dc Use target-dependent emitLeading/TrailingFence instead of the target-independent insertLeading/TrailingFence (in AtomicExpandPass)
Fixes two latent bugs:
- There was no fence inserted before expanded seq_cst load (unsound on Power)
- There was only a fence release before seq_cst stores (again unsound, in particular on Power)
    It is not even clear if this is correct on ARM swift processors (where release fences are
    DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift
    as it is not clear whether it is incorrect. I would love to get documentation stating
    whether it is correct or not.
These two bugs were not triggered because Power is not (yet) using this pass, and these
behaviours happen to be (mostly?) working on ARM
(although they completely butchered the semantics of the llvm IR).

See:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html
for an example of the problems that can be caused by the second of these bugs.

I couldn't see a way of fixing these in a completely target-independent way without
adding lots of unnecessary fences on ARM, hence the target-dependent parts of this
patch.

This patch implements the new target-dependent parts only for ARM (the default
of not doing anything is enough for AArch64), other architectures will use this
infrastructure in later patches.

llvm-svn: 217076
2014-09-03 21:01:03 +00:00
Juergen Ributzka 88e32517c4 [FastISel][tblgen] Rename tblgen generated FastISel functions. NFC.
This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.

Reviewed by Eric

llvm-svn: 217075
2014-09-03 20:56:59 +00:00
Juergen Ributzka 5b8bb4d7dd [FastISel] Rename public visible FastISel functions. NFC.
This commit renames the following public FastISel functions:
LowerArguments -> lowerArguments
SelectInstruction -> selectInstruction
TargetSelectInstruction -> fastSelectInstruction
FastLowerArguments -> fastLowerArguments
FastLowerCall -> fastLowerCall
FastLowerIntrinsicCall -> fastLowerIntrinsicCall
FastEmitZExtFromI1 -> fastEmitZExtFromI1
FastEmitBranch -> fastEmitBranch
UpdateValueMap -> updateValueMap
TargetMaterializeConstant -> fastMaterializeConstant
TargetMaterializeAlloca -> fastMaterializeAlloca
TargetMaterializeFloatZero -> fastMaterializeFloatZero
LowerCallTo -> lowerCallTo

Reviewed by Eric

llvm-svn: 217074
2014-09-03 20:56:52 +00:00
Eric Christopher b68e25330b Remove resetSubtargetFeatures as it is unused.
llvm-svn: 217071
2014-09-03 20:36:31 +00:00
Juergen Ributzka 7a76c2409e [FastISel] Some long overdue spring cleaning of FastISel.
Things got a little bit messy over the years and it is time for a little bit
spring cleaning.

This first commit is focused on the FastISel base class itself. It doxyfies all
comments, C++11fies the code where it makes sense, renames internal methods to
adhere to the coding standard, and clang-formats the files.

Reviewed by Eric

llvm-svn: 217060
2014-09-03 18:46:45 +00:00
Eric Christopher 79cc1e3ae7 Reinstate "Nuke the old JIT."
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reinstates commits r215111, 215115, 215116, 215117, 215136.

llvm-svn: 216982
2014-09-02 22:28:02 +00:00
Hal Finkel 445dda5c4a Add pass-manager flags to use CFL AA
Add -use-cfl-aa (and -use-cfl-aa-in-codegen) to add CFL AA in the default pass
managers (for easy testing).

llvm-svn: 216978
2014-09-02 22:12:54 +00:00
Juergen Ributzka 7e998fb5e6 [FastISel] Provide the option to skip target-independent instruction selection. NFC.
This allows the target to disable target-independent instruction selection and
jump directly into the target-dependent instruction selection code.

This can be beneficial for targets, such as AArch64, which could emit much
better code, but never got a chance to do so, because the target-independent
instruction selector was able to find an instruction sequence.

llvm-svn: 216947
2014-09-02 21:07:44 +00:00
Matt Arsenault c1a71217b3 Fix interference caused by fmul 2, x -> fadd x, x
If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.

llvm-svn: 216932
2014-09-02 19:02:53 +00:00
Reid Kleckner 0b2bccc3cd CodeGen: Handle va_start in the entry block
Also fix a small copy-paste bug in X86ISelLowering where Chain should
have been used in place of DAG.getEntryToken().

Fixes PR20828.

llvm-svn: 216929
2014-09-02 18:42:44 +00:00
Matt Arsenault 965de3050f Fix comment and unnecessary check for FP build_vectors.
This was copy-paste from the integer version, but
FP build_vectors don't truncate.

llvm-svn: 216928
2014-09-02 18:33:51 +00:00
Pete Cooper 1175945710 Change MCSchedModel to be a struct of statically initialized data.
This removes static initializers from the backends which generate this data, and also makes this struct match the other Tablegen generated structs in behaviour

Reviewed by Andy Trick and Chandler C

llvm-svn: 216919
2014-09-02 17:43:54 +00:00
David Blaikie 505e1b829f unique_ptrify PBQPBuilder::build
llvm-svn: 216918
2014-09-02 17:42:01 +00:00
Hal Finkel e19006ea22 Enable splitting indexing from loads with TargetConstants
When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant
offsets, as there is no guarantee that a backend can handle them on generic
ADDs (even if it generates them during address-mode matching) -- and,
specifically, applying this transformation directly with TargetConstants caused
a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less
than ideal. Instead, for non-opaque constants, we can convert them into regular
constants for use with the generated ADD (or SUB).

llvm-svn: 216908
2014-09-02 16:05:23 +00:00
Hal Finkel 51e6fa2201 Revert "Revert '[DAGCombiner] Split up an indexed load if only the base pointer value is live'"
I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The
underlying cause of the failure is that pre-inc loads with increments
represented by ISD::TargetConstants were being transformed into ISD:::ADDs with
ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they
were selected as invalid r+r adds.

This recommits r208640, rebased and with an exclusion for ISD::TargetConstant
increments. This behavior seems correct, although in the future we might want
to ask the target to split out the indexing that uses ISD::TargetConstants.

Unfortunately, I don't yet have small test case where the relevant invalid
'add' instruction is not itself dead (and thus eliminated by
DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things)

Original commit message (by Adam Nemet):

Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.

This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part).  See the testcase.

In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off.  This is the
CommitTargetLoweringOpt piece.

I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.

Fixes <rdar://problem/16031651>

llvm-svn: 216898
2014-09-02 06:24:04 +00:00
Saleem Abdulrasool d1a4ed6a7c CodeGen: indicate Windows unwind data format
The structures for Windows unwinding are shared across multiple platforms.
Indicate the encoding to be used for the particular target.  Use this to switch
the unwind emitter instantiated by the AsmPrinter.

llvm-svn: 216895
2014-09-01 23:48:39 +00:00
Saleem Abdulrasool 0fba7b5856 CodeGen: split out the Win64Exception emitter
Move the Windows unwind information emitter into a separate header.  This is not
related to DWARF based emission.  NFC.

llvm-svn: 216894
2014-09-01 23:48:34 +00:00
Patrik Hagglund 296acbfe4f Fix in InlineSpiller to make the rematerilization loop also consider
implicit uses of the whole register when a sub register is defined.

Now the same iterator is used in the rematerilization loop as in the
spill loop later.

Patch provided by Mikael Holmen.

This fix was proposed and reviewed by Quentin Colombet,
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/076135.html.

Unfortunately, this error in the rematerilization code has only been
seen in a large test case for an out-of-tree target, and is probably
hard to reproduce on an in-tree target. Therefore, no testcase is
provided.

llvm-svn: 216873
2014-09-01 11:04:07 +00:00
Jingyue Wu 5208cc5dbe [MachineSink] Use the real post dominator tree
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics
in isPostDominatedBy, use the real MachinePostDominatorTree. The old
heuristics caused instructions to sink unnecessarily, and might create
register pressure.

Test Plan:
Added a NVPTX codegen test to verify that our change is in effect. It also
shows the unnecessary register pressure caused by over-sinking. Updated
affected tests in AArch64 and X86.

Reviewers: eliben, meheff, Jiangning

Reviewed By: Jiangning

Subscribers: jholewinski, aemerson, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D4814

llvm-svn: 216862
2014-09-01 03:47:25 +00:00
David Blaikie 6a150a87ef DebugInfo: Elide lexical scopes which only contain other (inline or lexical) scopes.
DW_TAG_lexical_scopes inform debuggers about the instruction range for
which a given variable (or imported declaration/module/etc) is valid. If
the scope doesn't itself contain any such entities, it's a waste of
space and should be omitted.

We were correctly doing this for entirely empty leaves, but not for
intermediate nodes.

Reduces total (not just debug sections) .o file size for a bootstrap
-gmlt LLVM by 22% and bootstrap -gmlt clang executable by 13%. The wins
for a full -g build will be less as a % (and in absolute terms), but
should still be substantial - with some of that win being fewer
relocations, thus more substantiall reducing link times than fewer bytes
alone would have.

llvm-svn: 216861
2014-08-31 21:26:22 +00:00
David Blaikie 3fbf3b8546 DebugInfo: Move argument creation up into the caller that's unambiguously handling the subprogram scope (replacing a conditional with an assertion in the process)
llvm-svn: 216845
2014-08-31 18:04:28 +00:00
David Blaikie 2812746238 Delay adding imported entity DIEs to the lexical scope, streamlining the check for "this scope has nothing in it"
This makes the emptiness of the scope with regards to variables and
nested scopes is the same as with regards to imported entities. Just
check if we had nothing at all before we build the node.

llvm-svn: 216840
2014-08-31 05:46:17 +00:00
David Blaikie 8912df12ff Modify DwarfDebug::constructImportedEntityDIE to return rather than insert into the scope
Another step towards improving lexical_scope handling

llvm-svn: 216839
2014-08-31 05:41:15 +00:00
David Blaikie e0e8a3baa0 Refactor constructImportedEntityDIE(DwarfUnit, DIImportedEntity) to return a DIE rather than inserting it into a specified context.
First of many steps to improve lexical scope construction (to omit
trivial lexical scopes - those without any direct variables). To that
end it's easier not to create imported entities directly into the
lexical scope node, but to build them, then add them if necessary.

llvm-svn: 216838
2014-08-31 05:32:06 +00:00
David Blaikie cd4b8a2560 Simplify expression using container's front() rather than begin()->
llvm-svn: 216833
2014-08-31 02:14:26 +00:00
Josh Klontz e1900dc920 [PATCH][Interpreter] Add missing FP intrinsic lowering.
Summary:
This extends the work done in [1], adding missing intrinsic lowering for floor, trunc, round and copysign.

[1] http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/199372

Test Plan: Extended `test/ExecutionEngine/Interpreter/intrinsics.ll` to test the additional missing intrinsics. All tests pass.

Reviewers: dexonsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5120

llvm-svn: 216827
2014-08-30 18:33:35 +00:00
Craig Topper 6dc4a8bc2c Fix some cases where StringRef was being passed by const reference. Remove const from some other StringRefs since its implicitly const already.
llvm-svn: 216820
2014-08-30 16:48:02 +00:00
Juergen Ributzka 00d78221ab [MachineSinking] Clear kill flag of all operands at all their uses.
When sinking an instruction it might be moved past the original last use of one
of its operands. This last use has the kill flag set and the verifier will
obviously complain about this.

Before Machine Sinking (AArch64):
%vreg3<def> = ASRVXr %vreg1, %vreg2<kill>
%XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def>
...

After Machine Sinking:
%XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def>
...
%vreg3<def> = ASRVXr %vreg1, %vreg2<kill>

This fix clears all the kill flags in all instruction that use the same operands
as the instruction that is being sunk.

This fixes rdar://problem/18180996.

llvm-svn: 216803
2014-08-29 23:48:03 +00:00
Adrian Prantl daedfda892 Debug info: Add a new explicit DIDescriptor flag for the "public" access
specifier and change the default behavior to only emit the
DW_AT_accessibility(public) attribute when the isPublic() is explicitly
set.

rdar://problem/18154959

llvm-svn: 216799
2014-08-29 22:44:07 +00:00
David Blaikie 6ba88e0f08 Revert accidentally committed patches r216787-r216789
Rushed when I realized I hadn't committed the FreeDeleter for a clang
change I'd committed, and didn't check that I had things lying around in
my client.

Apologies for the noise.

llvm-svn: 216792
2014-08-29 22:10:52 +00:00
David Blaikie a19833d8f7 Omit DW_AT_artificial, DW_AT_external, and similar attributes under -gmlt
llvm-svn: 216789
2014-08-29 22:05:29 +00:00
David Blaikie 85d8bcd000 Omit dwarf::DW_AT_frame_base under -gmlt
llvm-svn: 216788
2014-08-29 22:05:27 +00:00
David Blaikie 71fffa950a Stuff
llvm-svn: 216787
2014-08-29 22:05:26 +00:00
Robin Morisset 039781ef26 Fix typos in comments, NFC
Summary: Just fixing comments, no functional change.

Test Plan: N/A

Reviewers: jfb

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D5130

llvm-svn: 216784
2014-08-29 21:53:01 +00:00
Reid Kleckner dccd0cbec3 Add a const and munge some comments
llvm-svn: 216781
2014-08-29 21:42:21 +00:00
Reid Kleckner 16e5541211 musttail: Forward regparms of variadic functions on x86_64
Summary:
If a variadic function body contains a musttail call, then we copy all
of the remaining register parameters into virtual registers in the
function prologue. We track the virtual registers through the function
body, and add them as additional registers to pass to the call. Because
this is all done in virtual registers, the register allocator usually
gives us good code. If the function does a call, however, it will have
to spill and reload all argument registers (ew).

Forwarding regparms on x86_32 is not implemented because most compilers
don't support varargs in 32-bit with regparms.

Reviewers: majnemer

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5060

llvm-svn: 216780
2014-08-29 21:42:08 +00:00
Frederic Riss 5732301afb Use DwarfDebug::attachLowHighPC for the compilation unit DIE.
llvm-svn: 216719
2014-08-29 09:00:26 +00:00
Job Noorman 9b31bd6bb0 Do not assume the value passed to memset is an i32.
The code in SelectionDAG::getMemset for some reason assumes the value passed to
memset is an i32. This breaks the generated code for targets that only have
registers smaller than 32 bits because the value might get split into multiple
registers by the calling convention. See the test for the MSP430 target included
in the patch for an example.

This patch ensures that nothing is assumed about the type of the value. Instead,
the type is taken from the selected overload of the llvm.memset intrinsic.

llvm-svn: 216716
2014-08-29 08:23:53 +00:00
Sanjay Patel ccd267683b Move FNEG next to FABS and make them more similar, so it's easier that they can be refactored. NFC.
llvm-svn: 216688
2014-08-28 21:51:37 +00:00
Rafael Espindola b43d51de95 On MachO, don't put non-private constants in mergeable sections.
On MachO, putting a symbol that doesn't start with a 'L' or 'l' in one of the
__TEXT,__literal* sections prevents the linker from merging the context of the
section.

Since private GVs are the ones the get mangled to start with 'L' or 'l', we now
only put those on the __TEXT,__literal* sections.

llvm-svn: 216682
2014-08-28 20:13:31 +00:00
Frederic Riss 9e32475f18 Constify MCSymbol* parameters to DwarfDebug::attachLowHighPC.
llvm-svn: 216681
2014-08-28 19:09:29 +00:00
Owen Anderson 3eb910b404 Do not introduce new shuffle patterns after operation legalization if SHUFFLE_VECTOR
was marked custom.  The target independent DAG combine has no way to know if
the shuffles it is introducing are ones that the target could support or not.

llvm-svn: 216678
2014-08-28 17:49:58 +00:00
Sanjay Patel 50cbfc5138 Janitorial services: "Don’t duplicate function or class name at the beginning of the comment."
llvm-svn: 216674
2014-08-28 16:29:51 +00:00
Sanjay Patel e28d57d9d8 Remove local TLI vars that are just duplicates of the class var. No functional change.
llvm-svn: 216673
2014-08-28 16:01:50 +00:00
Sanjay Patel 78614bf0e8 Use local vars to improve readability. No functional change.
Completes what was started in r216611 and r216623. 
Used const refs instead of pointers; not sure if one is preferable to the other.

llvm-svn: 216672
2014-08-28 15:53:16 +00:00
Arnaud A. de Grandmaison efd4363172 [PBQP] Only output debug information when requested
llvm-svn: 216660
2014-08-28 10:15:47 +00:00
Juergen Ributzka 31328168bb [FastISel] Undo phi node updates when falling-back to SelectionDAG.
The included test case would fail, because the MI PHI node would have two
operands from the same predecessor.

This problem occurs when a switch instruction couldn't be selected. This happens
always, because there is no default switch support for FastISel to begin with.

The problem was that FastISel would first add the operand to the PHI nodes and
then fall-back to SelectionDAG, which would then in turn add the same operands
to the PHI nodes again.

This fix removes these duplicate PHI node operands by reseting the
PHINodesToUpdate to its original state before FastISel tried to select the
instruction.

This fixes <rdar://problem/18155224>.

llvm-svn: 216640
2014-08-28 02:06:55 +00:00
Juergen Ributzka 4f1a54a41a [FastISel]
Currently instructions are folded very aggressively for AArch64 into the memory
operation, which can lead to the use of killed operands:
  %vreg1<def> = ADDXri %vreg0<kill>, 2
  %vreg2<def> = LDRBBui %vreg0, 2
  ... = ... %vreg1 ...

This usually happens when the result is also used by another non-memory
instruction in the same basic block, or any instruction in another basic block.

This fix teaches hasTrivialKill to not only check the LLVM IR that the value has
a single use, but also to check if the register that represents that value has
already been used. This can happen when the instruction with the use was folded
into another instruction (in this particular case a load instruction).

This fixes rdar://problem/18142857.

llvm-svn: 216634
2014-08-28 00:09:46 +00:00
Sanjay Patel 159f127f63 Use local variable in visitFADD. No functional change.
llvm-svn: 216623
2014-08-27 21:42:42 +00:00
Sanjay Patel ae402a35a0 Group unsafe-math optimizations for fsub into one block. No functional change.
llvm-svn: 216616
2014-08-27 20:57:52 +00:00
Juergen Ributzka 833bc681e3 [FastISel] Fix a potential bug in FastEmitInst_ri
FastEmitInst_ri was constraining the first operand without checking if it is
a virtual register. Use constrainOperandRegClass as all the other
FastEmitInst_* functions.

llvm-svn: 216613
2014-08-27 20:47:33 +00:00
Sanjay Patel a828f2ba46 Use local variable to improve readability.
No functional change intended.

llvm-svn: 216611
2014-08-27 20:40:31 +00:00
Rafael Espindola 3560ff2c1f Return a std::unique_ptr when creating a new MemoryBuffer.
llvm-svn: 216583
2014-08-27 20:03:13 +00:00
Oliver Stannard 89d1542840 Teach the AArch64 backend about v4f16 and v8f16
This teaches the AArch64 backend to deal with the operations required
to deal with the operations on v4f16 and v8f16 which are exposed by
NEON intrinsics, plus the add, sub, mul and div operations.

llvm-svn: 216555
2014-08-27 16:16:04 +00:00
Chandler Carruth 74ec9e19ee [SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine.
This combine is essentially combining target-specific nodes back into target
independent nodes that it "knows" will be combined yet again by a target
independent DAG combine into a different set of target-independent nodes that
are legal (not custom though!) and thus "ok". This seems... deeply flawed. The
crux of the problem is that we don't combine un-legalized shuffles that are
introduced by legalizing other operations, and thus we don't see a very
profitable combine opportunity. So the backend just forces the input to that
combine to re-appear.

However, for this to work, the conditions detected to re-form the unlegalized
nodes must be *exactly* right. Previously, failing this would have caused poor
code (if you're lucky) or a crasher when we failed to select instructions.
After r215611 we would fall back into the legalizer. In some cases, this just
"fixed" the crasher by produces bad code. But in the test case added it caused
the legalizer and the dag combiner to iterate forever.

The fix is to make the alignment checking in the x86 side of things match the
alignment checking in the generic DAG combine exactly. This isn't really a
satisfying or principled fix, but it at least make the code work as intended.
It also highlights that it would be nice to detect the availability of under
aligned loads for a given type rather than bailing on this optimization. I've
left a FIXME to document this.

Original commit message for r215611 which covers the rest of the chang:
  [SDAG] Fix a case where we would iteratively legalize a node during
  combining by replacing it with something else but not re-process the
  node afterward to remove it.

  In a truly remarkable stroke of bad luck, this would (in the test case
  attached) end up getting some other node combined into it without ever
  getting re-processed. By adding it back on to the worklist, in addition
  to deleting the dead nodes more quickly we also ensure that if it
  *stops* being dead for any reason it makes it back through the
  legalizer. Without this, the test case will end up failing during
  instruction selection due to an and node with a type we don't have an
  instruction pattern for.

It took many million runs of the shuffle fuzz tester to find this.

llvm-svn: 216537
2014-08-27 11:22:16 +00:00
Craig Topper e1d1294853 Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created.
llvm-svn: 216525
2014-08-27 05:25:25 +00:00
Craig Topper 3af9722529 Fix some cases were ArrayRefs were being passed by reference. Also remove 'const' from some other ArrayRef uses since its implicitly const already.
llvm-svn: 216524
2014-08-27 05:25:00 +00:00
David Blaikie c13bc97e58 Remove type unit skeletons. GDB no longer needs them & this saves a heap of space.
llvm-svn: 216521
2014-08-27 05:04:14 +00:00
Dylan Noblesmith 17f05a3fc6 CodeGen/LiveVariables: use vector::assign()
Address review comments.

llvm-svn: 216426
2014-08-26 02:03:25 +00:00
Rafael Espindola 3fd1e9933f Modernize raw_fd_ostream's constructor a bit.
Take a StringRef instead of a "const char *".
Take a "std::error_code &" instead of a "std::string &" for error.

A create static method would be even better, but this patch is already a bit too
big.

llvm-svn: 216393
2014-08-25 18:16:47 +00:00