Commit Graph

89803 Commits

Author SHA1 Message Date
Chaoren Lin 49317f2d90 Use llvm:Twine instead of std::to_string.
std::to_string is not available from the Android NDK.

Reviewers: lhames, ovyalov, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19638

llvm-svn: 267829
2016-04-28 00:49:37 +00:00
Bryan Chan 893110ecaf [SystemZ] Support Swift Calling Convention
Summary:
Port rL265480, rL264754, rL265997 and rL266252 to SystemZ, in order to enable the Swift port on the architecture. SwiftSelf and SwiftError are assigned to R10 and R9, respectively, which are normally callee-saved registers. For more information, see:

RFC: Implementing the Swift calling convention in LLVM and Clang
https://groups.google.com/forum/#!topic/llvm-dev/epDd2w93kZ0

Reviewers: kbarton, manmanren, rjmccall, uweigand

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19414

llvm-svn: 267823
2016-04-28 00:17:23 +00:00
Peter Collingbourne edf8432480 LTO: Don't bother trying to mangle unnamed globals, as they can't be preserved with MustPreserveSymbols.
Summary: Should fix sanitizer-windows bot.

Reviewers: joker.eph

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D19635

llvm-svn: 267820
2016-04-27 23:48:11 +00:00
Zachary Turner 1822af542f Parse module information from DBI stream.
This gets more data out of the DBI strema of the PDB.  In
particular it extracts the metadata for the list of modules
(compilands) that this PDB contains info about, and adds support
for dumping these fields to llvm-pdbdump.

Differential Revision: http://reviews.llvm.org/D19570
Reviewed By: ruiu

llvm-svn: 267818
2016-04-27 23:41:42 +00:00
Quentin Colombet 12b69919a2 [ImplicitNullChecks] Properly update the live-in of the block of the memory operation.
We basically replace:
HoistBB:
cond_br NullBB, NotNullBB

NullBB:
  ...

NotNullBB:
  <reg> = load

into
HoistBB
<reg> = load_faulting_op NullBB
uncond_br NotNullBB

NullBB:
  ...

NotNullBB: ## <reg> is now live-in of NotNullBB
  ...

This partially fixes the machine verifier error for
test/CodeGen/X86/implicit-null-check.ll, but it still fails because
of the implicit CFG structure.

llvm-svn: 267817
2016-04-27 23:26:40 +00:00
Rong Xu 6e34c490ff [PGO] Promote indirect calls to conditional direct calls with value-profile
This patch implements the transformation that promotes indirect calls to
conditional direct calls when the indirect-call value profile meta-data is
available.

Differential Revision: http://reviews.llvm.org/D17864

llvm-svn: 267815
2016-04-27 23:20:27 +00:00
Sanjay Patel facf45a82f [SimplifyCFG] propagate branch metadata when creating select
There's no existing test for this path, and I don't know how to expose
it in a regression test, but I'm assuming there's some reason this
path exists. 

llvm-svn: 267813
2016-04-27 23:14:12 +00:00
Lang Hames f88174dd80 [RuntimeDyld] Propagate another dropped error in RuntimeDyldELF.
This should fix the PPC64 bots.

llvm-svn: 267810
2016-04-27 22:54:03 +00:00
Mitch Bodart e60465ddf7 [X86] Enable the post-RA-scheduler for clang's default 32-bit cpu.
For compilations with no explicit cpu specified, this exhibits
nice gains on Silvermont, with neutral performance on big cores.

Differential Revision: http://reviews.llvm.org/D19138

llvm-svn: 267809
2016-04-27 22:52:35 +00:00
Quentin Colombet bf200688de [X86][FastISel] Make sure we use the right register class when we select stores.
llvm-svn: 267806
2016-04-27 22:33:42 +00:00
Colin LeMahieu a3782da3e3 [Hexagon] Merging nops in to previous packet rather than always creating a new one.
llvm-svn: 267798
2016-04-27 21:37:44 +00:00
Quentin Colombet d6dbec4c6f [X86] Fix the lowering of TLS calls.
The callseq_end node must be glued with the TLS calls, otherwise,
the generic code will miss the uses of the returned value and will
mark it dead.
Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI,
the pseudo uses the symbol address at this point not RDI and the
lowering will do the right thing.

llvm-svn: 267797
2016-04-27 21:37:37 +00:00
Colin LeMahieu 485d905510 [MCAssembler] Allow backend to finalize layout post-relaxation.
Differential revision: http://reviews.llvm.org/D19429

llvm-svn: 267796
2016-04-27 21:26:13 +00:00
Rong Xu af5aebaa32 [PGO] Prohibit address recording if the function is both internal and COMDAT
Differential Revision: http://reviews.llvm.org/D19515

llvm-svn: 267792
2016-04-27 21:17:30 +00:00
Matt Arsenault 0547b016b1 AMDGPU: Account for globals in AMDGPUPromoteAlloca pass
Patch by Bas Nieuwenhuizen

llvm-svn: 267791
2016-04-27 21:05:08 +00:00
Lang Hames bc38ea9596 [RuntimeDyld] Add missing include - <string> is requried for std::to_string.
This should fix the compile error that showed up in build:
http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/6754/

llvm-svn: 267790
2016-04-27 20:54:49 +00:00
Lang Hames 09a74c46ec [RuntimeDyld] Propagate Errors from findPPC64TOCSection.
llvm-svn: 267789
2016-04-27 20:51:58 +00:00
Ahmed Bougacha 65572afea8 [ARM] Set AddPristinesAndCSRs to expandCMP_SWAP LivePhysRegs.
We run after PEI.
Found via inspection; no obvious testcase.

Follow-up to r266679.

llvm-svn: 267781
2016-04-27 20:33:07 +00:00
Ahmed Bougacha 5a3bf6a4a9 [AArch64] Set AddPristinesAndCSRs to expandCMP_SWAP LivePhysRegs.
We run after PEI.
Found via inspection; no obvious testcase.

Follow-up to r266339.

llvm-svn: 267780
2016-04-27 20:33:05 +00:00
Ahmed Bougacha 9e71425f54 [AArch64] Set correct successors in CMPXCHG pseudo expansion.
transferSuccessors() would LoadCmpBB a successor of DoneBB,
whereas it should be a successor of the original MBB.

Follow-up to r266339.

Unfortunately, it's tricky to catch this in the verifier.

llvm-svn: 267779
2016-04-27 20:33:02 +00:00
Ahmed Bougacha b4af107239 [ARM] Set correct successors in CMPXCHG pseudo expansion.
transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas
it should be a successor of the original MBB.

The testcase changes are caused by Thumb2SizeReduction, which
was previously confused by the broken CFG.

Follow-up to r266679.

Unfortunately, it's tricky to catch this in the verifier.

llvm-svn: 267778
2016-04-27 20:32:54 +00:00
Lang Hames 8959531c51 [RuntimeDyld] Plumb Error/Expected through the internals of RuntimeDyld.
Also replaces a number of calls to report_fatal_error with Error returns.

The plumbing will make it easier to return errors originating in libObject.

Replacing report_fatal_errors with Error returns will give JIT clients the
opportunity to recover gracefully when the JIT is unable to produce/relocate
code, as well as providing meaningful error messages that can be used to file
bug reports.

llvm-svn: 267776
2016-04-27 20:24:48 +00:00
Than McIntosh a541320908 Fix build failure under NDEBUG.
llvm-svn: 267774
2016-04-27 20:07:02 +00:00
Kevin B. Smith c378a99ba5 [X86]: Quit promoting 16 bit loads to 32 bit.
Differential Revision: http://reviews.llvm.org/D19592

llvm-svn: 267773
2016-04-27 19:58:03 +00:00
Kostya Serebryany 0e0bcc4bdb [libFuzzer] disable leak detection if we have tried it for 1000 times w/o finding a leak [part 2]
llvm-svn: 267771
2016-04-27 19:52:56 +00:00
Kostya Serebryany 7018a1aaa4 [libFuzzer] disable leak detection if we have tried it for 1000 times w/o finding a leak
llvm-svn: 267770
2016-04-27 19:52:34 +00:00
Andrew Kaylor 289bd5f684 Add optimization bisect opt-in calls for PowerPC passes
Differential Revision: http://reviews.llvm.org/D19554

llvm-svn: 267769
2016-04-27 19:39:32 +00:00
David Majnemer 0c80e2eac6 [CodeGenPrepare] Don't sink a cast past its user
The sink cast machinery is supposed to sink casts as close to their user
as possible.  However, an EH pad is the first instruction in it's basic
block.  Don't sink if the user is an EH pad.

This fixes PR27536.

llvm-svn: 267767
2016-04-27 19:36:38 +00:00
Than McIntosh 1b60168576 Refactor debugging code, NFC.
Summary:
Refactor debugging routines to reduce code duplication. Remove a couple
of #include's that were not needed. Don't require MachineDominator as a
prereq for this pass (not needed).

These changes split off from http://reviews.llvm.org/D18827.

Reviewers: wmi, gbiv, qcolombet

Subscribers: llvm-commits, davidxl, jevinskie

Differential Revision: http://reviews.llvm.org/D18992

llvm-svn: 267766
2016-04-27 19:26:25 +00:00
Justin Lebar 7cdbce5946 [NVPTX] Run NVVMReflect at the beginning of IR passes.
Summary:
Currently the NVVMReflect pass is run at the beginning of our backend
passes.  But really, it should be run as early as possible, as it's
simply resolving an "if" statement in code.  So copy it into
TargetMachine::addEarlyAsPossiblePasses.

We still run it at the beginning of the backend passes, since it's
needed for correctness when lowering to nvptx.

(Specifically, NVVMReflect changes each call to the __nvvm_reflect
function or llvm.nvvm.reflect intrinsic into an integer constant, based
on the pass's configuration.  Clearly we miss many optimization
opportunities if we perform this transformation at the beginning of
codegen.)

Reviewers: rnk

Subscribers: tra, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D18616

llvm-svn: 267765
2016-04-27 19:13:37 +00:00
Ahmed Bougacha ace97c1f7d [LIR] Set attributes on memset_pattern16.
"inferattrs" will deduce the attribute, but it will be too late for
many optimizations. Set it ourselves when creating the call.

Differential Revision: http://reviews.llvm.org/D17598

llvm-svn: 267762
2016-04-27 19:04:50 +00:00
Ahmed Bougacha 7f97193dd7 [LIR] Reuse variable. NFCI.
llvm-svn: 267761
2016-04-27 19:04:46 +00:00
Ahmed Bougacha 44c19876c7 [InferAttrs] Mark memset_pattern16 params nocapture.
Differential Revision: http://reviews.llvm.org/D19471

llvm-svn: 267760
2016-04-27 19:04:43 +00:00
Ahmed Bougacha b0624a2cb4 [TLI] Unify LibFunc attribute inference. NFCI.
Now the pass is just a tiny wrapper around the util. This lets us reuse
the logic elsewhere (done here for BuildLibCalls) instead of duplicating
it.

The next step is to have something like getOrInsertLibFunc that also
sets the attributes.

Differential Revision: http://reviews.llvm.org/D19470

llvm-svn: 267759
2016-04-27 19:04:40 +00:00
Ahmed Bougacha d765a82b54 [TLI] Unify LibFunc signature checking. NFCI.
I tried to be as close as possible to the strongest check that
existed before; cleaning these up properly is left for future work.

Differential Revision: http://reviews.llvm.org/D19469

llvm-svn: 267758
2016-04-27 19:04:35 +00:00
Ahmed Bougacha 220c4010bf [TLI] Fix indentation. NFC.
llvm-svn: 267757
2016-04-27 19:04:29 +00:00
Sjoerd Meijer 41beee6575 Clean up to avoid compiler warnings for casting away const qualifiers.
Differential Revision: http://reviews.llvm.org/D19598

llvm-svn: 267753
2016-04-27 18:35:02 +00:00
Chad Rosier 03e1647d19 Revert "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD."
This reverts commit r267733 due to a -Werror,-Wunused-function error.

llvm-svn: 267752
2016-04-27 18:29:11 +00:00
Matthew Simpson 622b95be7b [LV] Reallow positive-stride interleaved load groups with gaps
We previously disallowed interleaved load groups that may cause us to
speculatively access memory out-of-bounds (r261331). We did this by ensuring
each load group had an access corresponding to the first and last member.
Instead of bailing out for these interleaved groups, this patch enables us to
peel off the last vector iteration, ensuring that we execute at least one
iteration of the scalar remainder loop. This solution was proposed in the
review of the previous patch.

Differential Revision: http://reviews.llvm.org/D19487

llvm-svn: 267751
2016-04-27 18:21:36 +00:00
Arch D. Robison aca7c412b4 [SLPVectorizer] Refactor where MinVecRegSize and MaxVecRegSize live.
This is the first of two commits for extending SLP Vectorizer to deal with aggregates.
This commit merely refactors existing logic.

http://reviews.llvm.org/D14185

llvm-svn: 267748
2016-04-27 17:46:25 +00:00
Gerolf Hoflehner 50426191d7 [DAGCombiner] Follow coding convention for function name (NFC)
llvm-svn: 267745
2016-04-27 17:27:16 +00:00
Marcin Koscielnicki 7efdca5622 [Mips] Add support for llvm.thread.pointer intrinsic.
This will be used to implement __builtin_thread_pointer in clang.

Differential Revision: http://reviews.llvm.org/D19569

llvm-svn: 267743
2016-04-27 17:21:49 +00:00
Reid Kleckner 7f0ae15e9d Silence a -Wdangling-else
llvm-svn: 267737
2016-04-27 16:46:33 +00:00
Matthew Simpson 47bd3994b7 Add parentheses to silence buildbot warning
llvm-svn: 267734
2016-04-27 16:25:04 +00:00
Artem Tamazov 3896f8f83d [AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD.
Added support of TTMP quads.
Reworked M0 exclusion machinery for SMRD and similar instructions
to enable usage of TTMP registers in those instructions as destinations.
Tests added.

Differential Revision: http://reviews.llvm.org/D19342

llvm-svn: 267733
2016-04-27 16:20:23 +00:00
Reid Kleckner 0336cc05e7 [PDB] Fix function names for private symbols in PDBs
Summary:
llvm-symbolizer wants to get linkage names of functions for historical
reasons. Linkage names are only recorded in the PDB for public symbols,
and the linkage name is apparently stored separately in some "public
symbol" record. We had a workaround in PDBContext which would look for
such symbols when the user requested linkage names.

However, when given an address that was truly in a private function and
public funciton, we would accidentally find nearby public symbols and
return those function names. The fix is to look for both function
symbols and public symbols and only prefer the public symbol name if the
addresses of the symbols agree.

Fixes PR27492

Reviewers: zturner

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19571

llvm-svn: 267732
2016-04-27 16:10:29 +00:00
Nicolai Haehnle f66bdb5ea8 AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic
Summary:
So it appears that to guarantee some of the ordering requirements of a GLSL
memoryBarrier() executed in the shader, we need to emit an s_waitcnt.

(We can't use an s_barrier, because memoryBarrier() may appear anywhere in
the shader, in particular it may appear in non-uniform control flow.)

Reviewers: arsenm, mareko, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19203

llvm-svn: 267729
2016-04-27 15:46:01 +00:00
Matthew Simpson e5dfb08fcb [TTI] Add hook for vector extract with extension
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.

Differential Revision: http://reviews.llvm.org/D18523

llvm-svn: 267725
2016-04-27 15:20:21 +00:00
Artem Tamazov 5cd55b1784 [AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware registers.
Possibility to specify code of hardware register kept.
Disassemble to symbolic name, if name is known.
Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19335

llvm-svn: 267724
2016-04-27 15:17:03 +00:00
Nico Weber e69b9548b8 Revert r267649, it caused PR27539.
llvm-svn: 267723
2016-04-27 15:16:54 +00:00
Teresa Johnson df5ef8711f [ThinLTO] Refine fix to avoid renaming of uses in inline assembly.
Summary:
Refine the workaround from r266877 that attempts to prevent
renaming of locals in inline assembly, so that in addition to looking
for a llvm.used local value, that there is at least one inline assembly
call in the module. Otherwise, debug functions added to the llvm.used
can block importing/exporting unnecessarily.

Reviewers: joker.eph

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D19573

llvm-svn: 267717
2016-04-27 14:19:38 +00:00
Teresa Johnson 02e98331c0 [ThinLTO] Use valueid instead of bitcode offsets in combined index file
Summary:
With the removal of support for lazy parsing of combined index summary
records (e.g. r267344), we no longer need to include the summary record
bitcode offset in the VST entries for definitions. Change the combined
index format to be similar to the per-module index format in using value
ids to cross-reference from the summary record to the VST entry (rather
than the summary record bitcode offset to cross-reference in the other
direction).

The visible changes are:
1) Add the value id to the combined summary records
2) Remove the summary offset from the combined VST records, which has
the following effects:
- No longer need the VST_CODE_COMBINED_GVDEFENTRY record, as all
  combined index VST entries now only contain the value id and
  corresponding GUID.
- No longer have duplicate VST entries in the case where there are
  multiple definitions of a symbol (e.g. weak/linkonce), as they all
  have the same value id and GUID.

An implication of #2 above is that in order to hook up an alias to the
correct aliasee based on the value id of the aliasee recorded in the
combined index alias record, we need to scan the entries in the index
for that GUID to find the one from the same module (i.e. the case where
there are multiple entries for the aliasee). But the reader no longer
has to maintain a special map to hook up the alias/aliasee.

Reviewers: joker.eph

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19481

llvm-svn: 267712
2016-04-27 13:28:35 +00:00
Artur Pilipenko 345f01481b NFC. Introduce Value::getPointerDerferecnceableBytes
Extract a part of isDereferenceableAndAlignedPointer functionality to Value::getPointerDerferecnceableBytes. Currently it's a NFC, but in future I'm going to accumulate all the logic about value dereferenceability in this function similarly to Value::getPointerAlignment function (D16144).

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17572

llvm-svn: 267708
2016-04-27 12:51:01 +00:00
Zlatko Buljan de0bbe6d1c [mips][microMIPS] Add CodeGen support for SUBU16, SUB, SUBU, DSUB and DSUBU instructions
Differential Revision: http://reviews.llvm.org/D16676

llvm-svn: 267694
2016-04-27 11:31:44 +00:00
Zlatko Buljan 29813620bc [mips][microMIPS] Add CodeGen support for SLL16, SRL16, SLL, SLLV, SRA, SRAV, SRL and SRLV instructions
Differential Revision: http://reviews.llvm.org/D17989

llvm-svn: 267693
2016-04-27 11:02:23 +00:00
Artur Pilipenko 9bb6beabf4 isSafeToLoadUnconditionally support queries without a context
This is required to use this function from isSafeToSpeculativelyExecute

Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D16231

llvm-svn: 267692
2016-04-27 11:00:48 +00:00
Artur Pilipenko c97eac6555 Use DL preferred alignment for alloca in Value::getPointerAlignment
Teach Value::getPointerAlignment that allocas with no explicit alignment are aligned to preferred alignment of the allocated type.

Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D17569

llvm-svn: 267689
2016-04-27 10:42:29 +00:00
Adam Nemet d2fa414718 [LoopDist] Add llvm.loop.distribute.enable loop metadata
Summary:
D19403 adds a new pragma for loop distribution.  This change adds
support for the corresponding metadata that the pragma is translated to
by the FE.

As part of this I had to rethink the flag -enable-loop-distribute.  My
goal was to be backward compatible with the existing behavior:

  A1. pass is off by default from the optimization pipeline
  unless -enable-loop-distribute is specified

  A2. pass is on when invoked directly from opt (e.g. for unit-testing)

The new pragma/metadata overrides these defaults so the new behavior is:

  B1. A1 + enable distribution for individual loop with the pragma/metadata

  B2. A2 + disable distribution for individual loop with the pragma/metadata

The default value whether the pass is on or off comes from the initiator
of the pass.  From the PassManagerBuilder the default is off, from opt
it's on.

I moved -enable-loop-distribute under the pass.  If the flag is
specified it overrides the default from above.

Then the pragma/metadata can further modifies this per loop.

As a side-effect, we can now also use -enable-loop-distribute=0 from opt
to emulate the default from the optimization pipeline.  So to be precise
this is the new behavior:

  C1. pass is off by default from the optimization pipeline
  unless -enable-loop-distribute or the pragma/metadata enables it

  C2. pass is on when invoked directly from opt
  unless -enable-loop-distribute=0 or the pragma/metadata disables it

Reviewers: hfinkel

Subscribers: joker.eph, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D19431

llvm-svn: 267672
2016-04-27 05:28:18 +00:00
Vaivaswatha Nagaraj 08efb0efcd [Cloning] cloneLoopWithPreheader(): add assert to ensure no sub-loops
Summary:
cloneLoopWithPreheader() does not update LoopInfo for sub-loop of
the original loop being cloned. Add assert to ensure no sub-loops for loop being cloned.

Reviewers: anemet, ashutosh.nema, hfinkel

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15922

llvm-svn: 267671
2016-04-27 05:25:09 +00:00
Craig Topper de4318b928 [Support][X86] Add a few more Intel model numbers to getHostCPUName for airmont and knl.
llvm-svn: 267670
2016-04-27 05:17:00 +00:00
Craig Topper e7d743ccf8 [Support][X86] Change the case values in the Intel family 6 code to hex so its easier to compare with Intel's docs. NFC
llvm-svn: 267669
2016-04-27 05:16:58 +00:00
Mehdi Amini c7b950171d Revert "Support "preserving" the summary information when using setModule() API in LTOCodeGenerator"
This reverts commit r267665.
ASAN shows that there is a use of undefined value.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267668
2016-04-27 05:11:44 +00:00
Craig Topper 0e2f14fa83 [Support][X86] Add a couple more Broadwell CPU models numbers to getHostCPUName.
llvm-svn: 267666
2016-04-27 04:40:03 +00:00
Mehdi Amini 360ed847bc Support "preserving" the summary information when using setModule() API in LTOCodeGenerator
Another attempt at r267655...

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267665
2016-04-27 04:24:10 +00:00
Mehdi Amini a1b8b6cd56 Revert "Support "preserving" the summary information when using setModule() API in LTOCodeGenerator"
This reverts commit r267657, r267656, and r267655.
The test does not pass on multiple bots, I'm unsure why yet but let's unbreak them.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267664
2016-04-27 03:34:28 +00:00
Evgeny Stupachenko 23ce61b663 The patch fixes PR27392.
Summary:
 It is incorrect to compare TripCount (which is BECount + 1)
  with extraiters (or Count) to check if we should enter unrolled
  loop or not, because TripCount can potentially overflow
  (when BECount is max unsigned integer).
 While comparing BECount with (Count - 1) is overflow safe and
  therefore correct.

Reviewer: hfinkel

Differential Revision: http://reviews.llvm.org/D19256

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 267662
2016-04-27 03:04:54 +00:00
Philip Reames c67651dd70 [LVI] Delete stale and misleading comment.
llvm-svn: 267661
2016-04-27 03:03:15 +00:00
Chuang-Yu Cheng 8676c3d599 [ppc64] fix bug in prologue that mfocrf's cr operand should be explict state instead of implicit
This fixes PR27414

Reviewers: kbarton mgrang tjablin

http://reviews.llvm.org/D19255

llvm-svn: 267660
2016-04-27 02:59:28 +00:00
Ahmed Bougacha 9a0c9adade [X86] Set AddPristinesAndCSRs to FixupBW LivePhysRegs. NFC.
We run after PEI, so we need to AddPristinesAndCSRs.
In practice, that makes no difference here, because we only ask about
liveness of super-registers of defined GR8/GR16 registers, so they
can't be pristine. Still, it's the correct thing to do.

Thanks to Quentin for noticing!

Follow-up to r267495.

llvm-svn: 267658
2016-04-27 01:51:38 +00:00
Mehdi Amini e2a65fe5ec Support "preserving" the summary information when using setModule() API in LTOCodeGenerator
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267655
2016-04-27 01:46:48 +00:00
Sanjoy Das 5253a089ba Fix typo in comment; NFC
llvm-svn: 267653
2016-04-27 01:44:31 +00:00
Ahmed Bougacha 19a2ee591a [X86] Don't assume that MMX extractelts are from index 0.
It's probably the case for all 3 MMX users out there, but with
hand-crafted IR, you can trigger selection failures. Fix that.

llvm-svn: 267652
2016-04-27 01:35:29 +00:00
Ahmed Bougacha e68363a03c [X86] Re-enable MMX i32 extractelt combine.
This effectively adds back the extractelt combine removed by r262358:
the direct case can still occur (because x86_mmx is special, see
r262446), but it's the indirect case that's now superseded by the
generic combine.

llvm-svn: 267651
2016-04-27 01:35:25 +00:00
Cong Hou 6f879d9eb1 Detects the SAD pattern on X86 so that much better code will be emitted once the pattern is matched.
Differential revision: http://reviews.llvm.org/D14840

llvm-svn: 267649
2016-04-27 01:29:18 +00:00
Philip Reames 2ab964e263 [LVI] Add a comment explaining a subtle piece of code
Or at least, I didn't understand the implications the first several times I read it it.

llvm-svn: 267648
2016-04-27 01:02:25 +00:00
Mehdi Amini b4e1e8297b ThinLTO: do not promote GlobalVariable that have a specific section.
Differential Revision: http://reviews.llvm.org/D18298

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267646
2016-04-27 00:32:13 +00:00
Matt Arsenault ba437c67d2 SLSR: Use UnknownAddressSpace instead of 0 for pure arithmetic.
In the case where isLegalAddressingMode is used for cases
not related to addressing modes, such as pure adds and muls,
it should not be using address space 0. LSR already passes -1
as the address space in these cases.

llvm-svn: 267645
2016-04-27 00:32:09 +00:00
Mehdi Amini da168fbc2e LTOCodeGenerator: turns linkonce(_odr) into weak_(odr) when present "MustPreserve" set
Summary:
If the linker requested to preserve a linkonce function, we should
honor this even if we drop all uses.

Reviewers: dexonsmith

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D19527

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267644
2016-04-27 00:32:02 +00:00
Adam Nemet 61399ac424 [LoopDist] Split main class. NFC
This splits out the per-loop functionality from the Pass class.

With this the fact whether the loop is forced-distribute with the new
metadata/pragma can be cached in the per-loop class rather than passed
around.

llvm-svn: 267643
2016-04-27 00:31:03 +00:00
Philip Reames 3f83dbeed9 [LVI] Reduce compile time by lazily scanning blocks if needed
When encountering a non-local pointer, LVI would eagerly scan the block for dereferences of the given object to prove the pointer to be non null.  That's all well and good, but *then* we'd go recurse through our input blocks.  As a result, we could end up scanning each and every block we traverse, even if the final definition was obviously non null or we found a constant value somewhere up the chain.  The previous code papered over this by using the isKnownNonNull routine from value tracking.  This made the duplication less painful in the common case.

Instead, we know do the block scan only *after* we've gotten the recursive results back.  This lets us stop scanning individual blocks as soon as we've determined it to be non-null in any predecessor block and use our usual merge rules to propagate that information cheaply through successor blocks.  For a pointer which can be found non-null, this does strictly less work and sometimes substaintially so.

Note that the case where we *can't* prove something non-null is still the really expensive case.  We end up scanning each and every block looking for a dereference and never end up finding one.

llvm-svn: 267642
2016-04-27 00:30:55 +00:00
Quentin Colombet ddad5aa152 [MachineInstrBundle] Actually set the PartialDeadDef flag only when the register
is defined!

The users were checking the proper thing (Defined + PartialDeadDef), but the
information may have been wrong for other use cases, so fix that.

llvm-svn: 267641
2016-04-27 00:16:29 +00:00
Andrew Kaylor d9974cc913 Add optimization bisect opt-in calls for SystemZ passes
Differential Revision: http://reviews.llvm.org/D19562

llvm-svn: 267636
2016-04-26 23:49:41 +00:00
Andrew Kaylor 87b10dd7b3 Add optimization bisect opt-in calls for NVPTX passes
Differential Revision: http://reviews.llvm.org/D19518

llvm-svn: 267635
2016-04-26 23:44:31 +00:00
Quentin Colombet 4ff3cfb673 [X86] Make sure it is safe to clobber EFLAGS, if need be, when choosing
the prologue.

Do not use basic blocks that have EFLAGS live-in as prologue if we need
to realign the stack. Realigning the stack uses AND instruction and this
clobbers EFLAGS.

An other alternative would have been to save and restore EFLAGS around
the stack realignment code, but this is likely inefficient.

Fixes PR27531.

llvm-svn: 267634
2016-04-26 23:44:14 +00:00
Justin Bogner c2bf63d29d PM: Port Reassociate to the new pass manager
llvm-svn: 267631
2016-04-26 23:39:29 +00:00
Justin Bogner cb8a21c88e Reassociate: Convert another functor into a lambda. NFC
Also move the explanatory comment with it.

llvm-svn: 267628
2016-04-26 23:32:00 +00:00
Philip Reames f105db4fc3 [LVI] Cut short search if we know we can't return a useful result
Previously we were recursing on our operands for unary and binary operators regardless of whether we knew how to reason about the operator in question.  This has the effect of doing a potentially large amount of work, only to throw it away.  By checking whether the operation is one LVI can handle, we can cut short the search and return the (overdefined) answer more quickly.  The quality of the results produced should not change.

llvm-svn: 267626
2016-04-26 23:27:33 +00:00
Sanjay Patel 29dea0d230 [SimplifyCFG] propagate branch metadata when creating select
llvm-svn: 267624
2016-04-26 23:15:48 +00:00
Quentin Colombet 2b3a4e787e [X86] Teach the expansion of copy instructions how to do proper liveness.
When the simple analysis provided by MachineBasicBlock::computeRegisterLiveness
fails, fall back on the LivePhysReg utility.

llvm-svn: 267623
2016-04-26 23:14:32 +00:00
Quentin Colombet 08e79990a0 [MachineBasicBlock] Take advantage of the partially dead information.
Thanks to that information we wouldn't lie on a register being live whereas it
is not.

llvm-svn: 267622
2016-04-26 23:14:29 +00:00
Quentin Colombet 3f19245015 [MachineInstrBundle] Improvement the recognition of dead definitions.
Now, it is possible to know that partial definitions are dead definitions and
recognize that clobbered registers are also dead.

llvm-svn: 267621
2016-04-26 23:14:24 +00:00
Philip Reames 053c2a6f25 [LVI] Apply transfer rule for overdefine inputs for binary operators
As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules.  Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined.  This greatly impacts the precision of the overall analysis and makes it far more fragile as well.

This patch builds on 267609 which did the same thing for unary casts.

llvm-svn: 267620
2016-04-26 23:10:35 +00:00
Jingyue Wu c1b9d47b3b [NVPTX] Fix some usages of CodeGenOpt::None.
NVPTXLowerKernelArgs is required for correctness, so it should not be guarded
by CodeGenOpt::None.

NVPTXPeephole is optimization only, so it should be skipped when
CodeGenOpt::None.

llvm-svn: 267619
2016-04-26 22:59:25 +00:00
Philip Reames e5030e85ea [LVI] A better fix for the assertion error introduced by 267609
Essentially, I was using the wrong size function.  For types which were sized, but not primitive, I wasn't getting a useful size for the operand and failed an assert.  I fixed this, and also added a guard that the input is a sized type.  Test case is for the original mistake.  I'm not sure how to actually exercise the sized type check.

llvm-svn: 267618
2016-04-26 22:52:30 +00:00
Philip Reames d5c62a0aad [LVI] Speculative fix for assertion seen in clang bots
I'll clean this up and add a test case shortly.  I want to make sure this does actually fix the bots; if not, I'll revert.

llvm-svn: 267617
2016-04-26 22:31:53 +00:00
Sanjay Patel d2d2aa52cd [LowerExpectIntrinsic] make default likely/unlikely ratio bigger
We need the default ratio to be sufficiently large that it triggers transforms 
based on block frequency info (BFI) and plays well with the recently introduced
BranchProbability used by CGP.

Differential Revision: http://reviews.llvm.org/D19435

llvm-svn: 267615
2016-04-26 22:23:38 +00:00
Justin Bogner 90744d215b Reassociate: Simplify using lambdas. NFC
llvm-svn: 267614
2016-04-26 22:22:18 +00:00
Philip Reames 38c87c2e50 [LVI] Infer local facts from unary expressions
As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules. Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined. This greatly impacts the precision of the overall analysis and makes it far more fragile as well.

This patch implements only the unary operation case. Once this is in, I'll implement the same for the binary operations.

Differential Revision: http://reviews.llvm.org/D19492

llvm-svn: 267609
2016-04-26 21:48:16 +00:00
Andrew Kaylor 2bee5ef462 Optimization bisect support in X86-specific passes
Differential Revision: http://reviews.llvm.org/D19439

llvm-svn: 267608
2016-04-26 21:44:24 +00:00
Ahmed Bougacha 128f8732a5 [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.
Differential Revision: http://reviews.llvm.org/D17176

llvm-svn: 267606
2016-04-26 21:15:30 +00:00
David Majnemer abb9f55c80 Revert "[SimplifyLibCalls] sprintf doesn't copy null bytes"
The destination buffer that sprintf uses is restrict qualified, we do
not need to worry about derived pointers referenced via format
specifiers.

This reverts commit r267580.

llvm-svn: 267605
2016-04-26 21:04:47 +00:00
Zachary Turner c3c4e15697 Remove more unused variables.
llvm-svn: 267598
2016-04-26 20:32:35 +00:00
Elena Demikhovsky 308a7eb0d2 Masked Store in Loop Vectorizer - bugfix
Fixed a bug in loop vectorization with conditional store.

Differential Revision: http://reviews.llvm.org/D19532

llvm-svn: 267597
2016-04-26 20:18:04 +00:00
Justin Bogner 4563a06cee PM: Port Internalize to the new pass manager
llvm-svn: 267596
2016-04-26 20:15:52 +00:00
Zachary Turner 7756127077 [llvm-pdbdump] Fix version reading on big endian systems.
llvm-svn: 267595
2016-04-26 19:48:18 +00:00
Andrew Kaylor 5b444a21df Add optimization bisect opt-in calls for Hexagon passes
Differential Revision: http://reviews.llvm.org/D19509

llvm-svn: 267593
2016-04-26 19:46:28 +00:00
Zachary Turner ff788aa0ee Fix warnings and -Werror build on clang.
llvm-svn: 267589
2016-04-26 19:24:10 +00:00
Zachary Turner 53a65ba5c9 Parse and dump PDB DBI Stream Header Information
The DBI stream contains a lot of bookkeeping information for other
streams. In particular it contains information about section contributions
and linked modules. This patch is a first attempt at parsing some of the
information out of the DBI stream. It currently only parses and dumps the
headers of the DBI stream, so none of the module data or section
contribution data is pulled out.

This is just a proof of concept that we understand the basic properties of
the DBI stream's metadata, and followup patches will try to extract more
detailed information out.

Differential Revision: http://reviews.llvm.org/D19500
Reviewed By: majnemer, ruiu

llvm-svn: 267585
2016-04-26 18:42:34 +00:00
Krzysztof Parzyszek 4773f647bd [Tail duplication] Handle source registers with subregisters
When a block is tail-duplicated, the PHI nodes from that block are
replaced with appropriate COPY instructions. When those PHI nodes
contained use operands with subregisters, the subregisters were
dropped from the COPY instructions, resulting in incorrect code.

Keep track of the subregister information and use this information
when remapping instructions from the duplicated block.

Differential Revision: http://reviews.llvm.org/D19337

llvm-svn: 267583
2016-04-26 18:36:34 +00:00
Tim Northover 4397837be2 Reapply: "ARM: put correct symbol index on indirect pointers in __thread_ptr.""
A latent bug in llvm-objdump used the wrong format specifier on 32-bit
targets, causing the test to fail. This fixes the issue.

llvm-svn: 267582
2016-04-26 18:29:16 +00:00
David Majnemer 8cd77baebc [SimplifyLibCalls] sprintf doesn't copy null bytes
sprintf doesn't read or copy the terminating null byte from it's string
operands.  sprintf will append it's own after processing all of the
format specifiers.

This fixes PR27526.

llvm-svn: 267580
2016-04-26 18:16:49 +00:00
Manman Ren 1c3f65a18c Swift Calling Convention: use %RAX for sret.
We don't need to copy the sret argument into %rax upon return.
rdar://25671494

llvm-svn: 267579
2016-04-26 18:08:06 +00:00
Konstantin Zhuravlyov 71515e57f9 [AMDGPU] Move reserved vgpr count for trap handler usage to SIMachineFunctionInfo + minor commenting changes
Differential Revision: http://reviews.llvm.org/D19537

llvm-svn: 267573
2016-04-26 17:24:40 +00:00
Sanjay Patel d66607bd8c [CodeGenPrepare] use branch weight metadata to decide if a select should be turned into a branch
This is part of solving PR27344:
https://llvm.org/bugs/show_bug.cgi?id=27344

CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this
same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the
IR further with a select in place.

For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly
predictable branch. Even the most limited HW branch predictors will be correct on this branch almost
all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all
the times the branch was predicted correctly.

As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's
MispredictPenalty. Or we could just let targets override the default by implementing the hook with that
and other target-specific options. Note that trying to statically determine mispredict rates for 
close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie, 
50/50 taken/not-taken might still be 100% predictable.

Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable()
branch weight default values are 4 and 64. A proposal to change that is in D19435.

Differential Revision: http://reviews.llvm.org/D19488

llvm-svn: 267572
2016-04-26 17:11:17 +00:00
Zachary Turner ce36c1f2ec Fix build broken due to order of initialization problem.
llvm-svn: 267571
2016-04-26 16:57:53 +00:00
Zachary Turner f34e01624a Refactor some more PDB reading code into DebugInfoPDB.
Differential Revision: http://reviews.llvm.org/D19445
Reviewed By: David Majnemer

llvm-svn: 267564
2016-04-26 16:20:00 +00:00
Konstantin Zhuravlyov 1d99c4d03c [AMDGPU] Reserve VGPRs for trap handler usage if instructed
Differential Revision: http://reviews.llvm.org/D19235

llvm-svn: 267563
2016-04-26 15:43:14 +00:00
Nico Weber fa7f4898a9 Use gcc's rules for parsing gcc-style response files
In gcc, \ escapes every character in response files. It is true that this makes
it harder to mention Windows files in rsp files, but not doing this means clang
disagrees with gcc, and also disagrees with the shell (on non-Windows) which
rsp file quoting is supposed to match. clang isn't free to choose what to do
here.

In general, the idea for response files is to take bits of your command line
and write them to a file unchanged, and have things work the same way. Since
the command line would've been interpreted by the shell, things in the rsp file
need to be subject to the same shell quoting rules.

People who want to put Windows-style paths in their response files either need
to do any of:
* escape their backslashes
* or use clang-cl which uses cl.exe/cmd.exe quoting rules
* pass --rsp-quoting=windows to clang to tell it to use
  cl.exe/cmd.exe quoting rules for response files.

Fixes PR27464.
http://reviews.llvm.org/D19417

llvm-svn: 267556
2016-04-26 13:53:56 +00:00
Sam Kolton 3025e7f25f [AMDGPU] Assembler: basic support for SDWA instructions
Support for SDWA instructions for VOP1 and VOP2 encoding.
Not done yet:
  - converters for support optional operands and modifiers
  - VOPC
  - sext() modifier
  - intrinsics
  - VOP2b (see vop_dpp.s)
  - V_MAC_F32 (see vop_dpp.s)

Differential Revision: http://reviews.llvm.org/D19360

llvm-svn: 267553
2016-04-26 13:33:56 +00:00
Andrey Turetskiy b405606432 [X86] PR27502: Fix the LEA optimization pass.
Handle MachineBasicBlock as a memory displacement operand in the LEA optimization pass.

Differential Revision: http://reviews.llvm.org/D19409

llvm-svn: 267551
2016-04-26 12:18:12 +00:00
Marcin Koscielnicki 834381f19c [Sparc] Fix build error introduced by rL267545.
llvm-svn: 267549
2016-04-26 10:43:47 +00:00
Marcin Koscielnicki 0cfb612413 [PowerPC] Add support for llvm.thread.pointer
Differential Revision: http://reviews.llvm.org/D19304

llvm-svn: 267546
2016-04-26 10:37:22 +00:00
Marcin Koscielnicki 33571e2c41 [SPARC] [SSP] Add support for LOAD_STACK_GUARD.
This fixes PR22248 on sparc.

Differential Revision: http://reviews.llvm.org/D19386

llvm-svn: 267545
2016-04-26 10:37:14 +00:00
Marcin Koscielnicki fafb44951a [SPARC] Add support for llvm.thread.pointer.
Differential Revision: http://reviews.llvm.org/D19387

llvm-svn: 267544
2016-04-26 10:37:01 +00:00
Mehdi Amini aa309b1a81 ThinLTOCodeGenerator: preserve linkonce when in "MustPreserved" set
If the linker specifically requested for a linkonce to be preserved,
we need to make sure we won't drop it even if all the uses in the
current module disappear.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267543
2016-04-26 10:35:01 +00:00
Renato Golin 5a55a029c0 Revert "ARM: put correct symbol index on indirect pointers in __thread_ptr."
This reverts commit r267488, as it broke some ARM buildbots.

llvm-svn: 267541
2016-04-26 10:02:02 +00:00
Chuang-Yu Cheng 0600e8d759 [ppc64] Reenable sibling call optimization on ppc64 since fixed tsan library tail-call issue
print-stack-trace.cc test failure of compiler-rt has been fixed by
r266869 (http://reviews.llvm.org/D19148), so reenable sibling call
optimization on ppc64

Reviewers: nemanjai kbarton
llvm-svn: 267527
2016-04-26 07:38:24 +00:00
Craig Topper c5551bfc26 [AArch64] Expand v1i64 and v2i64 ctlz.
The default is legal, which results in 'Cannot select' errors.

llvm-svn: 267522
2016-04-26 05:26:51 +00:00
Craig Topper d8d6be4f99 [ARM] Expand vector ctlz_zero_undef so it becomes ctlz.
The default is Legal, which results in 'Cannot select' errors.

llvm-svn: 267521
2016-04-26 05:04:37 +00:00
Craig Topper edb4a6ba98 [ARM] Expand v1i64 and v2i64 ctlz.
The default is legal, which results in 'Cannot select' errors.

llvm-svn: 267520
2016-04-26 05:04:33 +00:00
Dehao Chen 5d6d4841ed Tune basic block annotation algorithm.
Summary:
Instead of using maximum IR weight as the basic block weight, this patch uses the voting algorithm to find the most likely weight for the basic block. This can effectively avoid the cases when some IRs are annotated incorrectly due to code motion of the profiled binary.

This patch also updates propagate.ll unittest to include discriminator in the input file so that it is testing something meaningful.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19301

llvm-svn: 267519
2016-04-26 04:59:11 +00:00
Hal Finkel e4c0c1679b [SimplifyCFG] Preserve !llvm.mem.parallel_loop_access when merging
When SimplifyCFG merges identical instructions from both sides of a diamond, it
can preserve !llvm.mem.parallel_loop_access (as it does with most of the other
metadata). There's no real data or control dependency change in this case.

llvm-svn: 267515
2016-04-26 02:06:06 +00:00
Hal Finkel 411d31ad72 [LoopVectorize] Don't consider conditional-load dereferenceability for marked parallel loops
I really thought we were doing this already, but we were not. Given this input:

void Test(int *res, int *c, int *d, int *p) {
  for (int i = 0; i < 16; i++)
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}

we did not vectorize the loop. Even with "assume_safety" the check that we
don't if-convert conditionally-executed loads (to protect against
data-dependent deferenceability) was not elided.

One subtlety: As implemented, it will still prefer to use a masked-load
instrinsic (given target support) over the speculated load. The choice here
seems architecture specific; the best option depends on how expensive the
masked load is compared to a regular load. Ideally, using the masked load still
reduces unnecessary memory traffic, and so should be preferred. If we'd rather
do it the other way, flipping the order of the checks is easy.

The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also
implies that if conversion is okay.

Differential Revision: http://reviews.llvm.org/D19512

llvm-svn: 267514
2016-04-26 02:00:36 +00:00
Dan Gohman f456290fca [WebAssembly] Account for implicit operands when computing operand indices.
llvm-svn: 267511
2016-04-26 01:40:56 +00:00
David Majnemer 30ffc4ce45 [SROA] Don't falsely report that changes have occured
We would report that the function changed despite creating no new
allocas or performing any promotion.

This fixes PR27316.

llvm-svn: 267507
2016-04-26 01:05:00 +00:00
Andrew Kaylor 1aa3cf7d18 Reverting Thumb2SizeReduction opt bisect change to fix failing buildbots.
llvm-svn: 267506
2016-04-26 00:56:36 +00:00
Sanjay Patel a31b0c0ece [CodeGenPrepare] don't convert an unpredictable select into control flow
Suggested in the review of D19488:
http://reviews.llvm.org/D19488

llvm-svn: 267504
2016-04-26 00:47:39 +00:00
Junmo Park 3c65acf87e Remove MinLatency in SchedMachineModel. NFC.
Summary:
We don't use MinLatency any more since r184032.

Reviewers: atrick, hfinkel, mcrosier

Differential Revision: http://reviews.llvm.org/D19474

llvm-svn: 267502
2016-04-26 00:37:46 +00:00
Justin Bogner 1a07501379 PM: Port GlobalOpt to the new pass manager
llvm-svn: 267499
2016-04-26 00:28:01 +00:00
Justin Bogner d2f3d0a79d PM: Convert the logic for GlobalOpt into static functions. NFC
Pass all of the state we need around as arguments, so that these
functions are easier to reuse. There is one part of this that is
unusual: we pass around a functor to look up a DomTree for a function.
This will be a necessary abstraction when we try to use this code in
both the legacy and the new pass manager.

llvm-svn: 267498
2016-04-26 00:27:56 +00:00
Ahmed Bougacha 5cf735a5b1 [X86] Use LivePhysRegs in X86FixupBWInsts.
Kill-flags, which computeRegisterLiveness uses, are not reliable.
LivePhysRegs is.

Differential Revision: http://reviews.llvm.org/D19472

llvm-svn: 267495
2016-04-26 00:00:48 +00:00
Sanjay Patel 82059090d3 Add check for "branch_weights" with prof metadata
While we're here, fix the comment and variable names to make it
clear that these are raw weights, not percentages.

llvm-svn: 267491
2016-04-25 23:15:16 +00:00
James Y Knight 51208eaccc [Sparc] Fix double-float fabs and fneg on little endian CPUs.
The SparcV8 fneg and fabs instructions interestingly come only in a
single-float variant. Since the sign bit is always the topmost bit no
matter what size float it is, you simply operate on the high
subregister, as if it were a single float.

However, the layout of double-floats in the float registers is reversed
on little-endian CPUs, so that the high bits are in the second
subregister, rather than the first.

Thus, this expansion must check the endianness to use the correct
subregister.

llvm-svn: 267489
2016-04-25 22:54:09 +00:00
Tim Northover cbba0aba16 ARM: put correct symbol index on indirect pointers in __thread_ptr.
Otherwise the linker has no idea what should be resolved.

llvm-svn: 267488
2016-04-25 22:36:07 +00:00
Andrew Kaylor 736efc894d Fix build warning
llvm-svn: 267487
2016-04-25 22:27:30 +00:00
Andrew Kaylor 7de74af929 Add optimization bisect opt-in calls for AMDGPU passes
Differential Revision: http://reviews.llvm.org/D19450

llvm-svn: 267485
2016-04-25 22:23:44 +00:00
Amaury Sechet 9bbda191ba Reformat LLVMConstPointerNull. NFC
llvm-svn: 267484
2016-04-25 22:23:35 +00:00
Arch D. Robison be0490a6e8 Optimize store of "bitcast" from vector to aggregate.
This patch is what was the "instcombine" portion of D14185, with an additional 
test added (see julia_pseudovec in test/Transforms/InstCombine/insert-val-extract-elem.ll). 
The patch causes instcombine to replace sequences of extractelement-insertvalue-store 
that act essentially like a bitcast followed by a store.

Differential review: http://reviews.llvm.org/D14260

llvm-svn: 267482
2016-04-25 22:22:39 +00:00
Philip Reames 1918384155 [LVI] Make a precondition explicit rather than handling a case which never happens [NFC]
llvm-svn: 267481
2016-04-25 22:21:24 +00:00
Andrew Kaylor a2b9111ef7 Add optimization bisect opt-in calls for ARM passes
Differential Revision: http://reviews.llvm.org/D19449

llvm-svn: 267480
2016-04-25 22:01:04 +00:00
Andrew Kaylor 1ac98bb088 Add optimization bisect opt-in calls for AArch64 passes
Differential Revision: http://reviews.llvm.org/D19394

llvm-svn: 267479
2016-04-25 21:58:52 +00:00
Krzysztof Parzyszek 1711f2d8bd Add accidentally deleted "break"
llvm-svn: 267476
2016-04-25 21:28:52 +00:00
Lang Hames 1fa0e0e006 [ORC] clang-format code that was touched in r267457. NFC.
Commit r267457 made a lot of type-substitutions threw off code formatting and
alignment. This patch should tidy those changes up.

llvm-svn: 267475
2016-04-25 21:21:20 +00:00
Tim Northover 5c3140f745 ARM: put extern __thread stubs in a special section.
The linker needs to know that the symbols are thread-local to do its job
properly.

llvm-svn: 267473
2016-04-25 21:12:04 +00:00
Teresa Johnson c851d216e2 [ThinLTO] Introduce typedef for commonly-used map type (NFC)
Add a typedef for the std::map<GlobalValue::GUID, GlobalValueSummary *>
map that is passed around to identify summaries for values defined in a
particular module. This shortens up declarations in a variety of places.

llvm-svn: 267471
2016-04-25 21:09:51 +00:00
Krzysztof Parzyszek 3e28229000 [Hexagon] Few fixes for exception handling
llvm-svn: 267469
2016-04-25 21:05:19 +00:00
Quentin Colombet abe2d016cf Re-apply r267206 with a fix for the encoding problem: when the immediate of
log2(Mask) is smaller than 32, we must use the 32-bit variant because the 64-bit
variant cannot encode it. Therefore, set the subreg part accordingly.

[AArch64] Fix optimizeCondBranch logic.

The opcode for the optimized branch does not depend on the size
of the activate bits in the AND masks, but the AND opcode itself.
Indeed, we need to use a X or W variant based on the AND variant
not based on whether the mask fits into the related variant.
Otherwise, we may end up using the W variant of the optimized branch
for 64-bit register inputs!

This fixes the last make check verifier issues for AArch64: PR27479.

llvm-svn: 267465
2016-04-25 20:54:08 +00:00
Etienne Bergeron 50f02aa3fa Cleanup redundant expression in InstCombineAndOrXor.
Summary:
The expression is redundant on both side of operator |.

detected by : http://reviews.llvm.org/D19451

Reviewers: rnk, majnemer

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D19459

llvm-svn: 267458
2016-04-25 20:15:33 +00:00
Lang Hames ef5a0ee2c3 [ORC] Thread Error/Expected through the RPC library.
This replaces use of std::error_code and ErrorOr in the ORC RPC support library
with Error and Expected. This required updating the OrcRemoteTarget API, Client,
and server code, as well as updating the Orc C API.

This patch also fixes several instances where Errors were dropped.

llvm-svn: 267457
2016-04-25 19:56:45 +00:00
Matt Arsenault 074ea2851c AMDGPU/SI: Optimize adjacent s_nop instructions
Use the operand for how long to wait. This is somewhat
distasteful, since it would be better to just emit s_nop
with the right argument in the first place. This would require
changing TII::insertNoop to emit N operands, which would be easy.
Slightly more problematic is the post-RA scheduler and hazard recognizer
represent nops as a single null node, and would require inventing
another way of representing N nops.

llvm-svn: 267456
2016-04-25 19:53:22 +00:00
Kostya Serebryany 9ba19182be [libFuzzer] remove dead code
llvm-svn: 267455
2016-04-25 19:41:45 +00:00
Matt Arsenault 99c14524ec AMDGPU: Implement addrspacecast
llvm-svn: 267452
2016-04-25 19:27:24 +00:00
Matt Arsenault 48ab526f12 AMDGPU: Add queue ptr intrinsic
llvm-svn: 267451
2016-04-25 19:27:18 +00:00
Matt Arsenault dfaf4261ab AMDGPU: Add DAG to debug dump
Also reorder case to match enum order

llvm-svn: 267449
2016-04-25 19:27:09 +00:00
Philip Reames 3bb2832900 [LVI] Clarify comments describing the lattice values
There has been much recent confusion about the partition in the lattice between constant and non-constant values.  Hopefully, documenting this will prevent confusion going forward.

llvm-svn: 267440
2016-04-25 18:48:43 +00:00
Philip Reames 6671577eb3 [LVI] Split solveBlockValueConstantRange into two [NFC]
This function handled both unary and binary operators.  Cloning and specializing leads to much easier to follow code with minimal duplicatation.

llvm-svn: 267438
2016-04-25 18:30:31 +00:00
Krzysztof Parzyszek e8e754da74 [Hexagon] Register save/restore functions do not follow regular conventions
Do not mark them as modifying any of the volatile registers by default.

llvm-svn: 267433
2016-04-25 17:49:44 +00:00
Zachary Turner 0a43efea95 Resubmit "Refactor raw pdb dumper into library"
This fixes a number of endianness issues as well as an ODR
violation that hopefully causes everything to be happy.

llvm-svn: 267431
2016-04-25 17:38:08 +00:00
Chad Rosier e2cbd13e56 [ValueTracking] Improve isImpliedCondition when the dominating cond is false.
llvm-svn: 267430
2016-04-25 17:23:36 +00:00
Jacques Pienaar 522031bd05 [lanai] Expand findClosestSuitableAluInstr check to consider offset register.
Previously findClosestSuitableAluInstr was only considering the base register when checking the current instruction for suitability. Expand check to consider the offset if the offset is a register.

llvm-svn: 267424
2016-04-25 16:41:21 +00:00
Marcin Koscielnicki 1c1af6ef77 [PR27390] [CodeGen] Reject indexed loads in CombinerDAG.
visitAND, when folding and (load) forgets to check which output of
an indexed load is involved, happily folding the updated address
output on the following testcase:

target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-unknown-linux-gnu"

%typ = type { i32, i32 }

define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) {
  %b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1
  %1 = load i32, i32* %b, align 4
  %2 = ptrtoint i32* %b to i64
  %3 = and i64 %2, -35184372088833
  %4 = inttoptr i64 %3 to i32*
  %_msld = load i32, i32* %4, align 4
  %zzz = add i32 %1,  %_msld
  ret i32 %zzz
}

Fix this by checking ResNo.

I've found a few more places that currently neglect to check for
indexed load, and tightened them up as well, but I don't have test
cases for them.  In fact, they might not be triggerable at all,
at least with current targets.  Still, better safe than sorry.

Differential Revision: http://reviews.llvm.org/D19202

llvm-svn: 267420
2016-04-25 15:43:44 +00:00
Hrvoje Varga c2dd5d223a [mips][microMIPS] Revert commit r267137
Commit r267137 was the reason for failing tests in LLVM test suite.

llvm-svn: 267419
2016-04-25 15:40:08 +00:00
Zlatko Buljan b43d4bcbd5 [mips][microMIPS] Revert commit r266977
Commit r266977 was reason for failing LLVM test suite with error message: fatal error: error in backend: Cannot select: t17: i32 = rotr t2, t11 ...

llvm-svn: 267418
2016-04-25 15:34:57 +00:00
Etienne Bergeron 06c14ec31e Fix incorrect redundant expression in target AMDGPU.
Summary:
The expression is detected as a redundant expression.
Turn out, this is probably a bug.

```
/home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression]
  if (isSMRD(*FirstLdSt) && isSMRD(*FirstLdSt)) {
```

Reviewers: rnk, tstellarAMD

Subscribers: arsenm, cfe-commits

Differential Revision: http://reviews.llvm.org/D19460

llvm-svn: 267415
2016-04-25 15:06:33 +00:00
David Majnemer dd21523653 [WinEH] Update SplitAnalysis::computeLastSplitPoint to cope with multiple EH successors
We didn't have logic to correctly handle CFGs where there was more than
one EH-pad successor (these are novel with WinEH).
There were situations where a register was live in one exceptional
successor but not another but the code as written would only consider
the first exceptional successor it found.

This resulted in split points which were insufficiently early if an
invoke was present.

This fixes PR27501.

N.B.  This removes getLandingPadSuccessor.

llvm-svn: 267412
2016-04-25 14:31:32 +00:00
Silviu Baranga 82d04260b7 [ARM] Add support for the X asm constraint
Summary:
This patch adds support for the X asm constraint.

To do this, we lower the constraint to either a "w" or "r" constraint
depending on the operand type (both constraints are supported on ARM).

Fixes PR26493

Reviewers: t.p.northover, echristo, rengolin

Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D19061

llvm-svn: 267411
2016-04-25 14:29:18 +00:00
Artem Tamazov d6468666b5 [AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax.
Added hwreg(reg[,offset,width]) syntax.
Default offset = 0, default width = 32.
Possibility to specify 16-bit immediate kept.
Added out-of-range checks.
Disassembling is always to hwreg(...) format.
Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19329

llvm-svn: 267410
2016-04-25 14:13:51 +00:00
Anna Thomas 95f68aa7eb Test commit: modified comment. NFC
llvm-svn: 267406
2016-04-25 13:58:05 +00:00
Chad Rosier 3d75f8ce9e Typo. NFC.
llvm-svn: 267399
2016-04-25 13:25:14 +00:00
Krzysztof Parzyszek e6ee481bdf [Hexagon] Correctly set "Flags" in ELF header
llvm-svn: 267397
2016-04-25 12:49:47 +00:00
James Molloy eb040cc55f [GlobalOpt] Allow constant globals to be SRA'd
The current logic assumes that any constant global will never be SRA'd. I presume this is because normally constant globals can be pushed into their uses and deleted. However, that sometimes can't happen (which is where you really want SRA, so the elements that can be eliminated, are!).

There seems to be no reason why we can't SRA constants too, so let's do it.

llvm-svn: 267393
2016-04-25 10:48:29 +00:00
Igor Kudrin ed99a96f06 [Coverage] Restore the correct count value after processing a nested region in case of combined regions.
If several regions cover the same area of code, we have to restore
the combined value for that area when return from a nested region.

This patch achieves that by combining regions before calling buildSegments.

Differential Revision: http://reviews.llvm.org/D18610

llvm-svn: 267390
2016-04-25 09:43:37 +00:00
Silviu Baranga 795c629ec9 [SCEV] Improve the run-time checking of the NoWrap predicate
Summary:
This implements a new method of run-time checking the NoWrap
SCEV predicates, which should be easier to optimize and nicer
for targets that don't correctly handle multiplication/addition
of large integer types (like i128).

If the AddRec is {a,+,b} and the backedge taken count is c,
the idea is to check that |b| * c doesn't have unsigned overflow,
and depending on the sign of b, that:

   a + |b| * c >= a (b >= 0) or
   a - |b| * c <= a (b <= 0)

where the comparisons above are signed or unsigned, depending on
the flag that we're checking.

The advantage of doing this is that we avoid extending to a larger
type and we avoid the multiplication of large types (multiplying
i128 can be expensive).

Reviewers: sanjoy

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D19266

llvm-svn: 267389
2016-04-25 09:27:16 +00:00
Marcin Koscielnicki a44d44cb2e [PowerPC] [PR27387] Disallow r0 for ADD8TLS.
ADD8TLS, a variant of add instruction used for initial-exec TLS,
currently accepts r0 as a source register.  While add itself supports
r0 just fine, linker can relax it to a local-exec sequence, converting
it to addi - which doesn't support r0.

Differential Revision: http://reviews.llvm.org/D19193

llvm-svn: 267388
2016-04-25 09:24:34 +00:00
Mehdi Amini bf4513b9aa Run GlobalOpt before emitting the bitcode for ThinLTO
This is motivated by reducing the size of the IR and thus reduce
compile time.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267385
2016-04-25 08:47:49 +00:00
Mehdi Amini f72ca86b71 ThinLTO: Move createNameAnonFunctionPass insertion in PassManagerBuilder (NFC)
It is just code motion, but makes more sense this way.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267384
2016-04-25 08:47:37 +00:00
Craig Topper 03734c7ce1 [X86] Replace a SmallVector used to pass 2 values to an ArrayRef parameter with a fixed size array. NFC
llvm-svn: 267377
2016-04-25 04:30:29 +00:00
Junmo Park 884455e9bd Minor code cleanups. NFC.
llvm-svn: 267375
2016-04-25 01:40:54 +00:00
Adrian Prantl 93035c8f47 Verifier: Verify that each inlinable callsite of a debug-info-bearing function
in a debug-info-bearing function has a debug location attached to it. Failure to
do so causes an "!dbg attachment points at wrong subprogram for function"
assertion failure when the inliner sets up inline scope info.

rdar://problem/25878916

This reaplies r267320 without changes after fixing an issue in the OpenMP IR
generator in clang.

llvm-svn: 267370
2016-04-24 22:23:13 +00:00
Saleem Abdulrasool 9611518646 ARM: fix __chkstk Frame Setup on WoA
This corrects the MI annotations for the stack adjustment following the __chkstk
invocation.  We were marking the original SP usage as a Def rather than Kill.
The (new) assigned value is the definition, the original reference is killed.

Adjust the ISelLowering to mark Kills and FrameSetup as well.

This partially resolves PR27480.

llvm-svn: 267361
2016-04-24 20:12:48 +00:00
Simon Pilgrim 4c564ad4dd Tweak comments to make it clear that these combines are for SSE scalar instructions.
llvm-svn: 267360
2016-04-24 19:31:56 +00:00
Simon Pilgrim 4b5462f119 [InstCombine][SSE] Reduce DIVSS/DIVSD to FDIV if only first element is required
As discussed on D19318, if we only demand the first element of a DIVSS/DIVSD intrinsic, then reduce to a FDIV call. This matches the existing FADD/FSUB/FMUL patterns.

llvm-svn: 267359
2016-04-24 18:35:59 +00:00
Simon Pilgrim 83020942d3 [InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 2 of 2)
Split from D17490. This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:

1 - demanded vector element support for unary and some extra binary scalar intrinsics (RCP/RSQRT/SQRT/FRCZ and ADD/CMP/DIV/ROUND).

2 - addss/addsd get simplified to a fadd call if we aren't interested in the pass through elements

3 - if we don't need the lowest element of a scalar operation then just use the first argument (the pass through elements) directly

We can add support for propagating demanded elements through any equivalent packed SSE intrinsics in a future patch (these wouldn't use the pass through patterns).

Differential Revision: http://reviews.llvm.org/D19318

llvm-svn: 267357
2016-04-24 18:23:14 +00:00
Simon Pilgrim 424da1637a [InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 1 of 2)
This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:

1 - recognise that we only need the lowest element of the second input for binary scalar operations (and all the elements of the first input)

2 - recognise that the roundss/roundsd intrinsics use the lowest element of the second input and the remaining elements from the first input

Differential Revision: http://reviews.llvm.org/D17490

llvm-svn: 267356
2016-04-24 18:12:42 +00:00
Simon Pilgrim 1c9a9f255c [InstCombine] Avoid updating argument demanded elements in separate passes.
As discussed on D17490, we should attempt to update an intrinsic's arguments demanded elements in one pass if we can.

llvm-svn: 267355
2016-04-24 17:57:27 +00:00
Nick Lewycky d595d4265b Fix typo in comment. NFC
llvm-svn: 267354
2016-04-24 17:55:57 +00:00
Nick Lewycky af50837a31 Remove emacs mode markers from .cpp files. NFC
.cpp files are unambiguously C++, you only need the mode markers on .h files.

llvm-svn: 267353
2016-04-24 17:55:41 +00:00
Simon Pilgrim 2f6097d113 [X86][InstCombine] Tidyup VPERMILVAR -> shufflevector conversion to helper function. NFCI.
llvm-svn: 267352
2016-04-24 17:23:46 +00:00
Simon Pilgrim c0c56e747a [X86][InstCombine] Tidyup PSHUFB -> shufflevector conversion to helper function. NFCI.
llvm-svn: 267351
2016-04-24 17:00:34 +00:00
Simon Pilgrim dd748b83aa [X86][SSE] getTargetShuffleMaskIndices - dropped (unused) UNDEF handling
We aren't currently making use of this in any successful mask decode and its actually incorrect as it inserts the wrong number of SM_SentinelUndef mask elements.

llvm-svn: 267350
2016-04-24 16:49:53 +00:00
Simon Pilgrim 7c25ef92a3 [X86][SSE] Use range loop. NFCI.
llvm-svn: 267349
2016-04-24 16:33:35 +00:00
Craig Topper efea6dbc6c [Lanai] Use EVT::getEVTString() to print a type as a string instead of an enum encoding value.
llvm-svn: 267348
2016-04-24 16:30:51 +00:00
Simon Pilgrim f379a6c684 [X86][XOP] Fixed VPPERM permute op decoding (PR27472).
Fixed issue with VPPERM target shuffle mask decoding that was incorrectly masking off the 3-bit permute op with a 2-bit mask.

llvm-svn: 267346
2016-04-24 15:05:04 +00:00
Duncan P. N. Exon Smith 03a04a58ea BitcodeReader: Delay metadata parsing until reading a function body
There's hardly any functionality change here.  Instead of calling
materializeMetadata on the first call to materialize(GlobalValue*), wait
until the first one that's actually going to do something.  Noticed by
inspection; I don't have a concrete case where this makes a difference.

Added an assertion in materializeMetadata to be sure this (or a future
change) doesn't delay materializeMetadata after function-level metadata.

llvm-svn: 267345
2016-04-24 15:04:28 +00:00
Teresa Johnson 28e457bccd [ThinLTO] Remove GlobalValueInfo class from index
Summary:
Remove the GlobalValueInfo and change the ModuleSummaryIndex to directly
reference summary objects. The info structure was there to support lazy
parsing of the combined index summary objects, which is no longer
needed and not supported.

Reviewers: joker.eph

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19462

llvm-svn: 267344
2016-04-24 14:57:11 +00:00
Simon Pilgrim 9f5697ef68 [X86][SSE] Improved support for decoding target shuffle masks through bitcasts
Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm.

This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472.

llvm-svn: 267343
2016-04-24 14:53:54 +00:00
Marcin Koscielnicki aef3b5b5e2 [SystemZ] [SSP] Add support for LOAD_STACK_GUARD.
This fixes PR22248 on s390x.  The previous attempt at this was D19101,
which was before LOAD_STACK_GUARD existed.  Compared to the previous
version, this always emits a rather ugly block of 4 instructions, involving
a thread pointer load that can't be shared with other potential users.
However, this is necessary for SSP - spilling the guard value (or thread
pointer used to load it) is counter to the goal, since it could be
overwritten along with the frame it protects.

Differential Revision: http://reviews.llvm.org/D19363

llvm-svn: 267340
2016-04-24 13:57:49 +00:00
Aaron Ballman c79c2be7d2 Silence two C4806 warnings ('|': unsafe operation: no value of type 'bool' promoted to type 'const unsigned int' can equal the given constant). The fact that they trigger with this code seems like it may be a bug, but the warning itself is still generally useful enough to retain it for now.
llvm-svn: 267337
2016-04-24 13:03:20 +00:00
Duncan P. N. Exon Smith 5892c6b94b BitcodeReader: Fix some holes in upgrade from r267296
Add tests for some missing cases to bitcode upgrade in r267296.

  - DICompositeType with an 'elements:' field, which will cause it to be
    involved in a cycle after the upgrade.

  - A DIDerivedType that references a class in 'extraData:'.

I updated test/Bitcode/dityperefs-3.8.ll with the missing cases and
regenerated test/Bitcode/dityperefs-3.8.ll.bc.

llvm-svn: 267332
2016-04-24 06:52:01 +00:00
Craig Topper dbc981f71f [X86] Merge LowerCTLZ and LowerCTLZ_ZERO_UNDEF into a single function that branches internally for the one difference, allowing the rest of the code to be common. NFC
llvm-svn: 267331
2016-04-24 06:27:39 +00:00
Craig Topper 6469a39f51 [X86] Node need to check if AVX512 is supported when lowering vector CTLZ. The CTLZ operation is only Custom for vectors if AVX512 is enabled so if a vector gets here AVX512 is implied. NFC
llvm-svn: 267330
2016-04-24 06:27:35 +00:00
Mehdi Amini ca2c54e04e Add "hasSection" flag in the Summary
Reviewers: tejohnson

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19405

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267329
2016-04-24 05:31:43 +00:00
Gerolf Hoflehner 01b3a6184a [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)
The original patch caused crashes because it could derefence a null pointer
for SelectionDAGTargetInfo for targets that do not define it.

Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:

- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math

llvm-svn: 267328
2016-04-24 05:14:01 +00:00
Craig Topper e78eac1d31 [X86] Remove isel patterns for selecting tzcnt/lzcnt from cmove/ne+cttz/ctlz. These are folded by DAG combine now.
llvm-svn: 267326
2016-04-24 04:38:34 +00:00
Craig Topper 36c133159a [CodeGen] Teach DAG combine to fold select_cc seteq X, 0, sizeof(X), ctlz_zero_undef(X) -> ctlz(X). InstCombine already does this for IR and X86 pattern matches this during isel.
A follow up commit will remove the X86 patterns to allow this to be tested.

llvm-svn: 267325
2016-04-24 04:38:32 +00:00
Craig Topper beb77bd89f Fix an assertion that can never fire because the condition ANDed with the string is just true or 1.
llvm-svn: 267324
2016-04-24 04:38:29 +00:00
Adrian Prantl da99dced5c Revert "Verifier: Verify that each inlinable callsite of a debug-info-bearing function"
This reverts commit r267320 while investigating an OpenMP buildbot failure.

llvm-svn: 267322
2016-04-24 03:47:37 +00:00
Adrian Prantl 971481db98 Verifier: Verify that each inlinable callsite of a debug-info-bearing function
in a debug-info-bearing function has a debug location attached to it. Failure to
do so causes an "!dbg attachment points at wrong subprogram for function"
assertion failure when the inliner sets up inline scope info.

rdar://problem/25878916

llvm-svn: 267320
2016-04-24 03:23:02 +00:00
Mehdi Amini c3ed48c1bd Reorganize GlobalValueSummary with a "Flags" bitfield.
Right now it only contains the LinkageType, but will be extended
with "hasSection", "isOptSize", "hasInlineAssembly", etc.

Differential Revision: http://reviews.llvm.org/D19404

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267319
2016-04-24 03:18:18 +00:00
Mehdi Amini 8fe6936e18 Add a version field in the bitcode for the summary
Differential Revision: http://reviews.llvm.org/D19456

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267318
2016-04-24 03:18:11 +00:00
Mehdi Amini 059464fe36 Add an internalization step to the ThinLTOCodeGenerator
Keeping as much as possible internal/private is
known to help the optimizer. Let's try to benefit from
this in ThinLTO.
Note: this is early work, but is enough to build clang (and
all the LLVM tools). I still need to write some lit-tests...

Differential Revision: http://reviews.llvm.org/D19103

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267317
2016-04-24 03:18:01 +00:00
Craig Topper 855d182656 Fix a couple assertions that can never fire because they just contained the text string which always evaluates to true. Add a ! so they'll evaluate to false.
llvm-svn: 267312
2016-04-24 02:01:25 +00:00
Craig Topper 601b6c69bc [X86] Fix patterns that turn cmove/cmovne+ctlz/cttz into lzcnt/tzcnt instructions. Only one of the conditions should be valid for each pattern, not both. Update tests accordingly.
llvm-svn: 267311
2016-04-24 02:01:22 +00:00
Davide Italiano cefdf238e9 [RuntimeDyldELF] Handle GOTPCRELX/REX_GOTPCRELX.
llvm-svn: 267309
2016-04-24 01:36:37 +00:00
Davide Italiano f59b0da654 [MC/ELF] Implement support for GOTPCRELX/REX_GOTPCRELX.
The option to control the emission of the new relocations
is -relax-relocations (blatantly copied from GNU as).
It can't be enabled by default because it breaks relatively
recent versions of ld.bfd/ld.gold (late 2015).

llvm-svn: 267307
2016-04-24 01:03:57 +00:00
Mehdi Amini ae64eafd31 Store and emit original name in combined index
Summary:
As discussed in D18298, some local globals can't
be renamed/promoted (because they have a section, or because
they are referenced from inline assembly).
To be able to detect naming collision, we need to keep around
the "GUID" using their original name without taking the linkage
into account.

Reviewers: tejohnson

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19454

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267304
2016-04-23 23:38:17 +00:00
Mehdi Amini cb87494f4c Always traverse GlobalVariable initializer when computing the export list
Summary:
We are always importing the initializer for a GlobalVariable.
So if a GlobalVariable is in the export-list, we pull in any
refs as well.

Reviewers: tejohnson

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19102

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267303
2016-04-23 23:29:24 +00:00
Duncan P. N. Exon Smith c5225dfbea DebugInfo: Change DIBuilder to make distinct DIGlobalVariables
A long overdue change to make DIGlobalVariable distinct.  Much like
DISubprogram definitions (changed in r246098), it isn't logical to
unique DIGlobalVariable definitions from two different compile units.

(Longer-term, we should also find a way to reverse the link between
GlobalVariable and DIGlobalVariable, and between DIGlobalVariable and
DICompileUnit, so that debug info to do with optimized-out globals
disappears.  Admittedly it's harder than with Function/DISubprogram,
since global variables may be constant-folded and the debug info should
still describe that somehow.)

llvm-svn: 267301
2016-04-23 22:29:09 +00:00
Davide Italiano 4652c59568 [MC/ELF] Pass Fixup to getRelocType64.
In preparation for other changes.

llvm-svn: 267300
2016-04-23 22:26:31 +00:00
Duncan P. N. Exon Smith 69bdc8afa9 BitcodeReader: Avoid std::vector with non-movable types from r267296
r267298 didn't quite fix the build errors.  Use SmallVector instead of
std::vector, the latter of which I think is trying to maintain a strong
exception safety guarantee.

http://lab.llvm.org:8011/builders/lldb-amd64-ninja-freebsd11/builds/6228

llvm-svn: 267299
2016-04-23 21:36:59 +00:00
Duncan P. N. Exon Smith ece57ddd56 BitcodeReader: Avoid non-moving std::piecewise_construct from r267296
Not exactly sure why the host tries to use a copy constructor here, but
it's easy enough to work around it.

http://lab.llvm.org:8011/builders/lldb-amd64-ninja-freebsd11/builds/6227

llvm-svn: 267298
2016-04-23 21:23:41 +00:00
Duncan P. N. Exon Smith a59d3e5af8 DebugInfo: Remove MDString-based type references
Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around
DIType*.  It is no longer legal to refer to a DICompositeType by its
'identifier:', and DIBuilder no longer retains all types with an
'identifier:' automatically.

Aside from the bitcode upgrade, this is mainly removing logic to resolve
an MDString-based reference to an actualy DIType.  The commits leading
up to this have made the implicit type map in DICompileUnit's
'retainedTypes:' field superfluous.

This does not remove DITypeRef, DIScopeRef, DINodeRef, and
DITypeRefArray, or stop using them in DI-related metadata.  Although as
of this commit they aren't serving a useful purpose, there are patchces
under review to reuse them for CodeView support.

The tests in LLVM were updated with deref-typerefs.sh, which is attached
to the thread "[RFC] Lazy-loading of debug info metadata":

  http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html

llvm-svn: 267296
2016-04-23 21:08:00 +00:00
Sanjay Patel dc88bd6e1f replace duplicated static functions for profile metadata access with BranchInst member function; NFCI
llvm-svn: 267295
2016-04-23 20:01:22 +00:00
Renato Golin 179d1f5dad Revert "[AArch64] Fix optimizeCondBranch logic."
This reverts commit r267206, as it broke self-hosting on AArch64.

llvm-svn: 267294
2016-04-23 19:30:52 +00:00
Sanjay Patel 85ce0f1f1f improve documentation comments; NFC
llvm-svn: 267292
2016-04-23 16:31:48 +00:00
Craig Topper 7e5fad66f3 [CodeGen] When promoting CTTZ operations to larger type, don't insert a select to detect if the input is zero to return the original size instead of the extended size. Instead just set the first bit in the zero extended part.
llvm-svn: 267280
2016-04-23 05:20:47 +00:00
Duncan P. N. Exon Smith 30805b2417 BitcodeWriter: Emit uniqued subgraphs after all distinct nodes
Since forward references for uniqued node operands are expensive (and
those for distinct node operands are cheap due to
DistinctMDOperandPlaceholder), minimize forward references in uniqued
node operands.

Moreover, guarantee that when a cycle is broken by a distinct node, none
of the uniqued nodes have any forward references.  In
ValueEnumerator::EnumerateMetadata, enumerate uniqued node subgraphs
first, delaying distinct nodes until all uniqued nodes have been
handled.  This guarantees that uniqued nodes only have forward
references when there is a uniquing cycle (since r267276 changed
ValueEnumerator::organizeMetadata to partition distinct nodes in front
of uniqued nodes as a post-pass).

Note that a single uniqued subgraph can hit multiple distinct nodes at
its leaves.  Ideally these would themselves be emitted in post-order,
but this commit doesn't attempt that; I think it requires an extra pass
through the edges, which I'm not convinced is worth it (since
DistinctMDOperandPlaceholder makes forward references quite cheap
between distinct nodes).

I've added two testcases:

  - test/Bitcode/mdnodes-distinct-in-post-order.ll is just like
    test/Bitcode/mdnodes-in-post-order.ll, except with distinct nodes
    instead of uniqued ones.  This confirms that, in the absence of
    uniqued nodes, distinct nodes are still emitted in post-order.

  - test/Bitcode/mdnodes-distinct-nodes-break-cycles.ll is the minimal
    example where a naive post-order traversal would cause one uniqued
    node to forward-reference another.  IOW, it's the motivating test.

llvm-svn: 267278
2016-04-23 04:59:22 +00:00
Duncan P. N. Exon Smith 498b4977ba Avoid MSVC failure with default arguments in lambdas from r267270
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/11700

llvm-svn: 267277
2016-04-23 04:52:47 +00:00
Duncan P. N. Exon Smith 1483fff271 BitcodeWriter: Emit distinct nodes before uniqued nodes
When an operand of a distinct node hasn't been read yet, the reader can
use a DistinctMDOperandPlaceholder.  This is much cheaper than forward
referencing from a uniqued node.  Change
ValueEnumerator::organizeMetadata to partition distinct nodes and
uniqued nodes to reduce the overhead of cycles broken by distinct nodes.

Mehdi measured this for me; this removes most of the RAUW from the
importing step of -flto=thin, even after a WIP patch that removes
string-based DITypeRefs (introducing many more cycles to the metadata
graph).

llvm-svn: 267276
2016-04-23 04:42:39 +00:00
Teresa Johnson c814e0c1fd Address comments.
llvm-svn: 267274
2016-04-23 04:31:20 +00:00
Teresa Johnson 37687f3911 Refactor bitcode writer into classes (NFC)
Summary:
As discussed in on the mailing list yesterday, I have refactored
BitcodeWriter.cpp to use classes to manage the bitcode writing process,
instead of passing around long lists of parameters between static
functions. See:
  http://lists.llvm.org/pipermail/llvm-dev/2016-April/098610.html

I created a parent BitcodeWriter class to own the BitstreamWriter,
write the header, and contain the main entry point into the writing
process. There are two derived classes, one for writing a module and one
for writing a combined index file (for ThinLTO), which manage the
writing process specific to those bitcode file types.

I also changed the functions to conform to LLVM coding standards
(lowercase function name first letter). The only two routines that still
start with an uppercase letter are the two external interfaces, which
can be fixed as a follow-on (I wanted to keep this round just within
BitcodeWriter.cpp).

Reviewers: dexonsmith, joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19447

llvm-svn: 267273
2016-04-23 04:30:47 +00:00
Duncan P. N. Exon Smith e9f85c482b Avoid ternery statement to please g++ after r267270, NFC
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/36074

llvm-svn: 267272
2016-04-23 04:23:57 +00:00
Duncan P. N. Exon Smith d9bbdce715 ValueEnumerator: Use std::find_if, NFC
Mehdi's pattern recognition pulled this one out.  This is cleaner with
std::find_if than with the strange helper function that took an iterator
by reference and updated it.

llvm-svn: 267271
2016-04-23 04:22:38 +00:00
Duncan P. N. Exon Smith 4b1bc647f0 BitcodeReader: Avoid referencing unresolved nodes from distinct ones
Each reference to an unresolved MDNode is expensive, since the RAUW
support in MDNode uses a separate allocation and side map.  Since
a distinct MDNode doesn't require its operands on creation (unlike
uniuqed nodes, there's no need to check for structural equivalence),
use nullptr for any of its unresolved operands.  Besides reducing the
burden on MDNode maps, this can avoid allocating temporary MDNodes in
the first place.

We need some way to track operands.  Invent DistinctMDOperandPlaceholder
for this purpose, which is a Metadata subclass that holds an ID and
points at its single user.  DistinctMDOperandPlaceholder::replaceUseWith
is just like RAUW, but its name highlights that there is only ever
exactly one use.

There is no support for moving (or, obviously, copying) these.  Move
support would be possible but expensive; leaving it unimplemented
prevents user error.  In the BitcodeReader I originally considered
allocating on a BumpPtrAllocator and keeping a vector of pointers to
them, and then I realized that std::deque implements exactly this.

A couple of obvious follow-ups:

  - Change ValueEnumerator to emit distinct nodes first to take more
    advantage of this optimization.  (How convenient... I think I might
    have a couple of patches for this.)

  - Change DIBuilder and its consumers (like CGDebugInfo in clang) to
    use something like this when constructing debug info in the first
    place.

llvm-svn: 267270
2016-04-23 04:15:56 +00:00
Duncan P. N. Exon Smith 30ab4b4788 BitcodeReader: Consistently use IsDistinct, NFC
Consistently use the IsDistinct variable and start relying on it in
GET_OR_DISTINCT.  This change has NFC, but prepares for using IsDistinct
to optimize the behaviour of the getMD() and getMDOrNull() helpers.

llvm-svn: 267268
2016-04-23 04:01:57 +00:00
Duncan P. N. Exon Smith 004509dca1 BitcodeReader: Use getMD/getMDOrNull helpers consistently, almost NFC
The only functionality change was removing an error check from the
BitcodeReader (and an assertion from DILocation::getImpl) that is
already caught by Verifier::visitDILocation.  The Verifier is a better
place for this anyway, and being inconsistent with other subclasses of
MDNode isn't serving anyone.

llvm-svn: 267267
2016-04-23 03:55:14 +00:00
Craig Topper 6e6a1f0a5b [Hexagon] Set ctlz_zero_undef/cttz_zero_undef to Expand so LegalizeDAG will convert them to ctlz/cttz. Remove the now unneccessary isel patterns. NFC
llvm-svn: 267266
2016-04-23 02:49:31 +00:00
Craig Topper 6f8b8e4c45 [NVPTX] Set ctlz_zero_undef to Expand so LegalizeDAG will convert it to ctlz. Remove the now unneccessary isel patterns. NFC
llvm-svn: 267265
2016-04-23 02:49:29 +00:00
Craig Topper b297b6b0c9 [WebAssembly] Set ctlz_zero_undef/cttz_zero_undef to Expand so LegalizeDAG will convert them to ctlz/cttz. Remove the now unneccessary isel patterns. NFC
llvm-svn: 267264
2016-04-23 02:49:25 +00:00
Amaury Sechet b130f43bfb Style fix in Core.h / Core.cpp. NFC
llvm-svn: 267257
2016-04-23 00:12:45 +00:00