Commit Graph

97308 Commits

Author SHA1 Message Date
Craig Topper a363d42973 [AVX-512] Put the AVX-512 sections of the load folding tables into mostly alphabetical order. This is consistent with the older sections of the table. NFC
llvm-svn: 287956
2016-11-25 23:21:34 +00:00
David Majnemer d5648c7a7d Replace some callers of setTailCall with setTailCallKind
We were a little sloppy with adding tailcall markers.  Be more
consistent by using setTailCallKind instead of setTailCall.

llvm-svn: 287955
2016-11-25 22:35:09 +00:00
Marek Olsak 79c05871a2 AMDGPU/SI: Add back reverted SGPR spilling code, but disable it
suggested as a better solution by Matt

llvm-svn: 287942
2016-11-25 17:37:09 +00:00
Simon Pilgrim c5fb167df0 Use SDValue helpers instead of explicitly going via SDValue::getNode(). NFCI
llvm-svn: 287941
2016-11-25 17:25:21 +00:00
Simon Pilgrim 8e8ae7219f Use SDValue helper instead of explicitly going via SDValue::getNode(). NFCI
llvm-svn: 287940
2016-11-25 17:19:53 +00:00
Craig Topper 88071b37ab [AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size.
Summary:
Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations.

This patch detects this and changes the element size to match the vselect.

I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now.

Reviewers: delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27087

llvm-svn: 287939
2016-11-25 16:48:05 +00:00
Craig Topper 1e48829747 [AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables.
llvm-svn: 287937
2016-11-25 16:33:53 +00:00
Marek Olsak e3895bfb47 Revert "AMDGPU: Implement SGPR spilling with scalar stores"
This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e.

llvm-svn: 287936
2016-11-25 16:03:34 +00:00
Marek Olsak dad553a5cf Revert "AMDGPU: Fix MMO when splitting spill"
This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1.

llvm-svn: 287935
2016-11-25 16:03:27 +00:00
Marek Olsak 8cbbf65361 Revert "AMDGPU: Fix adding extra implicit def of register"
This reverts commit e834ce5976567575621901fb967b8018b9916d71.

llvm-svn: 287934
2016-11-25 16:03:22 +00:00
Marek Olsak 713e6fc531 Revert "AMDGPU: Fix not setting kill flag on temp reg when spilling"
This reverts commit 057bbbe4ae170247ba37f08f2e70ef185267d1bb.

llvm-svn: 287933
2016-11-25 16:03:19 +00:00
Marek Olsak a45dae458d Revert "AMDGPU: Make m0 unallocatable"
This reverts commit 124ad83dae04514f943902446520c859adee0e96.

llvm-svn: 287932
2016-11-25 16:03:15 +00:00
Marek Olsak ea848df84c Revert "AMDGPU: Remove m0 spilling code"
This reverts commit f18de36554eb22416f8ba58e094e0272523a4301.

llvm-svn: 287931
2016-11-25 16:03:06 +00:00
Marek Olsak 18a95bcb3c Revert "AMDGPU: Preserve m0 value when spilling"
This reverts commit a5a179ffd94fd4136df461ec76fb30f04afa87ce.

llvm-svn: 287930
2016-11-25 16:03:02 +00:00
Abhilash Bhandari 54e5a1a4da [Loop Unswitch] Patch to selective unswitch only the reachable branch instructions.
Summary:
The iterative algorithm for Loop Unswitching may render some of the branches unreachable in the unswitched loops.
Given the exponential nature of the algorithm, this is quite an overhead.
This patch fixes this problem by selectively unswitching only those branches within a loop that are reachable from the loop header.

Reviewers: Michael Zolothukin, Anna Thomas, Weiming Zhao.
Subscribers: llvm-commits.

Differential Revision: http://reviews.llvm.org/D26299

llvm-svn: 287925
2016-11-25 14:07:44 +00:00
Simon Dardis c08af6db5b [mips] Correct jal expansion for local symbols in .local directives.
This patch corrects the behaviour of code such as:

   .local foo
   jal foo
foo:
to use the correct jal expansion when writing ELF files.

Patch by: Daniel Sanders

Reviewers: zoran.jovanovic, seanbruno, vkalintiris

Differential Revision: https://reviews.llvm.org/D24722

llvm-svn: 287918
2016-11-25 11:06:43 +00:00
Craig Topper d4091494d3 [X86] Invert an 'if' and early out to fix a weird indentation. NFCI
llvm-svn: 287909
2016-11-25 02:29:24 +00:00
Craig Topper a46936185a [X86] Size a SmallVector to the worst case mask size for a 512-bit shuffle. NFCI
llvm-svn: 287908
2016-11-25 02:29:21 +00:00
Craig Topper 8c4cdf06db [DAGCombine] Teach DAG combine that if both inputs of a vselect are the same, then the condition doesn't matter and the vselect can be removed.
Selects with scalar condition already handle this correctly.

llvm-svn: 287904
2016-11-24 21:48:52 +00:00
Serge Rogatch a331133e6d Test commit access.
llvm-svn: 287898
2016-11-24 18:51:47 +00:00
Simon Pilgrim f1ee930db0 Fix unused variable warning
llvm-svn: 287889
2016-11-24 15:24:47 +00:00
Benjamin Kramer fc54e35d94 [X86] Don't round trip a unique_ptr through a raw pointer for assignment.
No functional change.

llvm-svn: 287888
2016-11-24 15:17:39 +00:00
Simon Pilgrim 9c71e07276 [X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64
Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit).

The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32.

Differential Revision: https://reviews.llvm.org/D26938

llvm-svn: 287886
2016-11-24 15:12:56 +00:00
Simon Pilgrim 841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Simon Pilgrim 7c26a6f9ef [X86][AVX512DQVL] Add awareness of vcvtqq2ps and vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result
llvm-svn: 287878
2016-11-24 14:02:30 +00:00
Simon Pilgrim ab323ec411 [X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering
llvm-svn: 287877
2016-11-24 13:38:59 +00:00
Nikolai Bozhenov 3a8d108b2b [x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.

It fixes PR28755.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25088

llvm-svn: 287875
2016-11-24 13:23:35 +00:00
Nikolai Bozhenov bb64aa14a3 [x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserter
Move the definitions of three variables out of the switch.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25192

llvm-svn: 287874
2016-11-24 13:15:49 +00:00
Nikolai Bozhenov a2dabed3b6 [x86] Rewrite getAddressFromInstr helper function
- It does not modify the input instruction
- Second operand of any address is always an Index Register,
  make sure we actually check for that, instead of a check for
  an immediate value

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D24938

llvm-svn: 287873
2016-11-24 13:05:43 +00:00
Simon Pilgrim a3af79678e [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.

This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.

Differential Revision: https://reviews.llvm.org/D27072

llvm-svn: 287870
2016-11-24 12:13:46 +00:00
Peter Collingbourne debb6f6cc1 Object: Add IRObjectFile::getTargetTriple().
This lets us remove a use of IRObjectFile::getModule() in llvm-nm.

Differential Revision: https://reviews.llvm.org/D27074

llvm-svn: 287846
2016-11-24 01:13:09 +00:00
Peter Collingbourne e32baa0c3e Object: Simplify the IRObjectFile symbol iterator implementation.
Change the IRObjectFile symbol iterator to be a pointer into a vector of
PointerUnions representing either IR symbols or asm symbols.

This change is in preparation for a future change for supporting multiple
modules in an IRObjectFile. Although it causes an increase in memory
consumption, we can deal with that issue separately by introducing a bitcode
symbol table.

Differential Revision: https://reviews.llvm.org/D26928

llvm-svn: 287845
2016-11-24 00:41:05 +00:00
Matt Arsenault 7b54dd039e AMDGPU: Preserve m0 value when spilling
llvm-svn: 287844
2016-11-24 00:26:50 +00:00
Matt Arsenault 94b32ffe8e TRI: Add hook to pass scavenger during frame elimination
The scavenger was not passed if requiresFrameIndexScavenging was
enabled. I need to be able to test for the availability of an
unallocatable register here, so I can't create a virtual register for
it.

It might be better to just always use the scavenger and stop
creating virtual registers.

llvm-svn: 287843
2016-11-24 00:26:47 +00:00
Matt Arsenault 5ee3325358 AMDGPU: Remove m0 spilling code
Since m0 isn't allocatable it should never be spilled anymore.

llvm-svn: 287842
2016-11-24 00:26:44 +00:00
Matt Arsenault 9e5c7b1031 AMDGPU: Make m0 unallocatable
m0 may need to be written for spill code, so
we don't want general code uses relying on the
value stored in it.

This introduces a few code quality regressions where copies
from m0 are not coalesced into copies of a copy of m0.

llvm-svn: 287841
2016-11-24 00:26:40 +00:00
Davide Italiano 8812f28f47 [lib/LTO] Rename few instances of Lto to LTO.
llvm-svn: 287840
2016-11-24 00:23:09 +00:00
Greg Clayton e65439797a Rely on a single DWARF version instead of having two copies
This patch makes AsmPrinter less reliant on DwarfDebug by relying on the DWARF version in the AsmPrinter's MCStreamer's MCContext. This allows us to remove the redundant DWARF version from DwarfDebug. It also lets us change code that used to access the AsmPrinter's DwarfDebug just to get to the DWARF version by changing the DWARF version accessor on AsmPrinter so that it grabs the version from its MCStreamer's MCContext.

Differential Revision: https://reviews.llvm.org/D27032

llvm-svn: 287839
2016-11-23 23:30:37 +00:00
Eugene Zelenko 570e39a25c [DebugInfo] Fix some Clang-tidy modernize-use-default and Include What You Use warnings; other minor fixes (NFC).
Per Zachary Turner and Mehdi Amini suggestion to make only post-commit reviews.

llvm-svn: 287838
2016-11-23 23:16:32 +00:00
Simon Pilgrim 3ce6a545c7 [X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of upper 64-bits of xmm result
We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq

llvm-svn: 287835
2016-11-23 22:35:06 +00:00
Nicolai Haehnle 934470f536 [SelectionDAG] Early-out in TargetLowering::expandMUL (NFC)
Summary: Reduce indentation level; preparation for D24956.

Reviewers: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27063

llvm-svn: 287831
2016-11-23 22:14:20 +00:00
Matt Arsenault a24d84beb9 AMDGPU: Cleanup immediate folding code
Move code down to use, reorder to avoid hard to follow
immediate folding logic.

llvm-svn: 287818
2016-11-23 21:51:07 +00:00
Matt Arsenault 391c3ea9bc AMDGPU: Fix debug printing
The uint8_t was printed as a char which didn't really work.

llvm-svn: 287817
2016-11-23 21:51:05 +00:00
Matt Arsenault 997a9abf4c AMDGPU: Fix not setting kill flag on temp reg when spilling
llvm-svn: 287808
2016-11-23 21:00:12 +00:00
Matt Arsenault dd0cb2a3e5 AMDGPU: Fix adding extra implicit def of register
In the scalar case, there's no reason to add an additional
def of the same register.

llvm-svn: 287807
2016-11-23 21:00:10 +00:00
Matt Arsenault 2669a76f01 AMDGPU: Fix MMO when splitting spill
The size and offset were wrong. The size of the object was
being used for the size of the access, when here it is really
being split into 4-byte accesses. The underlying object size
is set in the MachinePointerInfo, which also didn't have the
offset set.

llvm-svn: 287806
2016-11-23 20:52:53 +00:00
Haicheng Wu 731b04ca43 [LoopUnroll] Move code to exit early. NFC.
Just to save some compilation time.

Differential Revision: https://reviews.llvm.org/D26784

llvm-svn: 287800
2016-11-23 19:39:26 +00:00
Daniel Berlin 4056253c4d Revert "[Triple] Add Facebook vendor"
This reverts commit r287684

Objections on the review thread had not been addressed to
prior to commit.  I asked the committer to revert, but i expect they
are gone for the US holiday or something.

llvm-svn: 287798
2016-11-23 19:03:54 +00:00
Michael Kuperstein 47eb85a003 [X86] Allow folding of stack reloads when loading a subreg of the spilled reg
We did not support subregs in InlineSpiller:foldMemoryOperand() because targets
may not deal with them correctly.

This adds a target hook to let the spiller know that a target can handle
subregs, and actually enables it for x86 for the case of stack slot reloads.
This fixes PR30832.

Differential Revision: https://reviews.llvm.org/D26521

llvm-svn: 287792
2016-11-23 18:33:49 +00:00
Chandler Carruth dab4eae274 [PM] Change the static object whose address is used to uniquely identify
analyses to have a common type which is enforced rather than using
a char object and a `void *` type when used as an identifier.

This has a number of advantages. First, it at least helps some of the
confusion raised in Justin Lebar's code review of why `void *` was being
used everywhere by having a stronger type that connects to documentation
about this.

However, perhaps more importantly, it addresses a serious issue where
the alignment of these pointer-like identifiers was unknown. This made
it hard to use them in pointer-like data structures. We were already
dodging this in dangerous ways to create the "all analyses" entry. In
a subsequent patch I attempted to use these with TinyPtrVector and
things fell apart in a very bad way.

And it isn't just a compile time or type system issue. Worse than that,
the actual alignment of these pointer-like opaque identifiers wasn't
guaranteed to be a useful alignment as they were just characters.

This change introduces a type to use as the "key" object whose address
forms the opaque identifier. This both forces the objects to have proper
alignment, and provides type checking that we get it right everywhere.
It also makes the types somewhat less mysterious than `void *`.

We could go one step further and introduce a truly opaque pointer-like
type to return from the `ID()` static function rather than returning
`AnalysisKey *`, but that didn't seem to be a clear win so this is just
the initial change to get to a reliably typed and aligned object serving
is a key for all the analyses.

Thanks to Richard Smith and Justin Lebar for helping pick plausible
names and avoid making this refactoring many times. =] And thanks to
Sean for the super fast review!

While here, I've tried to move away from the "PassID" nomenclature
entirely as it wasn't really helping and is overloaded with old pass
manager constructs. Now we have IDs for analyses, and key objects whose
address can be used as IDs. Where possible and clear I've shortened this
to just "ID". In a few places I kept "AnalysisID" to make it clear what
was being identified.

Differential Revision: https://reviews.llvm.org/D27031

llvm-svn: 287783
2016-11-23 17:53:26 +00:00
Alina Sbirlea a3d2f703a5 [LoadStoreVectorizer] Enable vectorization of stores in the presence of an aliasing load
Summary:
The "getVectorizablePrefix" method would give up if it found an aliasing load for a store chain.
In practice, the aliasing load can be treated as a memory barrier and all stores that precede it
are a valid vectorizable prefix.
Issue found by volkan in D26962. Testcase is a pruned version of the one in the original patch.

Reviewers: jlebar, arsenm, tstellarAMD

Subscribers: mzolotukhin, wdng, nhaehnle, anna, volkan, llvm-commits

Differential Revision: https://reviews.llvm.org/D27008

llvm-svn: 287781
2016-11-23 17:43:15 +00:00
Nirav Dave cf34556330 [DAG] Improve loads-from-store forwarding to handle TokenFactor
Forward store values to matching loads down through token
factors. Factored from D14834.

Reviewers: jyknight, hfinkel

Subscribers: hfinkel, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D26080

llvm-svn: 287773
2016-11-23 16:48:35 +00:00
John Brawn 150addb45c [DAGCombiner] Fix infinite loop in vector mul/shl combining
We have the following DAGCombiner transformations:
 (mul (shl X, c1), c2) -> (mul X, c2 << c1)
 (mul (shl X, C), Y) -> (shl (mul X, Y), C)
 (shl (mul x, c1), c2) -> (mul x, c1 << c2)
Usually the constant shift is optimised by SelectionDAG::getNode when it is
constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing
with vectors and one of those vector constants contains an undef element
FoldConstantArithmetic does not fold and we enter an infinite loop.

Fix this by making FoldConstantArithmetic use getNode to decide how to fold each
vector element, the same as FoldConstantVectorArithmetic does, and rather than
adding the constant shift to the work list instead only apply the transformation
if it's already been folded into a constant, as if it's not we're going to loop
endlessly. Additionally add missing NoOpaques to one of those transformations,
which I noticed when writing the tests for this.

Differential Revision: https://reviews.llvm.org/D26605

llvm-svn: 287766
2016-11-23 16:05:51 +00:00
Nemanja Ivanovic 10fc3cfc63 [PowerPC] Remove InstAlias definitions that cause incorrect assembly
In rL283190, I added some InstAlias definitions to generate extended mnemonics
for some uses of the XXPERMDI instruction. However, when the assembler matches
these extended mnemonics, it matches the new instruction in situations where it
should match the old one.
This patch removes these definitions and accomplishes that by defining these
mnemonics with additional instructions that are isCodeGenOnly.

Fixes PR31127.

llvm-svn: 287765
2016-11-23 15:51:52 +00:00
Simon Pilgrim 4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Elena Demikhovsky 09375d98b8 Type legalization for compressstore and expandload intrinsics.
Implemented widening (v2f32) and splitting (v16f64).
On splitting, I use "popcnt" to calculate memory increment. 
More type legalization work will come in the next patches.

llvm-svn: 287761
2016-11-23 13:58:24 +00:00
Simon Pilgrim 03cd8f887c [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
llvm-svn: 287760
2016-11-23 13:42:09 +00:00
Benjamin Kramer 8a3c49897f [MD5] Use write32le instead of spelling it out with shifts.
No functionality change intended.

llvm-svn: 287757
2016-11-23 11:49:28 +00:00
Craig Topper f57e17def0 [AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles.
llvm-svn: 287744
2016-11-23 06:54:55 +00:00
Zvi Rackover 14aba43ea9 [X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT's
Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors.

Reviewers: RKSimon, craig.topper, delena, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26985

llvm-svn: 287743
2016-11-23 06:45:25 +00:00
Rui Ueyama c464fadc64 Fix builbots.
llvm-svn: 287735
2016-11-23 03:58:12 +00:00
Kuba Mracek 06995e866b [xray] Add XRay support for Mach-O in CodeGen
Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support.

Differential Revision: https://reviews.llvm.org/D26983

llvm-svn: 287734
2016-11-23 02:07:04 +00:00
Rui Ueyama 877c26c844 Add convenient functions to compute hashes of byte vectors.
In many sitautions, you just want to compute a hash for one chunk
of data. This patch adds convenient functions for that purpose.

Differential Revision: https://reviews.llvm.org/D26988

llvm-svn: 287726
2016-11-23 00:46:09 +00:00
Justin Lebar 6c0f25aec6 [StructurizeCFG] Refactor OrderNodes.
Summary:
No need to copy the RPOT vector before using it.  Switch from std::map
to SmallDenseMap.  Get rid of an unused variable (TempVisited).  Get rid
of a typedef, RNVector, which is now used only once.

Differential Revision: https://reviews.llvm.org/D26997

llvm-svn: 287721
2016-11-22 23:14:11 +00:00
Justin Lebar 23aaf60277 [StructurizeCFG] Add whitespace in getAnalysisUsage.
Summary:
"addRequired" and "addPreserved" look very similar when squished up next
to each other -- without the newline this code looked to me like it was
addRequired'ing DominatorTreeWrapperPass twice.

Reviewers: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26996

llvm-svn: 287720
2016-11-22 23:14:07 +00:00
Justin Lebar 820db74c1e [StructurizeCFG] Remove unnecessary "using" in class.
Reviewers: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26995

llvm-svn: 287719
2016-11-22 23:13:49 +00:00
Justin Lebar 73c4baf3a3 [StructurizeCFG] Merge the two constructors into one.
Reviewers: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26994

llvm-svn: 287718
2016-11-22 23:13:44 +00:00
Justin Lebar 1b60d70025 [StructurizeCFG] Use a for-each loop instead of iterators in runOnRegion.
Summary:

Reviewers: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26993

llvm-svn: 287717
2016-11-22 23:13:37 +00:00
Justin Lebar c7445d5731 [StructurizeCFG] Make hasOnlyUniformBranches a non-member function.
Summary: Lets us get rid of one member variable too.

Reviewers: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26992

llvm-svn: 287716
2016-11-22 23:13:33 +00:00
Sanjay Patel 1e6ca44a8e add and use isBitwiseLogicOp() helper function; NFCI
llvm-svn: 287712
2016-11-22 22:54:36 +00:00
Dehao Chen 554f500ae2 Before sample pgo annotation, do not inline a function that has no debug info. (NFC)
If there is no debug info in the callee, inlining it will not help annotator. This avoids infinite loop as reported in PR/31119.

llvm-svn: 287710
2016-11-22 22:50:01 +00:00
Davide Italiano e7ffae9dea [SCCP] Remove code in visitBinaryOperator (and add tests).
We visit and/or, we try to derive a lattice value for the
instruction even if one of the operands is overdefined.
If the non-overdefined value is still 'unknown' just return and wait
for ResolvedUndefsIn to "plug in" the correct value. This simplifies
the logic a bit. While I'm here add tests for missing cases.

llvm-svn: 287709
2016-11-22 22:11:25 +00:00
Matthias Braun 7f423442d1 TargetSubtargetInfo: Move implementation to lib/CodeGen; NFC
TargetSubtargetInfo is filled with CodeGen specific interfaces nowadays
(getInstrInfo(), getFrameLowering(), getSelectionDAGInfo()) most of the
tuning flags like enablePostRAScheduler(), getAntiDepBreakMode(),
enableRALocalReassignment(), ... also do not seem to be universal enough
to make sense outside of CodeGen.

Differential Revision: https://reviews.llvm.org/D26948

llvm-svn: 287708
2016-11-22 22:09:03 +00:00
Sanjay Patel e359eaaf70 [InstCombine] change bitwise logic type to eliminate bitcasts
In PR27925:
https://llvm.org/bugs/show_bug.cgi?id=27925

...we proposed adding this fold to eliminate a bitcast. In D20774, there was 
some concern about changing the type of a bitwise op as well as creating 
bitcasts that might not be free for a target. However, if we're strictly 
eliminating an instruction (by limiting this to one-use ops), then we should 
be able to do this in InstCombine.

But we're cautiously restricting the transform for now to vector types to
avoid possible backend problems. A transform to make sure the logic op is
legal for the target should be added to reverse this transform and improve
codegen.

Differential Revision: https://reviews.llvm.org/D26641

llvm-svn: 287707
2016-11-22 22:05:48 +00:00
Chandler Carruth 9eb857cb84 [LCG] Add a previously missing assert about the relationship of RefSCCs.
No intended change, everything seems to be in working order already.

llvm-svn: 287705
2016-11-22 21:40:10 +00:00
Vyacheslav Klochkov 9a630dfb57 Fixed the lost FastMathFlags in GVN(Global Value Numbering).
Reviewer: Hal Finkel.
Differential Revision: https://reviews.llvm.org/D26952

llvm-svn: 287700
2016-11-22 20:52:53 +00:00
Rui Ueyama 2b4ba04d57 Remove PDBFileBuilder::build() and related functions.
PDBFileBuilder supports two different ways to create files.
One is PDBFileBuilder::commit. That function takes a filename
and write a result to the file. The other is PDBFileBuilder::build.
That returns a new PDBFile object.

This patch removes the latter because no one is using it and
in a real life situation we are very unlikely to need it.
Even if you need it, it'd be easy to write a new PDB to a memory
buffer and read it back.

Removing PDBFileBuilder::build enables us to remove other classes
build transitively.

Differential Revision: https://reviews.llvm.org/D26987

llvm-svn: 287697
2016-11-22 20:32:22 +00:00
Vyacheslav Klochkov 68a677ae5b Fixed the lost FastMathFlags in Reassociate optimization.
Reviewer: Hal Finkel.
Differential Revision: https://reviews.llvm.org/D26957

llvm-svn: 287695
2016-11-22 20:23:04 +00:00
Paul Robinson f428c9b298 Restructure DwarfDebug::beginInstruction(). [NFC]
Will help a pending patch.

Differential Revision: http://reviews.llvm.org/D26982

llvm-svn: 287686
2016-11-22 19:46:51 +00:00
Shoaib Meenai 5497121613 [Triple] Add Facebook vendor
Add a compiler vendor for Facebook, to enable future vendor-specific
behavior.

Differential Revision: https://reviews.llvm.org/D25136

llvm-svn: 287684
2016-11-22 19:36:26 +00:00
Chandler Carruth bae595b742 [LCG] Add utilities to compute parent and ascestor relationships between
SCCs.

These will be fairly expensive routines to call and might be abused in
real code, but are quite useful when debugging or in asserts and are
reasonable and well formed properties to query.

I've used one of them in an assert that was requested in a code review
here. In subsequent commits I'll start using these routines more
heavily, for example in unittests etc. But this at least gets the
groundwork in place.

Differential Revision: https://reviews.llvm.org/D25506

llvm-svn: 287682
2016-11-22 19:23:31 +00:00
Simon Dardis 6efb8dd2e3 [mips] seb, seh instruction aliases
Add the single operand form.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D26961

llvm-svn: 287681
2016-11-22 19:17:23 +00:00
Nemanja Ivanovic b8e30d6db6 [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE
This patch corresponds to review:
https://reviews.llvm.org/D26861

It also fixes PR30730.

Committing on behalf of Lei Huang.

llvm-svn: 287679
2016-11-22 19:02:07 +00:00
Simon Pilgrim 4aa876ca7c [X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.
This occurs during UINT_TO_FP v2f64 lowering. 

We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle

llvm-svn: 287676
2016-11-22 17:50:06 +00:00
Vasileios Kalintiris 04dc211e6a [mips] Add support for unaligned load/store macros.
Add missing unaligned store macros (ush/usw) and fix the exisiting
implementation of the unaligned load macros in order to generate
identical expansions with the GNU assembler.

llvm-svn: 287646
2016-11-22 16:43:49 +00:00
Tim Northover b64fb453ea CodeGen: simplify TargetMachine::getSymbol interface. NFC.
No-one actually had a mangler handy when calling this function, and
getSymbol itself went most of the way towards getting its own mangler
(with a local TLOF variable) so forcing all callers to supply one was
just extra complication.

llvm-svn: 287645
2016-11-22 16:17:20 +00:00
Zvi Rackover 9a355219d1 [X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC.
llvm-svn: 287644
2016-11-22 15:33:28 +00:00
Zvi Rackover 0aa1c32d14 [X86] Remove dead code from LowerVectorBroadcast
Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish.

Reviewers: craig.topper, delena, RKSimon, andreadb

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D26678

llvm-svn: 287643
2016-11-22 15:17:52 +00:00
Chad Rosier ecc77273a0 [AArch64] Set the max interleave factor for Falkor.
llvm-svn: 287642
2016-11-22 14:25:02 +00:00
Chad Rosier 2abc29c593 [AArch64] Maximize 80-column. NFC.
llvm-svn: 287640
2016-11-22 14:12:09 +00:00
Simon Pilgrim 72e43570b7 [SelectionDAG] ComputeNumSignBits of TRUNCATE operations
Add basic ComputeNumSignBits support for TRUNCATE ops for cases where the source's number of sign bits overlaps with the truncated size.

Improves X86 SIGN_EXTEND_IN_REG vector cases which were needlessly sign extending boolean vector results.

Differential Revision: https://reviews.llvm.org/D26851

llvm-svn: 287635
2016-11-22 11:29:19 +00:00
Coby Tayree 49b3733d57 [AVX512][inline-asm] Fix AVX512 inline assembly instruction resolution when the size qualifier of a memory operand is not specified explicitly.
This commit handles cases where the size qualifier of an indirect memory reference operand in Intel syntax is missing (e.g. "vaddps xmm1, xmm2, [a]").

GCC will deduce the size qualifier for AVX512 vector and broadcast memory operands based on the possible matches:
"vaddps xmm1, xmm2, [a]" matches only “XMMWORD PTR” qualifier.
"vaddps xmm1, xmm2, [a]{1to4}" matches only “DWORD PTR” qualifier.

This is different from the current behavior of LLVM, which deduces the size qualifier based on the size of the memory operand.
For "vaddps xmm1, xmm2, [a]"
"char a;" will imply "BYTE PTR" qualifier
"short a;" will imply "WORD PTR" qualifier.

This commit aligns LLVM to GCC’s behavior.

This is the LLVM part of the review.
The Clang part of the review: https://reviews.llvm.org/D26587

Differential Revision: https://reviews.llvm.org/D26586

llvm-svn: 287630
2016-11-22 09:30:29 +00:00
Adam Nemet de33651bd9 Rename option to -lto-pass-remarks-output
The new option -pass-remarks-output broke LLVM_LINK_LLVM_DYLIB because
of the duplicate option name with opt.

llvm-svn: 287627
2016-11-22 07:35:14 +00:00
Craig Topper 3dcf45f08d [X86] Remove alternate CodeGenOnly version of (v)movq that declared the load size as i128mem. Change all uses to the use the i64mem version.
I'm sure this caused the load size to misprint in Intel syntax output. We were also inconsistent about which patterns used which instruction between VEX and EVEX.

There are two different reg/reg versions of movq, one from a GPR and one from the lower 64-bits of an XMM register. This changes the loading folding table to use the single i64mem memory form for folding both cases. But we need to use TB_NO_REVERSE to prevent a duplicate entry in the unfolding table.

llvm-svn: 287622
2016-11-22 05:31:43 +00:00
Craig Topper cada9f2275 [AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD).
Summary:
The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities.

We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too.

Reviewers: igorb, delena, Ayal, Farhana, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25652

llvm-svn: 287621
2016-11-22 04:57:34 +00:00
Saleem Abdulrasool 9b106ea072 MC: ensure that we have a section before accessing it
We would attempt to access the symbol section without ensuring that the symbol
was not absolute.  When the assembler referenced relocation is not evaluated to
the absolute, but when we record the relocation, we would query the section.
Because the symbol is absolute, it does not have a section associated with it,
triggering an assertion.  Just be more careful about the access of the section.

Addresses PR31064!

llvm-svn: 287619
2016-11-22 04:32:54 +00:00
Craig Topper da22267055 [AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type
Summary:
Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking.

This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018.

I plan to add support for more operations in future patches.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26902

llvm-svn: 287612
2016-11-22 03:51:53 +00:00
Peter Collingbourne 435890a4fe Object: Make SymbolicFile::symbol_{begin,end}() virtual and remove unnecessary wrappers.
llvm-svn: 287611
2016-11-22 03:38:40 +00:00
Stanislav Mekhanoshin ae0f6620e4 [AMDGPU] Fix multiple vreg definitions in si-lower-control-flow
Differential Revision: https://reviews.llvm.org/D26939

llvm-svn: 287608
2016-11-22 01:42:34 +00:00
Peter Collingbourne 0a4fc46321 Analysis: gep inbounds (gep inbounds (...)) is inbounds.
Differential Revision: https://reviews.llvm.org/D26441

llvm-svn: 287604
2016-11-22 01:03:40 +00:00
Matt Arsenault b30d2aca58 DAG: Ignore call site attributes when emitting target intrinsic
A target intrinsic may be defined as possibly reading memory,
but the call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic
assumption of the intrinsic definition, so the chain should
still be used.

llvm-svn: 287593
2016-11-21 22:56:42 +00:00
Geoff Berry e0bf52f394 [AArch64LoadStoreOptimizer] Don't treat write to XZR/WZR as a clobber.
Summary:
When searching for load/store instructions to pair/merge don't treat
writes to WZR/XZR as clobbers since they don't change the value read
from WZR/XZR (which is always 0).

Reviewers: mcrosier, junbuml, jmolloy, t.p.northover

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D26921

llvm-svn: 287592
2016-11-21 22:51:10 +00:00
Justin Lebar 3e50a5be8f [CodeGenPrepare] Don't sink non-cheap addrspacecasts.
Summary:
Previously, CGP would unconditionally sink addrspacecast instructions,
even going so far as to sink them into a loop.

Now we check that the cast is "cheap", as defined by TLI.

We introduce a new "is-cheap" function to TLI rather than using
isNopAddrSpaceCast because some GPU platforms want the ability to ask
for non-nop casts to be sunk.

Reviewers: arsenm, tra

Subscribers: jholewinski, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26923

llvm-svn: 287591
2016-11-21 22:49:15 +00:00
Justin Lebar 838c7f5a85 [CodeGenPrepare] Rewrite a loop in terms of llvm::none_of. NFC.
Reviewers: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26924

llvm-svn: 287590
2016-11-21 22:49:11 +00:00
Eli Friedman c0bba1a96d [LoopReroll] Make root-finding more aggressive.
Allow using an instruction other than a mul or phi as the base for
root-finding. For example, the included testcase includes a loop
which requires using a getelementptr as the base for root-finding.

Differential Revision: https://reviews.llvm.org/D26529

llvm-svn: 287588
2016-11-21 22:35:34 +00:00
Sanjay Patel 3b0bafee63 [InstCombine] canonicalize min/max constant to select's false value
This is a first step towards canonicalization and improved folding/codegen
for integer min/max as discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html

Here, we're just matching the simplest min/max patterns and adjusting the
icmp predicate while swapping the select operands.

I've included FIXME tests in test/Transforms/InstCombine/select_meta.ll
so it's easier to see how this might be extended (corresponds to the TODO
comment in the code). That's also why I'm using matchSelectPattern()
rather than a simpler check; once the backend is patched, we can just 
remove some of the restrictions to allow the obfuscated min/max patterns
in the FIXME tests to be matched.

Differential Revision: https://reviews.llvm.org/D26525

llvm-svn: 287585
2016-11-21 22:04:14 +00:00
Evgeny Stupachenko 8efbe6acae LSR debug fix.
Summary:
Dump instruction instead of address.
Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D26877

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 287584
2016-11-21 21:55:03 +00:00
Sanjay Patel c89911ba02 fix formatting; NFC
llvm-svn: 287582
2016-11-21 21:48:36 +00:00
Reid Kleckner 01660a3d2a [asan] Make ASan compatible with linker dead stripping on Windows
Summary:
This is similar to what was done for Darwin in rL264645 /
http://reviews.llvm.org/D16737, but it uses COFF COMDATs to achive the
same result instead of relying on new custom linker features.

As on MachO, this creates one metadata global per instrumented global.
The metadata global is placed in the custom .ASAN$GL section, which the
ASan runtime will iterate over during initialization. There are no other
references to the metadata, so normal linker dead stripping would
discard it. However, the metadata is put in a COMDAT group with the
instrumented global, so that it will be discarded if and only if the
instrumented global is discarded.

I didn't update the ASan ABI version check since this doesn't affect
non-Windows platforms, and the WinASan ABI isn't really stable yet.

Implementing this for ELF will require extending LLVM IR and MC a bit so
that we can use non-COMDAT section groups.

Reviewers: pcc, kcc, mehdi_amini, kubabrecka

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26770

llvm-svn: 287576
2016-11-21 20:40:37 +00:00
Simon Dardis 43115a1ce4 [mips] seq macro support
This patch adds the seq macro.

This partially resolves PR/30381.

Thanks to Sean Bruno for reporting the issue!

Reviewers: zoran.jovanovic, vkalintiris, seanbruno

Differential Revision: https://reviews.llvm.org/D24607

llvm-svn: 287573
2016-11-21 20:30:41 +00:00
Krzysztof Parzyszek 73c8a9bc2f Check proper live range in extendPHIRanges
The function extendPHIRanges checks the main range of the original live
interval, even when dealing with a subrange. This could also lead to an
assert when the subrange is not live at the extension point, but the
main range is. To avoid this, check the corresponding subrange of the
original live range, instead of always checking the main range.

Review (as a part of a bigger set of changes):
https://reviews.llvm.org/D26359

llvm-svn: 287571
2016-11-21 20:24:12 +00:00
Marcin Koscielnicki 6af8e6c3d5 [TLI] Fix breakage introduced by D21739.
The initialize function has an early return for AMDGPU targets.  If taken,
the ShouldExtI32* initialization code will not be executed, resulting in
invalid values in the corresponding fields.  Fix this by moving the code
to the top of the function.

llvm-svn: 287570
2016-11-21 20:20:39 +00:00
Shoaib Meenai 106e05a0e8 [AsmPrinter] Enable codeview for windows-itanium
Enable codeview emission for windows-itanium targets. Co-opt an existing
test (which is derived from a C source file and should therefore be
identical across the Itanium and MS ABIs).

Differential Revision: https://reviews.llvm.org/D26693

llvm-svn: 287567
2016-11-21 20:13:32 +00:00
Mandeep Singh Grang 73f0095d71 [MemorySSA] Fix for non-determinism in codegen
This patch fixes the non-determinism caused due to iterating SmallPtrSet's
which was uncovered due to the experimental "reverse iteration order " patch:
https://reviews.llvm.org/D26718

The following unit tests failed because of the undefined order of iteration.
LLVM :: Transforms/Util/MemorySSA/cyclicphi.ll
LLVM :: Transforms/Util/MemorySSA/many-dom-backedge.ll
LLVM :: Transforms/Util/MemorySSA/many-doms.ll
LLVM :: Transforms/Util/MemorySSA/phi-translation.ll

Reviewers: dberlin, mgrang

Subscribers: dberlin, llvm-commits, david2050

Differential Revision: https://reviews.llvm.org/D26704

llvm-svn: 287563
2016-11-21 19:33:02 +00:00
Simon Pilgrim 5662074ba3 [VectorLegalizer] Remove EVT::getSizeInBits code duplications. NFCI.
We were calling SVT.getSizeInBits() several times in a row - just call it once and reuse the result.

llvm-svn: 287556
2016-11-21 18:24:44 +00:00
Jun Bum Lim 82f55c5446 [CodeGenPrep] Skip merging empty case blocks
Summary: Merging an empty case block into the header block of switch could cause
ISel to add COPY instructions in the header of switch, instead of the case
block, if the case block is used as an incoming block of a PHI. This could
potentially increase dynamic instructions, especially when the switch is in a
loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl

Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 287553
2016-11-21 16:47:28 +00:00
Coby Tayree 94ddbb4a04 small fixup which enables the issuing of the aforementioned instruction (w/o operands), on MS/Intel syntax.
Differential Revision: https://reviews.llvm.org/D26913

llvm-svn: 287548
2016-11-21 15:50:56 +00:00
Yaxun Liu 02f75f31e0 Fix known zero bits for addrspacecast.
Currently LLVM assumes that a pointer addrspacecasted to a different addr space is equivalent to trunc or zext bitwise, which is not true. For example, in amdgcn target, when a null pointer is addrspacecasted from addr space 4 to 0, its value is changed from i64 0 to i32 -1.

This patch teaches LLVM not to assume known bits of addrspacecast instruction to its operand.

Differential Revision: https://reviews.llvm.org/D26803

llvm-svn: 287545
2016-11-21 15:42:31 +00:00
Simon Pilgrim 49d7eda968 [SelectionDAG] Add ComputeNumSignBits support for CONCAT_VECTORS opcode
llvm-svn: 287541
2016-11-21 14:36:19 +00:00
Simon Pilgrim b7bbaa669b [X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input
At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input).

This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit.

llvm-svn: 287535
2016-11-21 12:05:49 +00:00
Marcin Koscielnicki 1c2bd1e9f3 [InstrProfiling] Mark __llvm_profile_instrument_target last parameter as i32 zeroext if appropriate.
On some architectures (s390x, ppc64, sparc64, mips), C-level int is passed
as i32 signext instead of plain i32.  Likewise, unsigned int may be passed
as i32, i32 signext, or i32 zeroext depending on the platform.  Mark
__llvm_profile_instrument_target properly (its last parameter is unsigned
int).

This (together with the clang change) makes compiler-rt profile testsuite pass
on s390x.

Differential Revision: http://reviews.llvm.org/D21736

llvm-svn: 287534
2016-11-21 11:57:19 +00:00
Marcin Koscielnicki 5ae2c526db [TLI] Add functions determining if int parameters/returns should be zeroext/signext.
On some architectures (s390x, ppc64, sparc64, mips), C-level int is passed
as i32 signext instead of plain i32.  Likewise, unsigned int may be passed
as i32, i32 signext, or i32 zeroext depending on the platform.  Add this
information to TargetLibraryInfo, to be used whenever some LLVM pass
inserts a compiler-rt call to a function involving int parameters
or returns.

Differential Revision: http://reviews.llvm.org/D21739

llvm-svn: 287533
2016-11-21 11:57:11 +00:00
Michael Zuckerman 8462faeaba Fixing a small typo (A->U).
This seem to fixes PR30992.

-         HasAVX512 ? X86::VMOVAPSZ128rm_NOVLX 
+         HasAVX512 ? X86::VMOVUPSZ128rm_NOVLX 

llvm-svn: 287532
2016-11-21 11:52:11 +00:00
Craig Topper 9f2d632ee7 [AVX-512] Add EVEX form of VMOVZPQILo2PQIZrm to load folding tables to match SSE and AVX.
llvm-svn: 287523
2016-11-21 07:51:31 +00:00
Alexei Starovoitov 7ab125dbf3 [bpf] fix dwarf elf relocs and line numbers
- teach RelocVisitor to recognize bpf relocations
- fix AsmInfo->PointerSize to make sure dwarf is emitted correctly
- add a test for the above

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287521
2016-11-21 06:21:23 +00:00
Craig Topper 0dfc09372f [X86] Remove duplicate instructions for (v)movq and replace with patterns on other instructions. NFC
llvm-svn: 287519
2016-11-21 04:07:56 +00:00
Dean Michael Berris 31761f300d [XRay][AArch64] Implemented a test for the compile-time sleds emitted, and fixed a bug in the jump instruction
This patch adds a test for the assembly code emitted with XRay
instrumentation. It also fixes a bug where the operand of a jump
instruction must be not the number of bytes to jump over, but rather the
number of 4-byte instructions.

Author: rSerge

Reviewers: dberris, rengolin

Differential Revision: https://reviews.llvm.org/D26805

llvm-svn: 287516
2016-11-21 03:01:43 +00:00
Davide Italiano 2ae76dd239 [GlobalSplit] Port to the new pass manager.
llvm-svn: 287511
2016-11-21 00:28:23 +00:00
Simon Dardis 1dcb911061 [mips] Restrict tail call optimization
The tail call optimization was being used without proper consideration of
ABI requirements for saving and restoring the GP. This patch restricts tail
call optimization to functions within the same translation unit.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D24763

llvm-svn: 287505
2016-11-20 21:23:08 +00:00
Coby Tayree 99a6639047 The 'vpmultishiftqb' instruction was implemented falsely, this patch amend it.
More specifically - (MS dialect) broadcasting variants were implemented falsely.

Differential Revision: https://reviews.llvm.org/D26257

llvm-svn: 287501
2016-11-20 17:19:55 +00:00
Coby Tayree 97e9cf62f4 Some instructions were missing, other implemented falsely. this patch aims at amending those issues. full list:
vcvtps2pd
vcvtudq2pd
vcvtps2qq
vcvttps2qq
vcvtps2uqq
vcvttps2uqq

variants are:

[Dst]XMM(zero-masked/merge-masked/unmasked)
[Src]Mem64

Differential Revision: https://reviews.llvm.org/D26799

llvm-svn: 287500
2016-11-20 17:09:56 +00:00
Simon Pilgrim 5fadce4a3f [X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible
llvm-svn: 287497
2016-11-20 16:11:36 +00:00
Simon Pilgrim 5401bae523 [X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines
llvm-svn: 287496
2016-11-20 15:24:38 +00:00
Simon Pilgrim 3f40412e0f [X86][AVX512] Add support for VBMI VPERMV target shuffle combines
llvm-svn: 287495
2016-11-20 15:05:45 +00:00
Simon Pilgrim c17e1b74b8 [X86][AVX512VL] Removed duplicate operation action
Basic AVX512F already declared uint_to_fp v4i32 as legal

llvm-svn: 287493
2016-11-20 14:19:29 +00:00
Simon Pilgrim 3f10e9953d Strip trailing whitespace
llvm-svn: 287492
2016-11-20 14:05:23 +00:00
Simon Pilgrim 096b6d4f81 [X86][AVX512F] Add support for uint_to_fp v2i32 to v2f64 on AVX512F-only targets
Use 512-bit instructions (we already do something similar for uint_to_fp v4i32 to v4f64)

llvm-svn: 287491
2016-11-20 14:03:23 +00:00
Simon Pilgrim f2fbf43704 Fix comment typos. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287490
2016-11-20 13:47:59 +00:00
Simon Pilgrim 7d18a70dac Fix spelling mistakes in Transforms comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287488
2016-11-20 13:19:49 +00:00
Simon Pilgrim 7a6b6d5656 Fix spelling mistakes in SelectionDAG comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287487
2016-11-20 13:14:57 +00:00
Simon Pilgrim fbd2221de5 Fix comment typos. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287486
2016-11-20 13:10:51 +00:00
Oren Ben Simhon c0f073b67f [X86] RegCall - Handling long double arguments
The change is part of RegCall calling convention support for LLVM.
Long double (f80) requires special treatment as the first f80 parameter is saved in FP0 (floating point stack).
This review present the change and the corresponding tests.

Differential Revision: https://reviews.llvm.org/D26151

llvm-svn: 287485
2016-11-20 11:06:07 +00:00
Coby Tayree 179ff0e541 [X86][InlineAsm]Test commit.
Fixing a wrong comment on X86AsmParser.cpp::ParseZ: "true" --> "false"

Differential Revision: https://reviews.llvm.org/D26797

llvm-svn: 287484
2016-11-20 09:31:11 +00:00
Serge Pavlov f258ff1fa9 Fix file name resolution in nested response files
If a response file in construct `@file` was specified by relative name,
constructs `@file` nested within it were resolved incorrectly if the
flag RelativeNames in call to ExpandResponseFile was set to true.
This feature is used in configuration files, tests for it are in
respective change (see D24933).

llvm-svn: 287482
2016-11-20 06:25:07 +00:00
Alexei Starovoitov e6ddac0def [bpf] add BPF disassembler
add BPF disassembler, so tools like llvm-objdump can be used:
$ llvm-objdump -d -no-show-raw-insn ./sockex1_kern.o

./sockex1_kern.o:	file format ELF64-BPF

Disassembly of section socket1:
bpf_prog1:
       0:	r6 = r1
       8:	r0 = *(u8 *)skb[23]
      10:	*(u32 *)(r10 - 4) = r0
      18:	r1 = *(u32 *)(r6 + 4)
      20:	if r1 != 4 goto 8
      28:	r2 = r10
      30:	r2 += -4

ld_imm64 (the only 16-byte insn) and special ld_abs/ld_ind instructions
had to be treated in a special way. The decoders for the rest of the insns
are automatically generated.

Add tests to cover new functionality.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287477
2016-11-20 02:25:00 +00:00
Rui Ueyama e5669cecde Attempt to fix big-endian buildbots.
llvm-svn: 287476
2016-11-20 01:41:28 +00:00
Rui Ueyama 567d9c4b8f Style fix. NFC.
llvm-svn: 287475
2016-11-20 01:15:56 +00:00
Rui Ueyama 218072a989 Fix buildbot.
llvm-svn: 287474
2016-11-20 01:13:22 +00:00
Rui Ueyama fe33661ab0 SHA1: unroll loop in hashBlock.
This code is taken from public domain.
https://github.com/jsonn/src/blob/trunk/common/lib/libc/hash/sha1/sha1.c

I wrote a sha1 command and ran it on my Xeon E5-2680 v2 2.80GHz machine.
Here is a result. The new hash function is 37% faster than before.

 Performance counter stats for './llvm-sha1-old /ssd/build/bin/lld' (10 runs):

       6640.503687 task-clock (msec)         #    1.001 CPUs utilized            ( +-  0.03% )
                54 context-switches          #    0.008 K/sec                    ( +-  5.03% )
                 5 cpu-migrations            #    0.001 K/sec                    ( +- 31.73% )
           183,803 page-faults               #    0.028 M/sec                    ( +-  0.00% )
    18,527,954,113 cycles                    #    2.790 GHz                      ( +-  0.03% )
     4,993,237,485 stalled-cycles-frontend   #   26.95% frontend cycles idle     ( +-  0.11% )
   <not supported> stalled-cycles-backend
    50,217,149,423 instructions              #    2.71  insns per cycle
                                             #    0.10  stalled cycles per insn  ( +-  0.00% )
     6,094,322,337 branches                  #  917.750 M/sec                    ( +-  0.00% )
        11,778,239 branch-misses             #    0.19% of all branches          ( +-  0.01% )

       6.634017401 seconds time elapsed                                          ( +-  0.03% )

 Performance counter stats for './llvm-sha1-new /ssd/build/bin/lld' (10 runs):

       4167.062720 task-clock (msec)         #    1.001 CPUs utilized            ( +-  0.02% )
                52 context-switches          #    0.012 K/sec                    ( +- 16.45% )
                 7 cpu-migrations            #    0.002 K/sec                    ( +- 32.20% )
           183,804 page-faults               #    0.044 M/sec                    ( +-  0.00% )
    11,626,611,958 cycles                    #    2.790 GHz                      ( +-  0.02% )
     4,491,897,976 stalled-cycles-frontend   #   38.63% frontend cycles idle     ( +-  0.05% )
   <not supported> stalled-cycles-backend
    24,320,180,617 instructions              #    2.09  insns per cycle
                                             #    0.18  stalled cycles per insn  ( +-  0.00% )
     1,574,674,576 branches                  #  377.886 M/sec                    ( +-  0.00% )
        11,769,693 branch-misses             #    0.75% of all branches          ( +-  0.00% )

       4.163251552 seconds time elapsed                                          ( +-  0.02% )

Differential Revision: https://reviews.llvm.org/D26890

llvm-svn: 287473
2016-11-20 01:03:22 +00:00
Saleem Abdulrasool a577509f0a Demangle: remove references to allocator for default allocator
The demangler had stopped using a custom allocator but had not been updated to
remove the use of the explicit allocator passing.  This removes that as we do
not need to do anything special here anymore.  This just makes the code more
compact.  NFC.

llvm-svn: 287472
2016-11-20 00:20:27 +00:00
Saleem Abdulrasool 54ec3f9cf8 Demangle: remove unnecessary typedef for std::vector
We could create a local typedef for std::vector called Vector.  Inline the use
of std::vector rather than use the typedef.  NFC.

llvm-svn: 287471
2016-11-20 00:20:25 +00:00
Saleem Abdulrasool be1fd54f85 Demangle: replace custom typedef for std::string with std::string
We created a local typedef for `std::basic_string<char, std::char_traits<char>>`
which is just `std::string`.  Remove the local typedef and propagate the type
information through the rest of the demangler.  NFC.

llvm-svn: 287470
2016-11-20 00:20:23 +00:00
Saleem Abdulrasool 0da9050976 Demangle: use direct member initialization (NFC)
Prefer direct member initialization over the explicit out-of-line initialization
for the construction of the local type.  NFC.

llvm-svn: 287469
2016-11-20 00:20:20 +00:00
Benjamin Kramer ffd3715d16 Give some helper classes/functions internal linkage. NFC.
llvm-svn: 287462
2016-11-19 20:44:26 +00:00
Simon Pilgrim a14e0cb852 [X86][SSE] Improve PSHUFB lowering from either input
Canonicalization may leave the zeroable vector in the first input.

llvm-svn: 287461
2016-11-19 20:41:48 +00:00
Simon Pilgrim 623a7c57b5 [X86][AVX512] Add VPERMV/VPERMV3 v64i8 byte shuffles on avx512vbmi targets
llvm-svn: 287459
2016-11-19 20:12:34 +00:00
Mehdi Amini fec2158292 [ThinLTO] Fix crash when importing an opaque type
It seems that because ThinLTO does not import the full module,
some invariant of the type mapper are broken.

In Monolithic LTO, we import every globals: when calling
IRLinker::copyFunctionProto() on @foo(), we end-up calling
TypeMapTy::get(FTy) on the type of @foo(), which will map
%0 and record the destination as opaque.

ThinLTO skips this because @foo is not imported and goes directly
to the next stage.

Next we call computeTypeMapping() that map the types for each
globals, and ends up checking for type isomorphism, and may add
type mapping. However it doesn't record if there was an opaque
destination type that was resolved.

Instead of lazily "discovering" opaque type in the destination
module on the go, we change the TypeFinder to eagerly record all
types and not only the named ones.

Differential Revision: https://reviews.llvm.org/D26840

llvm-svn: 287453
2016-11-19 18:44:16 +00:00
Mehdi Amini 19f176b982 [ThinLTO] Implement -pass-remarks-output in ThinLTOCodeGenerator
Summary:
This will also be added to the LTO API, right now this will
bring ThinLTO on par with Monolithic LTO on Darwin.

Reviewers: anemet

Subscribers: tejohnson, llvm-commits

Differential Revision: https://reviews.llvm.org/D26886

llvm-svn: 287450
2016-11-19 18:20:05 +00:00
Mehdi Amini 6f40836823 Change setDiagnosticsOutputFile to take a unique_ptr from a raw pointer (NFC)
Summary:
This makes it explicit that ownership is taken. Also replace all `new`
with make_unique<> at call sites.

Reviewers: anemet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26884

llvm-svn: 287449
2016-11-19 18:19:41 +00:00
Craig Topper 893ea9fb2c [X86] Simplify some code a little by removing a dulicate variable and combinining two if statements. NFCI
llvm-svn: 287443
2016-11-19 17:33:17 +00:00
Daniel Sanders c95590bc45 Try again to fix unused variable warning on lld-x86_64-darwin13 after r287439.
The previous attempt didn't work. I assume LLVM_ATTRIBUTE_UNUSED isn't
available on that machine.

llvm-svn: 287442
2016-11-19 14:47:41 +00:00
Daniel Sanders c6d1986a84 Try to fix unused variable warning on lld-x86_64-darwin13 after r287439.
Whether the variable is used or not depends on NDEBUG.

llvm-svn: 287440
2016-11-19 13:50:32 +00:00
Daniel Sanders 72db2a390a Check that emitted instructions meet their predicates on all targets except ARM, Mips, and X86.
Summary:
* ARM is omitted from this patch because this check appears to expose bugs in this target.
* Mips is omitted from this patch because this check either detects bugs or deliberate
  emission of instructions that don't satisfy their predicates. One deliberate
  use is the SYNC instruction where the version with an operand is correctly
  defined as requiring MIPS32 while the version without an operand is defined
  as an alias of 'SYNC 0' and requires MIPS2.
* X86 is omitted from this patch because it doesn't use the tablegen-erated
  MCCodeEmitter infrastructure.

Patches for ARM and Mips will follow.

Depends on D25617

Reviewers: tstellarAMD, jmolloy

Subscribers: wdng, jmolloy, aemerson, rengolin, arsenm, jyknight, nemanjai, nhaehnle, tstellarAMD, llvm-commits

Differential Revision: https://reviews.llvm.org/D25618

llvm-svn: 287439
2016-11-19 13:05:44 +00:00
Dylan McKay 1a55f201ef [AVR] Remove a bunch of unused variables
llvm-svn: 287416
2016-11-19 01:33:42 +00:00
Dylan McKay 19270f3438 [AVR] Remove a variable which was unused in release mode
In release mode where assertions are not enabled, this caused an 'unused
variable' warning.

llvm-svn: 287414
2016-11-19 01:14:44 +00:00
Konstantin Zhuravlyov aefee42e0f [AMDGPU] Change frexp.exp intrinsic to return i16 for f16 input
Differential Revision: https://reviews.llvm.org/D26862

llvm-svn: 287389
2016-11-18 22:31:08 +00:00
Simon Pilgrim e40900dddd [SelectionDAG] Add knowbits support for CONCAT_VECTOR opcode
llvm-svn: 287387
2016-11-18 22:21:22 +00:00
Michael Zolotukhin 5020c9971b [LoopSimplify] Preserve LCSSA when removing edges from unreachable blocks.
This fixes PR30454.

llvm-svn: 287379
2016-11-18 21:01:12 +00:00
Mehdi Amini bf4d8d033b Revert "Add link-time detection of LLVM_ABI_BREAKING_CHECKS mismatch"
This reverts commit r287352, LLDB CI is broken.

llvm-svn: 287374
2016-11-18 20:02:34 +00:00
Matthias Braun db39fd6c53 Statistic/Timer: Include timers in PrintStatisticsJSON().
Differential Revision: https://reviews.llvm.org/D25588

llvm-svn: 287370
2016-11-18 19:43:24 +00:00
Matthias Braun 9f15a79e5d Timer: Track name and description.
The previously used "names" are rather descriptions (they use multiple
words and contain spaces), use short programming language identifier
like strings for the "names" which should be used when exporting to
machine parseable formats.

Also removed a unused TimerGroup from Hexxagon.

Differential Revision: https://reviews.llvm.org/D25583

llvm-svn: 287369
2016-11-18 19:43:18 +00:00
Geoff Berry b51774ac8c [MIRPrinter] Print raw branch probabilities as expected by MIRParser
Fixes PR28751.

Reviewers: MatzeB, qcolombet

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D26775

llvm-svn: 287368
2016-11-18 19:37:24 +00:00
Matt Arsenault afe614cb38 AMDGPU: Fix unused variable warning
llvm-svn: 287362
2016-11-18 18:33:36 +00:00
Adam Nemet e9bd022c41 [LTO] Add option to generate optimization records
It is used to drive this from the clang driver via -mllvm.

Same option name is used as in opt.

Differential Revision: https://reviews.llvm.org/D26832

llvm-svn: 287356
2016-11-18 18:06:28 +00:00
Hans Wennborg aeacdc258b IRMover: Avoid accidentally mapping types from the destination module (PR30799)
During Module linking, it's possible for SrcM->getIdentifiedStructTypes();
to return types that are actually defined in the destination module
(DstM). Depending on how the bitcode file was read,
getIdentifiedStructTypes() might do a walk over all values, including
metadata nodes, looking for types. In my case, a debug info metadata
node was shared between the two modules, and it referred to a type
defined in the destination module (see test case).

Differential Revision: https://reviews.llvm.org/D26212

llvm-svn: 287353
2016-11-18 17:33:05 +00:00
Mehdi Amini c311528516 Add link-time detection of LLVM_ABI_BREAKING_CHECKS mismatch
Summary:
LLVM will define a symbol, either EnableABIBreakingChecks or
DisableABIBreakingChecks depending on the configuration setting for
LLVM_ABI_BREAKING_CHECKS.

The llvm-config.h header will add weak references to these symbols in
every clients that includes this header. This should ensure that
a mismatch triggers a link failure (or a load time failure for DSO).

On MSVC, the pragma "detect_mismatch" is used instead.

Reviewers: rnk, jroelofs

Subscribers: llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D26841

llvm-svn: 287352
2016-11-18 17:28:10 +00:00
Ehsan Amiri 395be572f0 [PPC] limit line width to 80 characters
NFC. Forgot to fix this in the original commit.

llvm-svn: 287350
2016-11-18 16:24:27 +00:00
Simon Dardis 0e2ee3b4b9 [mips][msa] Implement f16 support
The MIPS MSA ASE provides instructions to convert to and from half precision
floating point. This patch teaches the MIPS backend to treat f16 as a legal
type and how to promote such values to f32 for the usual set of operations.

As a result of this, the fexup[lr].w intrinsics no longer crash LLVM during
type legalization.

Reviewers: zoran.jovanvoic, vkalintiris

Differential Revision: https://reviews.llvm.org/D26398

llvm-svn: 287349
2016-11-18 16:17:44 +00:00
Tom Stellard df613198c0 GlobalISel: Fix unconditional fallback with global isel abort is disabled
Reviewers: t.p.northover, ab, qcolombet

Subscribers: mehdi_amini, vkalintiris, wdng, dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D26765

llvm-svn: 287344
2016-11-18 14:14:35 +00:00
Tom Stellard 01e65d2cfc AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts
Summary:
The 32-bit instructions don't zero the high 16-bits like the 16-bit
instructions do.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26828

llvm-svn: 287342
2016-11-18 13:53:34 +00:00
Florian Hahn 77382be56b [simplifycfg][loop-simplify] Preserve loop metadata in 2 transformations.
insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now
propagates existing llvm.loop metadata to newly the added backedge.

llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp
now propagates existing llvm.loop metadata to the branch instructions in the
predecessor blocks of the empty block that is removed.

Differential Revision: https://reviews.llvm.org/D26495

llvm-svn: 287341
2016-11-18 13:12:07 +00:00
Simon Pilgrim 7938bd666e Cleanup function with clang-format. NFCI.
llvm-svn: 287340
2016-11-18 12:16:18 +00:00
Nicolai Haehnle ce2b589df5 AMDGPU: Fix legalization of MUBUF instructions in shaders
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions.  This affects
e.g.  shaders that access buffer textures.

Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention.  If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26747

llvm-svn: 287339
2016-11-18 11:55:52 +00:00
Simon Pilgrim dcd8433597 Fix spelling mistakes in MIPS target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287338
2016-11-18 11:53:36 +00:00
Ehsan Amiri ff0942e6ea [Power9] Add patterns for vnegd, vnegw
Exploit new instructions by adding patterns to .td file.
https://reviews.llvm.org/D26551

llvm-svn: 287334
2016-11-18 11:05:55 +00:00
Simon Pilgrim e995a8088d Fix spelling mistakes in AMDGPU target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287333
2016-11-18 11:04:02 +00:00
Simon Pilgrim fd8bf984f4 Fix typo in comment. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287331
2016-11-18 10:52:12 +00:00
Ehsan Amiri 85818684c6 [PPC][DAGCombine] Convert SETCC to subtract when the result is zero extended
When we see a SETCC whose only users are zero extend operations, we can replace
it with a subtraction. This results in doing all calculations in GPRs and
avoids CR use.

Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are
ways that this can be extended. For example for signed condition codes. In that
case we will be introducing additional sign extend instructions, so more careful
profitability analysis may be required.

Another direction to extend this is for equal, not equal conditions. Also when
users of SETCC are any_ext or sign_ext, we might be able to do something 
similar.

llvm-svn: 287329
2016-11-18 10:41:44 +00:00
Craig Topper 1de753f7f5 [InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics for variable shift with 16-bit elements.
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.

llvm-svn: 287316
2016-11-18 06:04:33 +00:00
Craig Topper 02b5a1b50f [AVX-512] Replace masked 16-bit element variable shift intrinsics with new unmasked versions and selects.
The same thing was done to 32-bit and 64-bit element sizes previously.

This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics.

llvm-svn: 287312
2016-11-18 05:04:44 +00:00
Matt Arsenault eff1ad8d8e AMDGPU: Move redundant setting of inst properties
llvm-svn: 287311
2016-11-18 04:42:59 +00:00
Matt Arsenault 742deb2495 AMDGPU: Fix crash on illegal type for inlineasm
There are still crashes on non-MVT types in other
places.

llvm-svn: 287310
2016-11-18 04:42:57 +00:00
Peter Collingbourne 63e10c9c96 Object: Simplify; remove unnecessary use of unique_ptr.
llvm-svn: 287305
2016-11-18 03:20:36 +00:00
Matthias Braun 637488dbf8 MachineOperand: Add dump() method
llvm-svn: 287302
2016-11-18 02:40:40 +00:00
Alexei Starovoitov 8f9f8210c1 convert bpf assembler to look like kernel verifier output
since bpf instruction set was introduced people learned to
read and understand kernel verifier output whereas llvm asm
output stayed obscure and unknown. Convert llvm to emit
assembler text similar to kernel to avoid this discrepancy

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287300
2016-11-18 02:32:35 +00:00
Craig Topper 07f1c15995 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Simon Pilgrim 6ba672e542 Fix spelling mistakes in Hexagon target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287248
2016-11-17 19:21:20 +00:00
Simon Pilgrim 9d15fb3c10 Fix spelling mistakes in X86 target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287247
2016-11-17 19:03:05 +00:00
Anna Zaks 9cd5ed1241 [asan] Turn on Mach-O global metadata liveness tracking by default
This patch turns on the metadata liveness tracking since all known issues
have been resolved. The future has been implemented in
https://reviews.llvm.org/D16737 and enables support of dead code stripping
option on Mach-O platforms.

As part of enabling the feature, I also plan on reverting the following
patch to compiler-rt:

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160704/369910.html

Differential Revision: https://reviews.llvm.org/D26772

llvm-svn: 287235
2016-11-17 16:55:40 +00:00
Konstantin Zhuravlyov 0a1a7b6b23 Revert "AMDGPU: Enable ConstrainCopy DAG mutation"
This reverts commit r287146.

This breaks few conformance tests.

llvm-svn: 287233
2016-11-17 16:41:49 +00:00
Daniil Fukalov 4c3322cc84 [SCEV] limit recursion depth of CompareSCEVComplexity
Summary:
CompareSCEVComplexity goes too deep (50+ on a quite a big unrolled loop) and runs almost infinite time.

Added cache of "equal" SCEV pairs to earlier cutoff of further estimation. Recursion depth limit was also introduced as a parameter.

Reviewers: sanjoy

Subscribers: mzolotukhin, tstellarAMD, llvm-commits

Differential Revision: https://reviews.llvm.org/D26389

llvm-svn: 287232
2016-11-17 16:07:52 +00:00
Simon Pilgrim 67ef3b984a Wdocumentation fix
llvm-svn: 287224
2016-11-17 12:21:45 +00:00
Simon Pilgrim 8eca5520dc [X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves
vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves.

If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer).

This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases).

Partial fix for PR30845

Differential Revision: https://reviews.llvm.org/D26590

llvm-svn: 287223
2016-11-17 12:14:49 +00:00
Simon Pilgrim c4d733cd6a Fix spelling in comment. NFC.
llvm-svn: 287222
2016-11-17 12:03:05 +00:00
Pablo Barrio c41e856f53 [ARM] Relax restriction on variadic functions for tailcall optimization
Summary:
Variadic functions can be treated in the same way as normal functions
with respect to the number and types of parameters.

Reviewers: grosbach, olista01, t.p.northover, rengolin

Subscribers: javed.absar, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D26748

llvm-svn: 287219
2016-11-17 10:56:58 +00:00
Oren Ben Simhon 489d6eff4f [X86] RegCall - Handling v64i1 in 32/64 bit target
Register Calling Convention defines a new behavior for v64i1 types.
This type should be saved in GPR.
However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit).

Differential Revision: https://reviews.llvm.org/D26181

llvm-svn: 287217
2016-11-17 09:59:40 +00:00
Sanjoy Das 43ccb38bb5 Delete dead code and add asserts instead; NFC
llvm-svn: 287214
2016-11-17 07:29:43 +00:00
Sanjoy Das 4a8fe09040 [ImplicitNullCheck] Fix an edge case where we were hoisting incorrectly
ImplicitNullCheck keeps track of one instruction that the memory
operation depends on that it also hoists with the memory operation.
When hoisting this dependency, it would sometimes clobber a live-in
value to the basic block we were hoisting the two things out of.  Fix
this by explicitly looking for such dependencies.

I also noticed two redundant checks on `MO.isDef()` in IsMIOperandSafe.
They're redundant since register MachineOperands are either Defs or Uses
-- there is no third kind.  I'll change the checks to asserts in a later
commit.

llvm-svn: 287213
2016-11-17 07:29:40 +00:00
Craig Topper 05b0fcd168 [X86] Fix formatting. NFC
llvm-svn: 287211
2016-11-17 05:59:55 +00:00
Dean Michael Berris 3234d3a4bd [XRay] Support AArch64 in LLVM
This patch adds XRay support in LLVM for AArch64 targets.
This patch is one of a series:

Clang: https://reviews.llvm.org/D26415
compiler-rt: https://reviews.llvm.org/D26413

Author: rSerge

Reviewers: rengolin, dberris

Subscribers: amehsan, aemerson, llvm-commits, iid_iunknown

Differential Revision: https://reviews.llvm.org/D26412

llvm-svn: 287209
2016-11-17 05:15:37 +00:00
Chris Bieneman 05c279fc4b [CMake] NFC. Updating CMake dependency specifications
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.

llvm-svn: 287206
2016-11-17 04:36:50 +00:00
Konstantin Zhuravlyov d709efb0da [AMDGPU] Custom lower f16 = fp_round f64
llvm-svn: 287203
2016-11-17 04:28:37 +00:00
Konstantin Zhuravlyov 3f0cdc7a11 [AMDGPU] Promote f16/i16 conversions to f32/i32
llvm-svn: 287201
2016-11-17 04:00:46 +00:00
Konstantin Zhuravlyov 662e01dfbe [AMDGPU] Expand `br_cc` for f16
Differential Revision: https://reviews.llvm.org/D26732

llvm-svn: 287199
2016-11-17 03:49:01 +00:00
Dehao Chen 41d72a8632 Use profile info to adjust loop unroll threshold.
Summary:
For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.

Reviewers: davidxl, mzolotukhin

Subscribers: sanjoy, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D26527

llvm-svn: 287186
2016-11-17 01:17:02 +00:00
Peter Collingbourne f72a8d4e08 Introduce GlobalSplit pass.
This pass splits globals into elements using inrange annotations on
getelementptr indices.

Differential Revision: https://reviews.llvm.org/D22295

llvm-svn: 287178
2016-11-16 23:40:26 +00:00
Dylan McKay 017a55b092 [AVR] Wrap all methods in the pseudo expansion pass in an anon namespace
The '-fpermissive' compiler flag complains if the template
specializations used in the class are used in a different namespace.

llvm-svn: 287176
2016-11-16 23:06:14 +00:00
Dylan McKay 5810c7ee6e [AVR] Remove unused method from AVRTargetMachine
llvm-svn: 287173
2016-11-16 22:48:30 +00:00
Sanjay Patel 066139a3ec [x86] allow FP-logic ops when one operand is FP and result is FP
We save an inter-register file move this way. If there's any CPU where
the FP logic is slower, we could transform this back to int-logic in 
MachineCombiner.

This helps, but doesn't solve, PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

The 'andn' test shows that we're missing a pattern match to
recognize the xor with -1 constant as a 'not' op.

llvm-svn: 287171
2016-11-16 22:34:05 +00:00
Ahmed Bougacha f33f91af24 [AsmParser] Avoid recursing when lexing ';'. NFC.
This should prevent stack overflows in non-optimized builds on
.ll files with lots of consecutive commented-out lines.

Instead of recursing into LexToken(), continue into a 'while (true)'.

llvm-svn: 287170
2016-11-16 22:25:05 +00:00
Ahmed Bougacha bd6ce9a247 [CodeGen] Pass references, not pointers, to MMI helpers. NFC.
While there, rename them to follow the coding style.

llvm-svn: 287169
2016-11-16 22:25:03 +00:00
Ahmed Bougacha 996961a461 Revert "Get GlobalISel to build on Linux after r286407"
This reverts commit r286962.

We want to avoid depending on SelectionDAG, and AddLandingPadInfo
lives in CodeGen now.

llvm-svn: 287168
2016-11-16 22:24:59 +00:00
Ahmed Bougacha 456dce8a84 [CodeGen] Pull MMI helpers from FunctionLoweringInfo to MMI. NFC.
They're not SelectionDAG- or FunctionLoweringInfo-specific.  They
are, however, specific to building MMI from IR.
We could make them members, but it's nice having MMI be a "simple" data
structure and this logic kept separate.

This also lets us reuse them from GlobalISel.

llvm-svn: 287167
2016-11-16 22:24:56 +00:00
Ahmed Bougacha 2b4c127531 [CodeGen] Cleanup MachineModuleInfo doxygen comments. NFC.
Remove redundant names and only keep header comments.

llvm-svn: 287166
2016-11-16 22:24:53 +00:00
Dylan McKay a789f40002 [AVR] Add the pseudo instruction expansion pass
Summary:
A lot of the pseudo instructions are required because LLVM assumes that
all integers of the same size as the pointer size are legal. This means
that it will not currently expand 16-bit instructions to their 8-bit
variants because it thinks 16-bit types are legal for the operations.

This also adds all of the CodeGen tests that required the pass to run.

Reviewers: arsenm, kparzysz

Subscribers: wdng, mgorny, modocache, llvm-commits

Differential Revision: https://reviews.llvm.org/D26577

llvm-svn: 287162
2016-11-16 21:58:04 +00:00
Peter Collingbourne 7d0c869b86 X86: Simplify X86ISD::Wrapper operand checks. NFCI.
We only ever create TargetConstantPool, TargetJumpTable, TargetExternalSymbol,
TargetGlobalAddress, TargetGlobalTLSAddress, MCSymbol and TargetBlockAddress
nodes as operands of X86ISD::Wrapper nodes, so we can remove one check and
invert the other.

Also update the documentation comment for X86ISD::Wrapper.

Differential Revision: https://reviews.llvm.org/D26731

llvm-svn: 287160
2016-11-16 21:48:59 +00:00
Sanjoy Das df4b162e4d [ImplicitNullChecks] Do not not handle call MachineInstrs
We don't track callee clobbered registers correctly, so avoid hoisting
across calls.

Note: for this bug to trigger we need a `readonly` call target, since we
already have logic to not hoist across potentially storing instructions
either.

llvm-svn: 287159
2016-11-16 21:45:22 +00:00
Peter Collingbourne 7a74803abf Bitcode: Introduce initial multi-module reader API.
Implement getLazyBitcodeModule() and parseBitcodeFile() in terms of it.

Differential Revision: https://reviews.llvm.org/D26719

llvm-svn: 287156
2016-11-16 21:44:45 +00:00
Tim Northover 397f9d9d05 ARM: fix CodeGen for 64-bit shifts.
One half of the shifts obviously needed conditional selection based on whether
the shift amount is more than 32-bits, but leaving the other half as the
natural shift isn't acceptable either: it's undefined behaviour to shift a
32-bit value by more than 31.

llvm-svn: 287149
2016-11-16 20:54:28 +00:00
Rong Xu 66827427e1 Make block placement deterministic
We fail to produce bit-to-bit matching stage2 and stage3 compiler in PGO
bootstrap build. The reason is because LoopBlockSet is of SmallPtrSet type
whose iterating order depends on the pointer value.

This patch fixes this issue by changing to use SmallSetVector.

Differential Revision: http://reviews.llvm.org/D26634

llvm-svn: 287148
2016-11-16 20:50:06 +00:00
Sanjay Patel 80baf69cb5 [InstCombine] replace unreachable with assert and remove unreachable code; NFCI
llvm-svn: 287147
2016-11-16 20:40:02 +00:00
Matt Arsenault 3b36bb1d87 AMDGPU: Enable ConstrainCopy DAG mutation
This fixes a probably unintended divergence from the default
scheduler behavior.

llvm-svn: 287146
2016-11-16 20:35:23 +00:00
Sanjay Patel 1b9560ffd6 [InstCombine] fix formatting and add FIXMEs to foldOperationIntoSelectOperand(); NFC
llvm-svn: 287145
2016-11-16 20:18:34 +00:00
Geoff Berry 8301c645c8 [AArch64] Handle vector types in replaceZeroVectorStore.
Summary:
Extend replaceZeroVectorStore to handle more vector type stores,
floating point zero vectors and set alignment more accurately on split
stores.

This is a follow-up change to r286875.

This change fixes PR31038.

Reviewers: MatzeB

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D26682

llvm-svn: 287142
2016-11-16 19:35:19 +00:00
Mandeep Singh Grang 000ce9a686 [LoopVectorize] Fix for non-determinism in codegen
Summary: This patch fixes issues in codegen uncovered due to https://reviews.llvm.org/D26718

Reviewers: mssimpso

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D26727

llvm-svn: 287135
2016-11-16 18:53:17 +00:00
Tom Stellard 0d162b1c4f AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass
Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
   instructions.

The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.

This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23408

llvm-svn: 287131
2016-11-16 18:42:17 +00:00
Sanjay Patel 4ce99d4d24 fix comment formatting; NFC
llvm-svn: 287127
2016-11-16 18:09:44 +00:00
Sanjay Patel 7f3d51f840 [x86] add fake scalar FP logic instructions to ReplaceableInstrs to save some bytes
We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. 
Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of 
compilers, but logically equivalent int, float, and double variants of bitwise-logic 
instructions are reality in x86, and the float variant may be a shorter instruction 
depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all 
the time.

This is a preliminary step towards solving PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

Differential Revision:
https://reviews.llvm.org/D26712

llvm-svn: 287122
2016-11-16 17:42:40 +00:00
Reid Kleckner 3a83e76811 [sancov] Name the global containing the main source file name
If the global name doesn't start with __sancov_gen, ASan will insert
unecessary red zones around it.

llvm-svn: 287117
2016-11-16 16:50:43 +00:00
Daniil Fukalov e870398e48 test commit, changed tab to spaces, NFC
llvm-svn: 287116
2016-11-16 16:41:40 +00:00
Pekka Jaaskelainen 8483cf0ae8 Add a little endian variant of TCE.
llvm-svn: 287111
2016-11-16 15:22:23 +00:00
Simon Pilgrim b57dd17142 [X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.

LLVM counterpart to D26686

Differential Revision: https://reviews.llvm.org/D26736

llvm-svn: 287108
2016-11-16 14:48:32 +00:00
Simon Dardis 8ca1cbccc6 [mips] Fix unsigned/signed type error
MipsFastISel uses a a class to represent addresses with a signed member
to represent the offset. MipsFastISel::emitStore, emitLoad and computeAddress
all treated the offset as being positive. In cases where the offset was
actually negative and a frame pointer was used, this would cause the constant
synthesis routine to crash as it would generate an unexpected instruction
sequence when frame indexes are replaced.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D26192

llvm-svn: 287099
2016-11-16 11:29:07 +00:00
Simon Dardis 7b7cb8d9dd [mips] not instruction alias
This patch adds the single operand form of the not alias to microMIPS and
MIPS along with additional tests.

This partially resolves PR/30381.

Thanks to Sean Bruno for reporting the issue!

llvm-svn: 287097
2016-11-16 11:04:49 +00:00
Pavel Labath 0c20e05e89 Remove TimeValue class
Summary:
All uses have been replaced by appropriate std::chrono types, and the class is
now unused.

Reviewers: zturner, mehdi_amini

Subscribers: llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D26447

llvm-svn: 287094
2016-11-16 10:46:48 +00:00
Ayman Musa 4d60243bfd [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.
Differential Revision: https://reviews.llvm.org/D26128

llvm-svn: 287087
2016-11-16 09:00:28 +00:00
Craig Topper 6910fa0ef4 [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.

Reviewers: zvi, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26660

llvm-svn: 287083
2016-11-16 05:24:10 +00:00
Konstantin Zhuravlyov bf998c7003 [AMDGPU] Refactor v_mac_{f16, f32} patterns into a class NFC
Differential Revision: https://reviews.llvm.org/D26711

llvm-svn: 287077
2016-11-16 03:39:12 +00:00
Matthias Braun 3d51cf0a2c AArch64: Use DeadRegisterDefinitionsPass before regalloc.
Doing this before register allocation reduces register pressure as we do
not even have to allocate a register for those dead definitions.

Differential Revision: https://reviews.llvm.org/D26111

llvm-svn: 287076
2016-11-16 03:38:27 +00:00
Konstantin Zhuravlyov 2a87a42035 [AMDGPU] Handle f16 select{_cc}
- Select `select` to `v_cndmask_b32`
- Expand `select_cc`
- Refactor patterns

Differential Revision: https://reviews.llvm.org/D26714

llvm-svn: 287074
2016-11-16 03:16:26 +00:00
Quentin Colombet fb9b0cdcfe [RegAllocGreedy] Record missed hint for late recoloring.
In https://reviews.llvm.org/D25347, Geoff noticed that we still have
useless copy that we can eliminate after register allocation. At the
time the allocation is chosen for those copies, they are not useless
but, because of changes in the surrounding code, later on they might
become useless.
The Greedy allocator already has a mechanism to deal with such cases
with a late recoloring. However, we missed to record the some of the
missed hints.

This commit fixes that.

llvm-svn: 287070
2016-11-16 01:07:12 +00:00
Rui Ueyama fb1e6d22a3 Align Modi and FileInfo substreams on 32-byte offsets.
This is required by DbiStream, but DbiStreamBuilder didn't align
these substreams, so the output of DbiSTreamBuilder couldn't be
read by DbiStream.

Test will be added to LLD.

llvm-svn: 287067
2016-11-16 00:59:27 +00:00
Vyacheslav Klochkov b3dc774a99 Fixed the lost FastMathFlags for CALL operations in SLPVectorizer.
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26575

llvm-svn: 287064
2016-11-16 00:55:50 +00:00
Justin Lebar 2860573529 [BypassSlowDivision] Handle division by constant numerators better.
Summary:
We don't do BypassSlowDivision when the denominator is a constant, but
we do do it when the numerator is a constant.

This patch makes two related changes to BypassSlowDivision when the
numerator is a constant:

 * If the numerator is too large to fit into the bypass width, don't
   bypass slow division (because we'll never run the smaller-width
   code).

 * If we bypass slow division where the numerator is a constant, don't
   OR together the numerator and denominator when determining whether
   both operands fit within the bypass width.  We need to check only the
   denominator.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26699

llvm-svn: 287062
2016-11-16 00:44:47 +00:00
Justin Lebar 583b8687eb [BypassSlowDivision] Simplify partially-tautological if statement.
if (A || (B && A)) --> if (A).

llvm-svn: 287061
2016-11-16 00:44:43 +00:00
Rui Ueyama 507013180e Fix Modi and File count if there are more than 65535 modules/files.
These numbers are intended to be capped at 65535, but
`std::max<uint16_t>(UINT16_MAX, N)` always returns N for any N because
the expression is the same as `std::max((uint16_t)UINT16_MAX, (uint16_t)N)`.

llvm-svn: 287060
2016-11-16 00:38:33 +00:00
Joerg Sonnenberger 8c1a9ac52b Always use relative jump table encodings on PowerPC64.
For the default, small and medium code model, use the existing
difference from the jump table towards the label. For all other code
models, setup the picbase and use the difference between the picbase and
the block address.

Overall, this results in smaller data tables at the expensive of one or
two more arithmetic operation at the jump site. Given that we only create
jump tables with a lot more than two entries, it is a net win in size.
For larger code models the assumption remains that individual functions
are no larger than 2GB.

Differential Revision: https://reviews.llvm.org/D26336

llvm-svn: 287059
2016-11-16 00:37:30 +00:00
Jan Vesely e8cc395e4f AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument
wbinvl.* are vector instruction that do not sue vector registers.

v2: check only M?BUF instructions

Differential Revision: https://reviews.llvm.org/D26633

llvm-svn: 287056
2016-11-15 23:55:15 +00:00
Filipe Cabecinhas ec350b71fa [AddressSanitizer] Add support for (constant-)masked loads and stores.
This patch adds support for instrumenting masked loads and stores under
ASan, if they have a constant mask.

isInterestingMemoryAccess now supports returning a mask to be applied to
the loads, and instrumentMop will use it to generate additional checks.

Added tests for v4i32 v8i32, and v4p0i32 (~v4i64) for both loads and
stores (as well as a test to verify we don't add checks to non-constant
masks).

Differential Revision: https://reviews.llvm.org/D26230

llvm-svn: 287047
2016-11-15 22:37:30 +00:00
Peter Collingbourne bc9a574657 Object: replace backslashes with slashes in embedded relative thin archive paths on Windows.
This makes these thin archives portable between *nix and Windows.

Differential Revision: https://reviews.llvm.org/D26696

llvm-svn: 287038
2016-11-15 21:36:35 +00:00
Chad Rosier 201fc1ed26 [AArch64] Add support for Qualcomm's Falkor CPU.
Differential Revision: https://reviews.llvm.org/D26673

llvm-svn: 287036
2016-11-15 21:34:12 +00:00
Tom Stellard d23de360db AMDGPU/SI: Fix pattern for i16 = sign_extend i1
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26670

llvm-svn: 287035
2016-11-15 21:25:56 +00:00
Kostya Serebryany 9d6dc7b164 [sanitizer-coverage] make sure asan does not instrument coverage guards (reported in https://github.com/google/oss-fuzz/issues/84)
llvm-svn: 287030
2016-11-15 21:12:50 +00:00
Kuba Brecka a3ccf3ddbd Fix llvm-symbolizer to correctly sort a symbol array and calculate symbol sizes
Sometimes, llvm-symbolizer gives wrong results due to incorrect sizes of some symbols. The reason for that was an incorrectly sorted array in computeSymbolSizes. The comparison function used subtraction of unsigned types, which is incorrect. Let's change this to return explicit -1 or 1.

Differential Revision: https://reviews.llvm.org/D26537

llvm-svn: 287028
2016-11-15 21:07:03 +00:00
Tim Northover ed55a05b01 GlobalISel: remove unused variable to silence warning.
llvm-svn: 287027
2016-11-15 21:06:07 +00:00
Matt Arsenault d4bb5e4831 AMDGPU: Enable store clustering
Also respect the TII hook for these like the generic code does
in case we want a flag later to disable this.

llvm-svn: 287021
2016-11-15 20:22:55 +00:00
Haicheng Wu faee2b71a7 [AArch64] Lower multiplication by a constant int to shl+add+shl
Lower a = b * C where C = (2^n + 1) * 2^m to

add     w0, w0, w0, lsl n
lsl     w0, w0, m

Differential Revision: https://reviews.llvm.org/D229245

llvm-svn: 287019
2016-11-15 20:16:48 +00:00
Matt Arsenault 3666629837 AMDGPU: Analyze mubuf with immediate soffset
Fixes giving up on clustering common addr64 accesses with
constant 0 soffset.

llvm-svn: 287018
2016-11-15 20:14:27 +00:00
Matt Arsenault 12c53897f3 AMDGPU: Fix return after else
llvm-svn: 287015
2016-11-15 19:58:54 +00:00
Wei Mi 37c4aaaf52 Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
llvm-svn: 287014
2016-11-15 19:42:05 +00:00
Matt Arsenault 92b355b1a9 AMDGPU: Replace assert(false) with unreachable
llvm-svn: 287013
2016-11-15 19:34:37 +00:00
Stanislav Mekhanoshin ea91cca593 [AMDGPU] Add wave barrier builtin
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.

Differential Revision: https://reviews.llvm.org/D26585

llvm-svn: 287007
2016-11-15 19:00:15 +00:00
Wei Mi 7ccf7651c0 [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.

Differential Revision: https://reviews.llvm.org/D26429

llvm-svn: 286999
2016-11-15 18:35:53 +00:00
Pawel Bylica c3f6c97f71 Integer legalization: fix MUL expansion
Summary:
This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720.

For tests I created a fuzz tester that compares the results with Boost.Multiprecision.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26628

llvm-svn: 286998
2016-11-15 18:29:24 +00:00
Zaara Syeda a19c9e60e9 vector load store with length (left justified) llvm portion
llvm-svn: 286993
2016-11-15 17:54:19 +00:00
Sanjay Patel 73d1d35d21 fix formatting; NFC
llvm-svn: 286989
2016-11-15 17:47:13 +00:00
Wei Mi d2948cef70 [IndVars] Change the order to compute WidenAddRec in widenIVUse.
When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence
return non-null but different WideAddRec, if getWideRecurrence is called
before getExtendedOperandRecurrence, we won't bother to call
getExtendedOperandRecurrence again. But As we know it is possible that after
SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned
by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null
WideAddRec, we know for sure that it is legal to do widening for current instruction.
So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which
will increase the chance of successful widening.

Differential Revision: https://reviews.llvm.org/D26059

llvm-svn: 286987
2016-11-15 17:34:52 +00:00
Diana Picus 895c6aa6fd [ARM] GlobalISel: Remove unused members. NFCI
This silences some warnings that I didn't see with my host compiler.

llvm-svn: 286981
2016-11-15 16:42:10 +00:00
Craig Topper c7486af9c9 [AVX-512] Add AVX-512 vector shift intrinsics to memory santitizer.
Just needed to add the intrinsics to the exist switch. The code is generic enough to support the wider vectors with no changes.

llvm-svn: 286980
2016-11-15 16:27:33 +00:00
Simon Pilgrim ceffb43b1b [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)
This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.

This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.

Fix for PR13248

Differential Revision: https://reviews.llvm.org/D26583

llvm-svn: 286979
2016-11-15 16:24:40 +00:00
Pablo Barrio 4f80c93a2e Revert "[JumpThreading] Unfold selects that depend on the same condition"
This reverts commit ac54d0066c478a09c7cd28d15d0f9ff8af984afc.

llvm-svn: 286976
2016-11-15 15:42:23 +00:00
Pablo Barrio 5f782bb048 Revert "[JumpThreading] Prevent non-deterministic use lists"
This reverts commit f2c2f5354070469dac253373c66527ca971ddc66.

llvm-svn: 286975
2016-11-15 15:42:17 +00:00
Diana Picus 90f0a84943 [ARM] Make sure GlobalISel is only initialized once. NFCI
Move some code inside the proper 'if' block to make sure it is only run once,
when the subtarget is first created. Things can still break if we use different
ARM target machines or if we have functions with different 'target-cpu' or
'target-features', we should fix that too in the future.

llvm-svn: 286974
2016-11-15 15:38:15 +00:00
Robert Lougher b0905209dd [LoopVectorizer] When estimating reg usage, unused insts may "end" another use
The register usage algorithm incorrectly treats instructions whose value is
not used within the loop (e.g. those that do not produce a value).

The algorithm first calculates the usages within the loop.  It iterates over
the instructions in order, and records at which instruction index each use
ends (in fact, they're actually recorded against the next index, as this is
when we want to delete them from the open intervals).

The algorithm then iterates over the instructions again, adding each
instruction in turn to a list of open intervals.  Instructions are then
removed from the list of open intervals when they occur in the list of uses
ended at the current index.

The problem is, instructions which are not used in the loop are skipped.
However, although they aren't used, the last use of a value may have been
recorded against that instruction index.  In this case, the use is not deleted
from the open intervals, which may then bump up the estimated register usage.

This patch fixes the issue by simply moving the "is used" check after the loop
which erases the uses at the current index.

Differential Revision: https://reviews.llvm.org/D26554

llvm-svn: 286969
2016-11-15 14:27:33 +00:00
Tony Jiang 5f850cd1b1 [PowerPC] Implement BE VSX load/store builtins - llvm portion.
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.

llvm-svn: 286967
2016-11-15 14:25:56 +00:00
Diana Picus 3776e76201 Get GlobalISel to build on Linux after r286407
r286407 has introduced calls to llvm::AddLandingPadInfo, which lives in the
SelectionDAG component. Add it to LLVMBuild to avoid linker failures on Linux.

llvm-svn: 286962
2016-11-15 14:11:11 +00:00
Zvi Rackover 6f76f46d2c [X86][FastISel] Assert that we are dealing with arithmetic with overflow intrinsics. NFC
llvm-svn: 286961
2016-11-15 13:50:35 +00:00
Sam Kolton c01faa383f [AMDGPU] TableGen: change individual instruction flags to bit type from bits<1>
Summary: This is needed to be able to use this flags in InstrMappings.

Reviewers: tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D26666

llvm-svn: 286960
2016-11-15 13:39:07 +00:00
Zvi Rackover f0b9b57bd3 [X86][FastISel] Fix lowering of overflow result on AVX512 targets
Summary:
    Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class.
    We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction.

    Fixes pr30981.

    Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet

    Subscribers: qcolombet, llvm-commits

    Differential Revision: https://reviews.llvm.org/D26620

llvm-svn: 286958
2016-11-15 13:29:23 +00:00
Florian Hahn 4b4dc172e7 Test commit, remove trailing space.
This commit is used to test commit access.

llvm-svn: 286957
2016-11-15 13:28:42 +00:00
Joerg Sonnenberger 1a7eec68a9 Introduce TLI predicative for base-relative Jump Tables.
For 64bit ABIs it is common practice to use relative Jump Tables with
potentially different relocation bases.  As the logic for the jump table
itself doesn't depend on the relocation base, make it easier for targets
to use the generic logic. Start by dropping the now redundant MIPS logic.

Differential Revision: https://reviews.llvm.org/D26578

llvm-svn: 286951
2016-11-15 12:39:46 +00:00
Javed Absar f043dac25d [ARM] Add machine scheduler for Cortex-R52
This patch adds the Sched Machine Model for Cortex-R52.

Details of the pipeline and descriptions are in comments
in file ARMScheduleR52.td included in this patch.

Reviewers: rengolin, jmolloy

Differential Revision: https://reviews.llvm.org/D26500

llvm-svn: 286949
2016-11-15 11:34:54 +00:00
Asaf Badouh b573553424 DAGCombiner: fix combine of trunc and select
bugzilla:
https://llvm.org/bugs/show_bug.cgi?id=29002
pr29002

Differential Revision: https://reviews.llvm.org/D26449


 

llvm-svn: 286938
2016-11-15 07:55:22 +00:00
Matt Arsenault 1c8d933881 TableGen: Add operator !or
llvm-svn: 286936
2016-11-15 06:49:28 +00:00
Zvi Rackover 76dbf26599 [X86][GlobalISel] Add minimal call lowering support to the IRTranslator
Summary:
    Add basic functionality to support call lowering for X86.
    Currently only supports functions which return void and take zero arguments.
    Inspired by commit 286573.

Reviewers: ab, qcolombet, t.p.northover

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26593

llvm-svn: 286935
2016-11-15 06:34:33 +00:00
Craig Topper e6915b85ed [X86] Add LLVM version number for each intrinsic handled by auto upgrade for age tracking.
One day we'd like to remove some of this autoupgrade support and it will be easier if we know how long some of it has been around.

Differential Revision: https://reviews.llvm.org/D26321

llvm-svn: 286933
2016-11-15 05:04:51 +00:00
Matt Arsenault c79dc70d50 AMDGPU: Fix f16 fabs/fneg
llvm-svn: 286931
2016-11-15 02:25:28 +00:00
Rui Ueyama 6b77ad3546 Simplify identify_magic.
This patch defines a memcmp-ish helper function to simplify identify_magic.

llvm-svn: 286928
2016-11-15 01:57:05 +00:00
Greg Clayton 6f6e4dbd5d Improve DWARF parsing speed by improving DWARFAbbreviationDeclaration
This patch gets a DWARF parsing speed improvement by having DWARFAbbreviationDeclaration instances know if they have a fixed byte size. If an abbreviation has a fixed byte size that can be calculated given a DWARFUnit, then parsing a DIE becomes two steps: parse ULEB128 abbrev code, and then add constant size to the offset.

This patch also adds a fixed byte size to each DWARFAbbreviationDeclaration::AttributeSpec so that attributes can quickly skip their values if needed without the need to lookup the fixed for size.

Notable improvements:

- DWARFAbbreviationDeclaration::findAttributeIndex() now returns an Optional<uint32_t> instead of a uint32_t and we no longer have to look for the magic -1U return value
- Optional<uint32_t> DWARFAbbreviationDeclaration::findAttributeIndex(dwarf::Attribute attr) const;
- DWARFAbbreviationDeclaration now has a getAttributeValue() function that extracts an attribute value given a DIE offset that takes advantage of the DWARFAbbreviationDeclaration::AttributeSpec::ByteSize
- bool DWARFAbbreviationDeclaration::getAttributeValue(const uint32_t DIEOffset, const dwarf::Attribute Attr, const DWARFUnit &U, DWARFFormValue &FormValue) const;
- A DWARFAbbreviationDeclaration instance can return a fixed byte size for itself so DWARF parsing is faster:
- Optional<size_t> DWARFAbbreviationDeclaration::getFixedAttributesByteSize(const DWARFUnit &U) const;
- Any functions that used to take a "const DWARFUnit *U" that would crash if U was NULL now take a "const DWARFUnit &U" and are only called with a valid DWARFUnit

Differential Revision: https://reviews.llvm.org/D26567

llvm-svn: 286924
2016-11-15 01:23:06 +00:00
Rui Ueyama e97c34cb60 Fix -Wswitch.
llvm-svn: 286920
2016-11-15 00:58:50 +00:00
Rui Ueyama 2d02166b43 Add a file magic for CL.exe's object file created with /GL.
This patch makes it possible to identify object files created by CL.exe
with /GL option. Such file contains Microsoft proprietary intermediate
code instead of target machine code to do LTO.

I need this to print out user-friendly error message from LLD.

Differential Revision: https://reviews.llvm.org/D26645

llvm-svn: 286919
2016-11-15 00:54:54 +00:00
Matt Arsenault 81da114e65 AMDGPU: Set hasExtraSrcRegAllocReq on v_div_scale_*
This doesn't solve any problems I know about, but this should have
more conservative assumptions about the operands'

llvm-svn: 286913
2016-11-15 00:05:42 +00:00
Matt Arsenault 972034bda9 AMDGPU: Fix formatting of 1/2pi immediate
llvm-svn: 286912
2016-11-15 00:04:33 +00:00
Tom Stellard 9c884e495c MIRParser: Add support for parsing vreg reg alloc hints
Reviewers: qcolombet, MatzeB

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26573

llvm-svn: 286911
2016-11-15 00:03:14 +00:00
Evandro Menezes 9fc54826e0 [AArch64] Compute the Newton series for reciprocals natively
Implement the Newton series for square root, its reciprocal and reciprocal
natively using the specialized instructions in AArch64 to perform each
series iteration.

Differential revision: https://reviews.llvm.org/D26518

llvm-svn: 286907
2016-11-14 23:29:01 +00:00
Peter Collingbourne 3cb86272fc Linker: Remove unnecessary call to copyMetadata in IRLinker::linkGlobalVariable.
This was causing us to create duplicate metadata on global variables.
Debug info test case by Adrian Prantl, additional test cases by me.

Fixes PR31012.

Differential Revision: https://reviews.llvm.org/D26622

llvm-svn: 286905
2016-11-14 23:18:38 +00:00
Tom Stellard 11e60ff7da RegAllocGreedy: Properly initialize this pass, so that -run-pass will work
Reviewers: qcolombet, MatzeB

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26572

llvm-svn: 286895
2016-11-14 21:50:13 +00:00
Kuba Brecka ddfdba3b01 [tsan] Add support for C++ exceptions into TSan (call __tsan_func_exit during unwinding), LLVM part
This adds support for TSan C++ exception handling, where we need to add extra calls to __tsan_func_exit when a function is exitted via exception mechanisms. Otherwise the shadow stack gets corrupted (leaked). This patch moves and enhances the existing implementation of EscapeEnumerator that finds all possible function exit points, and adds extra EH cleanup blocks where needed.

Differential Revision: https://reviews.llvm.org/D26177

llvm-svn: 286893
2016-11-14 21:41:13 +00:00
Kevin Enderby 22fc007809 Add a checkSymbolTable() method to the MachOObjectFile class.
The philosophy of the error checking in libObject for Mach-O files
is that the constructor will check the load commands so for their
tables the offsets and sizes are properly contained in the file.
But there is no checking of the entries of any of the tables.

For the contents of the tables themselves the methods accessing
the contents of the entries return errors as needed.  In some
cases this however makes it difficult or cumbersome to produce
a good error message which would include the tool name, file name,
archive member, and name of the architecture of a slice of a universal file
the error occurred in.

So idea is that there will be a method to check a table which can
be called up front before using it allowing a good error message
to be produced before a table is used.  And if only verification of
the Mach-O file and its tables are wanted a new possible method
checkAllTables() could be added to call all of the methods to
check all the tables at some time when such methods exist.

The checkSymbolTable() is the first of such methods to check
one of the Mach-O file tables.  This method initially will used in
llvm-objdump’s DisassembleMachO() routine before it gets the
section and symbol information.  As if there are problems with
the symbol table currently the error is first encountered by the
bool operator() in the SymbolSorter() struct which passed to
std::sort().  In this case there is no context as to the file name
the symbol which results a poor error message:

LLVM ERROR: truncated or malformed object (bad string index: 22 for symbol at index 1)

with the added call to the checkSymbolTable() method the
error message includes the tool name and file name:

llvm-objdump: 'macho-invalid-symbol-strx': truncated or malformed object (bad string table index: 22 past the end of string table, for symbol at index 1)
llvm-svn: 286887
2016-11-14 20:57:04 +00:00
Krzysztof Parzyszek b16a4e5869 [Hexagon] Give a predicate function a more meaningful name
Change "orisadd" to "IsOrAdd" to follow the naming conventions, and
change "isOrAdd" in the C++ code to "isOrEquivalentToAdd".

llvm-svn: 286886
2016-11-14 20:53:09 +00:00
Tim Northover 3d38c38826 ARM: try to fix GCC 4.8 compilation again after r286881.
llvm-svn: 286882
2016-11-14 20:31:53 +00:00
Tim Northover 46a6f0fbf0 Recommit: ARM: sort register lists by encoding in push/pop instructions.
For example we were producing

    push {r8, r10, r11, r4, r5, r7, lr}

This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.

Fixed usage of std::sort so that we (hopefully) use instantiations that
actually exist in GCC 4.8.

llvm-svn: 286881
2016-11-14 20:28:24 +00:00
Geoff Berry e8de67abad [AArch64] Change some pointers to references. NFC.
Follow-up change to r286875.

llvm-svn: 286879
2016-11-14 19:59:11 +00:00
Geoff Berry 526c50588d [AArch64] Split 0 vector stores into scalar store pairs.
Summary:
Replace a splat of zeros to a vector store by scalar stores of WZR/XZR.
The load store optimizer pass will merge them to store pair stores.
This should be better than a movi to create the vector zero followed by
a vector store if the zero constant is not re-used, since one
instructions and one register live range will be removed.

For example, the final generated code should be:

  stp xzr, xzr, [x0]

instead of:

  movi v0.2d, #0
  str q0, [x0]

Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26561

llvm-svn: 286875
2016-11-14 19:39:04 +00:00
Geoff Berry def4bfa9d9 [AArch64] Factor out transform code from split16BStore. NFC.
llvm-svn: 286874
2016-11-14 19:39:00 +00:00
Teresa Johnson 4fef68cb8d [ThinLTO] Only promote exported locals as marked in index
Summary:
We have always speculatively promoted all renamable local values
(except const non-address taken variables) for both the exporting
and importing module. We would then internalize them back based on
the ThinLink results if they weren't actually exported. This is
inefficient, and results in unnecessary renames. It also meant we
had to check the non-renamability of a value in the summary, which
was already checked during function importing analysis in the ThinLink.

Made renameModuleForThinLTO (which does the promotion/renaming) instead
use the index when exporting, to avoid unnecessary renames/promotions.
For importing modules, we can simply promoted all values as any local
we import by definition is exported and needs promotion.

This required changes to the method used by the FunctionImport pass
(only invoked from 'opt' for testing) and when invoked from llvm-link,
since neither does a ThinLink. We simply conservatively mark all locals
in the index as promoted, which preserves the current aggressive
promotion behavior.

I also needed to change an llvm-lto based test where we had previously
been aggressively promoting values that weren't importable (aliasees),
but now will not promote.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26467

llvm-svn: 286871
2016-11-14 19:21:41 +00:00
Kostya Serebryany 6c77811a29 [libFuzzer] replace 'auto' with 'auto *' to better follow the LLVM style
llvm-svn: 286870
2016-11-14 19:21:38 +00:00
Daniel Sanders 08714cdee4 Revert: r286868 - Test commit
llvm-svn: 286869
2016-11-14 19:10:56 +00:00
Daniel Sanders 12432e0b04 Test commit
llvm-svn: 286868
2016-11-14 19:09:33 +00:00
Tim Northover 1b66f39cf2 Revert "ARM: sort register lists by encoding in push/pop instructions."
This reverts commit 286866. It broke a bot, something to do with exactly which
templates std::sort accepts.

llvm-svn: 286867
2016-11-14 19:05:28 +00:00
Tim Northover e908ea844c ARM: sort register lists by encoding in push/pop instructions.
For example we were producing

    push {r8, r10, r11, r4, r5, r7, lr}

This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.

llvm-svn: 286866
2016-11-14 19:02:17 +00:00
Sean Fertile a435e07de8 [PPC] Add intrinsic mapping to the xscvhpsp instruction
add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to
Single-Precision' instruction.

Differential review: https://reviews.llvm.org/D26536

llvm-svn: 286862
2016-11-14 18:43:59 +00:00
Changpeng Fang 8236fe103f AMDGPU/SI: Support data types other than V4f32 in image intrinsics
Summary:
  Extend image intrinsics to support data types of V1F32 and V2F32.

  TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active,
  even though such case should be very rare.

Reviewers:
  tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D26472

llvm-svn: 286860
2016-11-14 18:33:18 +00:00
Bob Wilson d7bef6972d Use _Unwind_Backtrace on Apple platforms.
Darwin's backtrace() function does not work with sigaltstack (which was
enabled when available with r270395) — it does a sanity check to make
sure that the current frame pointer is within the expected stack area
(which it is not when using an alternate stack) and gives up otherwise.
The alternative of _Unwind_Backtrace seems to work fine on macOS, so use
that when backtrace() fails. Note that we then use backtrace_symbols_fd()
with the addresses from _Unwind_Backtrace, but I’ve tested that and it
also seems to work fine. rdar://problem/28646552

llvm-svn: 286851
2016-11-14 17:56:18 +00:00
Adrian Prantl 1f9ac96cb1 Typo
llvm-svn: 286845
2016-11-14 17:26:32 +00:00
Teresa Johnson 3624bdf60a Restore "[ThinLTO] Prevent exporting of locals used/defined in module level asm"
This restores the rest of r286297 (part was restored in r286475).
Specifically, it restores the part requiring adding a dependency from
the Analysis to Object library (downstream use changed to correctly
model split BitReader vs BitWriter libraries).

Original description of this part of patch follows:

Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.

This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).

Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.

Fixes PR30610.

llvm-svn: 286844
2016-11-14 17:12:32 +00:00
Sumanth Gundapaneni d428cf8b5f [Hexagon] Remove unsafe load instructions that affect Stack Slot Coloring
The Stack slot coloring pass removes a store that is followed by a load
that deal with the same stack slot. The function isLoadFromStackSlot
is supposed to consider the loads that have no side-effects. This
patch fixed the issue by removing the unsafe loads from this function
Eg:
%vreg0<def> = L2_loadruh_io <fi#15>, 0
S2_storeri_io <fi#15>, 0, %vreg0

In this case, we load an unsigned extended half word and store this in to
the same stack slot. The Stack slot coloring pass considers safe to remove
the store. This patch marked all the non-vector byte and half word loads as
unsafe.

llvm-svn: 286843
2016-11-14 17:11:00 +00:00
Teresa Johnson d5033a4576 [ThinLTO] Make inline assembly handling more efficient in summary
Summary:
The change in r285513 to prevent exporting of locals used in
inline asm added all locals in the llvm.used set to the reference
set of functions containing inline asm. Since these locals were marked
NoRename, this automatically prevented importing of the function.

Unfortunately, this caused an explosion in the summary reference lists
in some cases. In my particular example, it happened for a large protocol
buffer generated C++ file, where many of the generated functions
contained an inline asm call. It was exacerbated when doing a ThinLTO
PGO instrumentation build, where the PGO instrumentation included
thousands of private __profd_* values that were added to llvm.used.

We really only need to include a single llvm.used local (NoRename) value
in the reference list of a function containing inline asm to block it
being imported. However, it seems cleaner to add a flag to the summary
that explicitly describes this situation, which is what this patch does.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26402

llvm-svn: 286840
2016-11-14 16:40:19 +00:00
Simon Pilgrim 779da8e5ea [CostModel][X86] Added mul costs for vXi8 vectors
More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result

llvm-svn: 286838
2016-11-14 15:54:24 +00:00
Simon Pilgrim 27fed8e5d6 [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

llvm-svn: 286832
2016-11-14 14:45:16 +00:00
Sean Fertile adda5b2d2b [PPC] add intrinsics for vec extract exp/significand and vec test data class.
Differential Revision: https://reviews.llvm.org/D26272

llvm-svn: 286829
2016-11-14 14:42:37 +00:00
Simon Pilgrim 475b40dab8 Remove redundant condition (PR28352) NFCI.
We were already testing is the op was not a leaf, so need to then test if it was a leaf (added it to the assert instead).

llvm-svn: 286817
2016-11-14 12:00:46 +00:00
James Molloy 6df8f27c95 [InlineCost] Remove skew when calculating call costs
When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself.

However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case:

  int g() {
    h();
  }
  int f() {
    g();
  }

llvm-svn: 286814
2016-11-14 11:14:41 +00:00
Simon Pilgrim 3d8482a88d Remove redundant condition (PR28800) NFCI.
'A || (!A && B)' is equivalent to 'A || B':

(LoopCycle > DefCycle) || (LoopCycle <= DefCycle && LoopStage <= DefStage)
-->
(LoopCycle > DefCycle) || (LoopStage <= DefStage)

llvm-svn: 286811
2016-11-14 10:40:23 +00:00
Diana Picus bda7276120 GlobalISel: Fix indentation. NFC
llvm-svn: 286808
2016-11-14 10:25:43 +00:00
Pablo Barrio 7ce2c5ecaf [JumpThreading] Prevent non-deterministic use lists
Summary:
Unfolding selects was previously done with the help of a vector
of pointers that was then sorted to be able to remove duplicates.
As this sorting depends on the memory addresses, it was
non-deterministic. A SetVector is used now so that duplicates are
removed without the need of sorting first.

Reviewers: mgrang, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26450

llvm-svn: 286807
2016-11-14 10:24:26 +00:00
Eric Fiselier c79c9d8536 Add explicit (void) cast to unused unique_ptr::release() results
Summary:
This patch adds explicit `(void)` casts to discarded `release()` calls to suppress -Wunused-result.

This patch fixes *all* warnings are generated as a result of [applying `[[nodiscard]]`  within libc++](https://reviews.llvm.org/D26596).
Similar fixes were applied to Clang in r286796.

Reviewers: chandlerc, dberris

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26598

llvm-svn: 286797
2016-11-14 07:26:17 +00:00
Saleem Abdulrasool d6da74f22b Demangle: only demangle mangled symbols
Only attempt to demangle symbols which have the itanium C++ prefix of `_Z`.
This ensures that we do not treat any symbol name as a managled named.  We would
previously treat a C function `f` as a mangled name and decode that to `float`
incorrectly.

While it is easy to add tests for this, Mehdi recommended against introducing
tests for the demangler as libc++abi should cover the testing.

llvm-svn: 286795
2016-11-14 04:54:47 +00:00
Craig Topper 8f85ad1755 [AVX-512] Add suffixless aliases for EVEX encoded vcvtsi2ss/vcvtsi2sd/vcvtusi2ss/vcvtusi2sd. This matches the VEX behavior.
Fixes another problem from PR28850.

llvm-svn: 286790
2016-11-14 02:46:58 +00:00
Craig Topper b8596e4d1d [X86] Cleanup 'x' and 'y' mnemonic suffixes for vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions.
-Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions.
-Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions.
-Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax.
-Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing.

This should fix at least some of PR28850.

llvm-svn: 286787
2016-11-14 01:53:29 +00:00
Craig Topper 353e59b6d6 [AVX-512] Remove and autoupgrade masked dword/qword variable shift intrinsics to the new unmasked versions and selects.
llvm-svn: 286786
2016-11-14 01:53:22 +00:00
Sanjay Patel cfcc42bdc2 [ValueTracking] recognize even more variants of smin/smax
Similar to:
https://reviews.llvm.org/rL285499
https://reviews.llvm.org/rL286318

We can't minimally expose this in IR tests because we don't have min/max intrinsics,
but the difference is visible in codegen because SelectionDAGBuilder::visitSelect() 
uses matchSelectPattern().

We're not canonicalizing these patterns in IR (yet), so I don't expect there to be any
regressions as noted here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html

llvm-svn: 286776
2016-11-13 20:04:52 +00:00
Craig Topper ba13703bb3 [AVX-512] Fix a disassembler failure for AVX-512 vcmpss/vcmpsd with an immediate larger than 32. Fix the same bug with VLX vcmpps/vcmppd.
Fixes PR24941.

llvm-svn: 286775
2016-11-13 19:58:18 +00:00
Sanjay Patel 819f096f09 [ValueTracking] move min/max matching to helper function; NFCI
llvm-svn: 286772
2016-11-13 19:30:19 +00:00
Craig Topper 987dad2bc3 [X86][IR] Reduce the number of full string comparisons in the code that autoupgrades masked shift intrinsics.
llvm-svn: 286768
2016-11-13 19:09:56 +00:00
Matt Arsenault dc45274d54 AMDGPU: Implement SGPR spilling with scalar stores
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.

This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.

The cache also needs to be flushed before wave termination
if a scalar store is used.

llvm-svn: 286766
2016-11-13 18:20:54 +00:00
Igor Breger e2399f9e0e revert commit r286761, some builds failed on Win platforms
llvm-svn: 286765
2016-11-13 15:48:11 +00:00
Ayman Musa c09b3769ae [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.
Differential Revision: https://reviews.llvm.org/D26128

llvm-svn: 286761
2016-11-13 14:51:25 +00:00
Ayman Musa 46af8f9c6f [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.
Differential Revision: https://reviews.llvm.org/D26022

llvm-svn: 286758
2016-11-13 14:29:32 +00:00
Craig Topper b4173a5a70 [InstCombine][AVX-512] Teach InstCombineCalls to handle the new unmasked AVX-512 variable shift intrinsics.
llvm-svn: 286755
2016-11-13 07:26:19 +00:00