Commit Graph

98364 Commits

Author SHA1 Message Date
Mehdi Amini 19ef4fad91 Use lazy-loading of Metadata in MetadataLoader when importing is enabled (NFC)
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.

Two main current limitations are:

1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.

These two limitations can be lifted in a subsequent improvement if
needed.

With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).

Reviewers: pcc, tejohnson, dexonsmith

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28113

llvm-svn: 291027
2017-01-04 22:54:33 +00:00
Mehdi Amini 867aad1359 Change BitstreamCursor::skipRecord to return the record code (NFC)
llvm-svn: 291026
2017-01-04 22:54:14 +00:00
Matt Arsenault 6796d7ea8b AMDGPU: Remove unneccessary intermediate vector
llvm-svn: 291025
2017-01-04 22:54:10 +00:00
Matt Arsenault 3bdd75d01e InstCombine: Fold cos(-x) -> cos(x)
Also cos(fabs(x)) -> cos(x)

llvm-svn: 291022
2017-01-04 22:49:03 +00:00
David Blaikie 7ad9dc11db Reapply "Make BitCodeAbbrev ownership explicit using shared_ptr rather than IntrusiveRefCntPtr""
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).

This recommits 291006, reverted in r291007.

llvm-svn: 291016
2017-01-04 22:36:33 +00:00
Tim Shen 5480eb8445 [Legalizer] Fix fp-to-uint to fp-tosint promotion assertion.
Summary:
When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero
extended. For example, given double 65534.0, without legalization:

  fp-to-uint16: 65534.0 -> 0xfffe

With the legalization:

  fp-to-sint32: 65534.0 -> 0x0000fffe

Without this patch, legalization wrongly emits a signed extend assertion,
which is consumed by later icmp instruction, and cause miscompile.

Note that the floating point value must be in [0, 65535), otherwise the
behavior is undefined.

This patch reverts r279223 behavior and adds more tests and
documentations.

In PR29041's context, James Molloy mentioned that:

  We don't need to mask because conversion from float->uint8_t is
  undefined if the integer part of the float value is not representable in
  uint8_t. Therefore we can assume this doesn't happen!

which is totally true and good, because fptoui is documented clearly to
have undefined behavior when overflow/underflow happens. We should take
the advantage of this behavior so that we can save unnecessary mask
instructions.

Reviewers: jmolloy, nadav, echristo, kbarton

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28284

llvm-svn: 291015
2017-01-04 22:11:42 +00:00
Evgeny Stupachenko c88697dc16 The patch fixes (base, index, offset) match.
Summary:
Instead of matching:
  (a + i) + 1 -> (a + i, undef, 1)
Now it matches:
  (a + i) + 1 -> (a, i, 1)

Reviewers: rengolin

Differential Revision: http://reviews.llvm.org/D26367

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
2017-01-04 21:43:39 +00:00
Chad Rosier 63687e40bc [AArch64] Update the feature set for Qualcomm's Falkor CPU.
llvm-svn: 291010
2017-01-04 21:26:23 +00:00
Nirav Dave 0f9d111f97 [AArch64] Fix over-eager early-exit in load-store combiner
Fix early-exit analysis for memory operation pairing when operations are
not emitted in ascending order.

Reviewers: mcrosier, t.p.northover

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D28251

llvm-svn: 291008
2017-01-04 21:21:46 +00:00
David Blaikie 6e2207a134 Revert "Make BitCodeAbbrev ownership explicit using shared_ptr rather than IntrusiveRefCntPtr"
Breaks Clang's use of bitcode. Reverting until I have a fix to go with
it there.

This reverts commit r291006.

llvm-svn: 291007
2017-01-04 21:19:28 +00:00
David Blaikie daff78cd87 Make BitCodeAbbrev ownership explicit using shared_ptr rather than IntrusiveRefCntPtr
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).

llvm-svn: 291006
2017-01-04 21:13:35 +00:00
Hal Finkel b2f951d87a [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:

 1. When determining tail-call eligibility. If we make a tail call (i.e.
    directly branch to a function) then there is no place for the linker to add
    a TOC restoration.
 2. When determining when we need to add a nop instruction after a call.
    Likewise, if there is a possibility that the linker might need to add a
    TOC restoration after a call, then we need to put a nop after the call
    (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.

Differential Revision: https://reviews.llvm.org/D27231

llvm-svn: 291003
2017-01-04 21:05:13 +00:00
Daniel Berlin 6cc5e44068 NewGVN: Track the maximum number of iterations GVN takes on any function, so we can pinpoint performance issues.
llvm-svn: 291002
2017-01-04 21:01:02 +00:00
Davide Italiano 6309895770 [lib/LTO] Simplify logic removing set but unused variable. NFCI.
Reported by David Binderman and ack'ed by Teresa on IRC.
PR: 31527

llvm-svn: 291000
2017-01-04 20:37:57 +00:00
Peter Collingbourne efdff71b05 YAML: Remove Input::MapHNode::isValidKey(), use llvm::is_contained() instead. NFC.
llvm-svn: 290999
2017-01-04 20:10:43 +00:00
Eric Christopher 568c113ac0 Remove dead and unused variable NumSentinelElements.
Fixes PR31529.

llvm-svn: 290998
2017-01-04 20:05:18 +00:00
Eric Christopher 0192e97911 Remove dead variable Len.
Fixes PR31528

llvm-svn: 290995
2017-01-04 19:47:10 +00:00
Jan Vesely d48445d513 AMDGPU/SI: Implement sendmsghalt intrinsic
v2: expose using amdgcn prefix

Differential Revision: https://reviews.llvm.org/D23511

llvm-svn: 290977
2017-01-04 18:06:55 +00:00
Robert Lougher 5bf0416f45 Reapply "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of common inst"
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer).  The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).

Original commit message:

Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.

Original review: https://reviews.llvm.org/D27590

llvm-svn: 290973
2017-01-04 17:40:32 +00:00
Simon Pilgrim bb895f3e9c [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs
Actual codegen is much better than the extract+insert patterns that was assumed.

llvm-svn: 290962
2017-01-04 14:01:33 +00:00
Nemanja Ivanovic c08b90d08f [PowerPC] Add identification for POWER8NVL
This CPU type was not previously recognized by LLVM which led to emitting
poor (and sometimes incorrect) code in some JIT workloads on such a machine.

llvm-svn: 290961
2017-01-04 13:58:09 +00:00
Simon Pilgrim 939b8cd708 [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.
As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode.

llvm-svn: 290956
2017-01-04 12:08:41 +00:00
Florian Hahn 5815f6c53c [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: aprantl, MatzeB, mkuper

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290955
2017-01-04 12:08:35 +00:00
Bjorn Pettersson 3c6ce733f5 Fix for InlineSpiller accessing not updated dom tree base information.
Summary:
The InlineSpiller was accessing the DominatorTreeBase directly
through the public data member DT in the MachineDominatorTree.
This is not a good idea as the "cached" information in
SplitCriticalEdges is not applied before the access.
The DominatorTreeBase must be accessed through the member
function getBase() in MachineDominatorTree.

The fault was introduced in r266162.

I think the public data member DT in the MachineDominatorTree
should have been made private in the original code (r215576)
that introduced the concept of lazily updating the
MachineDominatorTree information from
MachineBasicBlock::SplitCriticalEdge().

Patch by Karl-Johan Karlsson <karl-johan.karlsson@ericsson.com>

Reviewers: wmi, qcolombet

Subscribers: llvm-commits, bjope, uabelho

Differential Revision: https://reviews.llvm.org/D27983

llvm-svn: 290950
2017-01-04 09:41:56 +00:00
Nitesh Jain b0bc573ca8 [LLC][MIPS] Fix crash after enabling LLVM_ENABLE_EXPENSIVE_CHECKS
Reviewers: sdardis, vkalintiris

Subscribers: jaydeep, slthakur, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D27841

llvm-svn: 290949
2017-01-04 09:34:37 +00:00
Ayman Musa 02f9533823 [X86][AVX512] Passing the appropriate memory operand class to INT_{U}COMIS{S|D} instructions
Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly.

Differential Revision: https://reviews.llvm.org/D28138

llvm-svn: 290948
2017-01-04 08:21:54 +00:00
Simon Pilgrim c76ea4b638 [X86] Attempt to pre-truncate arithmetic operations if useful
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.

This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).

Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.

I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.

Differential Revision: https://reviews.llvm.org/D28219

llvm-svn: 290947
2017-01-04 08:05:42 +00:00
Craig Topper d0aa53b9ae [AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources.
These are best handled with a vinsert32x4 or vinsert64x2 instruction.

llvm-svn: 290946
2017-01-04 07:32:03 +00:00
Craig Topper 83115a809f [AVX-512] Simplify code for creating 512-bit SHUF128 operations.
We don't need two loops and we can safely assume assume and hardcode the size of the widened mask.

llvm-svn: 290942
2017-01-04 07:31:51 +00:00
Peter Collingbourne 87dd2ab000 Support: Add YAML I/O support for custom mappings.
This will be used to YAMLify parts of the module summary.

Differential Revision: https://reviews.llvm.org/D28014

llvm-svn: 290935
2017-01-04 03:51:36 +00:00
David Majnemer cb892e9066 [InstCombine] Move casts around shift operations
It is possible to perform a left shift before zero extending if the
shift would only shift out zeros.

llvm-svn: 290928
2017-01-04 02:21:34 +00:00
David Majnemer 022d2a563b [InstCombine] Combine adds across a zext
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))

This is only possible if C2 is negative and C2 is greater than or equal to negative C1.

llvm-svn: 290927
2017-01-04 02:21:31 +00:00
Eugene Zelenko b2ca1b3f37 [Hexagon, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 290925
2017-01-04 02:02:05 +00:00
Teresa Johnson 5a8dba5bda [ThinLTO] Import type as decl only when non-null Identifier
As per post-commit review for r289993 (D27775), we can only safely
import a type as a decl if it has an Identifier, as the Name alone
is not enough to be unique across modules.

llvm-svn: 290915
2017-01-03 23:19:29 +00:00
Matt Arsenault 56ff4839ae InstCombine: Fold fabs on select of constants
llvm-svn: 290913
2017-01-03 22:40:34 +00:00
Sanjay Patel f0d1e77373 [InstCombine] use 'match' to reduce code bloat; NFCI
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.

So just in case we decide this is the right way, we might as well
have a cleaner implementation.

llvm-svn: 290912
2017-01-03 22:25:31 +00:00
Ahmed Bougacha 8a41319d8d [CodeGen] Further simplify returned call operand logic. NFC.
As Pete points out in r290905, CallSite lets us avoid duplicating this!

llvm-svn: 290909
2017-01-03 21:42:43 +00:00
Lang Hames b198e5585e [ExecutionEngine] Fix compile errors in OProfileJITEventListener.
Allows LLVM to build with LLVM_USE_OPROFILE=True.

Patch by Mark Dewing. Thanks Mark!

llvm-svn: 290908
2017-01-03 21:39:43 +00:00
Ahmed Bougacha 6aff744e7c [CodeGen] Simplify logic that looks for returned call operands. NFC-ish.
Use getReturnedArgOperand() instead of rolling our own.  Note that it's
equivalent because there can only be one 'returned' operand.

The existing code was also incorrect: there already was awkward logic to
ignore callee/EH blocks, but operands can now also be operand bundles,
in which case we'll look for non-existent parameter attributes.

Unfortunately, this isn't observable in-tree, as it only crashes when
exercising the regular call lowering logic with operand bundles.
Still, this is a nice small cleanup anyway.

llvm-svn: 290905
2017-01-03 20:33:22 +00:00
Kostya Serebryany 4986e819dc [libFuzzer] disable -print_pcs by default (was enabled by mistake)
llvm-svn: 290899
2017-01-03 18:51:28 +00:00
Michal Gorny 21c12044d2 [ADT] APFloatBase: Prevent collapsing semPPCDoubleDouble and semBogus
Provide a distinct contents for semBogus and semPPCDoubleDouble in order
to prevent compilers from collapsing them to a single memory address,
while we heavily rely on every semantic having distinct address.

This happens if insecure optimization collapsing identical values is
enabled. As a result, APFloats of semBogus are indistinguishable from
semPPCDoubleDouble -- and whenever the move constructor is used, the old
value beings being incorrectly recognized as a semPPCDoubleDouble.

Since the values in semPPCDoubleDouble are not used anywhere,
we can easily solve this issue via altering the value of one of the
fields and therefore ensuring that the collapse can not occur.

Differential Revision: https://reviews.llvm.org/D28112

llvm-svn: 290896
2017-01-03 16:33:50 +00:00
Craig Topper 48d232d3e7 [X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle.
llvm-svn: 290872
2017-01-03 07:36:41 +00:00
Craig Topper 785e58fdc9 [AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector inserts and avoid calling isShuffleEquivalent on a widened mask.
llvm-svn: 290871
2017-01-03 07:36:39 +00:00
Craig Topper 9496e3f916 [AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles corresponding to 256-bit subvector inserts.
llvm-svn: 290870
2017-01-03 07:00:40 +00:00
Craig Topper fa875a1d3d [AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions.
llvm-svn: 290869
2017-01-03 05:46:18 +00:00
Craig Topper be9ef55152 [X86] Remove trailing whitespace and an unnecessary line wrap. NFC
llvm-svn: 290867
2017-01-03 05:46:06 +00:00
Craig Topper 06bae884bd [X86] Fix header comment. NFC
llvm-svn: 290866
2017-01-03 05:46:05 +00:00
Craig Topper c849172105 [AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to select a masked operation.
llvm-svn: 290865
2017-01-03 05:46:02 +00:00
Craig Topper 0cda8bbf74 [AVX-512] Remove vinsert intrinsics and autoupgrade to native shufflevectors. There are some codegen problems here that I'll try to fix in future commits.
llvm-svn: 290864
2017-01-03 05:45:57 +00:00
Craig Topper 4d47c6ae57 [AVX-512] Remove vextract intrinsics and autoupgrade to native shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal.
Hopefully we can improve that in future patches.

llvm-svn: 290863
2017-01-03 05:45:46 +00:00
Matt Arsenault b264c94963 InstCombine: Add fma with constant transforms
DAGCombine already does these.

llvm-svn: 290860
2017-01-03 04:32:35 +00:00
Matt Arsenault 1cc294c85d InstCombine: Add fma + fabs/fneg transforms
fma (fneg x), (fneg y), z -> fma x, y, z
fma (fabs x), (fabs x), z -> fma x, x, z

llvm-svn: 290859
2017-01-03 04:32:31 +00:00
Dean Michael Berris f7e7b938ea [XRay] Merge instrumentation point table emission code into AsmPrinter.
Summary:
No need to have this per-architecture.  While there, unify 32-bit ARM's
behaviour with what changed elsewhere and start function names lowercase
as per the coding standards.  Individual entry emission code goes to the
entry's own class.

Fully tested on amd64, cross-builds on both ARMs and PowerPC.

Reviewers: dberris

Subscribers: aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28209

llvm-svn: 290858
2017-01-03 04:30:21 +00:00
Sanjay Patel 1c9867d009 [EarlyCSE] less else, more auto; NFC
llvm-svn: 290848
2017-01-03 00:16:24 +00:00
Sanjay Patel b38ad88e9f [InstCombine] use combineMetadataForCSE instead of copying it; NFCI
llvm-svn: 290844
2017-01-02 23:25:28 +00:00
Xin Tong 2940231ff0 Make sure total loop body weight is preserved in loop peeling
Summary:
Regardless how the loop body weight is distributed, we should preserve
total loop body weight. i.e. we should have same weight reaching the body of the loop
or its duplicates in peeled and unpeeled case.

Reviewers: mkuper, davidxl, anemet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28179

llvm-svn: 290833
2017-01-02 20:27:23 +00:00
Daniel Berlin de43ef9601 NewGVN: Clean up after removing possibility of null expressions.
llvm-svn: 290828
2017-01-02 19:49:17 +00:00
Sanjay Patel 65d533ca42 fix typo; NFC
llvm-svn: 290827
2017-01-02 19:05:11 +00:00
Sanjay Patel 4382997a13 [ValueTracking] remove stale comments; NFC
The checks were improved with:
https://reviews.llvm.org/rL290194

llvm-svn: 290826
2017-01-02 19:04:07 +00:00
Davide Italiano 67ada75d84 [NewGVN] Fold single-use variable inside the assertion.
It placates some bots which complain because they compile the
assertion out and think the variable is unused.

llvm-svn: 290825
2017-01-02 19:03:16 +00:00
Davide Italiano 841261624d [NewGVN] Restore old code to placate buildbots.
Apparently my suggestion of using ternary doesn't really work
as clang complains about incompatible types on LHS and RHS. Some
GCC versions happen to accept the code but clang behaviour is
correct here.

llvm-svn: 290822
2017-01-02 18:41:34 +00:00
Daniel Berlin 25f05b0ab7 NewGVN: Fix some formatting and comment issues
llvm-svn: 290820
2017-01-02 18:22:38 +00:00
Michal Gorny 89b6f16b3e [cmake] Add LLVM_ENABLE_DIA_SDK option, and expose it in LLVMConfig
Add an explicit LLVM_ENABLE_DIA_SDK option to control building support
for DIA SDK-based debugging. Control its value to match whether DIA SDK
support was found and expose it in LLVMConfig (alike LLVM_ENABLE_ZLIB).

Its value is needed for LLDB to determine whether to run tests requiring
DIA support. Currently it is obtained from llvm/Config/config.h;
however, this file is not available for standalone builds. Following
this change, LLDB will be modified to use the value from LLVMConfig.

Differential Revision: https://reviews.llvm.org/D26255

llvm-svn: 290818
2017-01-02 18:19:35 +00:00
Joerg Sonnenberger 7b83732a40 Emit .cfi_sections before the first .cfi_startproc
GNU as rejects input where .cfi_sections is used after .cfi_startproc,
if the new section differs from the old. Adjust our output to always
emit .cfi_sections before the first .cfi_startproc to minimize necessary
code.

Differential Revision: https://reviews.llvm.org/D28011

llvm-svn: 290817
2017-01-02 18:05:27 +00:00
Daniel Berlin 02c6b176e7 NewGVN: Add UnknownExpression and create them for things we can't symbolize. Kill fragile machinery for handling null expressions.
Summary:
This avoids the very fragile code for null expressions. We could also use a denseset that tracks which things have null expressions instead, but that seems pretty fragile and premature optimization.

This resolves a number of infinite loop cases, test reductions coming.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28193

llvm-svn: 290816
2017-01-02 18:00:53 +00:00
Daniel Berlin 589cecc6e9 NewGVN: Fix PR31480, PR31483, PR31499, by rewriting how memory congruence handling works.
Summary: Previously, we tried to fix up the equivalences during symbolic evaluation.  This does not work. Now, we change the equivalences during congruence finding, where it belongs.  We also initialize the equivalence table to give a maximal answer.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28192

llvm-svn: 290815
2017-01-02 18:00:46 +00:00
Davide Italiano b672537cbf [PMBuilder] Remove RunFloat2Int cl::opt.
The pass has been on by default for a long time without problems.

llvm-svn: 290814
2017-01-02 17:49:18 +00:00
Elena Demikhovsky d96200d60a Fixed shuffle-reverse cost on AVX-512.
(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately).

llvm-svn: 290812
2017-01-02 11:44:10 +00:00
Elena Demikhovsky 21706cbd24 AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.

In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).

* Shiffle-broadcast cost will be changed in Simon's upcoming patch.

Differential Revision: https://reviews.llvm.org/D28118

llvm-svn: 290810
2017-01-02 10:37:52 +00:00
Keno Fischer f7d84ee6ff Reapply "[CodeGen] Fix invalid DWARF info on Win64"
This reapplies rL289013 (reverted in rL289014) with the fixes identified
in D21731. Should hopefully pass the buildbots this time.

llvm-svn: 290809
2017-01-02 03:00:19 +00:00
Florian Hahn f872d230ad [selectiondag] Check PromotedFloats map during expansive checks.
Summary:
`PromotedFloats` needs to be checked in 
`DAGTypeLegalizer::PerformExpensiveChecks`. This patch fixes a few type
legalization failures with expansive checks for ARM fp16 tests.

Reviewers: baldrick, bogner, arsenm

Subscribers: arsenm, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28187

llvm-svn: 290796
2017-01-01 13:58:27 +00:00
Sanjoy Das 3bb2dbd665 Fix an issue with isGuaranteedToTransferExecutionToSuccessor
I'm not sure if this was intentional, but today
isGuaranteedToTransferExecutionToSuccessor returns true for readonly and
argmemonly calls that may throw.  This commit changes the function to
not implicitly infer nounwind this way.

Even if we eventually specify readonly calls as not throwing,
isGuaranteedToTransferExecutionToSuccessor is not the best place to
infer that.  We should instead teach FunctionAttrs or some other such
pass to tag readonly functions / calls as nounwind instead.

llvm-svn: 290794
2016-12-31 22:12:34 +00:00
Sanjoy Das 0945530d4d Avoid const_cast; NFC
llvm-svn: 290793
2016-12-31 22:12:31 +00:00
Sanjay Patel aea60846c4 [Inliner] remove unnecessary null checks from AddAlignmentAssumptions(); NFCI
We bail out on the 1st line if the assumption cache is not set, so there's
no need to check it after that.

llvm-svn: 290787
2016-12-31 17:54:05 +00:00
Sanjay Patel 7fd779f09f [ValueTracking] make dominator tree requirement explicit for isKnownNonNullFromDominatingCondition(); NFCI
I don't think this hole is currently exposed, but I crashed regression tests for
jump-threading and loop-vectorize after I added calls to isKnownNonNullAt() in
InstSimplify as part of trying to solve PR28430:
https://llvm.org/bugs/show_bug.cgi?id=28430

That's because they call into value tracking with a context instruction, but no
other parts of the query structure filled in.

For more background, see the discussion in:
https://reviews.llvm.org/D27855

llvm-svn: 290786
2016-12-31 17:37:01 +00:00
Philip Reames 0ef5d288b4 [SmallPtrSet] Introduce a find primitive and rewrite count/erase in terms of it
This was originally motivated by a compile time problem I've since figured out how to solve differently, but the cleanup seemed useful. We had the same logic - which essentially implemented find - in several places. By commoning them out, I can implement find and allow erase to be inlined at the call sites if profitable.

Differential Revision: https://reviews.llvm.org/D28183

llvm-svn: 290779
2016-12-31 02:33:22 +00:00
Dylan McKay 97cf837b46 [AVR] Optimize 16-bit ANDs with '1'
Summary: Fixes PR 31345

Reviewers: dylanmckay

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D28186

llvm-svn: 290778
2016-12-31 01:07:14 +00:00
Craig Topper d00db69227 [InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and llvm.x86.avx512.vcomi.ss don't use the upper elements of their input.
This was already done for the SSE/SSE2 version of the intrinsics.

llvm-svn: 290776
2016-12-31 00:45:06 +00:00
Craig Topper 991636312b [InstCombine][AVX-512] When turning intrinsics with masking into native IR, don't emit a select if the mask is known to be all ones.
This saves InstCombine the burden of having to optimize the select later.

llvm-svn: 290774
2016-12-30 23:06:28 +00:00
Philip Reames fac031a178 Add a comment for a todo in LoopUnroll post cleanup
llvm-svn: 290769
2016-12-30 22:10:19 +00:00
Philip Reames fdbb05b469 [LVI] Remove count/erase idiom in favor of checking result value of erase
Minor compile time win.  Avoids an additional O(N) scan in the case where we are removing an element and costs nothing when we aren't.

llvm-svn: 290768
2016-12-30 22:09:10 +00:00
Piotr Padlewski da36215017 [MemDep] Handle gep with zeros for invariant.group
Summary:
gep 0, 0 is equivalent to bitcast. LLVM canonicalizes it
to getelementptr because it make SROA can then handle it.

Simple case like

    void g(A &a) {
        z(a);
        if (glob)
            a.foo();
    }
    void testG() {
        A a;
        g(a);
    }

was not devirtualized with -fstrict-vtable-pointers because luck of
handling for gep 0 in Memory Dependence Analysis

Reviewers: dberlin, nlewycky, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28126

llvm-svn: 290763
2016-12-30 18:45:07 +00:00
Philip Reames a570a2303c [CVP] Adjust iteration order to reduce the amount of work required
CVP doesn't care about the order of blocks visited, but by using a pre-order traversal over the graph we can a) not visit unreachable blocks and b) optimize as we go so that analysis of later blocks produce slightly more precise results.

I noticed this via inspection and don't have a concrete example which points to the issue.  

llvm-svn: 290760
2016-12-30 18:00:55 +00:00
Philip Reames 1e48efcfc5 [LVI] Manually hoist computation from loop
Minor compile time win.  Not known to be a hot spot, just something I noticed while reading.

llvm-svn: 290759
2016-12-30 17:56:47 +00:00
Aaron Ballman 58a61e723e Caught a simple typo. I do not know of a way to test this, but it seems like an unlikely thing to regress in the future.
llvm-svn: 290757
2016-12-30 15:57:56 +00:00
Davide Italiano 75e39f9790 [NewGVN] Remove unneeded newline from assertion message.
llvm-svn: 290755
2016-12-30 15:01:17 +00:00
David Majnemer 5ec5f278c9 [InstCombine] Address post-commit feedback
llvm-svn: 290741
2016-12-30 03:36:17 +00:00
Kostya Serebryany 11a22bc39d [libFuzzer] cleaner implementation of -print_pcs=1
llvm-svn: 290739
2016-12-30 01:13:07 +00:00
Michael Kuperstein 76e06c8858 [LICM] When promoting scalars, allow inserting stores to thread-local allocas.
This is similar to the allocfn case - if an alloca is not captured, then it's
necessarily thread-local.

Differential Revision: https://reviews.llvm.org/D28170

llvm-svn: 290738
2016-12-30 01:03:17 +00:00
Dehao Chen cc76344ef5 Use continuous boosting factor for complete unroll.
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.

The patch also simplified the code by reducing one parameter in UP.

The patch only affects code-gen of two speccpu2006 benchmark:

445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.

Reviewers: mzolotukhin, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26989

llvm-svn: 290737
2016-12-30 00:50:28 +00:00
Michael Kuperstein 4a86a1921a [LICM] Remove unneeded tracking of whether changes were made. NFC.
"Changed" doesn't actually change within the loop, so there's
no reason to keep track of it - we always return false during
analysis and true after the transformation is made.

llvm-svn: 290735
2016-12-30 00:43:22 +00:00
Michael Kuperstein 62b98c3977 [LICM] Make logic in promoteLoopAccessesToScalars easier to follow. NFC.
llvm-svn: 290734
2016-12-30 00:39:00 +00:00
David Majnemer a1cfd7c5f8 [InstCombine] More thoroughly canonicalize the position of zexts
We correctly canonicalized (add (sext x), (sext y)) to (sext (add x, y))
where possible.  However, we didn't perform the same canonicalization
for zexts or for muls.

llvm-svn: 290733
2016-12-30 00:28:58 +00:00
Dylan McKay 453d042969 [AVR] Optimize 16-bit ORs with '0'
Summary: Fixes PR 31344

Authored by Anmol P. Paralkar

Reviewers: dylanmckay

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D28121

llvm-svn: 290732
2016-12-30 00:21:56 +00:00
Reid Kleckner 0e7c84c682 Simplify FunctionLoweringInfo.cpp with range for loops
I'm preparing to add some pattern matching code here, so simplify the
code before I do. NFC

llvm-svn: 290731
2016-12-30 00:21:38 +00:00
Reid Kleckner e8ee89f8b0 Include <algorithm> for std::max etc
llvm-svn: 290730
2016-12-30 00:15:40 +00:00
Michael Kuperstein ff36baefe7 [LICM] Compute exit blocks for promotion eagerly. NFC.
This moves the exit block and insertion point computation to be eager,
instead of after seeing the first scalar we can promote.

The cost is relatively small (the computation happens anyway, see discussion
on D28147), and the code is easier to follow, and can bail out earlier
if there's a catchswitch present.

llvm-svn: 290729
2016-12-29 23:11:19 +00:00
Michael Kuperstein 5566092963 [LICM] Don't try to promote in loops where we have no chance to promote. NFC.
We would check whether we have a prehader *or* dedicated exit blocks,
and go into the promotion loop. Then, for each alias set we'd check
if we have a preheader *and* dedicated exit blocks, and bail if not.

Instead, bail immediately if we don't have both.

llvm-svn: 290728
2016-12-29 22:51:22 +00:00
Michael Kuperstein b6da9cf3b7 [LICM] Only recompute LCSSA when we actually promoted something.
We want to recompute LCSSA only when we actually promoted a value.
This means we only need to look at changes made by promotion when
deciding whether to recompute it or not, not at regular sinking/hoisting.

(This was what the code was documented as doing, just not what it did)

Hopefully NFC.

llvm-svn: 290726
2016-12-29 22:37:13 +00:00
Daniel Berlin e0bd37e78f NewGVN: Fix PR 31491 by ensuring that we touch the right instructions. Change to one based numbering so we can assert we don't cause the same bug again.
llvm-svn: 290724
2016-12-29 22:15:12 +00:00
Justin Lebar 175ab74dc5 [ADT] Delete RefCountedBaseVPTR.
Summary:
This class is unnecessary.

Its comment indicated that it was a compile error to allocate an
instance of a class that inherits from RefCountedBaseVPTR on the stack.
This may have been true at one point, but it's not today.

Moreover you really do not want to allocate *any* refcounted object on
the stack, vptrs or not, so if we did have a way to prevent these
objects from being stack-allocated, we'd want to apply it to regular
RefCountedBase too, obviating the need for a separate RefCountedBaseVPTR
class.

It seems that the main way RefCountedBaseVPTR provides safety is by
making its subclass's destructor virtual.  This may have been helpful at
one point, but these days clang will emit an error if you define a class
with virtual functions that inherits from RefCountedBase but doesn't
have a virtual destructor.

Reviewers: compnerd, dblaikie

Subscribers: cfe-commits, klimek, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D28162

llvm-svn: 290717
2016-12-29 19:59:26 +00:00
Reid Kleckner cd46c1df80 Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"
This reverts commit r290694. It broke sanitizer tests on Win64. I'll
probably bring this back, but the jump tables will just live in .text
like they do for MSVC.

llvm-svn: 290714
2016-12-29 17:07:10 +00:00
Sanjoy Das 00d76a5754 [TBAAVerifier] Be stricter around verifying scalar nodes
This fixes the issue exposed in PR31393, where we weren't trying
sufficiently hard to diagnose bad TBAA metadata.

This does reduce the variety in the error messages we print out, but I
think the tradeoff of verifying more, simply and quickly overrules the
need for more helpful error messags here.

llvm-svn: 290713
2016-12-29 15:47:05 +00:00
Sanjoy Das 600d2a5a6b [TBAAVerifier] Make things const-consistent; NFC
llvm-svn: 290712
2016-12-29 15:47:01 +00:00
Sanjoy Das 55f12d9de9 [TBAAVerifier] Memoize validity of scalar tbaa nodes; NFCI
llvm-svn: 290711
2016-12-29 15:46:57 +00:00
Artem Tamazov 25478d821b [AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directive
Among other stuff, this allows to use predefined .option.machine_version_major
/minor/stepping symbols in the directive.

Relevant test expanded at once (also file renamed for clarity).

Differential Revision: https://reviews.llvm.org/D28140

llvm-svn: 290710
2016-12-29 15:41:52 +00:00
Igor Laevsky 4f31e52f94 Introduce element-wise atomic memcpy intrinsic
This change adds a new intrinsic which is intended to provide memcpy functionality
with additional atomicity guarantees. Please refer to the review thread
or language reference for further details.

Differential Revision: https://reviews.llvm.org/D27133

llvm-svn: 290708
2016-12-29 14:31:07 +00:00
Craig Topper 17b5568bc7 [InstCombine] Use getVectorNumElements instead of explicitly casting to VectorType and calling getNumElements. NFC
llvm-svn: 290707
2016-12-29 07:03:18 +00:00
Craig Topper 62f06e241b [InstCombine] Fix typo in comment. NFC
llvm-svn: 290706
2016-12-29 05:38:31 +00:00
Craig Topper 2e18bcfc60 [InstCombine] Use a 32-bits instead of 64-bits for storing the number of elements in VectorType for a ShuffleVector. While there getVectorNumElements to avoid an explicit cast. NFC
llvm-svn: 290705
2016-12-29 04:24:32 +00:00
Craig Topper 1a8a3377cc [InstCombine][X86] If the lowest element of a scalar intrinsic isn't used make sure we add it to the worklist so we can DCE it sooner.
We bypassed the intrinsic and returned the passthru operand, but we should also add the intrinsic to the worklist since its now dead. This can allow DCE to find it sooner and remove it. Similar was done for InsertElement when the inserted element isn't demanded.

llvm-svn: 290704
2016-12-29 03:30:17 +00:00
Kostya Serebryany d723804fa2 [libFuzzer] make __sanitizer_cov_trace_switch more predictable
llvm-svn: 290703
2016-12-29 02:50:35 +00:00
Daniel Berlin 6658cc9ead NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.
Summary:
The optimal iteration order for this problem is RPO order. We want to
process as many preds of a backedge as we can before we process the
backedge.

At the same time, as we add predicate handling, we want to be able to
touch instructions that are dominated by a given block by
ranges (because a change in value numbering a predicate possibly
affects all users we dominate that are using that predicate).
If we don't do it this way, we can't do value inference over
backedges (the paper covers this in depth).

The newgvn branch currently overshoots the last part, and guarantees
that it will touch *at least* the right set of instructions, but it
does touch more.  This is because the bitvector instruction ranges are
currently generated in RPO order (so we take the max and the min of
the ranges of dominated blocks, which means there are some in the
middle we didn't have to touch that we did).

We can do better by sorting the dominator tree, and then just using
dominator tree order.

As a preliminary, the dominator tree has some RPO guarantees, but not
enough. It guarantees that for a given node, your idom must come
before you in the RPO ordering. It guarantees no relative RPO ordering
for siblings.  We add siblings in whatever order they appear in the module.

So that is what we fix.

We sort the children array of the domtree into RPO order, and then use
the dominator tree for ordering, instead of RPO, since the dominator
tree is now a valid RPO ordering.

Note: This would help any other pass that iterates a forward problem
in dominator tree order.  Most of them are single pass.  It will still
maximize whatever result they compute.  We could also build the
dominator tree in this order, but our incremental updates would still
put it out of sort order, and recomputing the sort order is almost as
hard as general incremental updates of the domtree.

Also note that the sorting does not affect any tests, etc. Nothing
depends on domtree order, including the verifier, the equals
functions for domtree nodes, etc.

How much could this matter, you ask?
Here are the current numbers.
This is generated by running NewGVN over all files in LLVM.

Note that once we propagate equalities, the differences go up by an
order of magnitude or two (IE instead of 29, the max ends up in the
thousands, since the worst case we add a factor of N, where N is the
number of branch predicates).  So while it doesn't look that stark for
the default ordering, it gets *much much* worse.  There are also
programs in the wild where the difference is already pretty stark
(2 iterations vs hundreds).

RPO ordering:
759040 Number of iterations is 1
112908 Number of iterations is 2

Default dominator tree ordering:
755081 Number of iterations is 1
116234 Number of iterations is 2
   603 Number of iterations is 3
    27 Number of iterations is 4
     2 Number of iterations is 5
     1 Number of iterations is 7

Dominator tree sorted:
759040 Number of iterations is 1
112908 Number of iterations is 2
<yay!>

Really bad ordering (sort domtree siblings in postorder. not quite the
worst possible, but yeah):
754008 Number of iterations is 1
    21 Number of iterations is 10
     8 Number of iterations is 11
     6 Number of iterations is 12
     5 Number of iterations is 13
     2 Number of iterations is 14
     2 Number of iterations is 15
     3 Number of iterations is 16
     1 Number of iterations is 17
     2 Number of iterations is 18
 96642 Number of iterations is 2
     1 Number of iterations is 20
     2 Number of iterations is 21
     1 Number of iterations is 22
     1 Number of iterations is 29
 17266 Number of iterations is 3
  2598 Number of iterations is 4
   798 Number of iterations is 5
   273 Number of iterations is 6
   186 Number of iterations is 7
    80 Number of iterations is 8
    42 Number of iterations is 9

Reviewers: chandlerc, davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28129

llvm-svn: 290699
2016-12-29 01:12:36 +00:00
Reid Kleckner e9c8d7f87b Add a static_assert about the sizeof(GlobalValue)
I added one for Value back in r262045, and I'm starting to think we
should have these for any class with bitfields whose memory efficiency
really matters.

llvm-svn: 290698
2016-12-29 00:55:51 +00:00
Daniel Berlin 7ad1ea0984 Update equalsStoreHelper for the fact that only one branch can be true
llvm-svn: 290697
2016-12-29 00:49:32 +00:00
Reid Kleckner c9e0a153cf [COFF] Use 32-bit jump table entries in .rdata for Win64
Summary:
We were already using 32-bit jump table entries, but this was a
consequence of the default PIC model on Win64, and not an intentional
design decision. This patch ensures that we always use 32-bit label
difference jump table entries on Win64 regardless of the PIC model. This
is a good idea because it saves executable size and object file size.

Moving the jump tables to .rdata cleans up the disassembled object code
and reduces the available ROP targets, but it requires adding one more
RIP-relative lea to the code.  COFF doesn't have relocations to express
the difference between two arbitrary symbols, so we can't use the jump
table label in the label difference like we do elsewhere.

Fixes PR31488

Reviewers: majnemer, compnerd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28141

llvm-svn: 290694
2016-12-29 00:12:39 +00:00
Mehdi Amini 5022bb7238 Change Metadata Index emission in the bitcode to use 2x32 bits for the placeholder
The Bitstream reader and writer are limited to handle a "size_t" at
most, which means that we can't backpatch and read back a 64bits
value on 32 bits platform.

llvm-svn: 290693
2016-12-28 23:45:54 +00:00
Piotr Padlewski 6c37d298d9 Revert "[NewGVN] replace emplace_back with push_back"
llvm-svn: 290692
2016-12-28 23:24:02 +00:00
Justin Lebar 291abd3ebb Speed up Function::isIntrinsic() by adding a bit to GlobalValue. NFC
Summary:
Previously isIntrinsic() called getName().  This involves a hashtable
lookup, so is nontrivially expensive.  And isIntrinsic() is called
frequently, particularly by dyn_cast<IntrinsicInstr>.

This patch steals a bit of IntID and uses that to store whether or not
getName() starts with "llvm."

Reviewers: bogner, arsenm, joker-eph

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D22949

llvm-svn: 290691
2016-12-28 22:59:45 +00:00
Mehdi Amini e98f925834 Add an index for Module Metadata record in the bitcode
This index record the position for each metadata record in
the bitcode, so that the reader will be able to lazy-load
on demand each individual record.

We also make sure that every abbrev is emitted upfront so
that the block can be skipped while reading.

I don't plan to commit this before having the reader
counterpart, but I figured this can be reviewed mostly
independently.

Recommit r290684 (was reverted in r290686 because a test
was broken) after adding a threshold to avoid emitting
the index when unnecessary (little amount of metadata).
This optimization "hides" a limitation of the ability
to backpatch in the bitstream: we can only backpatch
safely when the position has been flushed. So if we emit
an index for one metadata, it is possible that (part of)
the offset placeholder hasn't been flushed and the backpatch
will fail.

Differential Revision: https://reviews.llvm.org/D28083

llvm-svn: 290690
2016-12-28 22:30:28 +00:00
Saleem Abdulrasool 2b59eca1f7 Revert "Add an index for Module Metadata record in the bitcode"
This reverts commit a0ca6ae2d38339e4ede0dfa588086fc23d87e836.  Revert at
Mehdi's request as it is breaking bots.

llvm-svn: 290686
2016-12-28 20:37:22 +00:00
Piotr Padlewski 629a7f2cc0 [NewGVN] replace emplace_back with push_back
emplace_back is not faster if it is equivalent to push_back. In this cases emplaced value had the
same type that the one stored in container. It is ugly and it might be even slower (see
Scott Meyers presentation about emplacement).

llvm-svn: 290685
2016-12-28 20:36:08 +00:00
Mehdi Amini 32ca148198 Add an index for Module Metadata record in the bitcode
Summary:
This index record the position for each metadata record in
the bitcode, so that the reader will be able to lazy-load
on demand each individual record.

We also make sure that every abbrev is emitted upfront so
that the block can be skipped while reading.

I don't plan to commit this before having the reader
counterpart, but I figured this can be reviewed mostly
independently.

Reviewers: pcc, tejohnson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28083

llvm-svn: 290684
2016-12-28 19:44:19 +00:00
Piotr Padlewski 26dada79ff [NewGVN] Simplyfy loop NFC
llvm-svn: 290683
2016-12-28 19:42:49 +00:00
Mehdi Amini cc7fbf718d [ThinLTO] Honor -O{0,1,2,4} passed through the libLTO interface for ThinLTO
This was hardcoded to be O3 till now, without any way to change it
without changing the code.

llvm-svn: 290682
2016-12-28 19:37:16 +00:00
Piotr Padlewski e4047b89ad [NewGVN] replace typedefs with usings
llvm-svn: 290680
2016-12-28 19:29:26 +00:00
Piotr Padlewski fc5727b2a2 [NewGVN] NFC fixes
llvm-svn: 290679
2016-12-28 19:17:17 +00:00
Reid Kleckner 92647369fc [WinEH] Don't assume endFunction is called while in .text
Jump table emission can switch to .rdata before
WinException::endFunction gets called. Just remember the appropriate
text section we started in and reset back to it when we end the
function. We were already switching sections back from .xdata anyway.

Fixes the first problem in PR31488, so that now COFF switch tables can
live in .rdata if we want them to.

llvm-svn: 290678
2016-12-28 19:05:12 +00:00
Davide Italiano 0e71480523 [NewGVN] Global sweep replacing NULL with nullptr. NFCI.
llvm-svn: 290670
2016-12-28 14:00:11 +00:00
Davide Italiano 0fb3c7cde5 [NewGVN] Remove redundant code. NFCI.
llvm-svn: 290669
2016-12-28 13:54:16 +00:00
Davide Italiano b111409015 [NewGVN] equals() for loads/stores is the same. Unify.
Differential Revision:  https://reviews.llvm.org/D28116

llvm-svn: 290667
2016-12-28 13:37:17 +00:00
Chandler Carruth 05ca5acc9e [PM] Introduce a devirtualization iteration layer for the new PM.
This is an orthogonal and separated layer instead of being embedded
inside the pass manager. While it adds a small amount of complexity, it
is fairly minimal and the composability and control seems worth the
cost.

The logic for this ends up being nicely isolated and targeted. It should
be easy to experiment with different iteration strategies wrapped around
the CGSCC bottom-up walk using this kind of facility.

The mechanism used to track devirtualization is the simplest one I came
up with. I think it handles most of the cases the existing iteration
machinery handles, but I haven't done a *very* in depth analysis. It
does however match the basic intended semantics, and we can tweak or
tune its exact behavior incrementally as necessary. One thing that we
may want to revisit is freshly building the value handle set on each
iteration. While I don't think this will be a significant cost (it is
strictly fewer value handles but more churn of value handes than the old
call graph), it is conceivable that we'll want a somewhat more clever
tracking mechanism. My hope is to layer that on as a follow up patch
with data supporting any implementation complexity it adds.

This code also provides for a basic count heuristic: if the number of
indirect calls decreases and the number of direct calls increases for
a given function in the SCC, we assume devirtualization is responsible.
This matches the heuristics currently used in the legacy pass manager.

Differential Revision: https://reviews.llvm.org/D23114

llvm-svn: 290665
2016-12-28 11:07:33 +00:00
Chandler Carruth 443e57e01d [PM] Teach the CGSCC's CG update utility to more carefully invalidate
analyses when we're about to break apart an SCC.

We can't wait until after breaking apart the SCC to invalidate things:
1) Which SCC do we then invalidate? All of them?
2) Even if we invalidate all of them, a newly created SCC may not have
   a proxy that will convey the invalidation to functions!

Previously we only invalidated one of the SCCs and too late. This led to
stale analyses remaining in the cache. And because the caching strategy
actually works, they would get used and chaos would ensue.

Doing invalidation early is somewhat pessimizing though if we *know*
that the SCC structure won't change. So it turns out that the design to
make the mutation API force the caller to know the *kind* of mutation in
advance was indeed 100% correct and we didn't do enough of it. So this
change also splits two cases of switching a call edge to a ref edge into
two separate APIs so that callers can clearly test for this and take the
easy path without invalidating when appropriate. This is particularly
important in this case as we expect most inlines to be between functions
in separate SCCs and so the common case is that we don't have to so
aggressively invalidate analyses.

The LCG API change in turn needed some basic cleanups and better testing
in its unittest. No interesting functionality changed there other than
more coverage of the returned sequence of SCCs.

While this seems like an obvious improvement over the current state, I'd
like to revisit the core concept of invalidating within the CG-update
layer at all. I'm wondering if we would be better served forcing the
callers to handle the invalidation beforehand in the cases that they
can handle it. An interesting example is when we want to teach the
inliner to *update and preserve* analyses. But we can cross that bridge
when we get there.

With this patch, the new pass manager an build all of the LLVM test
suite at -O3 and everything passes. =D I haven't bootstrapped yet and
I'm sure there are still plenty of bugs, but this gives a nice baseline
so I'm going to increasingly focus on fleshing out the missing
functionality, especially the bits that are just turned off right now in
order to let us establish this baseline.

llvm-svn: 290664
2016-12-28 10:34:50 +00:00
Gadi Haber 19c4fc5e62 This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible.
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.

Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky 
Differential Revision: https://reviews.llvm.org/D27901

llvm-svn: 290663
2016-12-28 10:12:48 +00:00
Chandler Carruth 9900d18bab [PM] Teach the inliner's call graph update to handle inserting new edges
when they are call edges at the leaf but may (transitively) be reached
via ref edges.

It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.

Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.

llvm-svn: 290649
2016-12-28 03:13:12 +00:00
Craig Topper 28ec3460e4 [InstCombine] Remove a piece of a comment that said that InstCombiner contains pass infrastructure. That hasn't been true since r226618. NFC
llvm-svn: 290648
2016-12-28 03:12:42 +00:00
Chandler Carruth c6334579e9 [LCG] Teach the ref edge removal to handle a ref edge that is trivial
due to a call cycle.

This actually crashed the ref removal before.

I've added a unittest that covers this kind of interesting graph
structure and mutation.

llvm-svn: 290645
2016-12-28 02:24:58 +00:00
Chandler Carruth e635289ee2 [PM] Disable the loop vectorizer from the new PM's pipeline as it
currenty relies on the old PM's dependency system forming LCSSA.

The new PM will require a different design for this, and for now this is
causing most of the issues I'm currently seeing in testing. I'd like to
get to a testable baseline and then work on re-enabling things one at
a time.

llvm-svn: 290644
2016-12-28 02:24:55 +00:00
Michael Kuperstein cd7ad7130f [InstCombine] Canonicalize insert splat sequences into an insert + shuffle
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.

This fixes PR31286.

Differential Revision: https://reviews.llvm.org/D27992

llvm-svn: 290641
2016-12-28 00:18:08 +00:00
Kostya Serebryany 2a8440df70 [libFuzzer] add an experimental flag -experimental_len_control=1 that sets max_len to 1M and tries to increases the actual max sizes of mutations very gradually (second attempt)
llvm-svn: 290637
2016-12-27 23:24:55 +00:00
Kostya Serebryany 8d75c78d4c [libFuzzer] don't create large random mutations when given an empty seed
llvm-svn: 290634
2016-12-27 22:15:04 +00:00
Kostya Serebryany f24e52c0c2 [sanitizer-coverage] sort the switch cases
llvm-svn: 290628
2016-12-27 21:20:06 +00:00
Kostya Serebryany 823c18147d [libFuzzer] fix UB and simplify the computation of the RNG seed (https://llvm.org/bugs/show_bug.cgi?id=31456)
llvm-svn: 290622
2016-12-27 19:51:34 +00:00
Chandler Carruth e14524ca30 [PM] Teach MemDep to invalidate its result object when its cached
analysis handles become invalid.

Add a test case for its invalidation logic.

llvm-svn: 290620
2016-12-27 19:33:04 +00:00
Saleem Abdulrasool 1799567f12 ASMParser: use range-based for loops (NFC)
Convert the verify method to use a few more range based for loops,
converting to const iterators in the process.

llvm-svn: 290617
2016-12-27 18:35:22 +00:00
Davide Italiano b222549dc5 [NewGVN] Simplify a bit removing else after return. NFCI.
llvm-svn: 290615
2016-12-27 18:15:39 +00:00
Chandler Carruth 56fe48b7e4 [PM] Remove a pointless optimization.
There is no need to do this within an analysis. That method shouldn't
even be reached if this predicate holds as the actual useful
optimization is in the analysis manager itself.

llvm-svn: 290614
2016-12-27 18:04:11 +00:00
Bryant Wong 7cb744621b [MemCpyOpt] Don't sink LoadInst below possible clobber.
Differential Revision: https://reviews.llvm.org/D26811

llvm-svn: 290611
2016-12-27 17:58:12 +00:00
Teresa Johnson e0ee5cf7c8 [ThinLTO] Fix "||" vs "|" mixup.
The effect of the bug was that we would incorrectly create summaries
for global and weak values defined in module asm (since we were
essentially testing for bit 1 which is SF_Undefined, and the
RecordStreamer ignores local undefined references). This would have
resulted in conservatively disabling importing of anything referencing
globals and weaks defined in module asm. Added these cases to the test
which now fails without this bug fix.

Fixes PR31459.

llvm-svn: 290610
2016-12-27 17:45:09 +00:00
Chad Rosier 2ff37b8615 [AArch64][AsmParser] Add support for parsing shift/extend operands with symbols.
Differential Revision: https://reviews.llvm.org/D27953

llvm-svn: 290609
2016-12-27 16:58:09 +00:00
Artem Tamazov a01cce8887 [AMDGPU][llvm-mc] Predefined symbols to access register counts (.kernel.{v|s}gpr_count)
The feature allows for conditional assembly, filling the entries
of .amd_kernel_code_t etc.

Symbols are defined with value 0 at the beginning of each kernel scope.
After each register usage, the respective symbol is set to:
	value = max( value, ( register index + 1 ) )
Thus, at the end of scope the value represents a count of used registers.

Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the
next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also
dummy scope that lies from the beginning of source file til the
first .amdgpu_hsa_kernel.

Test added.

Differential Revision: https://reviews.llvm.org/D27859

llvm-svn: 290608
2016-12-27 16:00:11 +00:00
Piotr Padlewski 2202aa9765 [MemDep] Operand visited twice bugfix
Because operand was not marked as seen it was visited twice.
It doesn't change behavior of optimization, it just saves redudant
visit, so no test changes.

llvm-svn: 290607
2016-12-27 15:06:07 +00:00
Eugene Leviant 5240a305a4 RuntimeDyldELF: refactor AArch64 relocations. NFC.
llvm-svn: 290606
2016-12-27 13:33:32 +00:00
Chandler Carruth aa35167578 [PM] Teach BasicAA how to invalidate its result object.
This requires custom handling because BasicAA caches handles to other
analyses and so it needs to trigger indirect invalidation.

This fixes one of the common crashes when using the new PM in real
pipelines. I've also tweaked a regression test to check that we are at
least handling the most immediate case.

I'm going to work at re-structuring this test some to both scale better
(rather than all being in one file) and check more invalidation paths in
a follow-up commit, but I wanted to get the basic bug fix in place.

llvm-svn: 290603
2016-12-27 10:30:45 +00:00
Eugene Leviant 687d4024b5 Attempt to fix build bot after r290597
llvm-svn: 290602
2016-12-27 10:24:58 +00:00
Chandler Carruth 81c8edaf5c [PM] Disable more of the loop passes -- LCSSA and LoopSimplify are also
not really wired into the loop pass manager in a way that will let us
productively use these passes yet.

This lets the new PM get farther in basic testing which is useful for
establishing a good baseline of "doesn't explode". There are still
plenty of crashers in basic testing though, this just gets rid of some
noise that is well understood and not representing a specific or narrow
bug.

llvm-svn: 290601
2016-12-27 10:16:46 +00:00
Sam Kolton e66365e07d [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructions
Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28051

llvm-svn: 290599
2016-12-27 10:06:42 +00:00
Eugene Leviant 920908352a RuntimeDyldELF: add R_AARCH64_ADD_ABS_LO12_NC reloc
Differential revision: https://reviews.llvm.org/D28115

llvm-svn: 290598
2016-12-27 09:51:38 +00:00
Eugene Leviant c089e406b9 Allow setting multiple debug types
Differential revision: https://reviews.llvm.org/D28109

llvm-svn: 290597
2016-12-27 09:31:20 +00:00
Daniel Berlin 1f31fe529e Change a std::vector to SmallVector in NewGVN
llvm-svn: 290596
2016-12-27 09:20:36 +00:00
Chandler Carruth 17c630a09c [PM] Teach the AAManager and AAResults layer (the worst offender for
inter-analysis dependencies) to use the new invalidation infrastructure.

This teaches it to invalidate itself when any of the peer function
AA results that it uses become invalid. We do this by just tracking the
originating IDs. I've kept it in a somewhat clunky API since some users
of AAResults are outside the new PM right now. We can clean this API up
if/when those users go away.

Secondly, it uses the registration on the outer analysis manager proxy
to trigger deferred invalidation when a module analysis result becomes
invalid.

I've included test cases that specifically try to trigger use-after-free
in both of these cases and they would crash or hang pretty horribly for
me even without ASan. Now they work nicely.

The `InvalidateAnalysis` utility pass required some tweaking to be
useful in this context and it still is pretty garbage. I'd like to
switch it back to the previous implementation and teach the explicit
invalidate method on the AnalysisManager to take care of correctly
triggering indirect invalidation, but I wanted to go ahead and send this
out so folks could see how all of this stuff works together in practice.
And, you know, that it does actually work. =]

Differential Revision: https://reviews.llvm.org/D27205

llvm-svn: 290595
2016-12-27 08:44:39 +00:00
Chandler Carruth ba90ae969c [PM] Introduce the facilities for registering cross-IR-unit dependencies
that require deferred invalidation.

This handles the other real-world invalidation scenario that we have
cases of: a function analysis which caches references to a module
analysis. We currently do this in the AA aggregation layer and might
well do this in other places as well.

Since this is relative rare, the technique is somewhat more cumbersome.
Analyses need to register themselves when accessing the outer analysis
manager's proxy. This proxy is already necessarily present to allow
access to the outer IR unit's analyses. By registering here we can track
and trigger invalidation when that outer analysis goes away.

To make this work we need to enhance the PreservedAnalyses
infrastructure to support a (slightly) more explicit model for "sets" of
analyses, and allow abandoning a single specific analyses even when
a set covering that analysis is preserved. That allows us to describe
the scenario of preserving all Function analyses *except* for the one
where deferred invalidation has triggered.

We also need to teach the invalidator API to support direct ID calls
instead of always going through a template to dispatch so that we can
just record the ID mapping.

I've introduced testing of all of this both for simple module<->function
cases as well as for more complex cases involving a CGSCC layer.

Much like the previous patch I've not tried to fully update the loop
pass management layer because that layer is due to be heavily reworked
to use similar techniques to the CGSCC to handle updates. As that
happens, we'll have a better testing basis for adding support like this.

Many thanks to both Justin and Sean for the extensive reviews on this to
help bring the API design and documentation into a better state.

Differential Revision: https://reviews.llvm.org/D27198

llvm-svn: 290594
2016-12-27 08:40:39 +00:00
Craig Topper e77e901130 [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load folding tables.
llvm-svn: 290591
2016-12-27 06:51:09 +00:00
Chandler Carruth 141bf5d14d [PM] Add one of the features left out of the initial inliner patch:
skipping indirectly recursive inline chains.

To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.

This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.

llvm-svn: 290590
2016-12-27 06:46:20 +00:00
George Burgess IV ed16024a9b [Analysis] Ignore `nobuiltin` on `allocsize` function calls.
We currently ignore the `allocsize` attribute on functions calls with
the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We
shouldn't care about `nobuiltin` here: `allocsize` is explicitly added
by the user, not inferred based on a function's symbol.

llvm-svn: 290588
2016-12-27 06:32:14 +00:00
George Burgess IV ce04489515 [Analysis] Refactor as promised in r290397.
This also makes us no longer check for `allocsize` on intrinsic calls.
This shouldn't matter, since intrinsics should provide the information
we get from `allocsize` on their own.

llvm-svn: 290585
2016-12-27 06:10:50 +00:00
Craig Topper 2da265b7bf [AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them to unmasked intrinsics plus a select.
llvm-svn: 290583
2016-12-27 05:30:14 +00:00
Craig Topper 72f2d4e8d6 [InstCombine][X86] Add DemandedElts support for 512-bit PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

This builds on r290554 which added supported for 128 and 256-bit.

llvm-svn: 290582
2016-12-27 05:30:09 +00:00
Craig Topper 89b3e0223f [AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can add them to InstCombine with the 128 and 256 bit versions.
The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.

llvm-svn: 290573
2016-12-27 03:46:05 +00:00
Chandler Carruth 03130d981c [PM] Teach the inliner in the new PM to merge attributes after inlining.
Also enable the new PM in the attributes test case which caught this
issue.

llvm-svn: 290572
2016-12-27 03:39:54 +00:00
Craig Topper 7f8540b5e7 [AVX-512][InstCombine] Teach InstCombine to turn masked scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.

llvm-svn: 290566
2016-12-27 01:56:30 +00:00
Craig Topper 83f2145c18 [AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div into masked instructions.
llvm-svn: 290564
2016-12-27 01:56:24 +00:00
Chandler Carruth 0ee8bb11c3 [PM] Move the collection of call sites to a more appropriate place
inside of `InlineFunction`. Prior to this, call instructions are
specifically being rewritten and replaced within the inlined region,
invalidating some of the call sites.

Several of these regions are using the same technique to walk the
inlined region so this seems clearly safe up to this point.

I've also added a short circuit to the scan for call sites based on what
other code is doing.

With this, the most common crash I've found in the new inliner code is
fixed. I've turned it on for another test case that covers this
scenario.

I'll make my way through most of the other inliner test cases
just to get some easy coverage next.

llvm-svn: 290562
2016-12-27 01:24:50 +00:00
Craig Topper 020b228155 [AVX-512][InstCombine] Teach InstCombine to turn packed add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
llvm-svn: 290559
2016-12-27 00:23:16 +00:00
Chandler Carruth 6e9bb7e064 [PM] Teach the always inliner in the new pass manager to support
removing fully-dead comdats without removing dead entries in comdats
with live members.

This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.

I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).

llvm-svn: 290557
2016-12-26 23:43:27 +00:00
Simon Pilgrim c9cf7fc7a4 [InstCombine][X86] Add DemandedElts support for PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

Differential Revision: https://reviews.llvm.org/D28119

llvm-svn: 290554
2016-12-26 23:28:17 +00:00
Daniel Berlin 85f91b0ec3 clang-format NewGVN files
llvm-svn: 290551
2016-12-26 20:06:58 +00:00
Daniel Berlin 85cbc8c097 Misc cleanups and simplifications for NewGVN.
Mostly use a bit more idiomatic C++ where we can,
so we can combine some things later.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28111

llvm-svn: 290550
2016-12-26 19:57:25 +00:00
Daniel Berlin d59e8010c5 Don't use our own incorrect version of isTriviallyDeadInstruction in NewGVN. Fixes PR/31472
llvm-svn: 290549
2016-12-26 18:44:36 +00:00
Davide Italiano fe7a3ee51e [NewGVN] Add a flag to enable the pass via `-mllvm`.
NewGVN can be tested passing `-mllvm -enable-newgvn` to clang.

Differential Revision:  https://reviews.llvm.org/D28059

llvm-svn: 290548
2016-12-26 18:26:19 +00:00
Davide Italiano a312ca845c [NewGVN] Fold lookupOperandLeader() when there's only one use. NFCI.
llvm-svn: 290543
2016-12-26 16:19:34 +00:00
Bryant Wong b5e03b61e2 [InstCombiner] Simplify lib calls to `round{,f}`
Differential Revision: https://reviews.llvm.org/D28110

llvm-svn: 290542
2016-12-26 14:29:29 +00:00
Craig Topper 5ef13ba18b [AVX-512] Fix some patterns to use extended register classes.
llvm-svn: 290536
2016-12-26 07:26:07 +00:00
Craig Topper 7b788ada2d [AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.

I'll do the same thing for packed add/sub/mul/div in a future patch.

Reviewers: delena, RKSimon, zvi, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27879

llvm-svn: 290535
2016-12-26 06:33:19 +00:00
Craig Topper f56d985f77 [AVX-512] Don't assume that the rounding mode argument to intrinsics is a constant. While clang will guarantee this, nothing in the backend will.
A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering.

llvm-svn: 290532
2016-12-26 01:40:17 +00:00
Craig Topper e328045711 [AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.

We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.

Reviewers: zvi, delena, spatel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27825

llvm-svn: 290530
2016-12-25 23:58:57 +00:00
Bryant Wong 4213d94142 [MemorySSA] Define a restricted upward AccessList splice.
Differential Revision: https://reviews.llvm.org/D26661

llvm-svn: 290527
2016-12-25 23:34:07 +00:00
Bryant Wong a07d9b1460 [AliasAnalysis] Teach BasicAA about memcpy.
Differential Revision: https://reviews.llvm.org/D27034

llvm-svn: 290526
2016-12-25 22:42:27 +00:00
Daniel Berlin d7c12ee54c Value number stores and memory states so we can detect when memory states are equivalent (IE store of same value to memory).
Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28084

llvm-svn: 290525
2016-12-25 22:23:49 +00:00
Daniel Berlin 65f5f0d728 Rename GVNExpression *ops_ members to *op_* to match conventions in the rest of LLVM
llvm-svn: 290524
2016-12-25 22:10:37 +00:00
Lang Hames c9d0ff1302 [Orc][RPC] Add a ParallelCallGroup utility for dispatching and waiting on
multiple asynchronous RPC calls.

ParallelCallGroup allows multiple asynchronous calls to be dispatched,
and provides a wait method that blocks until all asynchronous calls have
been executed on the remote and all return value handlers run on the
local machine.

This will allow, for example, the JIT client to issue memory allocation calls
for all sections in parallel, then block until all memory has been allocated
on the remote and the allocated addresses registered with the client, at which
point the JIT client can proceed to applying relocations.

llvm-svn: 290523
2016-12-25 21:55:05 +00:00
Michael Zuckerman 86602e85dd revert commit 290516
llvm-svn: 290517
2016-12-25 12:45:18 +00:00
Michael Zuckerman 45aa420640 Commit try added new empty line
llvm-svn: 290516
2016-12-25 12:01:34 +00:00
Amjad Aboud 7faeecc8f7 [DebugInfo] Added support for Checksum debug info feature.
Differential Revision: https://reviews.llvm.org/D27642

llvm-svn: 290514
2016-12-25 10:12:09 +00:00
Mehdi Amini 690952d15e MetadataLoader: replace the tracking of ForwardReferences and UnresolvedNodes with a set-based solution (NFC)
This makes it explicit what is the exact list to handle, and it
looks much more easy to manipulate and understand that the
previous custom tracking of min/max to express the range where
to look for.

Differential Revision: https://reviews.llvm.org/D28089

llvm-svn: 290507
2016-12-25 04:22:54 +00:00
Mehdi Amini 4f90ee0010 MetadataLoader: add an extra assertion in Placeholders flush (NFC)
We don't expect any forward reference at this point.

llvm-svn: 290506
2016-12-25 03:55:53 +00:00
Davide Italiano 463c32eaf6 [NewGVN] Prefer `auto` to explicit type when the latter is obvious.
llvm-svn: 290499
2016-12-24 17:17:21 +00:00
Simon Pilgrim 0d66d29678 [SelectionDAG] Early out from computeKnownBits when we know we will have no common bits.
Avoid extra (recursive) calls to computeKnownBits if we already know that there are no common known bits.

llvm-svn: 290490
2016-12-24 12:59:35 +00:00
Chandler Carruth 534d644b86 [PM] Try to improve the comments here to make what's going on more
clear.

Based on post-commit review suggestion from Sean. (Thanks!)

llvm-svn: 290488
2016-12-24 05:11:17 +00:00
Daniel Berlin 8a6a86146c Mark isOnlyReachableViaThisEdge as const
llvm-svn: 290468
2016-12-24 00:04:07 +00:00
Mehdi Amini 4fe6a8c826 Add an assertion for cl::opt names: they can't start with '-'
llvm-svn: 290467
2016-12-23 23:55:26 +00:00
Chandler Carruth 4eaff12ba2 [PM] Teach the always inlining test case to be much more strict about
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.

Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.

Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?

Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.

llvm-svn: 290457
2016-12-23 23:33:35 +00:00
Chandler Carruth 060ad61fbe [PM] Add support for building a default AA pipeline to the PassBuilder.
Pretty boring and lame as-is but necessary. This is definitely a place
we'll end up with extension hooks longer term. =]

Differential Revision: https://reviews.llvm.org/D28076

llvm-svn: 290449
2016-12-23 20:38:19 +00:00
Mehdi Amini 94f86ad4e0 Function-import: Disable IRVerifier on lazy-loaded modules: the ODR TypeUniquing generates invalid debug info.
llvm-svn: 290442
2016-12-23 19:19:44 +00:00
Mehdi Amini fc06b83ee7 Fix build after r290437 (missing include)
llvm-svn: 290438
2016-12-23 18:04:51 +00:00
Mehdi Amini 9a9077fdad FunctionImport: fix typo '#ifndef NDEBUG' instead of '#ifndef DEBUG'
llvm-svn: 290437
2016-12-23 17:59:24 +00:00
Jan Vesely 206a510e54 AMDGPU: split ret/noret patterns for global atomics
Differential Revision: https://reviews.llvm.org/D27989

llvm-svn: 290435
2016-12-23 15:34:51 +00:00
Davide Italiano b9ff23a402 [LICM] Plug a leak freeing the ASTs before clearing the map.
llvm-svn: 290433
2016-12-23 15:02:35 +00:00
Piotr Padlewski 383edba1fd [MemDep] NFC changes
llvm-svn: 290428
2016-12-23 13:13:32 +00:00
Davide Italiano 34f94384a5 [LICM] Work around LICM needs to maintain state across loops.
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.

Differential Revision:  https://reviews.llvm.org/D25848

llvm-svn: 290427
2016-12-23 13:12:50 +00:00
Renato Golin 21da340f7a [AArch64] Cortex-A57 FDIV/FSQRT scheduling fix (W-unit)
According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit
(W-unit in AArch64SchedA57.td, the same as cryptography instructions),
not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA).

This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW
instead of A57UnitX. Also, latencies for those instructions are
corrected.

Patch by Andrew Zhogin.

llvm-svn: 290426
2016-12-23 12:51:41 +00:00
Florian Hahn 898127fe36 Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot.
llvm-svn: 290425
2016-12-23 12:26:11 +00:00
Florian Hahn 1d6b1a7b79 [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: mkuper, MatzeB, aprantl

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290423
2016-12-23 11:35:00 +00:00
Davide Italiano 0ff941620c [NewGVN] Remove (for now) unused code. NFCI.
llvm-svn: 290420
2016-12-23 10:28:30 +00:00
Mehdi Amini 96cdc49305 [ThinLTO] Verify lazy-loaded source module for function importing when assertions are enabled (NFC)
llvm-svn: 290416
2016-12-23 05:16:19 +00:00
Mehdi Amini 9f926f70c1 MetadataLoader: split the creation of a single metadata out of a Record into its own function (NFC)
This is pure code motion, will just make it more reusable when I'll
attempt to lazy-load Metadats on-demand.

llvm-svn: 290414
2016-12-23 03:59:18 +00:00
Dan Gohman 00d734d89b [WebAssembly] Annotate call and load/store immediates.
These will be used to guide the binary encoding of these immediates.

llvm-svn: 290412
2016-12-23 03:23:52 +00:00
Zijiao Ma bf6007bd1b Make the canonicalisation on shifts benifit to more case.
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
  tests of machine-licm.Correct some check lines.

Differential Revision: https://reviews.llvm.org/D27916

llvm-svn: 290410
2016-12-23 02:56:07 +00:00
Mehdi Amini 37c178b6f5 MetadataLoader: Reinitialize MinFwdRef/MaxFwdRef after resolving cycles (NFC)
This put the Loader back in a consistent state.

llvm-svn: 290409
2016-12-23 02:20:12 +00:00
Mehdi Amini 5ae6170fc2 MetadataLoader: Add an assertion for the implicit invariant of PlaceHolder while loading Metadata (NFC)
llvm-svn: 290408
2016-12-23 02:20:09 +00:00
Mehdi Amini 70a9cd4cbe MetadataLoader: Make sure every member of MetadataLoader are initialized (NFC)
llvm-svn: 290407
2016-12-23 02:20:07 +00:00
Mehdi Amini ec68dd49bf MetadataLoader: Refactor "IsImporting" into the Pimpl for the MetadataLoader (NFC)
Keeping all the state together will make it easier to handle.

llvm-svn: 290406
2016-12-23 02:20:02 +00:00
Chandler Carruth ee08676102 Enable '-Wstring-conversion' and fix some bad asserts that it helped
find.

Notable is the assert in NewGVN which had no effect because of the bug.

llvm-svn: 290400
2016-12-23 01:38:06 +00:00
George Burgess IV ccae43a247 Don't consider allocsize functions to be allocation functions.
This patch fixes some ASAN unittest failures on FreeBSD. See the
cfe-commits email thread for r290169 for more on those.

According to the LangRef, the allocsize attribute only tells us about
the number of bytes that exist at the memory location pointed to by the
return value of a function. It does not necessarily mean that the
function will only ever allocate. So, we need to be very careful about
treating functions with allocsize as general allocation functions. This
patch makes us fully conservative in this regard, though I suspect that
we have room to be a bit more aggressive if we want.

This has a FIXME that can be fixed by a relatively straightforward
refactor; I just wanted to keep this patch minimal. If this sticks, I'll
come back and fix it in a few days.

llvm-svn: 290397
2016-12-23 01:18:09 +00:00
Sanjoy Das 50fef4321b NFC code motion in ImplicitNullChecks
Extract out two large lambdas into top level member functions.

llvm-svn: 290395
2016-12-23 00:41:24 +00:00
Sanjoy Das 9a129807f3 Reimplement depedency tracking in the ImplicitNullChecks pass
Summary:
This change rewrites a core component in the ImplicitNullChecks pass for
greater simplicity since the original design was over-complicated for no
good reason.  Please review this as essentially a new pass.  The change
is almost NFC and I've added a test case for a scenario that this new
code handles that wasn't handled earlier.

The implicit null check pass, at its core, is a code hoisting transform.
It differs from "normal" code transforms in that it speculates
potentially faulting instructions (by design), but a lot of the usual
hazard detection logic (register read-after-write etc.) still applies.
We previously detected hazards by keeping track of registers defined and
used by machine instructions over an instruction range, but that was
unwieldy and did not actually confer any performance benefits.  The
intent was to have linear time complexity over the number of machine
instructions considered, but it ended up being N^2 is practice.

This new version is more obviously O(N^2) (with N capped to 8 by
default) in hazard detection.  It does not attempt to be clever in
tracking register uses or defs (the previous cleverness here was a
source of bugs).

Once this is checked in, I'll extract out the `IsSuitableMemoryOp` and
`CanHoistLoadInst` lambda into member functions (they're too complicated
to be inline lambdas) and do some other related NFC cleanups.

Reviewers: reames, anna, atrick

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27592

llvm-svn: 290394
2016-12-23 00:41:21 +00:00
Quentin Colombet 3749f33888 [GlobalISel] More fix for the size vs. type typo. NFC.
I missed those in my previous commit (r290378).

llvm-svn: 290387
2016-12-22 22:50:34 +00:00
Chris Bieneman e0e451d927 [ObjectYAML] Support for DWARF debug_info section
This patch adds support for YAML<->DWARF for debug_info sections.

This re-lands r290147, reverted in 290148, re-landed in r290204 after fixing the issue that caused bots to fail (thank you UBSan!), and reverted again in r290209 due to failures on big endian systems.

After adding support for preserving endianness, this should be good now.

llvm-svn: 290386
2016-12-22 22:44:27 +00:00
Evgeniy Stepanov 27d4c9b71b [cfi] Emit jump tables as a function-level inline asm.
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.

The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.

This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.

Differential Revision: https://reviews.llvm.org/D28012

llvm-svn: 290384
2016-12-22 22:22:35 +00:00
Chris Bieneman 55de3a2449 [ObjectYAML] MachO support for endianness
This patch adds support to the macho<->yaml tools for preserving endianness in MachO structures and DWARF data.

llvm-svn: 290381
2016-12-22 21:58:03 +00:00
Quentin Colombet fa5960a28b [MachineVerifier] Check that even generic vregs comply to regclass constraints.
We used to not check generic vregs, but that is actually a mistake given
nothing in the GlobalISel pipeline is going to fix the constraints on
target specific instructions. Therefore, the target has to have them
right from the start.

llvm-svn: 290380
2016-12-22 21:56:39 +00:00
Quentin Colombet e08cc599b8 [MIRParser] Fix a typo in comment and error message.
We have long switched from size to type.

llvm-svn: 290378
2016-12-22 21:56:35 +00:00
Quentin Colombet f38015e5fe [AArch64][CallLowering] Constraint registers on target specific instruction
The InstructionSelect pass will not look at target specific instructions
since they are already selected. As a result, the operands of target
specific instructions must be properly constrained, because it is not
going to fix them.

This fixes invalid register classes on call instruction.

llvm-svn: 290377
2016-12-22 21:56:31 +00:00
Quentin Colombet 9751e61fe1 [MIRParser] Non-generic virtual register may have a type.
When generic virtual registers get constrained, because of a use on a
target specific operation for instance, we end up with regular virtual
registers with a type and that's perfectly fine.

llvm-svn: 290376
2016-12-22 21:56:29 +00:00
Quentin Colombet 7e1f66d6f5 [RegisterBankInfo] Allow to set a register class when nothing else is set
This is going to be needed to be able to constraint register class on
target specific instruction while the RegBankSelect pass did not run
yet.

llvm-svn: 290375
2016-12-22 21:56:26 +00:00
Quentin Colombet b4e71185b2 [GlobalISel] Refactor the logic to constraint registers.
Move the logic to constraint register from InstructionSelector to a
utility function. It will be required by other passes in the GlobalISel
pipeline.

llvm-svn: 290374
2016-12-22 21:56:19 +00:00
Matt Arsenault 0b26e47345 AMDGPU: Invert cmp + select with constant
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.

This seems to usually be better but causes some code size regressions
in some tests.

llvm-svn: 290372
2016-12-22 21:40:08 +00:00
Krzysztof Parzyszek 3885d87c60 [Hexagon] Add DAG mutations for machine pipeliner
llvm-svn: 290366
2016-12-22 19:44:55 +00:00
Wei Mi a2f0b594c2 Redo store splitting in CodeGenPrepare.
This is a succeeding patch of https://reviews.llvm.org/D22840 to address the
issue when a value to be merged into an int64 pair is in a different BB. Redoing
the store splitting in CodeGenPrepare so we can match the pattern across multiple
BBs and move some instructions into the same BB. We still keep the code in dag
combine so that we can catch cases that show up after DAG combining runs.

Differential Revision: https://reviews.llvm.org/D25914

llvm-svn: 290365
2016-12-22 19:44:45 +00:00
Wei Mi f3f01aba48 Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.
This is for splitMergedValStore in DAG Combine to share the target query interface
with similar logic in CodeGenPrepare.

Differential Revision: https://reviews.llvm.org/D24707

llvm-svn: 290363
2016-12-22 19:38:22 +00:00
Petar Jovanovic 8a4e63994e [mips] Fix compact branch hazard detection, part 2
Follow up to D27209 fix, this patch now properly handles single transient
instruction in basic block.

Patch by Aleksandar Beserminji.

Differential Revision: https://reviews.llvm.org/D27856

llvm-svn: 290361
2016-12-22 19:29:50 +00:00
Krzysztof Parzyszek 8839124848 Add the DAG mutation interface to the software pipeliner
llvm-svn: 290360
2016-12-22 19:21:20 +00:00
Krzysztof Parzyszek df24da221e Fix two bugs in the pipeliner in renaming phis in the prolog and epilog
When the pipeliner is renaming phi values, it may need to iterate through
the phi operands to check for other phis. However, the pipeliner should
stop once it reaches a phi that is outside the pipelined loop.

Also, when the generateExistingPhis code is unable to reuse an existing
phi, the default code that computes the PhiOp2 is only to be used when
the pipeliner is generating the kernel. Otherwise, the phi may be a value
computed earlier in the same epilog.

Patch by Brendon Cahoon.

llvm-svn: 290355
2016-12-22 18:49:55 +00:00
Matt Arsenault 941632839f AMDGPU: Use i16 for i16 shift amount
llvm-svn: 290351
2016-12-22 16:36:25 +00:00
Davide Italiano e05e3306a3 [NewGVN] Add the pass to PassRegistry.def.
We need to hook up here to get it working with the new PM.
Add a test while here (and remove a typo).

llvm-svn: 290350
2016-12-22 16:35:02 +00:00
Matt Arsenault 3c97e2030a AMDGPU: Fix missing 16-bit cmpx instructions
llvm-svn: 290349
2016-12-22 16:27:14 +00:00
Matt Arsenault 18f56be3d2 AMDGPU: Use i16 comparison instructions
llvm-svn: 290348
2016-12-22 16:27:11 +00:00
Matt Arsenault fef7beb6a6 AMDGPU: Fixed '!NodePtr->isKnownSentinel()' assert
Caused by dereferencing end iterator when trying to const cast the iterator.

Patch by Martin Sherburn

llvm-svn: 290347
2016-12-22 16:06:32 +00:00
Davide Italiano 7e274e02ae [GVN] Initial check-in of a new global value numbering algorithm.
The code have been developed by Daniel Berlin over the years, and
the new implementation goal is that of addressing shortcomings of
the current GVN infrastructure, i.e. long compile time for large
testcases, lack of phi predication, no load/store value numbering
etc...

The current code just implements the "core" GVN algorithm, although
other pieces (load coercion, phi handling, predicate system) are
already implemented in a branch out of tree. Once the core is stable,
we'll start adding pieces on top of the base framework.
The test currently living in test/Transform/NewGVN are a copy
of the ones in GVN, with proper `XFAIL` (missing features in NewGVN).
A flag will be added in a future commit to enable NewGVN, so that
interested parties can exercise this code easily.

Differential Revision:  https://reviews.llvm.org/D26224

llvm-svn: 290346
2016-12-22 16:03:48 +00:00
Dan Gohman 8b4340a5dd [WebAssembly] Add an "explicit" keyword to a constructor.
llvm-svn: 290345
2016-12-22 16:03:02 +00:00
Dan Gohman 207ed22660 [WebAssembly] Don't use variadic operand indices in the MCOperandInfo array.
llvm-svn: 290344
2016-12-22 16:00:55 +00:00
Dan Gohman 728926ac59 [WebAssembly] Don't old negative load/store offsets in fast-isel.
WebAssembly's load/store offsets are unsigned and don't wrap, so it's not
valid to fold in a negative offset.

llvm-svn: 290342
2016-12-22 15:15:10 +00:00
Sam Kolton a568e3dde7 [AMDGPU] Add pseudo SDWA instructions
Summary: This is needed for later SDWA support in CodeGen.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27412

llvm-svn: 290338
2016-12-22 12:57:41 +00:00
Sam Kolton a6792a39c4 [AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwa
Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands.

Reviewers: nhaustov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27847

llvm-svn: 290336
2016-12-22 11:30:48 +00:00
Ayman Musa 9ff608cdc6 [X86][AVX2] Passing the appropriate memory operand class to VPMADDWD instruction.
Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem.

Differential Revision: https://reviews.llvm.org/D28024

llvm-svn: 290333
2016-12-22 08:42:46 +00:00
Chandler Carruth 9c36c922d9 [PM] Remove now-dead extern template and explicit instantiation
declarations.

We're using a custom class here instead of the helper template, these
bits just didn't get deleted when the other bits did get deleted. This
was found by a really nice MSVC warning about explicitly instantiating
a template where some member functions aren't defined and thus can't be
instantiatied.

llvm-svn: 290327
2016-12-22 07:14:33 +00:00
Chandler Carruth e3f5064b72 [PM] Introduce a reasonable port of the main per-module pass pipeline
from the old pass manager in the new one.

I'm not trying to support (initially) the numerous options that are
currently available to customize the pass pipeline. If we end up really
wanting them, we can add them later, but I suspect many are no longer
interesting. The simplicity of omitting them will help a lot as we sort
out what the pipeline should look like in the new PM.

I've also documented to the best of my ability *why* each pass or group
of passes is used so that reading the pipeline is more helpful. In many
cases I think we have some questionable choices of ordering and I've
left FIXME comments in place so we know what to come back and revisit
going forward. But for now, I've left it as similar to the current
pipeline as I could.

Lastly, I've had to comment out several places where passes are not
ported to the new pass manager or where the loop pass infrastructure is
not yet ready. I did at least fix a few bugs in the loop pass
infrastructure uncovered by running the full pipeline, but I didn't want
to go too far in this patch -- I'll come back and re-enable these as the
infrastructure comes online. But I'd like to keep the comments in place
because I don't want to lose track of which passes need to be enabled
and where they go.

One thing that seemed like a significant API improvement was to require
that we don't build pipelines for O0. It seems to have no real benefit.

I've also switched back to returning pass managers by value as at this
API layer it feels much more natural to me for composition. But if
others disagree, I'm happy to go back to an output parameter.

I'm not 100% happy with the testing strategy currently, but it seems at
least OK. I may come back and try to refactor or otherwise improve this
in subsequent patches but I wanted to at least get a good starting point
in place.

Differential Revision: https://reviews.llvm.org/D28042

llvm-svn: 290325
2016-12-22 06:59:15 +00:00
Adrian Prantl 5542da4bbc Fix an assertion in DwarfExpression when emitting fragments in vector registers
When DwarfExpression is emitting a fragment that is located in a
register and that fragment is smaller than the register, and the
register must be composed from sub-registers (are you still with me?)
the last DW_OP_piece operation must not be larger than the size of the
fragment itself, since the last piece of the fragment could be smaller
than the last subregister that is being emitted.

rdar://problem/29779065

llvm-svn: 290324
2016-12-22 06:10:41 +00:00
Adrian Prantl 49797ca6be Refactor the DIExpression fragment query interface (NFC)
... so it becomes available to DIExpressionCursor.

llvm-svn: 290322
2016-12-22 05:27:12 +00:00
Matt Arsenault 485dacd90c DAG: Add helper for testing constant values
There are helpers for testing for constant or constant build_vector,
and for splat ConstantFP vectors, but not for a constantfp or
non-splat ConstantFP vector.

llvm-svn: 290317
2016-12-22 04:39:45 +00:00
Matt Arsenault 3de76b9dc8 AMDGPU: Fix missing commute table entries for cmpx
No tests because these aren't currently used anywhere.

llvm-svn: 290316
2016-12-22 04:39:41 +00:00
Matt Arsenault e7d8ed32f9 AMDGPU: Swap order of operands in fadd/fsub combine
FMA is canonicalized to constant in the middle operand. Do
the same so fmad matches and avoid an extra combine step.

llvm-svn: 290313
2016-12-22 04:03:40 +00:00
Matt Arsenault 46e6b7adef AMDGPU: Check fast math flags in fadd/fsub combines
llvm-svn: 290312
2016-12-22 04:03:35 +00:00
Matt Arsenault 770ec8680a AMDGPU: Form more FMAs if fusion is allowed
Extend the existing fadd/fsub->fmad combines to produce
FMA if allowed.

llvm-svn: 290311
2016-12-22 03:55:35 +00:00
Matt Arsenault d8b73d5304 AMDGPU: Move combines into separate functions
llvm-svn: 290309
2016-12-22 03:44:42 +00:00
Matt Arsenault ef82ad94ea AMDGPU: Enable some f32 fadd/fsub combines for f16
llvm-svn: 290308
2016-12-22 03:40:39 +00:00
Matt Arsenault 9e22bc2cd3 AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16
llvm-svn: 290307
2016-12-22 03:21:48 +00:00
Matt Arsenault cdff21b14e AMDGPU: Allow rcp and rsq usage with f16
llvm-svn: 290302
2016-12-22 03:05:44 +00:00
Matt Arsenault 4052a576c0 AMDGPU: Custom lower f16 fdiv
llvm-svn: 290301
2016-12-22 03:05:41 +00:00
Matt Arsenault ce84130f85 AMDGPU: Implement f16 fcanonicalize
llvm-svn: 290300
2016-12-22 03:05:37 +00:00
Matt Arsenault 4e55c1ec11 AMDGPU: Update isFPImmLegal for f16
I don't think this matters because ConstantFP is legal.

llvm-svn: 290299
2016-12-22 03:05:30 +00:00
Peter Collingbourne 704f814a5e Clear the PendingTypeTests vector after moving from it.
This is to put the vector into a well defined state. Apparently the state of a
vector after being moved from is valid but unspecified. Found with clang-tidy.

llvm-svn: 290298
2016-12-22 02:52:23 +00:00
Haicheng Wu 9ac20a1e10 [AArch64] Correct the check of signed 9-bit imm in getIndexedAddressParts().
-256 is a legal indexed address part.

Differential Revision: https://reviews.llvm.org/D27537

llvm-svn: 290296
2016-12-22 01:39:24 +00:00
Easwaran Raman 180bd9f6b3 Pass GetAssumptionCache to InlineFunctionInfo constructor
Differential revision: https://reviews.llvm.org/D28038

llvm-svn: 290295
2016-12-22 01:07:01 +00:00
David Majnemer 5fa7d48bb8 [NVVMIntrRange] Only set range metadata if none is already present
The range metadata inserted by NVVMIntrRange is pessimistic, range
metadata already present could be more precise.

llvm-svn: 290294
2016-12-22 00:51:59 +00:00
Adrian Prantl 58c1910642 [LLParser] Make the line field of DIMacro(File) optional.
Otherwise these records do not survive roundtrips.

llvm-svn: 290291
2016-12-22 00:29:00 +00:00
Ahmed Bougacha 36f7035bd7 [GlobalISel] Add basic Selector-emitter tblgen backend.
This adds a basic tablegen backend that analyzes the SelectionDAG
patterns to find simple ones that are eligible for GlobalISel-emission.

That's similar to FastISel, with one notable difference: we're not fed
ISD opcodes, so we need to map the SDNode operators to generic opcodes.
That's done using GINodeEquiv in TargetGlobalISel.td.

Otherwise, this is mostly boilerplate, and lots of filtering of any kind
of "complicated" pattern. On AArch64, this is sufficient to match G_ADD
up to s64 (to ADDWrr/ADDXrr) and G_BR (to B).

Differential Revision: https://reviews.llvm.org/D26878

llvm-svn: 290284
2016-12-21 23:26:20 +00:00
Ahmed Bougacha aa9fe53278 [AsmWriter] Remove redundant cast<>s. NFC.
llvm-svn: 290283
2016-12-21 23:26:13 +00:00
Dan Gohman a2b9b349e7 [WebAssembly] Fix the opcode value for i64.rotr.
llvm-svn: 290281
2016-12-21 23:09:42 +00:00
Peter Collingbourne 1b4137a7f9 IR: Function summary representation for type tests.
Each function summary has an attached list of type identifier GUIDs. The
idea is that during the regular LTO phase we would match these GUIDs to type
identifiers defined by the regular LTO module and store the resolutions in
a top-level "type identifier summary" (which will be implemented separately).

Differential Revision: https://reviews.llvm.org/D27967

llvm-svn: 290280
2016-12-21 23:03:45 +00:00
Haicheng Wu 6bb0e39321 [AArch64] Remove a redundant check. NFC.
The case AM.Scale == 0 is already handled by the code right above.

Differential Revision: https://reviews.llvm.org/D28003

llvm-svn: 290275
2016-12-21 21:40:47 +00:00
Greg Clayton 78a07bfa66 Add the ability for DWARFDie objects to get the parent DWARFDie.
In order for the llvm DWARF parser to be used in LLDB we will need to be able to get the parent of a DIE. This patch adds that functionality by changing the DWARFDebugInfoEntry class to store a depth field instead of a sibling index. Using a depth field allows us to easily calculate the sibling and the parent without increasing the size of DWARFDebugInfoEntry.

I tested llvm-dsymutil on a debug version of clang where this fully parses DWARF in over 1200 .o files to verify there was no serious regression in performance.

Added a full suite of unit tests to test this functionality.

Differential Revision: https://reviews.llvm.org/D27995

llvm-svn: 290274
2016-12-21 21:37:06 +00:00
Ed Maste 084062803e Update mailing list post URL and add libunwind reference
RTDyldMemoryManager.cpp describes the differing __register_frame
API between libunwind and libgcc, with a mailing list posting URL.

The original link was 404; replace it with what I believe is the
intended post, as well as a reference to the "OS X" implementation in
libunwind.

Differential Revision:	https://reviews.llvm.org/D27965

llvm-svn: 290269
2016-12-21 20:51:42 +00:00
Simon Pilgrim 081abbb164 [X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

llvm-svn: 290267
2016-12-21 20:00:10 +00:00
David Majnemer b0761a0c1b Revert "[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp"
This reverts commit r289813, it caused PR31449.

llvm-svn: 290266
2016-12-21 19:21:59 +00:00
Tom Stellard d8ea85aced AMDGPU/SI: Fix file header
llvm-svn: 290265
2016-12-21 19:06:24 +00:00
Peter Collingbourne 35f3f7cdc7 TypeMetadataUtils: Simplify; spotted by Mehdi.
llvm-svn: 290264
2016-12-21 19:00:47 +00:00
Zachary Turner ab266cf95b Add missing includes on Windows.
Patch by Andrey Khalyavin
Differential Revision: https://reviews.llvm.org/D27915

llvm-svn: 290263
2016-12-21 18:50:52 +00:00
Michael Kuperstein 88f15eedbb [LLParser] Parse vector GEP constant expression correctly
The constantexpr parsing was too constrained and rejected legal vector GEPs.
This relaxes it to be similar to the ones for instruction parsing.

This fixes PR30816.

Differential Revision: https://reviews.llvm.org/D28013

llvm-svn: 290261
2016-12-21 18:29:47 +00:00
Michael Kuperstein dd92c78669 [ConstantFolding] Fix vector GEPs harder
For vector GEPs, CastGEPIndices can end up in an infinite recursion, because
we compare the vector type to the scalar pointer type, find them different,
and then try to cast a type to itself.

Differential Revision: https://reviews.llvm.org/D28009

llvm-svn: 290260
2016-12-21 17:34:21 +00:00
Simon Pilgrim c93cd30fac [CostModel] Pass shuffle mask args with ArrayRef. NFCI.
llvm-svn: 290257
2016-12-21 15:49:01 +00:00
Michael Zuckerman 85e12d2851 revert first commit . removing empty line in X86.h
llvm-svn: 290255
2016-12-21 12:48:01 +00:00
Michael Zuckerman 58838cf29d First commit adding new line to X86.h
llvm-svn: 290254
2016-12-21 12:44:47 +00:00
Elena Demikhovsky 7c7bf1b432 Added a template for building target specific memory node in DAG.
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. 
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.

Differential Revision: https://reviews.llvm.org/D27899

llvm-svn: 290250
2016-12-21 10:43:36 +00:00
Davide Italiano c96272c47c [AMDGPU] Garbage collect dead code. NFCI.
llvm-svn: 290249
2016-12-21 10:19:00 +00:00
Oren Ben Simhon cb692157b7 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
Fixing a warning.

llvm-svn: 290248
2016-12-21 09:47:31 +00:00
Oren Ben Simhon de2eea7298 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
Fixing build issues.

llvm-svn: 290244
2016-12-21 08:59:42 +00:00
Oren Ben Simhon 3b95157090 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. 
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.

The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
This aubmit also includes additional lit tests to cover better HVAs corner cases.

Differential Revision: https://reviews.llvm.org/D27392

llvm-svn: 290240
2016-12-21 08:31:45 +00:00
Adam Nemet 32e6a34c02 [LDist] Match behavior between invoking via optimization pipeline or opt -loop-distribute
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).

As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.

clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.

This changes opt's default to off as well to match clang.  The tests are
modified to explicitly enable the transformation.

llvm-svn: 290235
2016-12-21 04:07:40 +00:00
Tim Shen 7b57ac44f9 [APFloat] Remove 'else' after return. NFC
Reviewers: kbarton, iteratee, hfinkel, echristo

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D27934

llvm-svn: 290232
2016-12-21 02:39:21 +00:00