Commit Graph

38161 Commits

Author SHA1 Message Date
David Majnemer 7f781aba97 [ConstantFolding] Fold masked loads
We can constant fold a masked load if the operands are appropriately
constant.

Differential Revision: http://reviews.llvm.org/D22324

llvm-svn: 275352
2016-07-14 00:29:50 +00:00
David Majnemer f89660aba7 [ConstantFolding] Extend FoldReinterpretLoadFromConstPtr to handle negative offsets
Treat loads which clip before the start of a global initializer the same
way we treat clipping beyond the end of the initializer: use zeros.

llvm-svn: 275345
2016-07-13 23:33:07 +00:00
Michael Kuperstein be837fa40f [DAG] Correctly chain masked loads
If a masked loads is not added to the chain, it should not reset the chain's
root.

This fixes the remaining part of PR28515.

llvm-svn: 275340
2016-07-13 23:23:40 +00:00
Quentin Colombet 68a84587c5 [MIR] Fix one GlobalISel test case that I missed in r275314.
llvm-svn: 275333
2016-07-13 22:35:33 +00:00
Nico Weber b888555bcc Add a triple to fix test on bots after 275320.
llvm-svn: 275327
2016-07-13 22:19:40 +00:00
Nico Weber eb9488b151 Fix a TODO in X86CallFrameOptimization to not rely on a codegen artifact.
This happens to make X86CallFrameOptimization in -O0 / FastISel builds as well,
but it's not clear if the pass should run in that setup.

http://reviews.llvm.org/D22314

llvm-svn: 275320
2016-07-13 21:38:27 +00:00
Alina Sbirlea 640a61cd8b Extended LoadStoreVectorizer to vectorize subchains.
Summary:
LSV used to abort vectorizing a chain for interleaved load/store accesses that alias.
Allow a valid prefix of the chain to be vectorized, mark just the prefix and retry vectorizing the remaining chain.

Reviewers: llvm-commits, jlebar, arsenm

Subscribers: mzolotukhin

Differential Revision: http://reviews.llvm.org/D22119

llvm-svn: 275317
2016-07-13 21:20:01 +00:00
Quentin Colombet 545e558b82 [MIR] Print on the given output instead of stderr.
Currently the MIR framework prints all its outputs (errors and actual
representation) on stderr.

This patch fixes that by printing the regular output in the output
specified with -o.

Differential Revision: http://reviews.llvm.org/D22251

llvm-svn: 275314
2016-07-13 20:36:03 +00:00
Matt Arsenault f071102647 AMDGPU: Remove last AMDIL intrinsics
llvm-svn: 275309
2016-07-13 19:42:06 +00:00
Andrew Kaylor 346dd7f1bd Reverting r275284 due to platform-specific test failures
llvm-svn: 275304
2016-07-13 19:09:16 +00:00
Sanjay Patel eff2aa70fc add more tests for zexty xor sandwiches
...mmm sandwiches

llvm-svn: 275302
2016-07-13 18:58:55 +00:00
Simon Pilgrim 5d664af3c3 [X86][SSE] Regenerate truncated shift test
Check SSE2 and AVX2 implementations

llvm-svn: 275300
2016-07-13 18:50:10 +00:00
Simon Pilgrim 631643e7d9 Regenerate test
llvm-svn: 275299
2016-07-13 18:46:37 +00:00
Sanjay Patel 904a88025a add test for zexty xor sandwich
llvm-svn: 275297
2016-07-13 18:40:38 +00:00
Krzysztof Parzyszek cb4dd7656b Move mempcpy_call.ll to X86 subdirectory
llvm-svn: 275294
2016-07-13 18:28:45 +00:00
Sanjay Patel c00e48a3db [InstCombine] extend vector select matching for non-splat constants
In D21740, we discussed trying to make this a more general matcher. However, I didn't see a clean
way to handle the regular m_Not cases and these non-splat vector patterns, so I've opted for the
direct approach here. If there are other potential uses of areInverseVectorBitmasks(), we could
move that helper function to a higher level.

There is an open question as to which is of these forms should be considered the canonical IR:
  %sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
  %shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3>

Differential Revision: http://reviews.llvm.org/D22114

llvm-svn: 275289
2016-07-13 18:07:02 +00:00
Andrew Kaylor 12cccdd731 Fix for Bug 26903, adds support to inline __builtin_mempcpy
Patch by Sunita Marathe

Differential Revision: http://reviews.llvm.org/D21920

llvm-svn: 275284
2016-07-13 17:25:11 +00:00
Matthias Braun 512424f28a PatchableFunction: Skip pseudos that do not create code
This fixes http://llvm.org/PR28524

llvm-svn: 275278
2016-07-13 16:37:29 +00:00
Teresa Johnson b907d06151 [ThinLTO/gold] Enable symbol resolution in distributed backend case
While testing a follow-on change to enable index-based symbol resolution
and internalization in the distributed backends, I realized that a test
case change I made in r275247 was only required because we were not
analyzing symbols in the claimed files in thinlto-index-only mode.

In the fixed test case there should be no internalization because we are
linking in -shared mode, so f() is in fact exported, which is detected
properly when we analyze symbols in thinlto-index-only mode. Note that
this is not (yet) a correctness issue (because we are not yet performing
the index-based linkage optimizations in the distributed backends -
that's coming in a follow-on patch).

llvm-svn: 275277
2016-07-13 16:35:56 +00:00
Sanjay Patel 610a2f6525 [x86][SSE/AVX] optimize pcmp results better (PR28484)
We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading.

One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the
same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if
it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the
test cases is filed as PR28505:
https://llvm.org/bugs/show_bug.cgi?id=28505

Differential Revision: http://reviews.llvm.org/D22225

llvm-svn: 275276
2016-07-13 16:04:07 +00:00
Simon Pilgrim a99368fa35 [X86][AVX512] Add support for VPERMILPD/VPERMILPS variable shuffle mask comments
llvm-svn: 275272
2016-07-13 15:45:36 +00:00
Simon Pilgrim 48d8340760 [X86][AVX] Add support for target shuffle combining to VPERMILPS variable shuffle mask
Added AVX512F VPERMILPS shuffle decoding support

llvm-svn: 275270
2016-07-13 15:10:43 +00:00
Tom Stellard 418beb7671 AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL
Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21484

llvm-svn: 275268
2016-07-13 14:23:33 +00:00
Nirav Dave 8ea792db60 [MC] Fix lexing ordering in assembly label parsing to preserve same line
comment placement.

llvm-svn: 275265
2016-07-13 14:03:12 +00:00
Matt Arsenault 0056868c4a AMDGPU: Fold out no-op kill intrinsics
llvm-svn: 275253
2016-07-13 06:04:22 +00:00
David Majnemer 1b3db33e3d [ConstantFolding] Don't treat negative GEP offsets as positive
GEP offsets are signed, don't treat them as huge positive numbers.

llvm-svn: 275251
2016-07-13 05:16:16 +00:00
Adam Nemet c2f791d8a7 [BFI] Add new LazyBFI analysis pass
Summary:
This is necessary for D21771.  In order to add the hotness attribute to
optimization remarks we need BFI to be available in all passes that emit
optimization remarks.

However we don't want to pay for computing BFI unless the hotness
attribute is requested.

This is achieved by making BFI lazy at the very high-level through a new
analysis pass -- BFI is not calculated unless requested.

I am adding a test to check the laziness under D21771 where the first
user of the analysis is added.

Reviewers: hfinkel, dexonsmith, davidxl

Subscribers: davidxl, dexonsmith, llvm-commits

Differential Revision: http://reviews.llvm.org/D22141

llvm-svn: 275250
2016-07-13 05:01:48 +00:00
Teresa Johnson 27694571b1 [ThinLTO/gold] ThinLTO internalization fixes
Internalization was missing cases where we originally had a local symbol
that was promoted eagerly but not actually exported. This is because we
were only internalizing the set of global (non-local) symbols that were
PREVAILAING_DEF_IRONLY. Instead, collect the set of global symbols that
are referenced outside of a single IR file, and skip internalization for
those.

llvm-svn: 275247
2016-07-13 03:42:41 +00:00
David Majnemer a7b6c973e5 [ConstantFold] Don't incorrectly infer inbounds on array GEP
The many levels of nesting inside the responsible code made it easy for
bugs to sneak in.  Flattening the logic makes it easier to see what's
going on.

llvm-svn: 275244
2016-07-13 03:24:41 +00:00
Keno Fischer 1efc3b70c5 Fix ScalarEvolutionExpander step scaling bug
The expandAddRecExprLiterally function incorrectly transforms
`[Start + Step * X]` into `Step * [Start + X]` instead of the correct
transform of `[Step * X] + Start`.

This caused https://github.com/JuliaLang/julia/issues/14704#issuecomment-174126219
due to what appeared to be sufficiently complicated loop interactions.

Patch by Jameson Nash (jameson@juliacomputing.com).

Reviewers: sanjoy
Differential Revision: http://reviews.llvm.org/D16505

llvm-svn: 275239
2016-07-13 01:28:12 +00:00
Dehao Chen 9cba1f4e7e New pass manager for LICM.
Summary: Port LICM to the new pass manager.

Reviewers: davidxl, silvas

Subscribers: krasin, vitalybuka, silvas, davide, sanjoy, llvm-commits, mehdi_amini

Differential Revision: http://reviews.llvm.org/D21772

llvm-svn: 275222
2016-07-12 22:37:48 +00:00
Tim Northover 72eebfa4b0 GlobalISel: freeze reserved regs after IRTranslator.
We can freeze the registers after the MachineFrameInfo has been configured (by
telling it about calls, inline asm, ...). This doesn't happen at all yet, but
will be part of IR translation.

Fixes -verify-machineinstrs assertion.

llvm-svn: 275221
2016-07-12 22:23:42 +00:00
Matt Arsenault 786724a22e AMDGPU: Follow up to r275203
I meant to squash this into it.

llvm-svn: 275220
2016-07-12 21:41:32 +00:00
Nemanja Ivanovic f0407e3902 The test case I added is PowerPC specific but I accidentally
had it in the wrong directory. Moved it to CodeGen/PowerPC.

Sorry about the noise.

llvm-svn: 275218
2016-07-12 21:24:08 +00:00
Michael Kuperstein a99c46cc73 [LV] Remove wrong assumption about LCSSA
The LCSSA pass itself will not generate several redundant PHI nodes in a single
exit block. However, such redundant PHI nodes don't violate LCSSA form, and may
be introduced by passes that preserve LCSSA, and/or preserved by the LCSSA pass
itself. So, assuming a single PHI node per exit block is not safe.

llvm-svn: 275217
2016-07-12 21:24:06 +00:00
Nemanja Ivanovic b43bb6141e [Power9] Add codegen for VSX word insert/extract instructions
This patch corresponds to review:
http://reviews.llvm.org/D20239

It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that
are useful in some cases for inserting and extracting vector elements of
v4[if]32 vectors.

llvm-svn: 275215
2016-07-12 21:00:10 +00:00
Piotr Padlewski fa0cdb371b Review fixes to lit documentation
Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22245

llvm-svn: 275214
2016-07-12 20:59:17 +00:00
Simon Pilgrim 6fa71da4a4 [X86][AVX] Add support for target shuffle combining to VPERM2F128/VPERM2I128
llvm-svn: 275212
2016-07-12 20:27:32 +00:00
Davide Italiano 0080269342 [SCCP] Constant fold structs if all the lattice value are constant.
Differential Revision:   http://reviews.llvm.org/D22269

llvm-svn: 275208
2016-07-12 19:54:19 +00:00
Matthias Braun 96ec47db74 X86FixupBWInsts: No need for forward liveness analysis.
With r274952 and r275201 in place there are no cases left where a
forward liveness analysis yields different results than a backward one.
So we can remove the forward stepping logic.

Differential Revision: http://reviews.llvm.org/D22083

llvm-svn: 275204
2016-07-12 19:04:30 +00:00
Matt Arsenault 657f871a4e AMDGPU: Fix verifier error with kill intrinsic
Don't create a terminator in the middle of the block.
We should probably get rid of this intrinsic.

llvm-svn: 275203
2016-07-12 19:01:23 +00:00
Dehao Chen b9f8e29290 [PM] Port LoopIdiomRecognize Pass to new PM
Summary: Port LoopIdiomRecognize Pass to new PM

Reviewers: davidxl

Subscribers: davide, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D22250

llvm-svn: 275202
2016-07-12 18:45:51 +00:00
Wei Ding 5b2636a152 AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8
Differential Revision: http://reviews.llvm.org/D22239

llvm-svn: 275197
2016-07-12 18:02:14 +00:00
Xinliang David Li 9eb472ba4b [PGO] Don't include full file path in static function profile counter names
Patch by Jake VanAdrighem
Differential Revision: http://reviews.llvm.org/D22028

llvm-svn: 275193
2016-07-12 17:14:51 +00:00
Sanjay Patel 4a6a751dce add tests for missing DeMorgan's Law folds
llvm-svn: 275192
2016-07-12 17:05:04 +00:00
Sanjay Patel 3900191ecc auto-generate checks
llvm-svn: 275188
2016-07-12 16:21:55 +00:00
Sanjay Patel 93dffe629a auto-generate checks
llvm-svn: 275187
2016-07-12 16:17:30 +00:00
Sanjay Patel 6d1f227e6b auto-generate checks
llvm-svn: 275186
2016-07-12 16:13:04 +00:00
Haicheng Wu 711ca868fc [AArch64] Set FMOVS0 and FMOVD0 as isAsCheapAsAMove when needed.
If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now
only Kryo has both), set FMOVS0 and FMOVD0 isAsCheapAsAMove.

Differential Revision: http://reviews.llvm.org/D22256

llvm-svn: 275178
2016-07-12 15:31:41 +00:00
Nemanja Ivanovic eebbcb6d57 [PowerPC] Cannonicalize applicable vector shift immediates as swaps
This patch corresponds to review:
http://reviews.llvm.org/D21358

Vector shifts that have the same semantics as a vector swap are cannonicalized
as such to provide additional opportunities for swap removal optimization to
remove unnecessary swaps.

llvm-svn: 275168
2016-07-12 12:16:27 +00:00
Amjad Aboud acee568545 [codeview] Improved array type support.
Added support for:
1. Multi dimension array.
2. Array of structure type, which previously was declared incompletely.
3. Dynamic size array.
4. Array where element type is a typedef, volatile or constant (this should resolve PR28311).

Differential Revision: http://reviews.llvm.org/D21526

llvm-svn: 275167
2016-07-12 12:06:34 +00:00
Nicolai Haehnle 7968c34586 AMDGPU: Unify MOVRELSOffset and MOVRELDOffset
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22217

llvm-svn: 275160
2016-07-12 08:12:16 +00:00
Vitaly Buka 204dc533c5 Revert "New pass manager for LICM."
Summary: This reverts commit r275118.

Subscribers: sanjoy, mehdi_amini

Differential Revision: http://reviews.llvm.org/D22259

llvm-svn: 275156
2016-07-12 06:25:32 +00:00
Craig Topper a6e6febe2c [AVX512] Remove masked logic op intrinsics and autoupgrade them to native IR.
llvm-svn: 275155
2016-07-12 05:27:53 +00:00
Ivan Krasin 5474645dc8 Print remarks from WholeProgramDevirt pass for each call site.
Summary:
It's useful to have some visibility about which call sites are devirtualized,
especially for debug purposes. Another use case is a regression test on the
application side (like, Chromium).

Reviewers: pcc

Differential Revision: http://reviews.llvm.org/D22252

llvm-svn: 275145
2016-07-12 02:38:37 +00:00
NAKAMURA Takumi e92e2124f6 llvm/test/CodeGen/AMDGPU/selected-stack-object.ll REQUIRES +Asserts, since it expects assertion failure.
llvm-svn: 275144
2016-07-12 02:18:09 +00:00
Haicheng Wu 1e39574e9f [Kryo] Enable ZCZeroing feature
This feature uses immediate #0 to zero a register.

Differential Revision: http://reviews.llvm.org/D19985

llvm-svn: 275143
2016-07-12 02:04:01 +00:00
Nico Weber c7bf646a99 Teach FastISel about thiscall (and, hence, about callee-pop).
http://reviews.llvm.org/D22115

llvm-svn: 275135
2016-07-12 01:30:35 +00:00
Matt Arsenault 45f8216cee AMDGPU: Remove superfluous string attributes from tests
Also fix v_mac.ll not testing right thing for fneg

llvm-svn: 275129
2016-07-11 23:35:48 +00:00
Mehdi Amini e75aa6f674 Add a libLTO API to query a memory buffer and check if it contains ObjC categories
The linker supports a feature to force load an object from a static
archive if it defines an Objective-C category.
This API supports this feature by looking at every section in the
module to find if a category is defined in the module.

llvm-svn: 275125
2016-07-11 23:10:18 +00:00
Dehao Chen 7ef5820fa3 New pass manager for LICM.
Summary: Port LICM to the new pass manager.

Reviewers: davidxl, silvas

Subscribers: silvas, davide, sanjoy, llvm-commits, mehdi_amini

Differential Revision: http://reviews.llvm.org/D21772

llvm-svn: 275118
2016-07-11 22:45:24 +00:00
Alina Sbirlea cbc6ac2afd Correct ordering of loads/stores.
Summary:
Aiming to correct the ordering of loads/stores. This patch changes the
insert point for loads to the position of the first load.
It updates the ordering method for loads to insert before, rather than after.

Before this patch the following sequence:
"load a[1], store a[1], store a[0], load a[2]"
Would incorrectly vectorize to "store a[0,1], load a[1,2]".
The correctness check was assuming the insertion point for loads is at
the position of the first load, when in practice it was at the last
load. An alternative fix would have been to invert the correctness check.
The current fix changes insert position but also requires reordering of
instructions before the vectorized load.

Updated testcases to reflect the changes.

Reviewers: tstellarAMD, llvm-commits, jlebar, arsenm

Subscribers: mzolotukhin

Differential Revision: http://reviews.llvm.org/D22071

llvm-svn: 275117
2016-07-11 22:34:29 +00:00
Tim Northover 3e0361710a ARM: validate immediate branch targets in AsmParser.
Immediate branch targets aren't commonly used, but if they are we should make
sure they can actually be encoded. This means they must be divisible by 2 when
targeting Thumb mode, and by 4 when targeting ARM mode.

Also do a little naming cleanup while I was changing everything around anyway.

llvm-svn: 275116
2016-07-11 22:29:37 +00:00
Nicolai Haehnle c06bfa1daa AMDGPU: Treat texture gather instructions more like other MIMG instructions
Summary:
Setting MIMG to 0 has a bunch of unexpected side effects, including that
isVMEM returns false which leads to incorrect treatment in the hazard
recognizer. The reason I noticed it is that it also leads to incorrect
treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug.

The only reason why MIMG was set to 0 is to signal the special handling of
dmasks, but that can be checked differently.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22210

llvm-svn: 275113
2016-07-11 21:59:43 +00:00
Zachary Turner dbeaea7b35 Refactor the PDB writing to use a builder approach
llvm-svn: 275110
2016-07-11 21:45:26 +00:00
Zachary Turner f6b9382467 [pdb] Add a pdb2yaml option to not dump file headers.
This will be useful once we start adding the ability to dump type
records and symbol records, since it will allow us to generate
mergeable information instead of information that specifies an
entire file.

llvm-svn: 275109
2016-07-11 21:45:09 +00:00
Nicolai Haehnle f52c3cf272 AMDGPU: fix local stack slot allocation bugs
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.

The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21551

llvm-svn: 275108
2016-07-11 21:44:40 +00:00
Michael Kuperstein f0c59330e9 [X86] Make some cast costs more precise
Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).

Differential Revision: http://reviews.llvm.org/D22064

llvm-svn: 275106
2016-07-11 21:39:44 +00:00
Quentin Colombet fb82c7bc94 [X86] Fix tailcall return address clobber bug.
This bug (llvm.org/PR28124) was introduced by r237977, which refactored
the tail call  sequence to be generated in two passes instead of one.

Unfortunately, the stack adjustment produced by the first pass was not
recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing
code such as the following, which clobbers the return address, to be
generated:

popl    %edi
popl    %edi
pushl   %eax
jmp     tailcallee              # TAILCALL

To fix the problem, the entire stack adjustment is performed in
X86ExpandPseudo::ExpandMI() for tail calls.

Patch by Magnus Lång <margnus1@gmail.com>

Differential Revision: http://reviews.llvm.org/D21325

llvm-svn: 275103
2016-07-11 21:03:03 +00:00
Alina Sbirlea 327955e057 Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer
Summary: Extend TTI to access TLI.allowsMisalignedMemoryAccesses(). Check condition when vectorizing load and store chains.
Add additional parameters: AddressSpace, Alignment, Fast.

Reviewers: llvm-commits, jlebar

Subscribers: arsenm, mzolotukhin

Differential Revision: http://reviews.llvm.org/D21935

llvm-svn: 275100
2016-07-11 20:46:17 +00:00
Michael Kuperstein cfbac5f361 [X86] Disable FixupSetCC for CodeGenOpt::None
It is an optimization pass, and should not run at -O0. Especially since Fast RA
will not do the required register coalescing anyway, so it's a loss even from
the optimization standpoint.

This also works around (but doesn't quite fix) PR28489.

llvm-svn: 275099
2016-07-11 20:40:44 +00:00
Chad Rosier 4f0dad1674 [IPRA] Properly compute register usage at call sites.
Differential Revision: http://reviews.llvm.org/D21395
Patch by Vivek Pandya.
PR28144

llvm-svn: 275087
2016-07-11 18:45:49 +00:00
Zhan Jun Liau def708a0f9 [SystemZ] Recognize Load On Condition Immediate (LOCHI/LOGHI) opportunities
Summary: Add support for the z13 instructions LOCHI and LOCGHI which
conditionally load immediate values.  Add target instruction info hooks so
that if conversion will allow predication of LHI/LGHI.

Author: RolandF

Reviewers: uweigand

Subscribers: zhanjunl

Commiting on behalf of Roland.

Differential Revision: http://reviews.llvm.org/D22117

llvm-svn: 275086
2016-07-11 18:45:03 +00:00
Jingyue Wu 641cfee976 [SLSR] Call getPointerSizeInBits with the correct address space.
llvm-svn: 275083
2016-07-11 18:13:28 +00:00
Davide Italiano e8ae0b5eb4 [PM/IPO] Port LowerTypeTests to the new PassManager.
There's a little bit of churn in this patch because the initialization
mechanism is now shared between the old and the new PM. Other than
that, it's just a pretty mechanical translation.

llvm-svn: 275082
2016-07-11 18:10:06 +00:00
Jacques Pienaar c3a162c451 [lanai] Add more tests for assembly of conditional ALU ops
llvm-svn: 275081
2016-07-11 17:58:16 +00:00
Dehao Chen 9232f98279 Implement callsite-hotness based inline cost for Sample-based PGO
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.

E.g.

if (A1 && A2 && A3 && ..... && A10) {
  for (i=0; i < 100000000; i++) {
    callsite();
  }
}

Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.

In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.

Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.

Reviewers: davidxl, eraman, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22118

llvm-svn: 275073
2016-07-11 16:48:54 +00:00
Dehao Chen 29d2641f52 Tune the weight propagation algorithm for sample profile.
Summary: Handle the case when there is only one incoming/outgoing edge for a visited basic block: use the block weight to adjust edge weight even when the edge has been visited before. This can help reduce inaccuracies introduced by incorrect basic block profile, as shown in the updated unittest.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22180

llvm-svn: 275072
2016-07-11 16:40:17 +00:00
Sanjay Patel 8f1d408c74 [x86] make some of the tests 256-bit for testing diversity
llvm-svn: 275070
2016-07-11 15:08:37 +00:00
Nirav Dave 8603062ee4 Fix branch relaxation in 16-bit mode.
Thread through MCSubtargetInfo to relaxInstruction function allowing relaxation
to generate jumps with 16-bit sized immediates in 16-bit mode.

This fixes PR22097.

Reviewers: dwmw2, tstellarAMD, craig.topper, jyknight

Subscribers: jfb, arsenm, jyknight, llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D20830

llvm-svn: 275068
2016-07-11 14:23:53 +00:00
Sanjay Patel b428951990 [x86] specify triple to avoid bot failures
llvm-svn: 275067
2016-07-11 14:17:54 +00:00
Nicolai Haehnle 889a20cf40 [Sink] Don't move calls to readonly functions across stores
Summary:

Reviewers: hfinkel, majnemer, tstellarAMD, sunfish

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17279

llvm-svn: 275066
2016-07-11 14:11:51 +00:00
Sanjay Patel 0d38830aca [x86] update checks
llvm-svn: 275064
2016-07-11 14:07:31 +00:00
Nirav Dave 53a72f4d3c Provide support for preserving assembly comments
Preserve assembly comments from input in output assembly and flags to
toggle property. This is on by default for inline assembly and off in
llvm-mc.

Parsed comments are emitted immediately before an EOL which generally
places them on the expected line.

Reviewers: rtrieu, dwmw2, rnk, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20020

llvm-svn: 275058
2016-07-11 12:42:14 +00:00
Artem Tamazov 53c9de08d2 [AMDGPU][llvm-mc] Quickfix for r272748 to enable labels in branch instructions.
Fixes issue mentioned at:
  https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra/issues/13.
Lit tests added.

Differential Revision: http://reviews.llvm.org/D22133

llvm-svn: 275054
2016-07-11 12:07:18 +00:00
Zlatko Buljan cba9f80ba8 [mips][microMIPS] Implement LDC1, SDC1, LDC2, SDC2, LWC1, SWC1, LWC2 and SWC2 instructions and add CodeGen support
Differential Revision: http://reviews.llvm.org/D18824

llvm-svn: 275050
2016-07-11 07:41:56 +00:00
Elena Demikhovsky d84f337953 AVX-512: DAG lowering for scalar MIN/MAX commutable ops
DAG lowering was missing for the scalar FMINC, FMAXC nodes.
The nodes are generated only in the "unsafe-fp-math" mode.
Added tests.

llvm-svn: 275048
2016-07-11 06:08:06 +00:00
Craig Topper 7ee070e7bc [AVX512] Add support for 512-bit ANDN now that all ones build vectors survive long enough to allow the matching.
llvm-svn: 275046
2016-07-11 05:36:53 +00:00
Craig Topper 516e14cd8e [AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one vectors.
llvm-svn: 275045
2016-07-11 05:36:48 +00:00
Hal Finkel 02012bcfee Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute
Reverting r275027 and r275033. These seem to cause miscompiles on the AArch64 buildbot.

llvm-svn: 275042
2016-07-11 04:51:23 +00:00
Hal Finkel 2cac58f604 Pointer-comparison folding should look through returned-argument functions
For functions which are known to return a specific argument, pointer-comparison
folding can look through the function calls as part of its analysis.

Differential Revision: http://reviews.llvm.org/D9387

llvm-svn: 275039
2016-07-11 03:37:59 +00:00
Hal Finkel bf3957a553 Teach isDereferenceablePointer to look through returned-argument functions
For functions which are known to return their argument,
isDereferenceableAndAlignedPointer can examine the argument value.

Differential Revision: http://reviews.llvm.org/D9384

llvm-svn: 275038
2016-07-11 03:08:49 +00:00
Hal Finkel e186debb8b Teach SCEV to look through returned-argument functions
When building SCEVs, if a function is known to return its argument, then we can
build the SCEV using the corresponding argument value.

Differential Revision: http://reviews.llvm.org/D9381

llvm-svn: 275037
2016-07-11 02:48:23 +00:00
Hal Finkel 6fd5e1f02b Teach computeKnownBits to look through returned-argument functions
If a function is known to return one of its arguments, we can use that in order
to compute known bits of the return value.

Differential Revision: http://reviews.llvm.org/D9397

llvm-svn: 275036
2016-07-11 02:25:14 +00:00
Hal Finkel 5c12d8fe8f BasicAA should look through functions with returned arguments
Motivated by the work on the llvm.noalias intrinsic, teach BasicAA to look
through returned-argument functions when answering queries. This is essential
so that we don't loose all other AA information when supplementing with
llvm.noalias.

Differential Revision: http://reviews.llvm.org/D9383

llvm-svn: 275035
2016-07-11 01:32:20 +00:00
Hal Finkel d66a7b05db Let FuncAttrs infer the 'returned' argument attribute
A function can have one argument with the 'returned' attribute, indicating that
the associated argument is always the return value of the function. Add
FuncAttrs inference logic.

Differential Revision: http://reviews.llvm.org/D22202

llvm-svn: 275027
2016-07-10 22:02:55 +00:00
Jan Vesely 2fa28c330c AMDGPU/R600: Add implicitarg.ptr intrinsic
Differential Revision: http://reviews.llvm.org/D21622

llvm-svn: 275024
2016-07-10 21:20:29 +00:00
Simon Pilgrim 2191faa433 [X86][SSE] Add support for target shuffle combining to PSHUFLW/PSHUFHW
llvm-svn: 275022
2016-07-10 21:02:47 +00:00
Sanjay Patel ccd08fc8c4 [x86, SSE, AVX] add tests for icmp+zext (PR28484)
Note the inconsistent vpbroadcast generation for AVX2; another bug.

llvm-svn: 275020
2016-07-10 20:45:14 +00:00
Simon Pilgrim 51c786bd91 [X86][SSE] Added tests for combining shuffles to PSHUFLW/PSHUFHW
llvm-svn: 275019
2016-07-10 20:19:56 +00:00
Marcin Koscielnicki cf7cc724a7 [SystemZ] Utilize Test Data Class instructions.
This adds a new SystemZ-specific intrinsic, llvm.s390.tdc.f(32|64|128),
which maps straight to the test data class instructions.  A new IR pass
is added to recognize instructions that can be converted to TDC and
perform the necessary replacements.

Differential Revision: http://reviews.llvm.org/D21949

llvm-svn: 275016
2016-07-10 14:41:22 +00:00
Craig Topper 0b0954570a [AVX512] Add support for lowering to 512-bit SHUFPS.
llvm-svn: 275011
2016-07-10 05:55:53 +00:00
Sean Silva db90d4d9c1 [PM] Port LoopVectorize to the new PM.
llvm-svn: 275000
2016-07-09 22:56:50 +00:00
Simon Pilgrim 606126e848 [X86][SSE] Add support for target shuffle combining to INSERTPS
llvm-svn: 274990
2016-07-09 21:47:55 +00:00
Simon Pilgrim 890b415902 [X86][SSE] Regenerate vector shift tests
llvm-svn: 274987
2016-07-09 20:55:20 +00:00
David Majnemer 28c3646f82 [COFF, Dwarf] Don't emit DW_AT_location for dllimported entities
There exists no relocation which can describe the address of a
dllimported variable: do not try to describe their location.

llvm-svn: 274986
2016-07-09 20:47:48 +00:00
Jingyue Wu debce55ac3 [SLSR] Fix crash on handling 128-bit integers.
ConstantInt::getSExtValue may fail on >64-bit integers. Add checks to call
getSExtValue only on narrow integers.

As a minor aside, simplify slsr-gep.ll to remove unnecessary load instructions.

llvm-svn: 274982
2016-07-09 19:13:18 +00:00
Jacques Pienaar b32a912f72 [lanai] Treat .t as optional in assembly parser for RR operands and add predicate operand to ShiftRR
llvm-svn: 274980
2016-07-09 18:26:04 +00:00
Matt Arsenault c1e6a45f2e AMDGPU: Merge / reorganize tests
llvm-svn: 274972
2016-07-09 08:02:28 +00:00
Matt Arsenault b2cb5f8105 AMDGPU: Simplify tests with per function subtargets
llvm-svn: 274971
2016-07-09 07:55:03 +00:00
Matt Arsenault dfec5ce032 AMDGPU: Fix fdiv lowering when f32 denormals supported
Also fix test not actually using function labels.

llvm-svn: 274969
2016-07-09 07:48:11 +00:00
Craig Topper 70610cf7b6 [X86] Remove and autoupgrade 512-bit non-temporal store intrinsics.
llvm-svn: 274966
2016-07-09 04:38:27 +00:00
Davide Italiano 92b933a55c [PM] Port CrossDSOCFI to the new pass manager.
llvm-svn: 274962
2016-07-09 03:25:35 +00:00
Davide Italiano cd96cfd8df [PM] Port LoopSimplify to the new pass manager.
While here move simplifyLoop() function to the new header, as
suggested by Chandler in the review.

Differential Revision:  http://reviews.llvm.org/D21404

llvm-svn: 274959
2016-07-09 03:03:01 +00:00
Matt Arsenault 1322b6f8bb AMDGPU: Improve offset folding for register indexing
llvm-svn: 274954
2016-07-09 01:13:56 +00:00
Matthias Braun 152e7c8b12 VirtRegMap: Replace some identity copies with KILL instructions.
An identity COPY like this:
   %AL = COPY %AL, %EAX<imp-def>
has no semantic effect, but encodes liveness information: Further users
of %EAX only depend on this instruction even though it does not define
the full register.

Replace the COPY with a KILL instruction in those cases to maintain this
liveness information. (This reverts a small part of r238588 but this
time adds a comment explaining why a KILL instruction is useful).

llvm-svn: 274952
2016-07-09 00:19:07 +00:00
Piotr Padlewski 7a298c1df0 Added REQUIRES to TestingGuide documentation
Reviewers: alexfh, wolfgangp, rengolin

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22172

llvm-svn: 274949
2016-07-08 23:47:29 +00:00
Piotr Padlewski 3b77612839 Add 'thinlto_src_module' md with asserts or -enable-import-metadata
Summary:
This way the metadata will be only generated when asserts enabled,
or when -enable-import-metadata specified

FIXED missing colon on requires.

Reviewers: tejohnson, eraman, mehdi_amini

Subscribers: mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D22167

llvm-svn: 274947
2016-07-08 23:01:49 +00:00
Piotr Padlewski d4b792346c Revert "Add 'thinlto_src_module' md with asserts or -enable-import-metadata"
Reverting because of 17463
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/17463

This reverts commit d20cb431bba2ba43b4c65a8556cff445bfefbb7c.

llvm-svn: 274946
2016-07-08 22:55:48 +00:00
Jacques Pienaar 9e70127b0a [lanai] Update test to use peephole-opt and not peephole-opts
llvm-svn: 274945
2016-07-08 22:28:29 +00:00
Anna Thomas 9ad45adfd7 Revert "InstCombine rule to fold truncs whose value is available"
This reverts commit r274853.
Caused failure in ppcBE build

llvm-svn: 274943
2016-07-08 22:15:08 +00:00
David Majnemer 230bbfbeec [MC, COFF] Permit a variable to be redefined
Our assertions in WinCOFFStreamer had unexpected side effects resulting
in symbols getting unexpectedly marked as used.

This fixes PR28462.

llvm-svn: 274941
2016-07-08 21:54:16 +00:00
Piotr Padlewski d6efefa2b8 Add 'thinlto_src_module' md with asserts or -enable-import-metadata
Summary:
This way the metadata will be only generated when asserts enabled,
or when -enable-import-metadata specified

Reviewers: tejohnson, eraman, mehdi_amini

Subscribers: mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D22167

llvm-svn: 274938
2016-07-08 21:25:39 +00:00
Matt Arsenault 3fb8f9eabf Reapply r274829 with fix for FP vectors
llvm-svn: 274937
2016-07-08 21:25:33 +00:00
Adam Nemet f836067cc0 [LAA] Port test to the new PM
This is a follow-on to r274452.

The LAA with the new PM is a loop pass so we go from inner to outer loops.

Also using a CHECK-NOT didn't make much sense because we print something
in either case; whether an invariant is 'found' or 'not found'.

llvm-svn: 274935
2016-07-08 21:24:06 +00:00
Sanjay Patel 664514f7fe [InstCombine] don't form select from bitcasted logic ops if bitcasts have >1 use
This isn't a sure thing (are 2 extra bitcasts less expensive than a logic op?), 
but we'll try to err on the conservative side by going with the case that has
less IR instructions.

Note: This question came up in http://reviews.llvm.org/D22114 , but this part is
independent of that patch proposal, so I'm making this small change ahead of that
one. 

See also:
http://reviews.llvm.org/rL274926

llvm-svn: 274932
2016-07-08 21:17:51 +00:00
Sanjay Patel 5246482c7a add another multi-use test for logic->select transform
llvm-svn: 274929
2016-07-08 21:08:16 +00:00
Sanjay Patel f4a08ede03 [InstCombine] don't form select from logic ops if it's unlikely that we'll eliminate any ops
llvm-svn: 274926
2016-07-08 20:53:29 +00:00
Sanjay Patel 297a0e67b6 adjust test so it won't completely optimize away
llvm-svn: 274925
2016-07-08 20:35:53 +00:00
Sanjay Patel 0733e6b61c add tests for multi-use folding to select
llvm-svn: 274922
2016-07-08 20:22:27 +00:00
Dehao Chen 429f5c735f Remove inline hints computation from SampleProfile.cpp
Summary: As we will move to use uniformed hotness check in inliner, we do not need inline hints in SampleProfile pass any more.

Reviewers: dnovillo, davidxl

Subscribers: eraman, llvm-commits

Differential Revision: http://reviews.llvm.org/D19287

llvm-svn: 274918
2016-07-08 20:12:44 +00:00
Nico Weber 28410c6846 Revert r274829, it caused PR28472.
llvm-svn: 274916
2016-07-08 19:52:19 +00:00
Simon Pilgrim 0a0e0d4e8e [X86] Regenerated bitreverse tests to demonstrate what is going on.
llvm-svn: 274915
2016-07-08 19:51:08 +00:00
Simon Pilgrim aaaeedb8cb [X86] Added bitreverse tests for non-legal types
Requested on D21578

llvm-svn: 274914
2016-07-08 19:48:33 +00:00
Simon Pilgrim 950419f948 [X86][AVX2] Add support for target shuffle combining to VPERMPD/VPERMQ
llvm-svn: 274908
2016-07-08 19:23:29 +00:00
Davide Italiano d555bde59f [SCCP] Fold constants as we build them whne visiting cast instructions.
This should be slightly more efficient and could avoid spurious overdefined
markings, as Eli pointed out.

Differential Revision:  http://reviews.llvm.org/D22122

llvm-svn: 274905
2016-07-08 19:13:40 +00:00
Sanjay Patel 1b6b824548 [InstCombine] check for one-use before turning simple logic op into a select
llvm-svn: 274891
2016-07-08 17:26:47 +00:00
Simon Pilgrim 4ca42e232d [SLPVectorizer][X86] Added fma vectorization tests
llvm-svn: 274889
2016-07-08 17:19:13 +00:00
Sanjay Patel 910ce0d511 add test to show multi-use output
llvm-svn: 274887
2016-07-08 17:12:27 +00:00
Simon Pilgrim b600ba3b79 [X86][AVX] Added combine test that should simplify to insertps
llvm-svn: 274884
2016-07-08 17:01:42 +00:00
Sanjay Patel cbfca9e8ef [InstCombine] allow or(sext(A), B) --> A ? -1 : B transform for vectors
llvm-svn: 274883
2016-07-08 17:01:15 +00:00
Zhan Jun Liau 7d4d436c74 [SystemZ] Add support for the .word directive.
Summary: Branch off the work to add support for the .word directive,
using addAliasForDirective.

Reviewers: koriakin

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22142

llvm-svn: 274878
2016-07-08 16:50:02 +00:00
Sanjay Patel 647174c8a4 add vector tests to show missing transform
llvm-svn: 274876
2016-07-08 16:39:53 +00:00
Matt Arsenault 44540a3db2 PeepholeOptimizer: Make pass name match DEBUG_TYPE
llvm-svn: 274874
2016-07-08 16:29:11 +00:00
Zhan Jun Liau 3b4c3f4d51 [SystemZ] Add support for missing instructions
Summary:
Add support to allow clang integrated assembler to recognize some
missing instructions, for openssl.

Instructions are:
LM, LMH, LMY, STM, STMH, STMY, ICM, ICMH, ICMY, SLA, SLAK, TML, TMH, EX, EXRL.

Reviewers: uweigand

Subscribers: koriakin, llvm-commits

Differential Revision: http://reviews.llvm.org/D22050

llvm-svn: 274869
2016-07-08 16:18:40 +00:00
Sanjay Patel 46df968326 minimize tests
The cmp and load aren't required.

llvm-svn: 274864
2016-07-08 16:11:48 +00:00
Sanjay Patel e1acad9b61 regenerate checks
llvm-svn: 274860
2016-07-08 16:06:38 +00:00
Chris Dewhurst 3202f065b8 [Sparc] Leon errata fix passes.
Errata fixes for various errata in different versions of the Leon variants of the Sparc 32 bit processor.

The nature of the errata are listed in the comments preceding the errata fix passes. Relevant unit tests are implemented for each of these.

Note: Running clang-format has changed a few other lines too, unrelated to the implemented errata fixes. These have been left in as this keeps the code formatting consistent.

Differential Revision: http://reviews.llvm.org/D21960

llvm-svn: 274856
2016-07-08 15:33:56 +00:00
Sjoerd Meijer 1ee119f897 Do not expand SDIV when compiling for minimum code size
Differential Revision: http://reviews.llvm.org/D22139

llvm-svn: 274855
2016-07-08 15:32:01 +00:00
Anna Thomas 3124f6273a InstCombine rule to fold truncs whose value is available
We can fold truncs whose operand feeds from a load, if the trunc value
is available through a prior load/store.

This change is from: http://reviews.llvm.org/D21246, which folded the
trunc but missed the bitcast or ptrtoint/inttoptr required in the RAUW
call, when the load type didnt match the prior load/store type.

Differential Revision: http://reviews.llvm.org/D21791

llvm-svn: 274853
2016-07-08 15:18:56 +00:00
Valery Pykhtin 68853ab2c5 [AMDGPU] fix ds_swizzle_b32 opcode for VI (bz 28371)
Differential Revision: http://reviews.llvm.org/D22049

llvm-svn: 274852
2016-07-08 15:12:46 +00:00
Sjoerd Meijer 46c4c3d31c Addressing post-commit comments regarding not expanding UDIV;
we don't expand only when compiling for minimum code size.

llvm-svn: 274847
2016-07-08 14:17:09 +00:00
Simon Pilgrim 4f1877fb57 [X86][SSE] Improve constant folding tests for CVTSD/CVTSS/CVTTSD/CVTTSS
As discussed on D22106, improve the testing for constant folding sse scalar conversion intrinsics to ensure we are correctly handling special/out of range cases

llvm-svn: 274846
2016-07-08 13:28:34 +00:00
Sjoerd Meijer a625af3feb Code size optimisation: don't expand a div to a mul and and a shift sequence.
As a result, the urem instruction will not be expanded to a sequence of umull,
lsrs, muls and sub instructions, but just a call to __aeabi_uidivmod.

Differential Revision: http://reviews.llvm.org/D22131

llvm-svn: 274843
2016-07-08 12:54:43 +00:00
Simon Pilgrim 828c731880 [X86][SSE] Accept any shuffle mask that is all zeroes
Until we have a better way to extract constants through bitcasted build vectors (and how to handle undefs of partial lanes etc.) at least accept build vectors that are all zeroes.

llvm-svn: 274833
2016-07-08 10:39:12 +00:00
Matt Arsenault c3a6fe6ecd Bug 28444: Fix assertion when extract_vector_elt has mismatched type
For some reason extract_vector_elt is sometimes allowed to have
a different result type than the vector element type.

llvm-svn: 274829
2016-07-08 07:05:00 +00:00
Craig Topper f7bf6de0af [AVX512] Remove and autoupgrade a duplicate set of 512-bit masked shift intrinsics.
I'm not sure if clang ever used these builtin names or not.

llvm-svn: 274827
2016-07-08 06:14:47 +00:00
Wei Mi 90d195a5fd [PM] Port UnreachableBlockElim to the new Pass Manager
Differential Revision: http://reviews.llvm.org/D22124

llvm-svn: 274824
2016-07-08 03:32:49 +00:00
Saleem Abdulrasool eb059b0e0a ARM: support high registers in __builtin_longjmp on WoA
Windows on ARM uses a pure thumb-2 environment.  This means that it can select a
high register when doing a __builtin_longjmp.  We would use a tLDRi which would
truncate the register to a low register.  Use a t2LDRi12 to get the full
register file access.  Tweak the code to just load into PC, as that is an
interworking branch on all supported cores anyways.

llvm-svn: 274815
2016-07-08 00:48:22 +00:00
Andrew Kaylor 3387074ae9 Temporarily remove a test case to unblock PPC bots.
llvm-svn: 274813
2016-07-08 00:35:39 +00:00
Andrew Kaylor 8b8805c94c Temporarily remove one test run line to unblock PPC bots.
llvm-svn: 274812
2016-07-08 00:32:58 +00:00
Jacques Pienaar 6d3eecc843 [lanai] Use peephole optimizer to generate more conditional ALU operations.
Summary:
* Similiar to the ARM backend yse the peephole optimizer to generate more conditional ALU operations;
* Add predicated type with default always true to RR instructions in LanaiInstrInfo.td;
* Move LanaiSetflagAluCombiner into optimizeCompare;
* The ASM parser can currently only handle explicitly specified CC, so specify ".t" (true) where needed in the ASM test;
* Remove unused MachineOperand flags;

Reviewers: eliben

Subscribers: aemerson

Differential Revision: http://reviews.llvm.org/D22072

llvm-svn: 274807
2016-07-07 23:36:04 +00:00
Michael Kuperstein 3e3652aef2 Recommit r274692 - [X86] Transform setcc + movzbl into xorl + setcc
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.

The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD)
which was not appreciated by fast regalloc on 32-bit.

llvm-svn: 274802
2016-07-07 22:50:23 +00:00
Vedant Kumar 0fdffd3709 [tsan] Try harder to not instrument gcov counters
GCOVProfiler::emitProfileArcs() can create many variables with names
starting with "__llvm_gcov_ctr", so llvm appends a numeric suffix to
most of them. Teach tsan about this.

llvm-svn: 274801
2016-07-07 22:45:28 +00:00
Kevin Enderby 1851a827a0 Add checks to the MachOObjectFile() constructor to make sure load commands sizes
are the correct multiple.

llvm-svn: 274798
2016-07-07 22:11:42 +00:00
Davide Italiano 16284df8ec [PM] Port InstSimplify to the new pass manager.
llvm-svn: 274796
2016-07-07 21:14:36 +00:00
Anna Thomas 6a78c78a03 [DSE] Remove dead stores in end blocks containing fence
We can remove dead stores in the presence of fence instructions. Fence
does not change an otherwise thread local store to visible.

reviewers: reames, dexonsmith, jfb
Differential Revision: http://reviews.llvm.org/D22001

llvm-svn: 274795
2016-07-07 20:51:42 +00:00
Chad Rosier 112d0e996b [AArch64] Change the preferred alignment for char and short to word alignment.
The commit reinstates r273279, which was informally approved.

Original Review: http://reviews.llvm.org/D21414

This reverts commit ca632c91aaa7cafc50942f890c49f727a046ace1.

llvm-svn: 274790
2016-07-07 20:02:18 +00:00
Andrew Kaylor 65fa0704aa Include SelectionDAGISel in the opt-bisect process
Differential Revision: http://reviews.llvm.org/D21143

llvm-svn: 274786
2016-07-07 18:55:02 +00:00
Peter Collingbourne 73589f321b ThinLTO: Do not take into account whether a definition has multiple copies when promoting.
We currently do not touch a symbol's linkage in the case where a definition
has a single copy. However, this code is effectively unnecessary: either
the definition is not exported, in which case the internalize phase sets
its linkage to internal, or it is exported, in which case we need to promote
linkage to weak. Those two cases are already handled by existing code.

I believe that the only real functional change here is in the case where we
have a single definition which does not prevail (e.g. because the definition
in a native object file prevails). In that case we now lower linkage to
available_externally following the existing code path for that case.

As a result we can remove the isExported function parameter from the
thinLTOResolveWeakForLinkerInIndex function.

Differential Revision: http://reviews.llvm.org/D21883

llvm-svn: 274784
2016-07-07 18:31:51 +00:00
Tim Northover 1d106c5fc2 tests: accept different TargetOpcode values.
These tests don't actually care about the internal opcode number, but have to
be updated whenever we add a new one for GlobalISel. That's bad.

llvm-svn: 274774
2016-07-07 17:51:42 +00:00
Michael Kuperstein edb38a94f8 Revert r274692 to check whether this is what breaks windows selfhost.
llvm-svn: 274771
2016-07-07 16:55:35 +00:00
Justin Bogner a466cc33fa NVPTX: Remove the legacy ptx intrinsics
- Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but
  not all of these registers were already accessible via the nvvm
  name.
- Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0.

There's a fair amount of code motion here, but it's all very
mechanical.

llvm-svn: 274769
2016-07-07 16:40:17 +00:00
Chad Rosier 3972953efd Revert "[AArch64] Change the preferred alignment for char and short to word alignment"
This reverts commit r273279 as the change was not properly approved.

llvm-svn: 274768
2016-07-07 16:37:29 +00:00
Valery Pykhtin af8b1bddbd [AMDGPU] fix ds_write_src2 encoding (bz26027)
Differential revision: http://reviews.llvm.org/D22041

llvm-svn: 274756
2016-07-07 14:23:38 +00:00
Rafael Espindola b34cba97b7 Don't crash trying to relax 32 loads on COFF.
Fixes pr28452.

llvm-svn: 274754
2016-07-07 14:00:07 +00:00
Sjoerd Meijer 17c08dc701 Code size optimisation: don't rewrite fputs to fwrite when optimising for size
because fwrite requires more arguments and thus extra MOVs are required.

llvm-svn: 274753
2016-07-07 13:56:23 +00:00
David Majnemer 7afb46d3c8 [LoopAccessAnalysis] Fix an integer overflow
We were inappropriately using 32-bit types to account for quantities
that can be far larger.

Fixed in PR28443.

llvm-svn: 274737
2016-07-07 06:24:36 +00:00
Craig Topper d5d2a35013 [AVX512] Zero extend the result of vpcmpeq/vpcmpgt and similar intrinsics in the autoupgrade code. This currently results in worse codegen but is needed for correctness.
llvm-svn: 274736
2016-07-07 06:11:07 +00:00
Elena Demikhovsky fc1e969dfc Fixed a bug in vectorizing GEP before gather/scatter intrinsic.
Vectorizing GEP was incorrect and broke SSA in some cases.
 
The patch fixes PR27997 https://llvm.org/bugs/show_bug.cgi?id=27997.

Differential revision: http://reviews.llvm.org/D22035

llvm-svn: 274735
2016-07-07 06:06:46 +00:00
David Majnemer a54fe1acdc [CodeView] Implement support for thread-local variables
llvm-svn: 274734
2016-07-07 05:14:21 +00:00
Qin Zhao c35b2cba6f [esan:cfrag] Add option -esan-aux-field-info
Summary:
Adds option -esan-aux-field-info to control generating binary with
auxiliary struct field information.

Extracts code for creating auxiliary information from
createCacheFragInfoGV into createCacheFragAuxGV.

Adds test struct_field_small.ll for -esan-aux-field-info test.

Reviewers: aizatsky

Subscribers: llvm-commits, bruening, eugenis, kcc, zhaoqin, vitalybuka

Differential Revision: http://reviews.llvm.org/D22019

llvm-svn: 274726
2016-07-07 03:20:16 +00:00
Peter Collingbourne 730c82e6b8 ThinLTO: Remove check for multiple modules before applying weak resolutions.
This check is not only unnecessary, it can produce the wrong result. If we
are linking a single module and it has an exported linkonce symbol, we need
to promote to weak in order to avoid PR19901-style problems.

Differential Revision: http://reviews.llvm.org/D21917

llvm-svn: 274722
2016-07-07 01:51:11 +00:00
Sean Silva 284b0324e2 [PM] Avoid getResult on a higher level in LoopAccessAnalysis
Note that require<domtree> and require<loops> aren't needed because they
come in implicitly via the loop pass manager.

llvm-svn: 274712
2016-07-07 01:01:53 +00:00
Sean Silva 59fe82f4ce [PM] Port TailCallElim
llvm-svn: 274708
2016-07-06 23:48:41 +00:00
Sean Silva b025d375a1 [PM] Port CorrelatedValuePropagation
llvm-svn: 274705
2016-07-06 23:26:29 +00:00
Peter Collingbourne d1d2614ee1 ThinLTO: Add test cases for promote+internalize.
This tests the effect of both promotion and internalization on a module,
and helps show that D21883 is NFC wrt promotion+internalization.

Differential Revision: http://reviews.llvm.org/D21915

llvm-svn: 274699
2016-07-06 22:53:02 +00:00
Sanjay Patel 65a51c25c1 [InstCombine] enhance (select X, C1, C2 --> ext X) to handle vectors
By replacing dyn_cast of ConstantInt with m_Zero/m_One/m_AllOnes, we
allow these transforms for splat vectors.

Differential Revision: http://reviews.llvm.org/D21899

llvm-svn: 274696
2016-07-06 22:23:01 +00:00
Manman Ren 524ca27b90 Add testing coverage for r274582.
llvm-svn: 274693
2016-07-06 22:01:28 +00:00
Michael Kuperstein 1ef6c59b1d [X86] Transform setcc + movzbl into xorl + setcc
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.

This fixes PR28146.

Differential Revision: http://reviews.llvm.org/D21774

llvm-svn: 274692
2016-07-06 21:56:18 +00:00
Vedant Kumar 4c01092a25 [llvm-cov] Add support for creating html reports
Based on a patch by Harlan Haskins!

Differential Revision: http://reviews.llvm.org/D18278

llvm-svn: 274688
2016-07-06 21:44:05 +00:00
Matthias Braun ad0032a649 AArch64: Change modeling of zero cycle zeroing.
On CPUs with the zero cycle zeroing feature enabled "movi v.2d" should
be used to zero a vector register. This was previously done at
instruction selection time, however the register coalescer sometimes
widened multiple vregs to the Q width because of that leading to extra
spills. This patch leaves the decision on how to zero a register to the
AsmPrinter phase where it doesn't affect register allocation anymore.

This patch also sets isAsCheapAsAMove=1 on FMOVS0, FMOVD0.

This fixes http://llvm.org/PR27454, rdar://25866262

Differential Revision: http://reviews.llvm.org/D21826

llvm-svn: 274686
2016-07-06 21:39:33 +00:00
Chad Rosier 232e29ebea [MemorySSA] Reinstate the legacy printer and verifier.
Differential Revision: http://reviews.llvm.org/D22058

llvm-svn: 274679
2016-07-06 21:20:47 +00:00
Rafael Espindola a29971faeb Add initial support for R_386_GOT32X.
This adds it only for movl mov@GOT(%reg), %reg.

llvm-svn: 274678
2016-07-06 21:19:11 +00:00
David Majnemer 7abd269aa9 [CodeView] Emit an appropriate symbol kind for globals
We emitted debug info for globals/functions as if they all had external
linkage.  Instead, emit local symbol records when appropriate.

llvm-svn: 274676
2016-07-06 21:07:47 +00:00
David Majnemer e1e7372e93 [CodeView] Unions are always sealed
It is impossible to inherit from a union.  We are missing a way to
represent this in IR for classes/structs...

llvm-svn: 274675
2016-07-06 21:07:42 +00:00
Justin Lebar 6f9d01bbd5 [NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.
Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D22068

llvm-svn: 274674
2016-07-06 21:06:10 +00:00
Haicheng Wu a95cd1267f [LIR] Fix mis-compilation with unwinding.
To fix PR27859, bail out if there is an instruction may throw.

Differential Revision: http://reviews.llvm.org/D20638

llvm-svn: 274673
2016-07-06 21:05:40 +00:00
Piotr Padlewski 6deaa6afae Add 'thinlto_src_module' metadata to imported function
Added metadata to be able to make statistics on how many functions
that have been imported have been removed. Also module name might
be helpfull when debugging.

Reviewers: tejohnson, eraman

Subscribers: mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D21943

llvm-svn: 274668
2016-07-06 20:26:25 +00:00
Derek Bruening d712a3c10e [esan|wset] Fix incorrect memory size assert
Summary:
Fixes an incorrect assert that fails on 128-bit-sized loads or stores.
Augments the wset tests to include this case.

Reviewers: aizatsky

Subscribers: vitalybuka, zhaoqin, kcc, eugenis, llvm-commits

Differential Revision: http://reviews.llvm.org/D22062

llvm-svn: 274666
2016-07-06 20:13:53 +00:00
Justin Bogner a463537a36 NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0
Everywhere where cuda.syncthreads or __syncthreads is used, use the
properly namespaced nvvm.barrier0 instead.

llvm-svn: 274664
2016-07-06 20:02:45 +00:00
Adrian McCarthy 820ca5404c Retry: "Emit CodeView type records for nested classes."
Now with a corrected test to account for a recently supported properties bit in the debug info of a struct.

Original review: http://reviews.llvm.org/D21939

This reverts commit 970c3fd497a28d25dd69526eb52594a696c37968.

llvm-svn: 274661
2016-07-06 19:49:51 +00:00
Chad Rosier dcfce2d0ec [DSE] Avoid iterator invalidation bugs.
The dse_with_dbg_value.ll test committed with r273141 is removed because this
we no longer performs any type of back tracking, which is what was causing the
codegen differences with and without debug information.

Differential Revision: http://reviews.llvm.org/D21613

llvm-svn: 274660
2016-07-06 19:48:52 +00:00
Sanjay Patel 04b3496d9b [x86] fix cost of SINT_TO_FP for i32 --> float (PR21356, PR28434)
This is "cvtdq2ps" which does not appear to be particularly slow on any CPU
according to Agner's tables. Choosing "5" as a cost here as suggested in:
https://llvm.org/bugs/show_bug.cgi?id=21356
...but it seems very conservative given that the instruction is fully pipelined,
and I think these costs are supposed to model throughput.

Note that related costs are also most likely too high, but this fixes PR21356
and partly fixes PR28434.

llvm-svn: 274658
2016-07-06 19:15:54 +00:00
Sean Silva f50d4b6cdc Work around PR28400 a bit harder.
We were still crashing in the "no change" case because LVI was not
getting invalidated.

See the thread "Should analyses be able to hold AssertingVH to IR?
(related to PR28400)" for more discussion.

llvm-svn: 274656
2016-07-06 19:05:41 +00:00
Elliot Colp bc2cfc2291 [SystemZ] Remove AND mask of bottom 6 bits when result is used for shift/rotate
On SystemZ, shift and rotate instructions only use the bottom 6 bits of the shift/rotate amount.
Therefore, if the amount is ANDed with an immediate mask that has all of the bottom 6 bits set, we
can remove the AND operation entirely.

Differential Revision: http://reviews.llvm.org/D21854

llvm-svn: 274650
2016-07-06 18:13:11 +00:00
Zachary Turner 8848a7a6b2 [pdb] Round trip the PDB stream between YAML and binary PDB.
This gets writing of the PDB stream working.

llvm-svn: 274647
2016-07-06 18:05:57 +00:00
Kit Barton f9d0a40573 Ensure all uses of permute instructions feed vector stores
There is a problem in VSXSwapRemoval where it is incorrectly removing permute instructions.
In this case, the permute is feeding both a vector store and also a non-store instruction. In this case, the permute cannot be removed.

The fix is to simply look at all the uses of the vector register defined by the permute and ensure that all the uses are vector store instructions.

This problem was reported in PR 27735 (https://llvm.org/bugs/show_bug.cgi?id=27735).

Test case based on the original problem reported.

Phabricator Review: http://reviews.llvm.org/D21802

llvm-svn: 274645
2016-07-06 18:03:52 +00:00
Tim Shen 1c3c0afc53 [DAGCombiner] Fix visitSTORE to continue processing current SDNode, if findBetterNeighborChains doesn't actually CombineTo it.
Summary:
findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed.

This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase.

Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D21692

llvm-svn: 274644
2016-07-06 17:44:03 +00:00
Michael Kuperstein aa71bdd3af [TTI] The cost model should not assume vector casts get completely scalarized
The cost model should not assume vector casts get completely scalarized, since
on targets that have vector support, the common case is a partial split up to
the legal vector size. So, when a vector cast  gets split, the resulting casts
end up legal and cheap.

Instead of pessimistically assuming scalarization, base TTI can use the costs
the concrete TTI provides for the split vector, plus a fudge factor to account
for the cost of the split itself. This fudge factor is currently 1 by default,
except on AMDGPU where inserts and extracts are considered free.

Differential Revision: http://reviews.llvm.org/D21251

llvm-svn: 274642
2016-07-06 17:30:56 +00:00
Adrian McCarthy 7649d8388a Revert "Emit CodeView type records for nested classes."
This reverts commit 256b29322c827a2d94da56468c936596f5509032.

llvm-svn: 274632
2016-07-06 15:14:10 +00:00
Simon Pilgrim 118da63a9d [X86][SSE] Added test cases for missed opportunities to combine pshufb to pslldq/psrldq
llvm-svn: 274631
2016-07-06 15:09:48 +00:00
Adrian McCarthy 024a7b6358 Emit CodeView type records for nested classes.
Differential Revision: http://reviews.llvm.org/D21939

llvm-svn: 274629
2016-07-06 14:47:32 +00:00
Matthew Simpson 433cb1dfe3 [LV] Don't widen trivial induction variables
We currently always vectorize induction variables. However, if an induction
variable is only used for counting loop iterations or computing addresses with
getelementptr instructions, we don't need to do this. Vectorizing these trivial
induction variables can create vector code that is difficult to simplify later
on. This is especially true when the unroll factor is greater than one, and we
create vector arithmetic when computing step vectors. With this patch, we check
if an induction variable is only used for counting iterations or computing
addresses, and if so, scalarize the arithmetic when computing step vectors
instead. This allows for greater simplification.

This patch addresses the suboptimal pointer arithmetic sequence seen in
PR27881.

Reference: https://llvm.org/bugs/show_bug.cgi?id=27881
Differential Revision: http://reviews.llvm.org/D21620

llvm-svn: 274627
2016-07-06 14:26:59 +00:00
Elena Demikhovsky ad0a56f3da Re-commit of 274613.
The prev commit failed on compilation.
A minor change in one pattern in lib/Target/X86/X86InstrAVX512.td fixes the failure.

llvm-svn: 274626
2016-07-06 14:15:43 +00:00
Sam Kolton 3c21a69077 [AMDGPU] Assembler: regression tests for bug 28413. NFC
llvm-svn: 274623
2016-07-06 12:52:20 +00:00
Diana Picus b772e409ba [ARM] Do not test for CPUs, use SubtargetFeatures. Also remove 2 flags.
This is a follow-up for r273544.

The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.

This commit also removes two command-line flags that weren't used in any of the
tests: widen-vmovs and swift-partial-update-clearance. The former may be easily
replaced with the mattr mechanism, but the latter may not (as it is a subtarget
property, and not a proper feature).

Differential Revision: http://reviews.llvm.org/D21797

llvm-svn: 274620
2016-07-06 11:22:11 +00:00
Elena Demikhovsky 02ced295aa Reverted 274613 due to compilation failue.
llvm-svn: 274615
2016-07-06 09:11:49 +00:00
Elena Demikhovsky 5a4f2476fd AVX-512: Optimization for patterns with i1 scalar type
The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc".
I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction.
I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions.

This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173.

Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails).

Differential revision: http://reviews.llvm.org/D21956

llvm-svn: 274613
2016-07-06 09:01:20 +00:00
Nicolai Haehnle e40530ea7b AMDGPU: Fix return of non-void-returning shaders
Summary:
Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that
ensures that a non-void-returning shader falls off the end of the last
basic block was effectively disabled, since SI_RETURN is now used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21975

llvm-svn: 274612
2016-07-06 08:35:17 +00:00
Elena Demikhovsky 971fbfda1e Vector GEP test: renamed + some comments
Differential revision: http://reviews.llvm.org/D21957

llvm-svn: 274611
2016-07-06 08:11:23 +00:00
Daniel Berlin fc7e651bfd Fix handling of forward unreachable but reverse-reachable blocks in MemorySSA construction
llvm-svn: 274606
2016-07-06 05:32:05 +00:00
George Burgess IV bfa401e5ad [CFLAA] Split into Anders+Steens analysis.
StratifiedSets (as implemented) is very fast, but its accuracy is also
limited. If we take a more aggressive andersens-like approach, we can be
way more accurate, but we'll also end up being slower.

So, we've decided to split CFLAA into CFLSteensAA and CFLAndersAA.

Long-term, we want to end up in a place where CFLSteens is queried
first; if it can provide an answer, great (since queries are basically
map lookups). Otherwise, we'll fall back to CFLAnders, BasicAA, etc.

This patch splits everything out so we can try to do something like
that when we get a reasonable CFLAnders implementation.

Patch by Jia Chen.

Differential Revision: http://reviews.llvm.org/D21910

llvm-svn: 274589
2016-07-06 00:26:41 +00:00
Ryan Govostes e51401bdab [asan] Add a hidden option for Mach-O global metadata liveness tracking
llvm-svn: 274578
2016-07-05 21:53:08 +00:00
Tim Northover e6ae6767d9 AArch64: TableGenerate system instruction operands.
The way the named arguments for various system instructions are handled at the
moment has a few problems:

  - Large-scale duplication between AArch64BaseInfo.h and AArch64BaseInfo.cpp
  - That weird Mapping class that I have no idea what I was on when I thought
    it was a good idea.
  - Searches are performed linearly through the entire list.
  - We print absolutely all registers in upper-case, even though some are
    canonically mixed case (SPSel for example).
  - The ARM ARM specifies sysregs in terms of 5 fields, but those are relegated
    to comments in our implementation, with a slightly opaque hex value
    indicating the canonical encoding LLVM will use.

This adds a new TableGen backend to produce efficiently searchable tables, and
switches AArch64 over to using that infrastructure.

llvm-svn: 274576
2016-07-05 21:23:04 +00:00
Balaram Makam d4acd7ed10 Revert r259387: "AArch64: Implement missed conditional compare sequences."
This reverts commit r259387 because it inserts illegal code after legalization
    in some backends where i64 OR type is illegal for example.

llvm-svn: 274573
2016-07-05 20:24:05 +00:00
Simon Pilgrim bec6543d17 [X86][AVX2] Add support for target shuffle combining to BROADCAST
Only support broadcast from vector register so far - memory folding support will have to wait.

llvm-svn: 274572
2016-07-05 20:11:29 +00:00
Simon Pilgrim 48adedffb7 [X86][AVX512] Fixed decoding of permd/permpd variable mask shuffles + enabled them for target shuffle combining
Corrected element mask masking to extract the bottom index bits (now matches the perm2 implementation but for unary inputs).

llvm-svn: 274571
2016-07-05 18:31:17 +00:00
Saleem Abdulrasool 4d950ef892 ARM: fix `-mlong-calls` for WoA
Not all code-paths set the relocation model to static for Windows.  This
currently breaks on Windows ARM with `-mlong-calls` when built with clang.
Loosen the assertion to what it was previously.  We would ideally ensure that
all the configuration sets Windows to static relocation model.

llvm-svn: 274570
2016-07-05 18:30:52 +00:00
Matt Arsenault 2d79389508 DAGCombiner: Fold away vector extract of insert with the same index
This only really matters when the index is non-constant since the
constant case already gets taken care of by other combines.

llvm-svn: 274569
2016-07-05 18:25:02 +00:00
Tim Northover 01dff9d18a AArch64: use correct SDValue # when looking for bitfield placement.
The other use really does only care about the SDNode (it checks the
opcode against a whitelist), but bitFieldPlacement can be misled if
the node produces multiple results.

Patch by Ismail Badawi.

llvm-svn: 274567
2016-07-05 18:02:57 +00:00
Matt Arsenault ffc8275f2b AMDGPU: Fix folding SGPRs into madak/madmk src0
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.

r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.

llvm-svn: 274564
2016-07-05 17:09:01 +00:00
Sam Kolton a9cd6aa895 [AMDGPU] Assembler: Fix parsing error with floating-point literals passed to integer instructions
Differential Revision: http://reviews.llvm.org/D21972

llvm-svn: 274551
2016-07-05 14:01:11 +00:00
Simon Pilgrim 4e96fbf3c1 [X86][AVX512] Autoupgrade the BROADCAST intrinsics
llvm-svn: 274550
2016-07-05 13:58:47 +00:00
Simon Pilgrim 1e91654b38 [X86][AVX512BW] Added BROADCAST intrinsics fast-isel generic IR tests
llvm-svn: 274545
2016-07-05 13:16:05 +00:00
James Molloy ae5ff990ae [Thumb] Reapply r272251 with a fix for PR28348 (mk 2)
The important thing I was missing was ensuring newly added constants were kept in topological order. Repositioning the node is correct if the constant is newly added (so it has no topological ordering) but wrong if it already existed - positioning it next in the worklist would break the topological ordering.

Original commit message:
  [Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated

  If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;

    int i(int a) {
      return a & 0xfffffeec;
    }

  Used to produce:
      ldr r1, [CONSTPOOL]
      ands r0, r1
    CONSTPOOL: 0xfffffeec

  And now produces:
      movs    r1, #255
      adds    r1, #20  ; Less costly immediate generation
      bics    r0, r1

llvm-svn: 274543
2016-07-05 12:37:13 +00:00
Simon Pilgrim 20ede63a33 [X86][AVX512] Added BROADCAST intrinsics fast-isel generic IR tests
llvm-svn: 274537
2016-07-05 10:15:14 +00:00
Nemanja Ivanovic 44513e545f [PowerPC] - Legalize vector types by widening instead of integer promotion
This patch corresponds to review:
http://reviews.llvm.org/D20443

It changes the legalization strategy for illegal vector types from integer
promotion to widening. This only applies for vectors with elements of width
that is a multiple of a byte since we have hardware support for vectors with
1, 2, 3, 8 and 16 byte elements.
Integer promotion for vectors is quite expensive on PPC due to the sequence
of breaking apart the vector, extending the elements and reconstituting the
vector. Two of these operations are expensive.
This patch causes between minor and major improvements in performance on most
benchmarks. There are very few benchmarks whose performance regresses. These
regressions can be handled in a subsequent patch with a DAG combine (similar
to how this patch handles int -> fp conversions of illegal vector types).

llvm-svn: 274535
2016-07-05 09:22:29 +00:00
Simon Pilgrim dea33cc2f3 [X86][AVX512] Added VSHUFPD intrinsics fast-isel generic IR tests
llvm-svn: 274534
2016-07-05 09:10:07 +00:00
Simon Pilgrim 8a01915bd2 [X86][AVX512VL] Added VSHUFPD/VSHUFPS intrinsics fast-isel generic IR tests
llvm-svn: 274533
2016-07-05 09:09:41 +00:00
Saleem Abdulrasool 4d3626ed31 test: relax the match on the timestamp
llvm-svn: 274529
2016-07-05 01:14:53 +00:00
Saleem Abdulrasool aecbdf70bf Object: support empty UID/GID fields
Normal archives do not have empty UID/GID fields.  However, the Microsoft
Import library format is a customized archive (it just uses an alternate symbol
index format).  When the import library is constructed by lib.exe, the UID and
GID fields are left empty.  Do not abort on such an input.

llvm-svn: 274528
2016-07-05 00:23:05 +00:00
Simon Pilgrim 3ad040909a [X86][AVX512] Add support for lowering shuffles to VSHUFPD
llvm-svn: 274520
2016-07-04 20:41:24 +00:00
James Molloy c3b4ed4a70 Revert "[Thumb] Reapply r272251 with a fix for PR28348"
This reverts commit r274510 - it made green dragon unhappy.

llvm-svn: 274512
2016-07-04 17:14:24 +00:00
James Molloy 9f019835ef [Thumb] Reapply r272251 with a fix for PR28348
We were using DAG->getConstant instead of DAG->getTargetConstant. This meant that we could inadvertently increase the use count of a constant if stars aligned, which it did in this testcase. Increasing the use count of the constant could cause ISel to fall over (because DAGToDAG lowering assumed the constant had only one use!)

Original commit message:
  [Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated

  If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;

    int i(int a) {
      return a & 0xfffffeec;
    }

  Used to produce:
      ldr r1, [CONSTPOOL]
      ands r0, r1
    CONSTPOOL: 0xfffffeec

  And now produces:
      movs    r1, #255
      adds    r1, #20  ; Less costly immediate generation
      bics    r0, r1

llvm-svn: 274510
2016-07-04 16:35:41 +00:00
Simon Pilgrim 02d435d2f4 [X86][AVX512] Autoupgrade the VPERMPD/VPERMQ intrinsics
llvm-svn: 274506
2016-07-04 14:19:05 +00:00
Simon Pilgrim 8b82fce537 [X86][AVX512] Added VPERMPD/VPERMQ intrinsics fast-isel generic IR tests
llvm-svn: 274503
2016-07-04 13:43:10 +00:00
Simon Pilgrim 9fca300cbe [X86][AVX512] Autoupgrade the VPERMILPD/VPERMILPS intrinsics
llvm-svn: 274498
2016-07-04 12:40:54 +00:00
Simon Pilgrim c8cf2ddb6d [X86][AVX512] Added VPERMILPD/VPERMILPS intrinsics fast-isel generic IR tests
Added PSHUFD tests as well

llvm-svn: 274493
2016-07-04 11:07:50 +00:00
Nicolai Haehnle 84c9f9919a Add writeonly IR attribute
Summary:
This complements the earlier addition of IntrWriteMem and IntrWriteArgMem
LLVM intrinsic properties, see D18291.

Also start using the attribute for memset, memcpy, and memmove intrinsics,
and remove their special-casing in BasicAliasAnalysis.

Reviewers: reames, joker.eph

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D18714

llvm-svn: 274485
2016-07-04 08:01:29 +00:00
Craig Topper d83f818a3e [CodeGen] Make the code that detects a if a shuffle is really a concatenation of the inputs more general purpose.
We can now handle concatenation of each source multiple times. The previous code just checked for each source to appear once in either order.

This also now handles an entire source vector sized piece having undef indices correctly. We now concat with UNDEF instead of using one of the sources. This is responsible for the test case change.

llvm-svn: 274483
2016-07-04 06:19:35 +00:00
Simon Pilgrim 7f096de0b8 [X86][AVX512] Add support for 512-bit shuffle lowering to VPERMPD/VPERMQ
llvm-svn: 274473
2016-07-03 19:50:06 +00:00
Craig Topper d1eca0f32c [CodeGen] Teach OR combine of shuffles involving zero vectors to better handle undef indices.
Undef indices can now be treated as zeros. Or if its undef ORed with zero, we will keep the undef.

llvm-svn: 274472
2016-07-03 19:37:12 +00:00
Craig Topper 8e826d5abe [X86] Add tests to show that the DAG combine for OR of shuffles with zero vectors doesn't handle undefs as well as it could. Fix coming in another commit.
llvm-svn: 274471
2016-07-03 19:37:10 +00:00
Haicheng Wu b71b2f622a [MBB] add a missing corner case in UpdateTerminator()
After the block placement, if a block ends with a conditional branch, but the
next block is not its successor. The conditional branch should be changed to
unconditional branch.  This patch fixes PR28307, PR28297, PR28402.

Differential Revision: http://reviews.llvm.org/D21811

llvm-svn: 274470
2016-07-03 19:14:17 +00:00
Simon Pilgrim 68ea80649b [X86][AVX512] Add support for VPERMPD/VPERMQ masked shuffle comments
llvm-svn: 274469
2016-07-03 18:40:24 +00:00
Simon Pilgrim a0d73835b2 [X86][AVX512] Add support for 512-bit shuffle decoding of VPERMPD/VPERMQ
llvm-svn: 274468
2016-07-03 18:27:37 +00:00
Simon Pilgrim dbd6db0dc7 [X86][AVX512] Add support for VPALIGNR/PSHUFD/PSHUFHW/PSHUFLW masked shuffle comments
llvm-svn: 274466
2016-07-03 15:00:51 +00:00
Sanjay Patel cbaac41856 [InstCombine] enable vector select of bools -> logic folds
llvm-svn: 274465
2016-07-03 14:34:39 +00:00
Simon Pilgrim 598bdb6bfe [X86][AVX512] Add support for UNPCK masked shuffle comments
llvm-svn: 274464
2016-07-03 14:26:21 +00:00
Simon Pilgrim 1f59076196 [X86][AVX512] Add support for VPERM/VSHUF masked shuffle comments
llvm-svn: 274462
2016-07-03 13:55:41 +00:00
Simon Pilgrim 68f438a036 [X86][AVX512] Add support for PMOVZX masked shuffle comments
llvm-svn: 274461
2016-07-03 13:33:28 +00:00
Sanjay Patel 42396ae0ea add vector bool select tests and regenerate checks for scalar bool select tests
llvm-svn: 274460
2016-07-03 13:26:02 +00:00
Simon Pilgrim 7c2fbdc101 [X86][AVX512] Add support for masked shuffle comments
This patch adds support for including the avx512 mask register information in the mask/maskz versions of shuffle instruction comments.

This initial version just adds support for MOVDDUP/MOVSHDUP/MOVSLDUP to reduce the mass of test regenerations, other shuffle instructions can be added in due course.

Differential Revision: http://reviews.llvm.org/D21953

llvm-svn: 274459
2016-07-03 13:08:29 +00:00
Simon Pilgrim 129b720c18 [X86][AVX512] Add support for lowering shuffles to VPERMILPS
llvm-svn: 274458
2016-07-03 12:47:21 +00:00
Sean Silva 45835e731d Remove dead TLI arg of isKnownNonNull and propagate deadness. NFC.
This actually uncovered a surprisingly large chain of ultimately unused
TLI args.
From what I can gather, this argument is a remnant of when
isKnownNonNull would look at the TLI directly.
The current approach seems to be that InferFunctionAttrs runs early in
the pipeline and uses TLI to annotate the TLI-dependent non-null
information as return attributes.

This also removes the dependence of functionattrs on TLI altogether.

llvm-svn: 274455
2016-07-02 23:47:27 +00:00
Xinliang David Li 8a021317a2 [PM] Port LoopAccessInfo analysis to new PM
It is implemented as a LoopAnalysis pass as 
discussed and agreed upon.

llvm-svn: 274452
2016-07-02 21:18:40 +00:00
Simon Pilgrim 99e8a1aa0b [X86][AVX512] Add support for lowering shuffles to VPERMILPD
llvm-svn: 274450
2016-07-02 20:20:12 +00:00
Simon Pilgrim 72052f6de9 [X86][AVX512VL] Add fast-isel MOVDDUP/MOVSLDUP/MOVSHDUP shuffle tests
llvm-svn: 274448
2016-07-02 19:22:46 +00:00
Simon Pilgrim cde7c54baa [X86][AVX512] Add support for 512-bit PSHUFB lowering
llvm-svn: 274444
2016-07-02 18:14:31 +00:00
Simon Pilgrim 77dda7c2e0 [X86][AVX512] Converted the MOVDDUP/MOVSLDUP/MOVSHDUP masked intrinsics to generic IR
llvm-svn: 274443
2016-07-02 17:16:41 +00:00
Simon Pilgrim 19adee9d84 [X86][AVX512] Autoupgrade the MOVDDUP/MOVSLDUP/MOVSHDUP intrinsics
llvm-svn: 274439
2016-07-02 14:42:35 +00:00
Simon Pilgrim f040d8c061 [X86][AVX512] Add support for lowering shuffles to MOVDDUP/MOVSLDUP/MOVSHDUP
llvm-svn: 274436
2016-07-02 12:45:03 +00:00
Simon Pilgrim 5e95390957 [X86][AVX512] Add test cases that should lower to MOVSLDUP/MOVSHDUP
llvm-svn: 274435
2016-07-02 12:20:35 +00:00
Simon Pilgrim a6f262a1f9 [X86][AVX512] Add fast-isel shuffle tests
Its not worth trying to write out tests for all the avx512f builtins yet, just adding tests for lowering of generic IR as we transition to it (shuffles mainly right now).

llvm-svn: 274434
2016-07-02 12:13:29 +00:00
Qin Zhao b463c23c10 [esan|cfrag] Add counters for struct array accesses
Summary:
Adds one counter to the struct counter array for counting struct
array accesses.

Adds instrumentation to insert counter update for struct array
accesses.

Reviewers: aizatsky

Subscribers: llvm-commits, bruening, eugenis, kcc, zhaoqin, vitalybuka

Differential Revision: http://reviews.llvm.org/D21594

llvm-svn: 274420
2016-07-02 03:25:37 +00:00
Michael Kuperstein 071d8306b0 [PM] Port ConstantHoisting to the new Pass Manager
Differential Revision: http://reviews.llvm.org/D21945

llvm-svn: 274411
2016-07-02 00:16:47 +00:00
Reid Kleckner e092dad72c [codeview] Set the Nested and Scoped ClassOptions based on the scope chain
These are set on both the declaration record and the definition record.

llvm-svn: 274410
2016-07-02 00:11:07 +00:00
Matt Arsenault accddacb70 TII: Fix inlineasm size counting comments as insts
The main problem was counting comments on their own
line as instructions.

llvm-svn: 274405
2016-07-01 23:26:50 +00:00
David Majnemer 08bd744c2c [CodeView] Include the offset of nested members
Given something like:
  struct S {
    int a;
    struct { int b; };
  };

We would fail to give 'b' offset 4.  Instead, we would give it the
offset it has inside of it's struct.

llvm-svn: 274400
2016-07-01 23:12:48 +00:00
David Majnemer 6bdc24e7b6 [CodeView] Pretty print anonymous scopes
A namespace without a name should be written out as `anonymous
namespace' while a tag type without a name should be written out as
<unnamed-tag>.

llvm-svn: 274399
2016-07-01 23:12:45 +00:00
Matt Arsenault 7f681ac7a9 AMDGPU: Add feature for unaligned access
llvm-svn: 274398
2016-07-01 23:03:44 +00:00
Matt Arsenault 8af47a09e5 AMDGPU: Expand unaligned accesses early
Due to visit order problems, in the case of an unaligned copy
the legalized DAG fails to eliminate extra instructions introduced
by the expansion of both unaligned parts.

llvm-svn: 274397
2016-07-01 22:55:55 +00:00
Evgeniy Stepanov b736335dc3 [msan] Fix __msan_maybe_ for non-standard type sizes.
Fix incorrect calculation of the type size for __msan_maybe_warning_N
call that resulted in an invalid (narrowing) zext instruction and
"Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed."

Only happens in very large functions (with more than 3500 MSan
checks) operating on integer types that are not power-of-two.

llvm-svn: 274395
2016-07-01 22:49:59 +00:00
Matt Arsenault 327bb5ad82 AMDGPU: Improve load/store of illegal types.
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.

We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.

llvm-svn: 274394
2016-07-01 22:47:50 +00:00
Reid Kleckner ad56ea3129 [codeview] Don't record UDTs for anonymous structs
MSVC makes up names for these anonymous structs, but we don't (yet).
Eventually Clang should use getTypedefNameForAnonDecl() to put some name
in the debug info, and we can update the test case when that happens.

llvm-svn: 274391
2016-07-01 22:24:51 +00:00
Alina Sbirlea 8d8aa5dd6c Address two correctness issues in LoadStoreVectorizer
Summary:
GetBoundryInstruction returns the last instruction as the instruction which follows or end(). Otherwise the last instruction in the boundry set is not being tested by isVectorizable().
Partially solve reordering of instructions. More extensive solution to follow.

Reviewers: tstellarAMD, llvm-commits, jlebar

Subscribers: escha, arsenm, mzolotukhin

Differential Revision: http://reviews.llvm.org/D21934

llvm-svn: 274389
2016-07-01 21:44:12 +00:00
Dehao Chen 7b2c997736 Specify mtriple for the frame-order.ll test.
Summary: original test may have different bahavior on different bot, specifically it broke llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D21931

llvm-svn: 274368
2016-07-01 17:35:13 +00:00
Dehao Chen ad2b4e1334 Do not count debug instructions when counting number of uses to reorder frame objects.
Summary: The code generation should be independent of the debug info.

Reviewers: zansari, davidxl, mkuper, majnemer

Subscribers: majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D21911

llvm-svn: 274357
2016-07-01 15:40:25 +00:00
Nikolay Haustov beb24f5b20 Resubmit r268719 - AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.
This was reverted in r268740 because of problems with corresponding Clang change.
Clang change was updated and resubmitted in r274220.

Check calling convention in AMDGPUMachineFunction::isKernel

This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.

Also, in the future unused non-kernels may be optimized.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19917

llvm-svn: 274341
2016-07-01 10:00:58 +00:00
Sam Kolton 5196b88f07 [AMDGPU] Assembler: support SDWA for VOPC instructions
Summary: dst_sel and dst_unused disabled for VOPC as they have no effect on result

Reviewers: artem.tamazov, tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21376

llvm-svn: 274340
2016-07-01 09:59:21 +00:00
Duncan P. N. Exon Smith e60719b3fa Revert "add tests for bugs fixed by the GVN hoist pass"
This reverts commit r274327 since the tests fail.  E.g.:
  http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/17240

It looks like this commit is building on r274305, but that commit caused
a miscompile and was reverted in r274320.

llvm-svn: 274332
2016-07-01 04:55:13 +00:00
Sebastian Pop 196ba4f844 add tests for bugs fixed by the GVN hoist pass
https://llvm.org/bugs/show_bug.cgi?id=20242
https://llvm.org/bugs/show_bug.cgi?id=22005

llvm-svn: 274327
2016-07-01 03:03:19 +00:00
Reid Kleckner b5af11dfa3 [codeview] Add DISubprogram::ThisAdjustment
Summary:
This represents the adjustment applied to the implicit 'this' parameter
in the prologue of a virtual method in the MS C++ ABI. The adjustment is
always zero unless multiple inheritance is involved.

This increases the size of DISubprogram by 8 bytes, unfortunately. The
adjustment really is a signed 32-bit integer. If this size increase is
too much, we could probably win it back by splitting out a subclass with
info specific to virtual methods (virtuality, vindex, thisadjustment,
containingType).

Reviewers: aprantl, dexonsmith

Subscribers: aaboud, amccarth, llvm-commits

Differential Revision: http://reviews.llvm.org/D21614

llvm-svn: 274325
2016-07-01 02:41:21 +00:00
Matt Arsenault 0101ecade0 LoadStoreVectorizer: Don't increase alignment with no align set
If no alignment was set on the load/stores, it would vectorize
to the new type even though this increases the default alignment.

llvm-svn: 274323
2016-07-01 02:09:38 +00:00
Matt Arsenault 370e8226c7 LoadStoreVectorizer: Check TTI for vec reg bit width
llvm-svn: 274322
2016-07-01 02:07:22 +00:00
Matt Arsenault 42ad17059a LoadStoreVectorizer: Fix assert when merging pointer ops
This needs to use inttoptr/ptrtoint if combining an int and pointer
load. If a pointer is used always do an integer load.

llvm-svn: 274321
2016-07-01 01:55:52 +00:00
Duncan P. N. Exon Smith 9d1f156418 Revert "code hoisting pass based on GVN"
This reverts commit r274305, since it breaks self-hosting:
  http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/22349/
  http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/17232

Note that the blamelist on lab.llvm.org:8011 is incorrect.  The previous
build was r274299, but somehow r274305 wasn't included in the blamelist:
  http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules

llvm-svn: 274320
2016-07-01 01:51:40 +00:00
Matt Arsenault 241f34cde8 LoadStoreVectorizer: Use AA metadata
This was not passing the full instruction with metadata
to the alias query.

llvm-svn: 274318
2016-07-01 01:47:46 +00:00
Matt Arsenault d7e8898bdd LoadStoreVectorizer: if one element of a vector is integer, default to
integer.

Fixes issues on some architectures where we use arithmetic ops to build
vectors, which can cause bad things to happen for loads/stores of mixed
types.

Patch by Fiona Glaser

llvm-svn: 274307
2016-07-01 00:37:01 +00:00