Commit Graph

98364 Commits

Author SHA1 Message Date
Craig Topper da84ff3ed4 [AVX-512] Add masked forms of the alternate MOVDDUP patterns.
I'm not too sure how to get isel to select even all of the unmasked forms, but at least we have a consistent set now.

llvm-svn: 291368
2017-01-07 22:20:23 +00:00
Simon Pilgrim a470296367 [CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.
llvm-svn: 291366
2017-01-07 22:08:09 +00:00
Simon Pilgrim 82e3e05fe2 [CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above
We were matching against general vector shift costs before the uniform splat costs

llvm-svn: 291365
2017-01-07 21:47:10 +00:00
Simon Pilgrim e70644dab7 [CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion.
llvm-svn: 291364
2017-01-07 21:33:00 +00:00
Mehdi Amini d5549f3dac [ThinLTO] Fix assertions on lazy-loading of Metadata TBAA attachments
Summary:
The issue happens with:

 %0 = ....., !tbaa !0
 %1 = ....., !tbaa !1

With !0 that references !1.

In this case when loading !0 we generates a temporary for the
operand !1. We now flush it immediately and trigger the load of
!1 before moving on. If we don't we get the temporary when
attaching to %1. This is usually not an issue except that we
eagerly try to update TBAA MDNodes, which is obviously not possible
if we only have a temporary.

Differential Revision: https://reviews.llvm.org/D28423

llvm-svn: 291362
2017-01-07 20:24:23 +00:00
Matt Arsenault a7d2194168 SimplifyLibCalls: Remove incorrect optimization of fabs
fabs(x * x) is not generally safe to assume x is positive if x is a NaN.
This is also less general than it could be, so this will be replaced
with a transformation on the intrinsic.

llvm-svn: 291359
2017-01-07 19:55:12 +00:00
Mehdi Amini 42ef199058 [Bitcode] Remove unused PlaceHolder parameter to lazyLoadModuleMetadataBlock()
llvm-svn: 291356
2017-01-07 18:31:38 +00:00
Simon Pilgrim 725997154d [CostModel][X86] Merge separate AVX1 cost LUTs. NFCI.
llvm-svn: 291355
2017-01-07 18:19:25 +00:00
Simon Pilgrim a4109d6433 [CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.
llvm-svn: 291354
2017-01-07 17:54:10 +00:00
Simon Pilgrim df7de7a87e [CostModel][X86] Added missing AVX2 arithmetic costs.
Allows us to correctly fall through to the lower AVX1 costs if look up failed.

llvm-svn: 291353
2017-01-07 17:27:39 +00:00
Simon Pilgrim 100eae1ee0 [CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI.
llvm-svn: 291352
2017-01-07 17:03:51 +00:00
Daniel Berlin 32f8d560dd NewGVN: Make sure we properly lookup operand leaders while creating
congruence classes for stores, and then keep them up to date.  Add
testcases.

llvm-svn: 291351
2017-01-07 16:55:14 +00:00
Simon Pilgrim a1b8e2c725 [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470)
llvm-svn: 291347
2017-01-07 15:37:50 +00:00
Rui Ueyama d52f4b86f3 TarWriter: Use fitsInUstar function.
This change should have been commit as part of r291340.

llvm-svn: 291341
2017-01-07 08:32:07 +00:00
Rui Ueyama 999f094aa3 TarWriter: Use Ustar header's "prefix" field to store long filenames.
Tar's Ustar header has the "prefix" field to store a directory
part of a filename. It is not as flexible as the PAX-extended
filename because there's still a limitation on the maximum filename
size, but it mitigates the situation.

This patch should unbreak some Windows buildbots that uses very
old tar command.

llvm-svn: 291340
2017-01-07 08:28:56 +00:00
Craig Topper 42b848a683 [X86] Disable load unfolding for 128-bit MOVDDUP instructions since the load size is smaller than the register size so unfolding would increase the load size.
llvm-svn: 291338
2017-01-07 06:56:54 +00:00
Xin Tong ee5cb65ada Fix a typo. NFC
llvm-svn: 291335
2017-01-07 04:30:58 +00:00
Daniel Berlin 0444343326 NewGVN: Reformat and fix a few newlines
llvm-svn: 291334
2017-01-07 03:23:47 +00:00
Davide Italiano 1b97fc34a4 [NewGVN] Prefer auto over explicit type. NFCI.
llvm-svn: 291328
2017-01-07 02:05:50 +00:00
Dan Gohman 0e2ceb8121 [WebAssembly] Don't abort on code with UB.
Gracefully leave code that performs function-pointer bitcasts implying
non-trivial pointer conversions alone, rather than aborting, since it's
just undefined behavior.

llvm-svn: 291326
2017-01-07 01:50:01 +00:00
Dan Gohman d5eda35557 [WebAssembly] Move a SmallVector to a more specific scope. NFC.
llvm-svn: 291324
2017-01-07 01:31:18 +00:00
Peter Collingbourne d79e49d807 LowerTypeTests: Thread summary and action from the API and command line into the pass.
Also move command line handling out of the pass constructor and into
a separate function.

Differential Revision: https://reviews.llvm.org/D28422

llvm-svn: 291323
2017-01-07 01:17:24 +00:00
Dylan McKay c5e209a0b2 [AVR] Parenthesize a boolean expression
Without the parentheses, clang would emit warnings while compiling the
code.

llvm-svn: 291320
2017-01-07 00:55:28 +00:00
Dan Gohman 1b637458f6 [WebAssembly] Add a pass to create wrappers for function bitcasts.
WebAssembly requires caller and callee signatures to match exactly. In LLVM,
there are a variety of circumstances where signatures may be mismatched in
practice, and one can bitcast a function address to another type to call it
as that type. This patch adds a pass which replaces bitcasted function
addresses with wrappers to replace the bitcasts.

This doesn't catch everything, but it does match many common cases.

llvm-svn: 291315
2017-01-07 00:34:54 +00:00
Daniel Berlin d92e7f9f74 NewGVN: Fix PR 31501.
Summary: LLVM's non-standard notion of phi nodes means we can't both try to substitute for undef in phi nodes *and* use phi nodes as leaders all the time. This changes NewGVN to use the same semantics as SimplifyPHINode to decide which phi nodes are equivalent.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28312

llvm-svn: 291308
2017-01-07 00:01:42 +00:00
Teresa Johnson 9006d52651 [ThinLTO] Handle conflicting local names gracefully
Summary:
r285871 introduced an assert that was overly aggressive in the case
of a same-named local in different same-named files (in different
directories), where the source name and therefore the GUID ended up
the same because the files were compiled in their own directory without
any leading path. Change the handling in the promotion logic to get
the summary for the version in that module.

This also exposed an issue where we are not always importing the
right copy, which is a performance not correctness issue (because
the renaming is based on the module hash which must be different,
see the bug report for details). I will fix that as a follow-on.

Fixes PR31561.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28411

llvm-svn: 291304
2017-01-06 23:38:41 +00:00
Teresa Johnson b0d70f817e [ThinLTO] Optionally ignore empty index file
Summary:
In order to simplify distributed build system integration, where actions
may be scheduled before the Thin Link which determines the list of
objects selected by the linker. The gold plugin currently will emit
0-sized index files for objects not selected by the link, to enable
checking for expected output files by the build system. If the build
system then schedules a backend action for these bitcode files, we want
to be able to fall back to normal compilation instead of failing.

This is the LLVM side support for optionally enabling fallback
instead of issuing an error. Return a null CombinedIndex from
llvm::getModuleSummaryIndexForFile under the option when the file
is empty. Clang can then ignore the index when it is null.

Clang patch is D28362.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28410

llvm-svn: 291302
2017-01-06 23:37:17 +00:00
Eugene Zelenko 4282c404f9 [BPF] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291297
2017-01-06 23:06:25 +00:00
David Majnemer 63da0c238b [InstSimplify] Optimize away udivs in the presence of range metadata
We know that udiv %V, C can be optimized away to 0 if %V is ult C.

llvm-svn: 291296
2017-01-06 22:58:02 +00:00
Kuba Mracek 316dc70f82 [asan] Change the visibility of ___asan_globals_registered to hidden
This flag is used to track global registration in Mach-O and it doesn't need to be exported and visible.

Differential Revision: https://reviews.llvm.org/D28250

llvm-svn: 291289
2017-01-06 22:02:58 +00:00
Xin Tong 3caaa36ac5 Fix use after free
Summary: Fix use after free in LoopUnswitch

Reviewers: chenli, atrick, hfinkel, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28412

llvm-svn: 291288
2017-01-06 21:49:08 +00:00
David Majnemer 8c0e62f507 [InstSimplify] Optimize away urems in the presence of range metadata
We know that urem %V, C can be optimized away to %V if %V is ult C.

llvm-svn: 291282
2017-01-06 21:23:51 +00:00
Mehdi Amini 27d224fbbb Fix LoopLoadElimination to keep original alignment on the inital hoisted store
This is fixing a bug where Loop Vectorization is widening a load but
with a lower alignment. Hoisting the load without propagating the alignment
will allow inst-combine to later deduce a higher alignment that what the pointer
actually is.

Differential Revision: https://reviews.llvm.org/D28408

llvm-svn: 291281
2017-01-06 21:06:51 +00:00
Jan Vesely 06200bd7bc AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodes
This will make transition to SCRATCH_MEMORY easier

Differential Revision: https://reviews.llvm.org/D24746

llvm-svn: 291279
2017-01-06 21:00:46 +00:00
Matthias Braun 258b847c4f AArch64CollectLOH: Rewrite as block-local analysis.
Re-apply r288561: This time with a fix where the ADDs that are part of a
3 instruction LOH would not invalidate the "LastAdrp" state. This fixes
http://llvm.org/PR31361

Previously this pass was using up to 5% compile time in some cases which
is a bit much for what it is doing. The pass featured a full blown
data-flow analysis which in the default configuration was restricted to a
single block.

This rewrites the pass under the assumption that we only ever work on a
single block. This is done in a single pass maintaining a state machine
per general purpose register to catch LOH patterns.

Differential Revision: https://reviews.llvm.org/D27329

This reverts commit 9e6cedb0a4f14364d6511597a9160305e7d34493.

llvm-svn: 291266
2017-01-06 19:22:01 +00:00
Wolfgang Pieb c17a279eda [DWARF] Null out the debug locs of (loop invariant) instructions hoisted by LICM in
order to avoid jumpy line tables. Calls are left alone because they may be inlined.

Differential Revision: https://reviews.llvm.org/D28390

llvm-svn: 291258
2017-01-06 18:38:57 +00:00
Reid Kleckner 5984d01826 Use %z for size_t and avoid deprecated string functions
This usage of strcpy and snprintf was certainly safe, but using them
sets off various deprecation and lint warnings. Easier to just write the
belt and suspenders version.

llvm-svn: 291256
2017-01-06 18:22:18 +00:00
Chad Rosier e177185e79 [AArch64] Reduce vector insert/extract cost for Falkor.
Differential Revision: https://reviews.llvm.org/D28403

llvm-svn: 291254
2017-01-06 18:03:26 +00:00
Simon Pilgrim 3128d6b520 [X86][SSE] Pass float domain flag to shuffle combine match functions. NFCI.
Early step towards ignoring domain above a certain shuffle depth.

llvm-svn: 291248
2017-01-06 17:34:30 +00:00
Konstantin Zhuravlyov 31dbb0391d [AMDGPU] Remove extra semicolon. NFC
llvm-svn: 291246
2017-01-06 17:23:21 +00:00
Konstantin Zhuravlyov 67a6d5401a [AMDGPU] Do not emit .AMDGPU.config section for amdhsa
Differential Revision: https://reviews.llvm.org/D27732

llvm-svn: 291245
2017-01-06 17:02:10 +00:00
Simon Pilgrim bd3c6824d4 [X86][SSE] Simplify float domain requirement in unary shuffle matching.
The AVX1-only limit is never actually required in matchUnaryVectorShuffle

llvm-svn: 291244
2017-01-06 17:00:59 +00:00
Simon Pilgrim a08d7b9913 Remove trailing whitespace. NFCI.
llvm-svn: 291240
2017-01-06 15:31:52 +00:00
Simon Pilgrim 9b8c7caf4e [X86] Add X86Subtarget argument. NFCI.
All callers of getTargetVShiftNode have access to X86Subtarget already so pass it along instead of re-extracting it.

llvm-svn: 291239
2017-01-06 15:29:17 +00:00
Filipe Cabecinhas 4647b74b51 [ASan] Make ASan instrument variable-masked loads and stores
Summary: Previously we only supported constant-masked loads and stores.

Reviewers: kcc, RKSimon, pgousseau, gbedwell, vitalybuka

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28370

llvm-svn: 291238
2017-01-06 15:24:51 +00:00
Daniel Sanders 12360efa76 [globalisel] Stop requiring -debug/-debug-only=registerbankinfo for assertions.
Summary:
I've noticed that these assertions don't trigger when the condition is false.
The problem is that the DEBUG(x) macro only executes x when the pass is
emitting debug output via the -debug and -debug-only=registerbankinfo command
line arguments.

Debug builds should always execute the assertions so use '#ifndef NDEBUG' instead.

Also removed an assertion that is only true the first time it's tested. <Target>RegisterBankInfo's constructor will re-use register banks causing them to be valid on subsequent tests. That
assertion will fail on the first test too in the near future.

Reviewers: t.p.northover, ab, rovka, qcolombet

Subscribers: dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D28358

llvm-svn: 291235
2017-01-06 14:29:34 +00:00
Simon Pilgrim d8333372bc [CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.
Set the costs on the lowest target that supports the type.

llvm-svn: 291229
2017-01-06 11:12:53 +00:00
Rui Ueyama f2a6275116 TarWriter: Emit PAX headers only when needed.
We use PAX headers to store long filenames (>= 100 bytes).
It is not needed to emit PAX headers if filenames fit in the
Ustar header. This patch implements that optimization.

llvm-svn: 291215
2017-01-06 05:33:45 +00:00
Craig Topper e86fb932ea [AVX-512] Add EXTRACT_SUBVECTOR support to combineBitcastForMaskedOp.
llvm-svn: 291214
2017-01-06 05:18:48 +00:00
David Majnemer 9e04befb09 [SelectionDAG] Rework lowerRangeToAssertZExt
Utilize ConstantRange to make it easier to interpret range metadata.

llvm-svn: 291211
2017-01-06 02:43:28 +00:00
Rui Ueyama 4bb7883f0c Add a class to create a tar archive file.
In LLD, we create cpio archive files for --reproduce command.
cpio was not a bad choice because it is very easy to create, but
it was sometimes hard to use because people are not familiar with
cpio command.

I noticed that creating a tar archive isn't as hard as I thought.
So I implemented it in this patch.

Differential Revision: https://reviews.llvm.org/D28091

llvm-svn: 291209
2017-01-06 02:29:48 +00:00
Bob Wilson 37df90a474 Revert "Use _Unwind_Backtrace on Apple platforms."
This reverts commit 63165f6ae3bac1623be36d4b3ce63afa1d51a30a.

After making this change, I discovered that _Unwind_Backtrace is
unable to unwind past a signal handler after an assertion failure.
I filed a bug report about that issue in rdar://29866587 but even if
we get a fix soon, it will be awhile before it get released.

llvm-svn: 291207
2017-01-06 02:26:33 +00:00
Peter Collingbourne 81271b7bd2 LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.
This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.

Differential Revision: https://reviews.llvm.org/D28341

llvm-svn: 291205
2017-01-06 02:22:47 +00:00
David Blaikie 1e58b463d9 Remove unused private fields to fix the clang -Werror build.
llvm-svn: 291201
2017-01-06 00:48:24 +00:00
Eugene Zelenko 049b017538 [AArch64, Lanai] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291197
2017-01-06 00:30:53 +00:00
David Majnemer eaba06cffa [SelectionDAG] Correctly transform range metadata to AssertZExt
We used the logBase2 of the high instead of the ceilLogBase2 resulting
in the wrong result for certain values.  For example, it resulted in an
i1 AssertZExt when the exclusive portion of the range was 3.

llvm-svn: 291196
2017-01-06 00:11:46 +00:00
Kostya Serebryany 61f5473bad [libFuzzer] remove dead code, NFC
llvm-svn: 291195
2017-01-06 00:09:40 +00:00
Greg Clayton 93e4fe8aad Add iterator support to DWARFDie to allow child DIE iteration.
Differential Revision: https://reviews.llvm.org/D28303

llvm-svn: 291194
2017-01-05 23:47:37 +00:00
Logan Chien ce542eefe3 Code cleanup: Remove tab indents.
llvm-svn: 291193
2017-01-05 23:41:33 +00:00
Simon Pilgrim aa186c632d [CostModel][X86] Tidyup arithmetic costs code. NFCI.
Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention.

llvm-svn: 291187
2017-01-05 22:48:02 +00:00
Kostya Serebryany 4aa0590e33 [libFuzzer] improve error handling during the merge (handle various IO failures)
llvm-svn: 291182
2017-01-05 22:05:47 +00:00
Geoff Berry d46b6e8096 [AArch64] Fold some filled/spilled subreg COPYs
Summary:
Extend AArch64 foldMemoryOperandImpl() to handle folding spills of
subreg COPYs with read-undef defs like:

  %vreg0:sub_32<def,read-undef> = COPY %WZR; GPR64:%vreg0

by widening the spilled physical source reg and generating:

  STRXui %XZR <fi#0>

as well as folding fills of similar COPYs like:

  %vreg0:sub_32<def,read-undef> = COPY %vreg1; GPR64:%vreg0, GPR32:%vreg1

by generating:

  %vreg0:sub_32<def,read-undef> = LDRWui <fi#0>

Reviewers: MatzeB, qcolombet

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27425

llvm-svn: 291180
2017-01-05 21:51:42 +00:00
Xin Tong 8b8a600d92 Fix typo. NFC
llvm-svn: 291178
2017-01-05 21:40:08 +00:00
Teresa Johnson 6c475a7595 ThinLTO: add early "dead-stripping" on the Index
Summary:
Using the linker-supplied list of "preserved" symbols, we can compute
the list of "dead" symbols, i.e. the one that are not reachable from
a "preserved" symbol transitively on the reference graph.
Right now we are using this information to mark these functions as
non-eligible for import.

The impact is two folds:
- Reduction of compile time: we don't import these functions anywhere
  or import the function these symbols are calling.
- The limited number of import/export leads to better internalization.

Patch originally by Mehdi Amini.

Reviewers: mehdi_amini, pcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23488

llvm-svn: 291177
2017-01-05 21:34:18 +00:00
Joerg Sonnenberger 83963995c6 PR 31534: When emitting both DWARF unwind tables and debug information,
do not use .cfi_sections. This requires checking if any non-declaration
function in the module needs an unwind table.

llvm-svn: 291172
2017-01-05 20:55:28 +00:00
Michael Kuperstein c9acad12e9 [LICM] Allow promotion of some stores that are not guaranteed to execute.
Promotion is always legal when a store within the loop is guaranteed to execute.

However, this is not a necessary condition - for promotion to be memory model
semantics-preserving, it is enough to have a store that dominates every exit
block. This is because if the store dominates every exit block, the fact the
exit block was executed implies the original store was executed as well.

Differential Revision: https://reviews.llvm.org/D28147

llvm-svn: 291171
2017-01-05 20:42:06 +00:00
Matthias Braun 1172332203 CodeGen: Assert that liveness is up to date when reading block live-ins.
Add an assert that checks whether liveins are up to date before they are
used.

- Do not print liveins into .mir files anymore in situations where they
  are out of date anyway.
- The assert in the RegisterScavenger is superseded by the new one in
  livein_begin().
- Skip parts of the liveness updating logic in IfConversion.cpp when
  liveness isn't tracked anymore (just enough to avoid hitting the new
  assert()).

Differential Revision: https://reviews.llvm.org/D27562

llvm-svn: 291169
2017-01-05 20:01:19 +00:00
Evgeniy Stepanov e8e11eb726 Revert "Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")"
Summary: This reverts commit r291144. It breaks build bots.

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/3270, http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/2058

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:1638:12: error: could not convert ‘(const unsigned int*)(& Variants)’ from ‘const unsigned int*’ to ‘llvm::ArrayRef<unsigned int>’
     return Variants;

Reviewers: eugenis, tstellarAMD

Patch by Alex Shlyapnikov.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D28372

llvm-svn: 291168
2017-01-05 19:51:13 +00:00
Simon Pilgrim 4c050c2190 [CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI.
llvm-svn: 291165
2017-01-05 19:42:43 +00:00
Simon Pilgrim 6f72eba606 Remove trailing whitespace. NFCI.
llvm-svn: 291163
2017-01-05 19:24:25 +00:00
Simon Pilgrim 5b06e4d319 [CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI.
llvm-svn: 291162
2017-01-05 19:19:39 +00:00
Simon Pilgrim a8bf97569a [CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI.
Removes need for yet another LUT.

llvm-svn: 291158
2017-01-05 19:01:50 +00:00
Andrew Kaylor 7353cf4623 [LICM] Small update to note changes made in hoistRegion
Differential Revision: https://reviews.llvm.org/D28363

llvm-svn: 291157
2017-01-05 18:53:24 +00:00
Simon Pilgrim 430d34fc14 [CostModel][X86] Strip unused 256-bit vector shift costs. NFCI.
Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead.

llvm-svn: 291152
2017-01-05 18:36:48 +00:00
Simon Pilgrim b01e844241 [CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL
Matches other MUL/ADD/SUB 256-bit case on AVX1

llvm-svn: 291149
2017-01-05 18:20:25 +00:00
Joerg Sonnenberger d7baada5dd Typo
llvm-svn: 291147
2017-01-05 17:59:22 +00:00
Simon Pilgrim f74700aa8c [CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI.
llvm-svn: 291146
2017-01-05 17:56:19 +00:00
Matt Arsenault ec63f62c58 Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")
Arrays are supposed to be static const

llvm-svn: 291144
2017-01-05 17:36:11 +00:00
Xin Tong 9efb049fb3 Remove a unnecessary hasLoopInvariantOperands check in loop sink.
Summary:
Preheader instruction's operands will always be invariant w.r.t. the loop which its the preheader
for.

Memory aliases are handled in canSinkOrHoistInst.

Reviewers: danielcdh, davidxl

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D28270

llvm-svn: 291132
2017-01-05 16:52:37 +00:00
Sanjay Patel dea5a7bd53 less braces; NFC
llvm-svn: 291126
2017-01-05 16:47:32 +00:00
Simon Pilgrim bca02f9e20 [CostModel][X86] Add support for broadcast shuffle costs
Currently only for broadcasts with input and output of the same width.

Differential Revision: https://reviews.llvm.org/D27811

llvm-svn: 291122
2017-01-05 15:56:08 +00:00
Zvi Rackover 4b7d724d62 [X86] Optimize vector shifts with variable but uniform shift amounts
Summary:
For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register.
The lower 64-bits of the register are evaluated to determine the shift amount.
This patch improves the construction of the vector containing the shift amount.

Reviewers: craig.topper, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28353

llvm-svn: 291120
2017-01-05 15:11:43 +00:00
Teresa Johnson 2b60384581 [ThinLTO] Add parenthesis as per build warning
Fixes a warning about "||" and "&&" due to r291108.

llvm-svn: 291119
2017-01-05 15:10:10 +00:00
Tony Jiang 3a2f00b024 [PowerPC] Implement missing ISA 2.06 instructions.
Instructions: fctidu[.], fctiwu[.], ftdiv, ftsqrt are not implemented. Implement
them and add corresponding test cases in this patch.

llvm-svn: 291116
2017-01-05 15:00:45 +00:00
Teresa Johnson e27b058de3 [ThinLTO] Use DenseSet instead of SmallPtrSet for holding GUIDs
Should fix some more bot failures from r291108.
This should have been a DenseSet, since GUID is not a pointer type.
It caused some bots to fail, but for some reason I wasnt't getting a
build failure.

llvm-svn: 291115
2017-01-05 14:59:56 +00:00
Simon Pilgrim a62395a4bd [CostModel][X86] Pulled out common type legalization code
llvm-svn: 291109
2017-01-05 14:33:32 +00:00
Teresa Johnson 519465b993 [ThinLTO] Subsume all importing checks into a single flag
Summary:
This adds a new summary flag NotEligibleToImport that subsumes
several existing flags (NoRename, HasInlineAsmMaybeReferencingInternal
and IsNotViableToInline). It also subsumes the checking of references
on the summary that was being done during the thin link by
eligibleForImport() for each candidate. It is much more efficient to
do that checking once during the per-module summary build and record
it in the summary.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28169

llvm-svn: 291108
2017-01-05 14:32:16 +00:00
Mohammed Agabaria 23599ba794 Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling.
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.

Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.

Differential Revision: https://reviews.llvm.org/D27518

llvm-svn: 291106
2017-01-05 14:03:41 +00:00
Kristof Beyls a983e7c4a4 [GlobalISel] Add support for address-taken basic blocks
To make this work, pointers from the MachineBasicBlock to the LLVM-IR-level
basic blocks need to be initialized, as the AsmPrinter uses this link to be
able to print out labels for the basic blocks that are address-taken.

Most of the changes in this commit are about adapting existing tests to include
the basic block name that is now printed out in the MIR format, now that the
name becomes available as the link to the LLVM-IR basic block is initialized.
The relevant test change for the functionality added in this patch are the
added "(address-taken)" strings in
test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D28123

llvm-svn: 291105
2017-01-05 13:27:52 +00:00
Kristof Beyls eced071e88 [GlobalISel] Add support for switch statements
This commit does this using a trivial chain of conditional branches.  In the
future, we probably want to reuse the optimized switch lowering used in
SelectionDAG.

Differential Revision: https://reviews.llvm.org/D28176

llvm-svn: 291099
2017-01-05 11:28:51 +00:00
Kristof Beyls 2252440b81 [GlobalISel] Fix AArch64 ICMP instruction selection
Differential Revision: https://reviews.llvm.org/D28175

llvm-svn: 291097
2017-01-05 10:16:08 +00:00
Mohammed Agabaria 189e2d29ba [Test Commit] fixing some format issue in X86TTI to match clang-format output.
llvm-svn: 291095
2017-01-05 09:51:02 +00:00
Elena Demikhovsky 143cbc425b AVX-512: Optimized pattern for truncate with unsigned saturation.
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
Differential revision: https://reviews.llvm.org/D28216

llvm-svn: 291092
2017-01-05 08:21:09 +00:00
Craig Topper 33c544bdb0 [X86] Add Intel Kaby Lake model numbers to getHostCPUName aliased to "skylake" since there are no feature differences.
Model numbers found here http://www.sandpile.org/x86/cpuid.htm

llvm-svn: 291086
2017-01-05 05:57:27 +00:00
Saleem Abdulrasool 6252bd8eac MC: support passing search paths to the IAS
This is needed to support inclusion in inline assembly via the
`.include` directive.

llvm-svn: 291085
2017-01-05 05:56:39 +00:00
Craig Topper 1ab35fa7a8 [X86] Change getHostCPUName to report Intel model 0x4e as "skylake" instead of "skylake-avx512". Add the proper 0x55 model for "skylake-avx512".
Summary:
Intel's i5-6300U CPU is reporting to have a model id of 78 (4e).
The Host detection assumes that to be Skylake Xeon (with AVX512 support),
instead of a normal Skylake machine.

Patch by: Valentin Churavy

Reviewers: nalimilan, craig.topper

Subscribers: hfinkel, tkelman, craig.topper, nalimilan, llvm-commits

Differential Revision: https://reviews.llvm.org/D28221

llvm-svn: 291084
2017-01-05 05:47:29 +00:00
Kostya Serebryany 2648243ebd [libFuzzer] use /tmp (or $TMPDIR, if present) to store temp files during merge
llvm-svn: 291078
2017-01-05 04:32:19 +00:00
Peter Collingbourne b2ce2b6805 IR: Module summary representation for type identifiers; summary test scaffolding for lowertypetests.
Set up basic YAML I/O support for module summaries, plumb the summary into
the pass and add a few command line flags to test YAML I/O support. Bitcode
support to come separately, as will the code in LowerTypeTests that actually
uses the summary. Also add a couple of tests that pass by virtue of the pass
doing nothing with the summary (which happens to be the correct thing to do
for those tests).

Differential Revision: https://reviews.llvm.org/D28041

llvm-svn: 291069
2017-01-05 03:39:00 +00:00
Richard Smith d4d575b955 Revert r291025 ("AMDGPU: Remove unneccessary intermediate vector")
This caused buildbot failures due to returning ArrayRefs referencing local
(temporary) objects.

llvm-svn: 291067
2017-01-05 03:13:10 +00:00
Wolfgang Pieb ce13e716c5 [DWARF] Null out the debug locs of load instructions that have been moved by GVN
performing partial redundancy elimination (PRE). Not doing so can cause jumpy line
tables and confusing (though correct) source attributions.

Differential Revision: https://reviews.llvm.org/D27857

llvm-svn: 291037
2017-01-04 23:58:26 +00:00
Mehdi Amini 19ef4fad91 Use lazy-loading of Metadata in MetadataLoader when importing is enabled (NFC)
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.

Two main current limitations are:

1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.

These two limitations can be lifted in a subsequent improvement if
needed.

With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).

Reviewers: pcc, tejohnson, dexonsmith

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28113

llvm-svn: 291027
2017-01-04 22:54:33 +00:00
Mehdi Amini 867aad1359 Change BitstreamCursor::skipRecord to return the record code (NFC)
llvm-svn: 291026
2017-01-04 22:54:14 +00:00
Matt Arsenault 6796d7ea8b AMDGPU: Remove unneccessary intermediate vector
llvm-svn: 291025
2017-01-04 22:54:10 +00:00
Matt Arsenault 3bdd75d01e InstCombine: Fold cos(-x) -> cos(x)
Also cos(fabs(x)) -> cos(x)

llvm-svn: 291022
2017-01-04 22:49:03 +00:00
David Blaikie 7ad9dc11db Reapply "Make BitCodeAbbrev ownership explicit using shared_ptr rather than IntrusiveRefCntPtr""
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).

This recommits 291006, reverted in r291007.

llvm-svn: 291016
2017-01-04 22:36:33 +00:00
Tim Shen 5480eb8445 [Legalizer] Fix fp-to-uint to fp-tosint promotion assertion.
Summary:
When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero
extended. For example, given double 65534.0, without legalization:

  fp-to-uint16: 65534.0 -> 0xfffe

With the legalization:

  fp-to-sint32: 65534.0 -> 0x0000fffe

Without this patch, legalization wrongly emits a signed extend assertion,
which is consumed by later icmp instruction, and cause miscompile.

Note that the floating point value must be in [0, 65535), otherwise the
behavior is undefined.

This patch reverts r279223 behavior and adds more tests and
documentations.

In PR29041's context, James Molloy mentioned that:

  We don't need to mask because conversion from float->uint8_t is
  undefined if the integer part of the float value is not representable in
  uint8_t. Therefore we can assume this doesn't happen!

which is totally true and good, because fptoui is documented clearly to
have undefined behavior when overflow/underflow happens. We should take
the advantage of this behavior so that we can save unnecessary mask
instructions.

Reviewers: jmolloy, nadav, echristo, kbarton

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28284

llvm-svn: 291015
2017-01-04 22:11:42 +00:00
Evgeny Stupachenko c88697dc16 The patch fixes (base, index, offset) match.
Summary:
Instead of matching:
  (a + i) + 1 -> (a + i, undef, 1)
Now it matches:
  (a + i) + 1 -> (a, i, 1)

Reviewers: rengolin

Differential Revision: http://reviews.llvm.org/D26367

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
2017-01-04 21:43:39 +00:00
Chad Rosier 63687e40bc [AArch64] Update the feature set for Qualcomm's Falkor CPU.
llvm-svn: 291010
2017-01-04 21:26:23 +00:00
Nirav Dave 0f9d111f97 [AArch64] Fix over-eager early-exit in load-store combiner
Fix early-exit analysis for memory operation pairing when operations are
not emitted in ascending order.

Reviewers: mcrosier, t.p.northover

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D28251

llvm-svn: 291008
2017-01-04 21:21:46 +00:00
David Blaikie 6e2207a134 Revert "Make BitCodeAbbrev ownership explicit using shared_ptr rather than IntrusiveRefCntPtr"
Breaks Clang's use of bitcode. Reverting until I have a fix to go with
it there.

This reverts commit r291006.

llvm-svn: 291007
2017-01-04 21:19:28 +00:00
David Blaikie daff78cd87 Make BitCodeAbbrev ownership explicit using shared_ptr rather than IntrusiveRefCntPtr
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).

llvm-svn: 291006
2017-01-04 21:13:35 +00:00
Hal Finkel b2f951d87a [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:

 1. When determining tail-call eligibility. If we make a tail call (i.e.
    directly branch to a function) then there is no place for the linker to add
    a TOC restoration.
 2. When determining when we need to add a nop instruction after a call.
    Likewise, if there is a possibility that the linker might need to add a
    TOC restoration after a call, then we need to put a nop after the call
    (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.

Differential Revision: https://reviews.llvm.org/D27231

llvm-svn: 291003
2017-01-04 21:05:13 +00:00
Daniel Berlin 6cc5e44068 NewGVN: Track the maximum number of iterations GVN takes on any function, so we can pinpoint performance issues.
llvm-svn: 291002
2017-01-04 21:01:02 +00:00
Davide Italiano 6309895770 [lib/LTO] Simplify logic removing set but unused variable. NFCI.
Reported by David Binderman and ack'ed by Teresa on IRC.
PR: 31527

llvm-svn: 291000
2017-01-04 20:37:57 +00:00
Peter Collingbourne efdff71b05 YAML: Remove Input::MapHNode::isValidKey(), use llvm::is_contained() instead. NFC.
llvm-svn: 290999
2017-01-04 20:10:43 +00:00
Eric Christopher 568c113ac0 Remove dead and unused variable NumSentinelElements.
Fixes PR31529.

llvm-svn: 290998
2017-01-04 20:05:18 +00:00
Eric Christopher 0192e97911 Remove dead variable Len.
Fixes PR31528

llvm-svn: 290995
2017-01-04 19:47:10 +00:00
Jan Vesely d48445d513 AMDGPU/SI: Implement sendmsghalt intrinsic
v2: expose using amdgcn prefix

Differential Revision: https://reviews.llvm.org/D23511

llvm-svn: 290977
2017-01-04 18:06:55 +00:00
Robert Lougher 5bf0416f45 Reapply "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of common inst"
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer).  The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).

Original commit message:

Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.

Original review: https://reviews.llvm.org/D27590

llvm-svn: 290973
2017-01-04 17:40:32 +00:00
Simon Pilgrim bb895f3e9c [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs
Actual codegen is much better than the extract+insert patterns that was assumed.

llvm-svn: 290962
2017-01-04 14:01:33 +00:00
Nemanja Ivanovic c08b90d08f [PowerPC] Add identification for POWER8NVL
This CPU type was not previously recognized by LLVM which led to emitting
poor (and sometimes incorrect) code in some JIT workloads on such a machine.

llvm-svn: 290961
2017-01-04 13:58:09 +00:00
Simon Pilgrim 939b8cd708 [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.
As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode.

llvm-svn: 290956
2017-01-04 12:08:41 +00:00
Florian Hahn 5815f6c53c [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: aprantl, MatzeB, mkuper

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290955
2017-01-04 12:08:35 +00:00
Bjorn Pettersson 3c6ce733f5 Fix for InlineSpiller accessing not updated dom tree base information.
Summary:
The InlineSpiller was accessing the DominatorTreeBase directly
through the public data member DT in the MachineDominatorTree.
This is not a good idea as the "cached" information in
SplitCriticalEdges is not applied before the access.
The DominatorTreeBase must be accessed through the member
function getBase() in MachineDominatorTree.

The fault was introduced in r266162.

I think the public data member DT in the MachineDominatorTree
should have been made private in the original code (r215576)
that introduced the concept of lazily updating the
MachineDominatorTree information from
MachineBasicBlock::SplitCriticalEdge().

Patch by Karl-Johan Karlsson <karl-johan.karlsson@ericsson.com>

Reviewers: wmi, qcolombet

Subscribers: llvm-commits, bjope, uabelho

Differential Revision: https://reviews.llvm.org/D27983

llvm-svn: 290950
2017-01-04 09:41:56 +00:00
Nitesh Jain b0bc573ca8 [LLC][MIPS] Fix crash after enabling LLVM_ENABLE_EXPENSIVE_CHECKS
Reviewers: sdardis, vkalintiris

Subscribers: jaydeep, slthakur, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D27841

llvm-svn: 290949
2017-01-04 09:34:37 +00:00
Ayman Musa 02f9533823 [X86][AVX512] Passing the appropriate memory operand class to INT_{U}COMIS{S|D} instructions
Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly.

Differential Revision: https://reviews.llvm.org/D28138

llvm-svn: 290948
2017-01-04 08:21:54 +00:00
Simon Pilgrim c76ea4b638 [X86] Attempt to pre-truncate arithmetic operations if useful
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.

This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).

Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.

I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.

Differential Revision: https://reviews.llvm.org/D28219

llvm-svn: 290947
2017-01-04 08:05:42 +00:00
Craig Topper d0aa53b9ae [AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources.
These are best handled with a vinsert32x4 or vinsert64x2 instruction.

llvm-svn: 290946
2017-01-04 07:32:03 +00:00
Craig Topper 83115a809f [AVX-512] Simplify code for creating 512-bit SHUF128 operations.
We don't need two loops and we can safely assume assume and hardcode the size of the widened mask.

llvm-svn: 290942
2017-01-04 07:31:51 +00:00
Peter Collingbourne 87dd2ab000 Support: Add YAML I/O support for custom mappings.
This will be used to YAMLify parts of the module summary.

Differential Revision: https://reviews.llvm.org/D28014

llvm-svn: 290935
2017-01-04 03:51:36 +00:00
David Majnemer cb892e9066 [InstCombine] Move casts around shift operations
It is possible to perform a left shift before zero extending if the
shift would only shift out zeros.

llvm-svn: 290928
2017-01-04 02:21:34 +00:00
David Majnemer 022d2a563b [InstCombine] Combine adds across a zext
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))

This is only possible if C2 is negative and C2 is greater than or equal to negative C1.

llvm-svn: 290927
2017-01-04 02:21:31 +00:00
Eugene Zelenko b2ca1b3f37 [Hexagon, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 290925
2017-01-04 02:02:05 +00:00
Teresa Johnson 5a8dba5bda [ThinLTO] Import type as decl only when non-null Identifier
As per post-commit review for r289993 (D27775), we can only safely
import a type as a decl if it has an Identifier, as the Name alone
is not enough to be unique across modules.

llvm-svn: 290915
2017-01-03 23:19:29 +00:00
Matt Arsenault 56ff4839ae InstCombine: Fold fabs on select of constants
llvm-svn: 290913
2017-01-03 22:40:34 +00:00
Sanjay Patel f0d1e77373 [InstCombine] use 'match' to reduce code bloat; NFCI
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.

So just in case we decide this is the right way, we might as well
have a cleaner implementation.

llvm-svn: 290912
2017-01-03 22:25:31 +00:00
Ahmed Bougacha 8a41319d8d [CodeGen] Further simplify returned call operand logic. NFC.
As Pete points out in r290905, CallSite lets us avoid duplicating this!

llvm-svn: 290909
2017-01-03 21:42:43 +00:00
Lang Hames b198e5585e [ExecutionEngine] Fix compile errors in OProfileJITEventListener.
Allows LLVM to build with LLVM_USE_OPROFILE=True.

Patch by Mark Dewing. Thanks Mark!

llvm-svn: 290908
2017-01-03 21:39:43 +00:00
Ahmed Bougacha 6aff744e7c [CodeGen] Simplify logic that looks for returned call operands. NFC-ish.
Use getReturnedArgOperand() instead of rolling our own.  Note that it's
equivalent because there can only be one 'returned' operand.

The existing code was also incorrect: there already was awkward logic to
ignore callee/EH blocks, but operands can now also be operand bundles,
in which case we'll look for non-existent parameter attributes.

Unfortunately, this isn't observable in-tree, as it only crashes when
exercising the regular call lowering logic with operand bundles.
Still, this is a nice small cleanup anyway.

llvm-svn: 290905
2017-01-03 20:33:22 +00:00
Kostya Serebryany 4986e819dc [libFuzzer] disable -print_pcs by default (was enabled by mistake)
llvm-svn: 290899
2017-01-03 18:51:28 +00:00
Michal Gorny 21c12044d2 [ADT] APFloatBase: Prevent collapsing semPPCDoubleDouble and semBogus
Provide a distinct contents for semBogus and semPPCDoubleDouble in order
to prevent compilers from collapsing them to a single memory address,
while we heavily rely on every semantic having distinct address.

This happens if insecure optimization collapsing identical values is
enabled. As a result, APFloats of semBogus are indistinguishable from
semPPCDoubleDouble -- and whenever the move constructor is used, the old
value beings being incorrectly recognized as a semPPCDoubleDouble.

Since the values in semPPCDoubleDouble are not used anywhere,
we can easily solve this issue via altering the value of one of the
fields and therefore ensuring that the collapse can not occur.

Differential Revision: https://reviews.llvm.org/D28112

llvm-svn: 290896
2017-01-03 16:33:50 +00:00
Craig Topper 48d232d3e7 [X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle.
llvm-svn: 290872
2017-01-03 07:36:41 +00:00
Craig Topper 785e58fdc9 [AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector inserts and avoid calling isShuffleEquivalent on a widened mask.
llvm-svn: 290871
2017-01-03 07:36:39 +00:00
Craig Topper 9496e3f916 [AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles corresponding to 256-bit subvector inserts.
llvm-svn: 290870
2017-01-03 07:00:40 +00:00
Craig Topper fa875a1d3d [AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions.
llvm-svn: 290869
2017-01-03 05:46:18 +00:00
Craig Topper be9ef55152 [X86] Remove trailing whitespace and an unnecessary line wrap. NFC
llvm-svn: 290867
2017-01-03 05:46:06 +00:00
Craig Topper 06bae884bd [X86] Fix header comment. NFC
llvm-svn: 290866
2017-01-03 05:46:05 +00:00
Craig Topper c849172105 [AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to select a masked operation.
llvm-svn: 290865
2017-01-03 05:46:02 +00:00
Craig Topper 0cda8bbf74 [AVX-512] Remove vinsert intrinsics and autoupgrade to native shufflevectors. There are some codegen problems here that I'll try to fix in future commits.
llvm-svn: 290864
2017-01-03 05:45:57 +00:00
Craig Topper 4d47c6ae57 [AVX-512] Remove vextract intrinsics and autoupgrade to native shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal.
Hopefully we can improve that in future patches.

llvm-svn: 290863
2017-01-03 05:45:46 +00:00
Matt Arsenault b264c94963 InstCombine: Add fma with constant transforms
DAGCombine already does these.

llvm-svn: 290860
2017-01-03 04:32:35 +00:00
Matt Arsenault 1cc294c85d InstCombine: Add fma + fabs/fneg transforms
fma (fneg x), (fneg y), z -> fma x, y, z
fma (fabs x), (fabs x), z -> fma x, x, z

llvm-svn: 290859
2017-01-03 04:32:31 +00:00
Dean Michael Berris f7e7b938ea [XRay] Merge instrumentation point table emission code into AsmPrinter.
Summary:
No need to have this per-architecture.  While there, unify 32-bit ARM's
behaviour with what changed elsewhere and start function names lowercase
as per the coding standards.  Individual entry emission code goes to the
entry's own class.

Fully tested on amd64, cross-builds on both ARMs and PowerPC.

Reviewers: dberris

Subscribers: aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28209

llvm-svn: 290858
2017-01-03 04:30:21 +00:00
Sanjay Patel 1c9867d009 [EarlyCSE] less else, more auto; NFC
llvm-svn: 290848
2017-01-03 00:16:24 +00:00
Sanjay Patel b38ad88e9f [InstCombine] use combineMetadataForCSE instead of copying it; NFCI
llvm-svn: 290844
2017-01-02 23:25:28 +00:00
Xin Tong 2940231ff0 Make sure total loop body weight is preserved in loop peeling
Summary:
Regardless how the loop body weight is distributed, we should preserve
total loop body weight. i.e. we should have same weight reaching the body of the loop
or its duplicates in peeled and unpeeled case.

Reviewers: mkuper, davidxl, anemet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28179

llvm-svn: 290833
2017-01-02 20:27:23 +00:00
Daniel Berlin de43ef9601 NewGVN: Clean up after removing possibility of null expressions.
llvm-svn: 290828
2017-01-02 19:49:17 +00:00
Sanjay Patel 65d533ca42 fix typo; NFC
llvm-svn: 290827
2017-01-02 19:05:11 +00:00
Sanjay Patel 4382997a13 [ValueTracking] remove stale comments; NFC
The checks were improved with:
https://reviews.llvm.org/rL290194

llvm-svn: 290826
2017-01-02 19:04:07 +00:00
Davide Italiano 67ada75d84 [NewGVN] Fold single-use variable inside the assertion.
It placates some bots which complain because they compile the
assertion out and think the variable is unused.

llvm-svn: 290825
2017-01-02 19:03:16 +00:00
Davide Italiano 841261624d [NewGVN] Restore old code to placate buildbots.
Apparently my suggestion of using ternary doesn't really work
as clang complains about incompatible types on LHS and RHS. Some
GCC versions happen to accept the code but clang behaviour is
correct here.

llvm-svn: 290822
2017-01-02 18:41:34 +00:00
Daniel Berlin 25f05b0ab7 NewGVN: Fix some formatting and comment issues
llvm-svn: 290820
2017-01-02 18:22:38 +00:00
Michal Gorny 89b6f16b3e [cmake] Add LLVM_ENABLE_DIA_SDK option, and expose it in LLVMConfig
Add an explicit LLVM_ENABLE_DIA_SDK option to control building support
for DIA SDK-based debugging. Control its value to match whether DIA SDK
support was found and expose it in LLVMConfig (alike LLVM_ENABLE_ZLIB).

Its value is needed for LLDB to determine whether to run tests requiring
DIA support. Currently it is obtained from llvm/Config/config.h;
however, this file is not available for standalone builds. Following
this change, LLDB will be modified to use the value from LLVMConfig.

Differential Revision: https://reviews.llvm.org/D26255

llvm-svn: 290818
2017-01-02 18:19:35 +00:00
Joerg Sonnenberger 7b83732a40 Emit .cfi_sections before the first .cfi_startproc
GNU as rejects input where .cfi_sections is used after .cfi_startproc,
if the new section differs from the old. Adjust our output to always
emit .cfi_sections before the first .cfi_startproc to minimize necessary
code.

Differential Revision: https://reviews.llvm.org/D28011

llvm-svn: 290817
2017-01-02 18:05:27 +00:00
Daniel Berlin 02c6b176e7 NewGVN: Add UnknownExpression and create them for things we can't symbolize. Kill fragile machinery for handling null expressions.
Summary:
This avoids the very fragile code for null expressions. We could also use a denseset that tracks which things have null expressions instead, but that seems pretty fragile and premature optimization.

This resolves a number of infinite loop cases, test reductions coming.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28193

llvm-svn: 290816
2017-01-02 18:00:53 +00:00
Daniel Berlin 589cecc6e9 NewGVN: Fix PR31480, PR31483, PR31499, by rewriting how memory congruence handling works.
Summary: Previously, we tried to fix up the equivalences during symbolic evaluation.  This does not work. Now, we change the equivalences during congruence finding, where it belongs.  We also initialize the equivalence table to give a maximal answer.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28192

llvm-svn: 290815
2017-01-02 18:00:46 +00:00
Davide Italiano b672537cbf [PMBuilder] Remove RunFloat2Int cl::opt.
The pass has been on by default for a long time without problems.

llvm-svn: 290814
2017-01-02 17:49:18 +00:00
Elena Demikhovsky d96200d60a Fixed shuffle-reverse cost on AVX-512.
(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately).

llvm-svn: 290812
2017-01-02 11:44:10 +00:00
Elena Demikhovsky 21706cbd24 AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.

In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).

* Shiffle-broadcast cost will be changed in Simon's upcoming patch.

Differential Revision: https://reviews.llvm.org/D28118

llvm-svn: 290810
2017-01-02 10:37:52 +00:00
Keno Fischer f7d84ee6ff Reapply "[CodeGen] Fix invalid DWARF info on Win64"
This reapplies rL289013 (reverted in rL289014) with the fixes identified
in D21731. Should hopefully pass the buildbots this time.

llvm-svn: 290809
2017-01-02 03:00:19 +00:00
Florian Hahn f872d230ad [selectiondag] Check PromotedFloats map during expansive checks.
Summary:
`PromotedFloats` needs to be checked in 
`DAGTypeLegalizer::PerformExpensiveChecks`. This patch fixes a few type
legalization failures with expansive checks for ARM fp16 tests.

Reviewers: baldrick, bogner, arsenm

Subscribers: arsenm, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28187

llvm-svn: 290796
2017-01-01 13:58:27 +00:00
Sanjoy Das 3bb2dbd665 Fix an issue with isGuaranteedToTransferExecutionToSuccessor
I'm not sure if this was intentional, but today
isGuaranteedToTransferExecutionToSuccessor returns true for readonly and
argmemonly calls that may throw.  This commit changes the function to
not implicitly infer nounwind this way.

Even if we eventually specify readonly calls as not throwing,
isGuaranteedToTransferExecutionToSuccessor is not the best place to
infer that.  We should instead teach FunctionAttrs or some other such
pass to tag readonly functions / calls as nounwind instead.

llvm-svn: 290794
2016-12-31 22:12:34 +00:00
Sanjoy Das 0945530d4d Avoid const_cast; NFC
llvm-svn: 290793
2016-12-31 22:12:31 +00:00
Sanjay Patel aea60846c4 [Inliner] remove unnecessary null checks from AddAlignmentAssumptions(); NFCI
We bail out on the 1st line if the assumption cache is not set, so there's
no need to check it after that.

llvm-svn: 290787
2016-12-31 17:54:05 +00:00
Sanjay Patel 7fd779f09f [ValueTracking] make dominator tree requirement explicit for isKnownNonNullFromDominatingCondition(); NFCI
I don't think this hole is currently exposed, but I crashed regression tests for
jump-threading and loop-vectorize after I added calls to isKnownNonNullAt() in
InstSimplify as part of trying to solve PR28430:
https://llvm.org/bugs/show_bug.cgi?id=28430

That's because they call into value tracking with a context instruction, but no
other parts of the query structure filled in.

For more background, see the discussion in:
https://reviews.llvm.org/D27855

llvm-svn: 290786
2016-12-31 17:37:01 +00:00
Philip Reames 0ef5d288b4 [SmallPtrSet] Introduce a find primitive and rewrite count/erase in terms of it
This was originally motivated by a compile time problem I've since figured out how to solve differently, but the cleanup seemed useful. We had the same logic - which essentially implemented find - in several places. By commoning them out, I can implement find and allow erase to be inlined at the call sites if profitable.

Differential Revision: https://reviews.llvm.org/D28183

llvm-svn: 290779
2016-12-31 02:33:22 +00:00
Dylan McKay 97cf837b46 [AVR] Optimize 16-bit ANDs with '1'
Summary: Fixes PR 31345

Reviewers: dylanmckay

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D28186

llvm-svn: 290778
2016-12-31 01:07:14 +00:00
Craig Topper d00db69227 [InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and llvm.x86.avx512.vcomi.ss don't use the upper elements of their input.
This was already done for the SSE/SSE2 version of the intrinsics.

llvm-svn: 290776
2016-12-31 00:45:06 +00:00
Craig Topper 991636312b [InstCombine][AVX-512] When turning intrinsics with masking into native IR, don't emit a select if the mask is known to be all ones.
This saves InstCombine the burden of having to optimize the select later.

llvm-svn: 290774
2016-12-30 23:06:28 +00:00
Philip Reames fac031a178 Add a comment for a todo in LoopUnroll post cleanup
llvm-svn: 290769
2016-12-30 22:10:19 +00:00
Philip Reames fdbb05b469 [LVI] Remove count/erase idiom in favor of checking result value of erase
Minor compile time win.  Avoids an additional O(N) scan in the case where we are removing an element and costs nothing when we aren't.

llvm-svn: 290768
2016-12-30 22:09:10 +00:00
Piotr Padlewski da36215017 [MemDep] Handle gep with zeros for invariant.group
Summary:
gep 0, 0 is equivalent to bitcast. LLVM canonicalizes it
to getelementptr because it make SROA can then handle it.

Simple case like

    void g(A &a) {
        z(a);
        if (glob)
            a.foo();
    }
    void testG() {
        A a;
        g(a);
    }

was not devirtualized with -fstrict-vtable-pointers because luck of
handling for gep 0 in Memory Dependence Analysis

Reviewers: dberlin, nlewycky, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28126

llvm-svn: 290763
2016-12-30 18:45:07 +00:00
Philip Reames a570a2303c [CVP] Adjust iteration order to reduce the amount of work required
CVP doesn't care about the order of blocks visited, but by using a pre-order traversal over the graph we can a) not visit unreachable blocks and b) optimize as we go so that analysis of later blocks produce slightly more precise results.

I noticed this via inspection and don't have a concrete example which points to the issue.  

llvm-svn: 290760
2016-12-30 18:00:55 +00:00
Philip Reames 1e48efcfc5 [LVI] Manually hoist computation from loop
Minor compile time win.  Not known to be a hot spot, just something I noticed while reading.

llvm-svn: 290759
2016-12-30 17:56:47 +00:00
Aaron Ballman 58a61e723e Caught a simple typo. I do not know of a way to test this, but it seems like an unlikely thing to regress in the future.
llvm-svn: 290757
2016-12-30 15:57:56 +00:00
Davide Italiano 75e39f9790 [NewGVN] Remove unneeded newline from assertion message.
llvm-svn: 290755
2016-12-30 15:01:17 +00:00
David Majnemer 5ec5f278c9 [InstCombine] Address post-commit feedback
llvm-svn: 290741
2016-12-30 03:36:17 +00:00
Kostya Serebryany 11a22bc39d [libFuzzer] cleaner implementation of -print_pcs=1
llvm-svn: 290739
2016-12-30 01:13:07 +00:00
Michael Kuperstein 76e06c8858 [LICM] When promoting scalars, allow inserting stores to thread-local allocas.
This is similar to the allocfn case - if an alloca is not captured, then it's
necessarily thread-local.

Differential Revision: https://reviews.llvm.org/D28170

llvm-svn: 290738
2016-12-30 01:03:17 +00:00
Dehao Chen cc76344ef5 Use continuous boosting factor for complete unroll.
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.

The patch also simplified the code by reducing one parameter in UP.

The patch only affects code-gen of two speccpu2006 benchmark:

445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.

Reviewers: mzolotukhin, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26989

llvm-svn: 290737
2016-12-30 00:50:28 +00:00
Michael Kuperstein 4a86a1921a [LICM] Remove unneeded tracking of whether changes were made. NFC.
"Changed" doesn't actually change within the loop, so there's
no reason to keep track of it - we always return false during
analysis and true after the transformation is made.

llvm-svn: 290735
2016-12-30 00:43:22 +00:00
Michael Kuperstein 62b98c3977 [LICM] Make logic in promoteLoopAccessesToScalars easier to follow. NFC.
llvm-svn: 290734
2016-12-30 00:39:00 +00:00
David Majnemer a1cfd7c5f8 [InstCombine] More thoroughly canonicalize the position of zexts
We correctly canonicalized (add (sext x), (sext y)) to (sext (add x, y))
where possible.  However, we didn't perform the same canonicalization
for zexts or for muls.

llvm-svn: 290733
2016-12-30 00:28:58 +00:00
Dylan McKay 453d042969 [AVR] Optimize 16-bit ORs with '0'
Summary: Fixes PR 31344

Authored by Anmol P. Paralkar

Reviewers: dylanmckay

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D28121

llvm-svn: 290732
2016-12-30 00:21:56 +00:00
Reid Kleckner 0e7c84c682 Simplify FunctionLoweringInfo.cpp with range for loops
I'm preparing to add some pattern matching code here, so simplify the
code before I do. NFC

llvm-svn: 290731
2016-12-30 00:21:38 +00:00
Reid Kleckner e8ee89f8b0 Include <algorithm> for std::max etc
llvm-svn: 290730
2016-12-30 00:15:40 +00:00
Michael Kuperstein ff36baefe7 [LICM] Compute exit blocks for promotion eagerly. NFC.
This moves the exit block and insertion point computation to be eager,
instead of after seeing the first scalar we can promote.

The cost is relatively small (the computation happens anyway, see discussion
on D28147), and the code is easier to follow, and can bail out earlier
if there's a catchswitch present.

llvm-svn: 290729
2016-12-29 23:11:19 +00:00
Michael Kuperstein 5566092963 [LICM] Don't try to promote in loops where we have no chance to promote. NFC.
We would check whether we have a prehader *or* dedicated exit blocks,
and go into the promotion loop. Then, for each alias set we'd check
if we have a preheader *and* dedicated exit blocks, and bail if not.

Instead, bail immediately if we don't have both.

llvm-svn: 290728
2016-12-29 22:51:22 +00:00
Michael Kuperstein b6da9cf3b7 [LICM] Only recompute LCSSA when we actually promoted something.
We want to recompute LCSSA only when we actually promoted a value.
This means we only need to look at changes made by promotion when
deciding whether to recompute it or not, not at regular sinking/hoisting.

(This was what the code was documented as doing, just not what it did)

Hopefully NFC.

llvm-svn: 290726
2016-12-29 22:37:13 +00:00
Daniel Berlin e0bd37e78f NewGVN: Fix PR 31491 by ensuring that we touch the right instructions. Change to one based numbering so we can assert we don't cause the same bug again.
llvm-svn: 290724
2016-12-29 22:15:12 +00:00