Commit Graph

55610 Commits

Author SHA1 Message Date
Andrew V. Tischenko 1fe3375620 [X86] MCA tests for XCHG*, XADD* and CMPXCHG* instructions
Differential Revision: https://reviews.llvm.org/D49912

llvm-svn: 339145
2018-08-07 14:36:43 +00:00
Sanjay Patel 948ff87d7d [InstSimplify] move minnum/maxnum with common op fold from instcombine
llvm-svn: 339144
2018-08-07 14:36:27 +00:00
Sanjay Patel b06d283909 [InstSimplify] add tests for minnum/maxnum with shared op; NFC
llvm-svn: 339142
2018-08-07 14:13:40 +00:00
Sanjay Patel b802d18df7 [InstSimplify] move misplaced minnum/maxnum tests; NFC
llvm-svn: 339141
2018-08-07 14:12:08 +00:00
Jonas Devlieghere 42243df3b9 Fix inconsistency with/without debug information (-g)
This fixes an inconsistency in code generation when compiling with or
without debug information (-g). When debug information is available in
an empty block, the original test would fail, resulting in possibly
different code.

Patch by: Jeroen Dobbelaere

Differential revision: https://reviews.llvm.org/D49467

llvm-svn: 339129
2018-08-07 12:14:01 +00:00
Aleksandar Beserminji 949a17c016 [mips] Handle branch expansion corner cases
When potential jump instruction and target are in the same segment, use
jump instruction with immediate field.

In cases where offset does not fit immediate value of a bc/j instructions,
offset is stored into register, and then jump register instruction is used.

Differential Revision: https://reviews.llvm.org/D48019

llvm-svn: 339126
2018-08-07 10:45:45 +00:00
Pavel Labath 2f0881160c [DebugInfo] Reduce debug_str_offsets section size
Summary:
The accelerator tables use the debug_str section to store their strings.
However, they do not support the indirect method of access that is
available for the debug_info section (DW_FORM_strx et al.).

Currently our code is assuming that all strings can/will be referenced
indirectly, and puts all of them into the debug_str_offsets section.
This is generally true for regular (unsplit) dwarf, but in the DWO case,
most of the strings in the debug_str section will only be used from the
accelerator tables. Therefore the contents of the debug_str_offsets
section will be largely unused and bloating the main executable.

This patch rectifies this by teaching the DwarfStringPool to
differentiate between strings accessed directly and indirectly. When a
user inserts a string into the pool it has to declare whether that
string will be referenced directly or not. If at least one user requsts
indirect access, that string will be assigned an index ID and put into
debug_str_offsets table. Otherwise, the offset table is skipped.

This approach reduces the overall binary size (when compiled with
-gdwarf-5 -gsplit-dwarf) in my tests by about 2% (debug_str_offsets is
shrunk by 99%).

Reviewers: probinson, dblaikie, JDevlieghere

Subscribers: aprantl, mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D49493

llvm-svn: 339122
2018-08-07 09:54:52 +00:00
Simon Pilgrim 7e18938793 [TargetLowering] Add support for non-uniform vectors to BuildUDIV
This patch refactors the existing TargetLowering::BuildUDIV base implementation to support non-uniform constant vector denominators.

It also includes a fold for MULHU by pow2 constants to SRL which can now more readily occur from BuildUDIV.

Differential Revision: https://reviews.llvm.org/D49248

llvm-svn: 339121
2018-08-07 09:51:34 +00:00
Simon Pilgrim 974a5a7d94 [X86][SSE] Add more non-uniform exact sdiv vector tests covering all/none ashr paths
llvm-svn: 339120
2018-08-07 09:31:22 +00:00
George Rimar 65a6828b17 [yaml2obj] - Add a support for changing EntSize.
I was trying to add a test case for LLD and found that it
is impossible to set sh_entsize via yaml.
The patch implements the missing part.

Differential revision: https://reviews.llvm.org/D50235

llvm-svn: 339113
2018-08-07 08:11:38 +00:00
Sjoerd Meijer a2ddddfd3e [ARM][NFC] Replaced tab characters in test file vfcmp.ll.
llvm-svn: 339111
2018-08-07 08:05:15 +00:00
Heejin Ahn e8653bb89a [WebAssembly] Enable atomic expansion for unsupported atomicrmws
Summary:
Wasm does not have direct counterparts to some of LLVM IR's atomicrmw
instructions (min, max, umin, umax, and nand). This enables atomic
expansion using cmpxchg instruction within a loop for those atomicrmw
instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49440

llvm-svn: 339084
2018-08-07 00:22:22 +00:00
Matt Arsenault 08f3fe4fae AMDGPU: cvt_pk_rtz_f16 canonicalizes
llvm-svn: 339078
2018-08-06 23:01:31 +00:00
Matt Arsenault e94ee833f9 AMDGPU: Handle some vector operations in isCanonicalized
llvm-svn: 339077
2018-08-06 22:45:51 +00:00
Stella Stamenova cc2404c01d [lit, python] Always add quotes around the python path in lit
Summary:
The issue with the python path is that the path to python on Windows can contain spaces. To make the tests always work, the path to python needs to be surrounded by quotes.

This change updates several configuration files which specify the path to python as a substitution and also remove quotes from existing tests.

Reviewers: asmith, zturner, alexshap, jakehehrlich

Reviewed By: zturner, alexshap, jakehehrlich

Subscribers: mehdi_amini, nemanjai, eraman, kbarton, jakehehrlich, steven_wu, dexonsmith, stella.stamenova, delcypher, llvm-commits

Differential Revision: https://reviews.llvm.org/D50206

llvm-svn: 339073
2018-08-06 22:37:44 +00:00
Matt Arsenault a29e76244a AMDGPU: Push fcanonicalize through partially constant build_vector
This usually avoids some re-packing code, and may
help find canonical sources.

llvm-svn: 339072
2018-08-06 22:30:44 +00:00
Peter Collingbourne 69dd7cd45e MC: Redirect .addrsig directives referring to private (.L) symbols to the section symbol.
This matches our behaviour for regular (i.e. relocated) references to
private symbols and therefore avoids needing to unnecessarily write
address-significant .L symbols to the object file's symbol table,
which can interfere with stack traces.

Fixes check-cfi after r339050.

llvm-svn: 339066
2018-08-06 21:59:58 +00:00
Matt Arsenault d49ab0b214 AMDGPU: Treat more custom operations as canonicalizing
Everything should quiet, and I think everything should
flush.

I assume the min3/med3/max3 follow the same rules
as regular min/max for flushing, which should at
least be conservatively correct.

There are still more operations that need to
be handled.

llvm-svn: 339065
2018-08-06 21:58:11 +00:00
Matt Arsenault ce6d61fba8 AMDGPU: Conversions always produce canonical results
Not sure why this was checking for denormals for f16.
My interpretation of the IEEE standard is conversions
should produce a canonical result, and the ISA manual
says denormals are created when appropriate.

llvm-svn: 339064
2018-08-06 21:51:52 +00:00
Philip Reames 94b29601ef [LICM] Further strengthen tests for hoisting guards and invariant.starts [NFC]
llvm-svn: 339062
2018-08-06 21:39:43 +00:00
Matt Arsenault f8768bfc84 AMDGPU: Fix implementation of isCanonicalized
If denormals are enabled, denormals are canonical.
Also fix a few other issues. minnum/maxnum are supposed
to canonicalize. Temporarily improve workaround for the
instruction behavior change in gfx9.

Handle selects and fcopysign.

The tests were also largely broken, since they were
checking for a flush used on some targets after the
store of the result.

llvm-svn: 339061
2018-08-06 21:38:27 +00:00
Philip Reames 9d7bb2f700 [LICM] Strengthen invariant.start hoisting tests [NFC]
llvm-svn: 339057
2018-08-06 21:18:34 +00:00
Reid Kleckner 15e91c3235 [X86] Fix assertion in subreg extraction
This assert fires when attempting to extract a subregister from the
global PIC base register. This virtual register SD node is not in the
VRBaseMap, so we shouldn't call getVR to look it up there. If this is a
RegisterSDNode, we should be able to use the virtual register directly.

Fixes PR38385

llvm-svn: 339056
2018-08-06 21:16:16 +00:00
Philip Reames 81c7dc93d2 [LICM] Add tests highlighting missing hoists for intrinsics [NFC]
llvm-svn: 339054
2018-08-06 21:06:15 +00:00
Evandro Menezes 6e137cb9f0 [SLC] Fix shrinking of pow()
Properly shrink `pow()` to `powf()` as a binary function and, when no other
simplification applies, do not discard it.

Differential revision: https://reviews.llvm.org/D50113

llvm-svn: 339046
2018-08-06 19:40:17 +00:00
Alexandre Ganea 741cc3531a [llvm-pdbutil] Support PDBs without a DBI stream
Differential Revision: https://reviews.llvm.org/D50258

llvm-svn: 339045
2018-08-06 19:35:00 +00:00
Easwaran Raman 10fd92dd94 [X86] Recognize a splat of negate in isFNEG
Summary:
Expand isFNEG so that we generate the appropriate F(N)M(ADD|SUB)
instructions in more cases. For example, the following sequence
a = _mm256_broadcast_ss(f)
d = _mm256_fnmadd_ps(a, b, c)

generates an fsub and fma without this patch and an fnma with this
change.

Reviewers: craig.topper

Subscribers: llvm-commits, davidxl, wmi

Differential Revision: https://reviews.llvm.org/D48467

llvm-svn: 339043
2018-08-06 19:23:38 +00:00
Craig Topper 0076477a4c [X86] When using "and $0" and "orl $-1" to store 0 and -1 for minsize, make sure the store isn't volatile
If the store is volatile this might be a memory mapped IO access. In that case we shouldn't generate a load that didn't exist in the source

Differential Revision: https://reviews.llvm.org/D50270

llvm-svn: 339041
2018-08-06 18:44:26 +00:00
Craig Topper f8a8c746e3 [X86] Add test cases to show bad use of "and $0" and "orl $-1" for minsize when the store is volatile
If the store is volatile we shouldn't be adding a little that didn't exist in the source.

llvm-svn: 339040
2018-08-06 18:44:21 +00:00
Wei Mi 3c1c088500 [RegisterCoalescer] Delay live interval update work until the rematerialization
for all the uses from the same def is done.

We run into a compile time problem with flex generated code combined with
`-fno-jump-tables`. The cause is that machineLICM hoists a lot of invariants
outside of a big loop, and drastically increases the compile time in global
register splitting and copy coalescing.  https://reviews.llvm.org/D49353
relieves the problem in global splitting. This patch is to handle the problem
in copy coalescing.

About the situation where the problem in copy coalescing happens. After
machineLICM, we have several defs outside of a big loop with hundreds or
thousands of uses inside the loop. Rematerialization in copy coalescing
happens for each use and everytime rematerialization is done, shrinkToUses
will be called to update the huge live interval. Because we have 'n' uses
for a def, and each live interval update will have at least 'n' complexity,
the total update work is n^2.

To fix the problem, we try to do the live interval update work in a collective
way. If a def has many copylike uses larger than a threshold, each time
rematerialization is done for one of those uses, we won't do the live interval
update in time but delay that work until rematerialization for all those uses
are completed, so we only have to do the live interval update work once.

Delaying the live interval update could potentially change the copy coalescing
result, so we hope to limit that change to those defs with many
(like above a hundred) copylike uses, and the cutoff can be adjusted by the
option -mllvm -late-remat-update-threshold=xxx.

Differential Revision: https://reviews.llvm.org/D49519

llvm-svn: 339035
2018-08-06 17:30:45 +00:00
Matt Arsenault 0d1b3934e2 AMDGPU: Fold v_lshl_or_b32 with 0 src0
Appears from expansion of some packed cases.

llvm-svn: 339025
2018-08-06 15:40:20 +00:00
Matt Arsenault 56b31d8d75 ValueTracking: Handle canonicalize in CannotBeNegativeZero
Also fix apparently missing test coverage for any of the
handling here.

llvm-svn: 339023
2018-08-06 15:16:26 +00:00
Matt Arsenault dbf77c5b41 AMDGPU: Rename check prefixes in test
Will avoid noisy diff in future change.

llvm-svn: 339022
2018-08-06 15:16:12 +00:00
Bryan Chan e023706471 [AArch64] Fix assertion failure on widened f16 BUILD_VECTOR
Summary:
Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the
same type. This fixes an assertion failure in VerifySDNode.

Reviewers: SjoerdMeijer, t.p.northover, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D50202

llvm-svn: 339013
2018-08-06 14:14:41 +00:00
Tim Northover 9956e4a24b ARM-MachO: don't add Thumb bit for addend to non-external relocation.
ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes
out that part of any addend in an object file. But it only does that for
symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting
that extra bit in other cases.

llvm-svn: 339007
2018-08-06 11:32:44 +00:00
Max Kazantsev 2dbbd64cb7 Re-enable "[ValueTracking] Teach isKnownNonNullFromDominatingCondition about AND"
The patch was reverted because of bug detected by sanitizer. The bug is fixed,
respective tests added.

Differential Revision: https://reviews.llvm.org/D50172

llvm-svn: 339005
2018-08-06 11:14:18 +00:00
Max Kazantsev 3271f379a9 Revert rL338990 to see if it causes sanitizer failures
Multiple failues reported by sanitizer-x86_64-linux, seem to be caused by this
patch. Reverting to see if they sustain without it.

Differential Revision: https://reviews.llvm.org/D50172

llvm-svn: 338994
2018-08-06 08:10:28 +00:00
Max Kazantsev 34b0666be9 [ValueTracking] Teach isKnownNonNullFromDominatingCondition about AND
`isKnownNonNullFromDominatingCondition` is able to prove non-null basing on `br` or `guard`
by `%p != null` condition, but is unable to do so basing on `(%p != null) && %other_cond`.
This patch allows it to do so.

Differential Revision: https://reviews.llvm.org/D50172
Reviewed By: reames

llvm-svn: 338990
2018-08-06 06:11:36 +00:00
Max Kazantsev eded4abef8 [GuardWidening] Widen guards with conditions of frequently taken dominated branches
If there is a frequently taken branch dominated by a guard, and its condition is available
at the point of the guard, we can widen guard with condition of this branch and convert
the branch into unconditional:

  guard(cond1)
  if (cond2) {
    // taken in 99.9% cases
    // do something
  } else {
    // do something else    
  }

Converts to

  guard(cond1 && cond2)
  // do something

Differential Revision: https://reviews.llvm.org/D49974
Reviewed By: reames

llvm-svn: 338988
2018-08-06 05:49:19 +00:00
David Bolvansky b7fcd10700 [NFC] Fixed inliner tests - 2
llvm-svn: 338973
2018-08-05 16:53:36 +00:00
David Bolvansky 2f1f3b10ad [NFC] Fixed inliner tests
llvm-svn: 338972
2018-08-05 16:30:46 +00:00
David Bolvansky c0aa4b75a4 Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338969
2018-08-05 14:53:08 +00:00
Eric Christopher 9855a5a0a1 Revert "Add a warning if someone attempts to add extra section flags to sections"
There are a bunch of edge cases and inconsistencies in how we're emitting sections
cause this warning to fire and it needs more work.

This reverts commit r335558.

llvm-svn: 338968
2018-08-05 14:23:37 +00:00
Roman Lebedev 365fa96055 [NFC][InstCombine] Add tests for sinking 'not' into 'xor' (PR38446)
https://rise4fun.com/Alive/IT3

Comes up in the [most ugliest]  signed int -> signed char  case of
-fsanitize=implicit-conversion (https://reviews.llvm.org/D50250)

Not sure if we want to do it always, or only when it is free to invert.

llvm-svn: 338967
2018-08-05 10:15:04 +00:00
Roman Lebedev 656a478e98 [NFC][InstCombine] Regenerate set.ll test
llvm-svn: 338965
2018-08-05 08:53:40 +00:00
Craig Topper fb33181038 [X86] Remove stale comments from a test. NFC
The 16-bit case was recently fixed so this comment no longer applies.

llvm-svn: 338964
2018-08-05 06:25:01 +00:00
David Bolvansky b82a5ec1b6 [InstCombine] [NFC] Tests for strcmp to memcmp transformation
llvm-svn: 338963
2018-08-05 05:46:56 +00:00
Chijun Sima 8b5de48d62 [TailCallElim] Preserve DT and PDT
Summary:
Previously, in the NewPM pipeline, TailCallElim recalculates the DomTree when it modifies any instruction in the Function.
For example,
```
CallInst *CI = dyn_cast<CallInst>(&I);
...
CI->setTailCall();
Modified = true;
...
if (!Modified || ...)
  return PreservedAnalyses::all();
```
After applying this patch, the DomTree only recalculates if needed (plus an extra insertEdge() + an extra deleteEdge() call).

When optimizing SQLite with `-passes="default<O3>"` pipeline of the newPM, the number of DomTree recalculation decreases by 6.2%, the number of nodes visited by DFS decreases by 2.9%. The time used by DomTree will decrease approximately 1%~2.5% after applying the patch.
 
Statistics:
```
Before the patch:
 23010 dom-tree-stats               - Number of DomTree recalculations
489264 dom-tree-stats               - Number of nodes visited by DFS -- DomTree
After the patch:
 21581 dom-tree-stats               - Number of DomTree recalculations
475088 dom-tree-stats               - Number of nodes visited by DFS -- DomTree
```

Reviewers: kuhar, dmgreen, brzycki, grosser, davide

Reviewed By: kuhar, brzycki

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49982

llvm-svn: 338954
2018-08-04 08:13:47 +00:00
Chijun Sima eacad79777 [ADCE] Remove the need of DomTree
Summary: ADCE doesn't need to query domtree.

Reviewers: kuhar, brzycki, dmgreen, davide, grosser

Reviewed By: kuhar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49988

llvm-svn: 338950
2018-08-04 02:50:12 +00:00
Aditya Nandakumar e07b3b737b [GISel]: Add Opcodes for CTLZ/CTTZ/CTPOP
https://reviews.llvm.org/D48600

Added IRTranslator support to translate these known intrinsics into GISel opcodes.

llvm-svn: 338944
2018-08-04 01:22:12 +00:00
Craig Topper 3c869cb5e5 [X86] Add isel patterns for atomic_load+sub+atomic_sub.
Despite the comment removed in this patch, this is beneficial when the RHS of the sub is a register.

llvm-svn: 338930
2018-08-03 22:08:30 +00:00
Craig Topper 84319d1b42 [X86] Add test cases to show missed opportunity to use RMW for atomic_load+sub+atomic_store.
llvm-svn: 338929
2018-08-03 22:08:28 +00:00
Reid Kleckner 8e40702c1c [X86] Re-generate abi-isel.ll checks with update_llc_test_checks.py
These tests were clearly auto-generated when they were converted to
FileCheck back in r80019 (2009), but we didn't have a fancy script to
keep them up to date then. I've reviewed the diff, and we should be
generating the exact same code sequences we used to.

After this, I plan to commit a change that changes our output slightly,
but in a way that is still correct. It will generate a large diff, and I
want it to be clearly correct, so I am regenerating these checks in
preparation for that.

llvm-svn: 338928
2018-08-03 21:58:25 +00:00
Reid Kleckner 5578b53c92 [X86] Make abi-isel.ll like update_llc_test_checks.py output
- Remove -asm-verbose=0 from every llc command. The tests still pass.
- Reorder the RUN lines to match CHECKs.
- Use -LABEL like update_llc_test_checks.py does.

llvm-svn: 338927
2018-08-03 21:58:12 +00:00
Reid Kleckner 13a9035190 [X86] Layout tests exactly as update_llc_test_checks.py would
Put the LLVM IR at the bottom of the function instead of the top.  In my
next patch, I will run update_llc_test_checks.py on this file, and I
want to only highlight the diffs in the CHECK lines. Hopefully by doing
this change first, the patch will be more understandable.

llvm-svn: 338926
2018-08-03 21:57:59 +00:00
Craig Topper d7391eefdf [X86] Remove RELEASE_ and ACQUIRE_ pseudo instructions. Use isel patterns and the normal instructions instead
At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering.

So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions.

Differential Revision: https://reviews.llvm.org/D50212

llvm-svn: 338925
2018-08-03 21:40:44 +00:00
Craig Topper 8c41136ca3 [X86] Autogenerate complete checks. NFC
llvm-svn: 338921
2018-08-03 20:58:14 +00:00
Anastasis Grammenos 4dfe279e00 [TRE][DebugInfo] Preserve Debug Location in new branch instruction
There are two branch instructions created
so the new test covers them both.

Differential Revision: https://reviews.llvm.org/D50263

llvm-svn: 338917
2018-08-03 20:27:13 +00:00
Craig Topper c4960582ec [SelectionDAG] Teach LegalizeVectorTypes to widen the mask input to a masked store.
The mask operand is visited before the data operand so we need to be able to widen it.

Fixes PR38436.

llvm-svn: 338915
2018-08-03 20:14:18 +00:00
Matt Arsenault c3dc8e65e2 DAG: Enhance isKnownNeverNaN
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.

Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.

llvm-svn: 338910
2018-08-03 18:27:52 +00:00
Artem Belevich 0a11b6366a [NVPTX] Handle __nvvm_reflect("__CUDA_ARCH").
Summary:
libdevice in recent CUDA versions relies on __nvvm_reflect() to select
GPU-specific bitcode. This patch addresses the requirement.

Reviewers: jlebar

Subscribers: jholewinski, sanjoy, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D50207

llvm-svn: 338908
2018-08-03 18:05:24 +00:00
Craig Topper feb2a58860 [X86] Add a DAG combine for the __builtin_parity idiom used by clang to enable better codegen
Clang uses "ctpop & 1" to implement __builtin_parity. If the popcnt instruction isn't supported this generates a large amount of code to calculate the population count. Instead we can bisect the data down to a single byte using xor and then check the parity flag.

Even when popcnt is supported, its still a good idea to split 64-bit data on 32-bit targets using an xor in front of a single popcnt. Otherwise we get two popcnts and an add before the and.

I've specifically targeted this at the sizes supported by clang builtins, but we could generalize this if we think that's useful.

Differential Revision: https://reviews.llvm.org/D50165

llvm-svn: 338907
2018-08-03 18:00:29 +00:00
Craig Topper b0ad9b9fd7 [X86] Add test cases for the current codegen of __builtin_parity.
Will be improved in a follow commit

llvm-svn: 338906
2018-08-03 18:00:23 +00:00
Joel Galenson cfe5bc158d Fix crash in bounds checking.
In r337830 I added SCEV checks to enable us to insert fewer bounds checks.  Unfortunately, this sometimes crashes when multiple bounds checks are added due to SCEV caching issues.  This patch splits the bounds checking pass into two phases, one that computes all the conditions (using SCEV checks) and the other that adds the new instructions.

Differential Revision: https://reviews.llvm.org/D49946

llvm-svn: 338902
2018-08-03 17:12:23 +00:00
Nicholas Wilson e408a89a3a [WebAssembly] Cleanup of the way globals and global flags are handled
Differential Revision: https://reviews.llvm.org/D44030

llvm-svn: 338894
2018-08-03 14:33:37 +00:00
Jonas Devlieghere 3a92c5c1d3 [DebugInfo/Verifier] Don't emit error for missing module in index
We don't expect module names to be present in the index. This patch adds
DW_TAG_module to the blacklist.

Differential revision: https://reviews.llvm.org/D50237

llvm-svn: 338878
2018-08-03 12:01:43 +00:00
Jonas Paulsson f107b7275c [SystemZ] Improve handling of instructions which expand to several groups
Some instructions expand to more than one decoder group.

This has been hitherto ignored, but is handled with this patch.

Review: Ulrich Weigand
https://reviews.llvm.org/D50187

llvm-svn: 338849
2018-08-03 10:43:05 +00:00
Sjoerd Meijer d62c5ec2fe [ARM] FP16: support vector zip and unzip
This is addressing PR38404.

Differential Revision: https://reviews.llvm.org/D50186

llvm-svn: 338835
2018-08-03 09:24:29 +00:00
Simon Pilgrim 4014fb1049 [X86] Add example of 'zero shift' guards on rotation patterns (PR34924)
Basic pattern that leaves an unnecessary select on a rotation by zero result. This variant is trivial - the more general case with a compare+branch to prevent execution of undefined shifts is more tricky.

llvm-svn: 338833
2018-08-03 09:20:02 +00:00
Sjoerd Meijer 9b30213828 [ARM] FP16: support VFMA
This is addressing PR38404.

llvm-svn: 338830
2018-08-03 09:12:56 +00:00
Craig Topper a7a12399a1 [X86] Remove all the vector NOP bitcast patterns. Use a few lines of code in the Select method in X86ISelDAGToDAG.cpp instead.
There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function.

The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change.

llvm-svn: 338824
2018-08-03 07:01:10 +00:00
Craig Topper e902b7d0b0 [X86] Support fp128 and/or/xor/load/store with VEX and EVEX encoded instructions.
Move all the patterns to X86InstrVecCompiler.td so we can keep SSE/AVX/AVX512 all in one place.

To save some patterns we'll use an existing DAG combine to convert f128 fand/for/fxor to integer when sse2 is enabled. This allows use to reuse all the existing patterns for v2i64.

I believe this now makes SHA instructions the only case where VEX/EVEX and legacy encoded instructions could be generated simultaneously.

llvm-svn: 338821
2018-08-03 06:12:56 +00:00
Hiroshi Inoue 73f8b255b6 [InstSimplify] fold extracting from std::pair (2/2)
This is the second patch of the series which intends to enable jump threading for an inlined method whose return type is std::pair<int, bool> or std::pair<bool, int>. 
The first patch is https://reviews.llvm.org/rL338485.

This patch handles code sequences that merges two values using `shl` and `or`, then extracts one value using `and`.

Differential Revision: https://reviews.llvm.org/D49981

llvm-svn: 338817
2018-08-03 05:39:48 +00:00
Craig Topper a80352c04e [X86] When post-processing the DAG to remove zero extending moves for YMM/ZMM, make sure the producing instruction is VEX/XOP/EVEX encoded.
If the producing instruction is legacy encoded it doesn't implicitly zero the upper bits. This is important for the SHA instructions which don't have a VEX encoded version. We might also be able to hit this with the incomplete f128 support that hasn't been ported to VEX.

llvm-svn: 338812
2018-08-03 04:49:42 +00:00
Craig Topper ded14af7aa [X86] Autogenerate complete checks. NFC
llvm-svn: 338811
2018-08-03 04:49:41 +00:00
Craig Topper 55697276dc [X86] Autogenerate complete checks. NFC
llvm-svn: 338802
2018-08-03 01:28:12 +00:00
Craig Topper b99281c9b8 [X86] Autogenerate complete checks. NFC
llvm-svn: 338799
2018-08-03 01:20:32 +00:00
Craig Topper 2c095444a4 [X86] Prevent promotion of i16 add/sub/and/or/xor to i32 if we can fold an atomic load and atomic store.
This makes them consistent with i8/i32/i64. Which still seems to be more aggressive on folding than icc, gcc, or MSVC.

llvm-svn: 338795
2018-08-03 00:37:34 +00:00
Philip Reames 5937368d4f [LICM] Remove unneccessary safety check to increase sinking effectiveness
This one requires a bit of explaination.  It's not every day you simply delete code to implement an optimization.  :)

The transform in question is sinking an instruction from a loop to the uses in loop exiting blocks.  We know (from LCSSA) that all of the uses outside the loop must be phi nodes, and after predecessor splitting, we know all phi users must have a single operand.  Since the use must be strictly dominated by the def, we know from the definition of dominance/ssa that the exit block must execute along a (non-strict) subset of paths which reach the def.  As a result, duplicating a potentially faulting instruction can not *introduce* a fault that didn't previously exist in the program.  

The full story is that this patch builds on "rL338671: [LICM] Factor out fault legality from canHoistOrSinkInst [NFC]" which pulled this logic out of a common helper routine.  As best I can tell, this check was originally added to the helper function for hoisting legality, later an incorrect fastpath for loads/calls was added, and then the bug was fixed by duplicating the fault safety check in the hoist path.  This left the redundant check in the common code to pessimize sinking for no reason.  I split it out in an NFC, and am not removing the unneccessary check.  I wanted there to be something easy to revert in case I missed something.

Reviewed by: Anna Thomas (in person)

llvm-svn: 338794
2018-08-03 00:21:56 +00:00
Dave Lee 3fb120f12e objdump: Better handling of Mach-O universal binaries
Summary:
With Mach-O, there is a flag requirement discrepancy between working with
universal binaries and thin binaries. Many flags that don't require the `-macho`
flag (for example `-private-headers` and `-disassemble`) fail to work on
universal binaries unless `-macho` is given. When this happens, the error
message is unhelpful, stating:

    The file was not recognized as a valid object file.

Which can lead to confusion.

This change allows generic flags to be used on universal binaries with and
without the `-macho` flag. This means flags that can be used for thin files can
be used consistently with fat files too.

To do this, the universal binary support within `ParseInputMachO()` is extracted
into a new function. This new function is called directly from `DumpInput()`
when the input binary is universal. Additionally the `-arch` flag validation in
`ParseInputMachO()` was extracted to be reused.

Reviewers: compnerd

Reviewed By: compnerd

Subscribers: keith, llvm-commits

Differential Revision: https://reviews.llvm.org/D48702

llvm-svn: 338792
2018-08-03 00:06:38 +00:00
Eli Friedman 1ba5e9ac24 [GlobalMerge] Allow merging globals with explicit section markings.
At least on ELF, it's impossible to tell from the object file whether
two globals with the same section marking were merged: the merged global
uses "private" linkage to hide its symbol, and the aliases look like
regular symbols. I can't think of any other reason to disallow it.
(Of course, we can only merge globals in the same section.)

The weird alignment handling matches AsmPrinter; our alignment handling
for global variables should probably be refactored.

Differential Revision: https://reviews.llvm.org/D49822

llvm-svn: 338791
2018-08-02 23:54:16 +00:00
Tim Renouf abd85fb1f5 [AMDGPU] Reworked SIFixWWMLiveness
Summary:
I encountered some problems with SIFixWWMLiveness when WWM is in a loop:

1. It sometimes gave invalid MIR where there is some control flow path
   to the new implicit use of a register on EXIT_WWM that does not pass
   through any def.

2. There were lots of false positives of registers that needed to have
   an implicit use added to EXIT_WWM.

3. Adding an implicit use to EXIT_WWM (and adding an implicit def just
   before the WWM code, which I tried in order to fix (1)) caused lots
   of the values to be spilled and reloaded unnecessarily.

This commit is a rework of SIFixWWMLiveness, with the following changes:

1. Instead of considering any register with a def that can reach the WWM
   code and a def that can be reached from the WWM code, it now
   considers three specific cases that need to be handled.

2. A register that needs liveness over WWM to be synthesized now has it
   done by adding itself as an implicit use to defs other than the
   dominant one.

Also added the following fixmes:

FIXME: We should detect whether a register in one of the above
categories is already live at the WWM code before deciding to add the
implicit uses to synthesize its liveness.

FIXME: I believe this whole scheme may be flawed due to the possibility
of the register allocator doing live interval splitting.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46756

Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94
llvm-svn: 338783
2018-08-02 23:31:32 +00:00
Craig Topper 63873db5c4 [X86] Allow 'atomic_store (neg/not atomic_load)' to isel to a RMW instruction.
There was a FIXMe in the td file about a type inference issue that was easy to fix.

llvm-svn: 338782
2018-08-02 23:30:38 +00:00
Craig Topper 2deeeae2a5 [X86] Add NEG and NOT test cases to atomic_mi.ll in preparation for fixing the FIXME in X86InstrCompiler.td to make these work for atomic load/store.
llvm-svn: 338781
2018-08-02 23:30:31 +00:00
Tim Renouf f1c7b92a6a [AMDGPU] Avoid using divergent value in mubuf addr64 descriptor
Summary:
This fixes a problem where a load from global+idx generated incorrect
code on <=gfx7 when the index is divergent.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47383

Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed
llvm-svn: 338779
2018-08-02 22:53:57 +00:00
Zachary Turner 666de23fbf [MS Demangler] Fix some tests that are no longer broken.
These were fixed with earlier patches, but had not yet been
re-enabled.

llvm-svn: 338778
2018-08-02 22:37:40 +00:00
Krzysztof Parzyszek d91a9e27a9 [Hexagon] Simplify CFG after atomic expansion
This will remove suboptimal branching from the generated ll/sc loops.
The extra simplification pass affects a lot of testcases, which have
been modified to accommodate this change: either by modifying the
test to become immune to the CFG simplification, or (less preferablt)
by adding option -hexagon-initial-cfg-clenaup=0.

llvm-svn: 338774
2018-08-02 22:17:53 +00:00
Heejin Ahn 4128cb0b6b [WebAssembly] Support for atomic.wait / atomic.wake instructions
Summary:
This adds support for atomic.wait / atomic.wake instructions in the wasm
thread proposal.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49395

llvm-svn: 338770
2018-08-02 21:44:24 +00:00
Craig Topper db89ec1185 [X86] Autogenerate complete checks. NFC
llvm-svn: 338765
2018-08-02 20:28:45 +00:00
Krzysztof Parzyszek 90f3249ce2 [SCEV] Properly solve quadratic equations
Differential Revision: https://reviews.llvm.org/D48283

llvm-svn: 338758
2018-08-02 19:13:35 +00:00
Sam Clegg 41d7047de5 [WebAssembly] Ensure bitcasts that would result in invalid wasm are removed by FixFunctionBitcasts
Rather than allowing invalid bitcasts to be lowered to wasm
call instructions that won't validate, generate wrappers that
contain unreachable thereby delaying the error until runtime.

Differential Revision: https://reviews.llvm.org/D49517

llvm-svn: 338744
2018-08-02 17:38:06 +00:00
Craig Topper 0423881820 [X86] Allow fake unary unpckhpd and movhlps to be commuted for execution domain fixing purposes
These instructions perform the same operation, but the semantic of which operand is destroyed is reversed. If the same register is used as both operands we can change the execution domain without worrying about this difference.

Unfortunately, this really only works in cases where the input register is killed by the instruction. If its not killed, the two address isntruction pass inserts a copy that will become a move instruction. This makes the instruction use different physical registers that contain the same data at the time the unpck/movhlps executes. I've considered using a unary pseudo instruction with tied operand to trick the two address instruction pass. We could then expand the pseudo post regalloc to get the same physical register on both inputs.

Differential Revision: https://reviews.llvm.org/D50157

llvm-svn: 338735
2018-08-02 16:48:01 +00:00
Simon Pilgrim ef494e1722 [X86][SSE] Add uniform/non-uniform exact sdiv vector tests covering all paths
Regenerated tests and tested on 64-bit (AVX2) as well.

llvm-svn: 338729
2018-08-02 15:34:51 +00:00
David Bolvansky 67647bcfbe [InstCombine] [NFC] Tests for select with binop fold
llvm-svn: 338727
2018-08-02 14:59:23 +00:00
Sanjay Patel 3f6e9a71f7 [InstSimplify] move minnum/maxnum with undef fold from instcombine
llvm-svn: 338719
2018-08-02 14:33:40 +00:00
Sjoerd Meijer 8e7fab0443 [ARM][NFC] Follow up of r338568
I disabled more tests than necessary, this enables them.

llvm-svn: 338717
2018-08-02 14:04:48 +00:00
Sanjay Patel f9a0d593e9 [ValueTracking] fix maxnum miscompile for cannotBeOrderedLessThanZero (PR37776)
This adds the NAN checks suggested in PR37776:
https://bugs.llvm.org/show_bug.cgi?id=37776

If both operands to maxnum are NAN, that should get constant folded, so we don't 
have to handle that case. This is the same assumption as other FP ops in this
function. Returning 'false' is always conservatively correct.

Copying from the bug report:

Currently, we have this for "when is cannotBeOrderedLessThanZero 
(mustBePositiveOrNaN) true for maxnum":
               L
        -------------------
        | Pos | Neg | NaN |
   ------------------------
   |Pos |  x  |  x  |  x  |
   ------------------------
 R |Neg |  x  |     |  x  |
   ------------------------
   |NaN |  x  |  x  |  x  |
   ------------------------


The cases with (Neg & NaN) are wrong. We should have:

                L
        -------------------
        | Pos | Neg | NaN |
   ------------------------
   |Pos |  x  |  x  |  x  |
   ------------------------
 R |Neg |  x  |     |     |
   ------------------------
   |NaN |  x  |     |  x  |
   ------------------------

Differential Revision: https://reviews.llvm.org/D50081

llvm-svn: 338716
2018-08-02 13:46:20 +00:00
Matt Arsenault 1f3977a856 DAG: Fix vector widening fcanonicalize
llvm-svn: 338715
2018-08-02 13:43:53 +00:00
Matt Arsenault 36cdcfadcf AMDGPU: Fix scalarizing v4f16 fcanonicalize
llvm-svn: 338714
2018-08-02 13:43:42 +00:00
Ben Dunbobbin d498dcdbbf [llvm-ar] Fix help text test. NFC.
Missed from @338703

llvm-svn: 338709
2018-08-02 12:27:01 +00:00
Simon Pilgrim 090d58b2b5 [X86][SSE] Add more UDIV nonuniform-constant vector tests
Ensure we cover all paths for vector data as requested on D49248

llvm-svn: 338698
2018-08-02 10:53:53 +00:00
Alexander Ivchenko 49168f6778 [GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value
This is logical continuation of https://reviews.llvm.org/D46018 (r332449)

Differential Revision: https://reviews.llvm.org/D49660

llvm-svn: 338685
2018-08-02 08:33:31 +00:00
David Green ea60446c6d [AArch64] Add support for got relocated LDR's
As a part of adding the tiny codemodel, we need to support ldr's with :got:
relocations on them. This seems to be mostly already done, just needs the
relocation type support.

Differential Revision: https://reviews.llvm.org/D50137

llvm-svn: 338673
2018-08-02 06:24:40 +00:00
Philip Reames 24b13cb06d [LICM] Expand tests to highlight an oddity in sinking implementation
llvm-svn: 338670
2018-08-02 03:54:29 +00:00
Lei Liu b9a7b7a84d Fix FCOPYSIGN expansion
In expansion of FCOPYSIGN, the shift node is missing when the two
operands of FCOPYSIGN are of the same size. We should always generate
shift node (if the required shift bit is not zero) to put the sign
bit into the right position, regardless of the size of underlying
types.

Differential Revision: https://reviews.llvm.org/D49973

llvm-svn: 338665
2018-08-02 01:54:12 +00:00
Nemanja Ivanovic e1a525ed06 [PowerPC] Do not round values prior to converting to integer
Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements
feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector
loses precision. This patch removes the code that adds these nodes
to true f64 operands. It also adds patterns required to ensure
the code is still vectorized rather than converting individual
elements and inserting into a vector.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38342

Differential Revision: https://reviews.llvm.org/D50121

llvm-svn: 338658
2018-08-02 00:03:22 +00:00
Lei Liu 8e422b8403 [AArch64] DWARF: do not generate AT_location for thread local
AArch64 ELF ABI does not define a static relocation type for TLS offset within
a module, which makes it impossible for compiler to generate a valid
DW_AT_location content for thread local variables. Currently LLVM generates an
invalid R_AARCH64_ABS64 relocation at the DW_AT_location field for a TLS
variable. That causes trouble for linker because thread local variable does
not have an absolute address at link time. AArch64 GCC solves the problem by
not generating DW_AT_location for thread local variables. We should do the
same in LLVM.

Differential Revision: https://reviews.llvm.org/D43860

llvm-svn: 338655
2018-08-01 23:46:49 +00:00
George Burgess IV 213d1d23ef Reland r338431: "Add DebugCounters to DivRemPairs"
(Previously reverted in r338442)

I'm told that the breakage came from us using an x86 triple on configs
that didn't have x86 enabled. This is remedied by moving the
debugcounter test to an x86 directory (where there's also a
opt-bisect-isel.ll test for similar reasons).

I can't repro the reverse-iteration failure mentioned in the revert with
this patch, so I assume that a misconfiguration on my end is what caused
that.

Original commit message:

    Add DebugCounters to DivRemPairs

    For people who don't use DebugCounters, NFCI.

    Patch by Zhizhou Yang!

    Differential Revision: https://reviews.llvm.org/D50033

llvm-svn: 338653
2018-08-01 23:14:14 +00:00
Sanjay Patel 28c7e41c09 [InstSimplify] move minnum/maxnum with same arg fold from instcombine
llvm-svn: 338652
2018-08-01 23:05:55 +00:00
Reid Kleckner a30a6d2c29 Load from the GOT for external symbols in the large, PIC code model
Do the same handling for external symbols that we do for jump table
symbols and global values.

Fixes one of the cases in PR38385

llvm-svn: 338651
2018-08-01 22:56:05 +00:00
John Baldwin c5d7e04052 [ASAN] Use the correct shadow offset for ASAN on FreeBSD/mips64.
Reviewed By: atanasyan

Differential Revision: https://reviews.llvm.org/D49939

llvm-svn: 338650
2018-08-01 22:51:13 +00:00
Matt Arsenault 709374d186 AMDGPU: Improve hack for packing conversion ops
Mutate the node type during selection when it
doesn't matter. This avoids an intermediate bitcast
node on targets with legal i16/f16.

Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16,
which I assume are OK.

llvm-svn: 338619
2018-08-01 20:13:58 +00:00
Matt Arsenault 55ab9213d3 AMDGPU: Partially fix handling of packed amdgpu_ps arguments
Fixes annoying limitations when writing tests.
Also remove more leftover code for manually scalarizing arguments
and return values.

llvm-svn: 338618
2018-08-01 19:57:34 +00:00
Heejin Ahn b3724b7169 [WebAssembly] Support for a ternary atomic RMW instruction
Summary: This adds support for a ternary atomic RMW instruction: cmpxchg.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49195

llvm-svn: 338617
2018-08-01 19:40:28 +00:00
Alexey Bataev d4dd7215f6 [DEBUGINFO] Disable emission of the dwarf sections, but allow directives.
Summary:
Added an option that allows to emit only '.loc' and '.file' kind debug
directives, but disables emission of the DWARF sections. Required for
NVPTX target to support profiling. It requires '.loc' and '.file'
directives, but does not require any DWARF sections for the profiler.

Reviewers: probinson, echristo, dblaikie

Subscribers: aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D46021

llvm-svn: 338616
2018-08-01 19:38:20 +00:00
Craig Topper c985d42903 [X86] Canonicalize the pattern for __builtin_ffs in a similar way to '__builtin_ffs + 5'
We now emit a move of -1 before the cmov and do the addition after the cmov just like the case with an extra addition.

This may be slightly worse for code size, but is more consistent with other compilers. And we might be able to hoist the mov -1 outside of loops.

llvm-svn: 338613
2018-08-01 18:38:46 +00:00
Craig Topper ffb8eb30ff [X86] Add test cases for the patterns used by __builtin_ffs.
We previously had tests for "__builtin_ffs + 5", but the SelectinoDAG without an extra addition came out slightly different.

llvm-svn: 338612
2018-08-01 18:38:43 +00:00
Jan Vesely 93b252799b AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS
Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8)

llvm-svn: 338610
2018-08-01 18:36:07 +00:00
Zachary Turner 44ebbc216a [MS Demangler] Properly demangle templated operators.
After we detected the presence of a template via ?$ we would proceed by
only demangling a simple unqualified name. This means we would fail on
templated operators (and perhaps other yet-to-be-determined things)

This was discovered while doing some refactoring to store richer
semantic information about the demangled types to pave the way for
overhauling the way we handle backreferences. (Specifically, we need to
defer recording or resolving back-references until a symbol has been
completely demangled, because we need to use information that only
occurs later in the mangled string to decide whether a back-reference
should be recorded.)

Differential Revision: https://reviews.llvm.org/D50145

llvm-svn: 338608
2018-08-01 18:32:47 +00:00
Vlad Tsyrklevich ab016e00ec [X86] FastISel fall back on !absolute_symbol GVs
Summary:
D25878, which added support for !absolute_symbol for normal X86 ISel,
did not add support for materializing references to absolute symbols for
X86 FastISel. This causes build failures because FastISel generates
PC-relative relocations for absolute symbols. Fall back to normal ISel
for references to !absolute_symbol GVs. Fix for PR38200.

Reviewers: pcc, craig.topper

Reviewed By: pcc

Subscribers: hiraditya, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D50116

llvm-svn: 338599
2018-08-01 17:44:37 +00:00
Simon Pilgrim b911d6721d [llvm-mca][x86] Add CMPXCHG instruction resource tests
I've put CMPXCHG8B/CMPXCHG16B in the same file, even though technically they are under separate CPUID bits all targets seem to support both (or neither).

llvm-svn: 338595
2018-08-01 17:25:11 +00:00
Sanjay Patel d5ae183034 [x86] remove stale FIXME note from test; NFC
This was fixed with rL338592.

llvm-svn: 338593
2018-08-01 17:18:50 +00:00
Sanjay Patel 8aac22e06a [SelectionDAG] fix bug in translating funnel shift with non-power-of-2 type
The bug is visible in the constant-folded x86 tests. We can't use the
negated shift amount when the type is not power-of-2:
https://rise4fun.com/Alive/US1r

...so in that case, use the regular lowering that includes a select
to guard against a shift-by-bitwidth. This path is improved by only
calculating the modulo shift amount once now.

Also, improve the rotate (with power-of-2 size) lowering to use
a negate rather than subtract from bitwidth. This improves the
codegen whether we have a rotate instruction or not (although
we can still see that we're not matching to a legal rotate in
all cases).

llvm-svn: 338592
2018-08-01 17:17:08 +00:00
Sanjay Patel 6d302c93cc [x86] add tests to show miscompile for funnel shift with weird size; NFC
llvm-svn: 338587
2018-08-01 16:59:54 +00:00
Simon Pilgrim 5c4fb14e07 [llvm-mca][x86] Add PREFETCHW instruction resource tests
These aren't just available via 3DNow! so test for them separately as well.

llvm-svn: 338584
2018-08-01 16:34:39 +00:00
Simon Pilgrim dcfa732b2f [llvm-mca][x86] Add PCLMUL instruction resource tests
Renamed the btver2 file that already contained them - the other targets were only testing the AVX versions

llvm-svn: 338583
2018-08-01 16:25:50 +00:00
Jordan Rupprecht d67c1e129b [llvm-objcopy] Add support for --rename-section flags from gnu objcopy
Summary:
Add support for --rename-section flags from gnu objcopy.

Not all flags appear to have an effect for ELF objects, but allowing them would allow easier drop-in replacement. Other unrecognized flags are rejected.

This was only tested by comparing flags printed by "readelf -e <.o>" against the output of gnu vs llvm objcopy, it hasn't been tested to be valid beyond that.

Reviewers: jakehehrlich, alexshap

Subscribers: llvm-commits, paulsemel, alexshap

Differential Revision: https://reviews.llvm.org/D49870

llvm-svn: 338582
2018-08-01 16:23:22 +00:00
Andrea Di Biagio 7f3bf5c1f9 [llvm-mca] Correctly update the rank in `Scheduler::select()`.
Found by inspection.

llvm-svn: 338579
2018-08-01 16:06:33 +00:00
Simon Pilgrim 34ac6533f4 [llvm-mca][x86] Add SET/TEST instruction resource tests
llvm-svn: 338576
2018-08-01 15:29:47 +00:00
Sjoerd Meijer 590e4e8dde [ARM] Armv8.2-A FP16 vector intrinsics tests
Clang support for the Armv8.2-A FP16 vector intrinsic was committed in
rC328277, but this was never followed up, i.e. the LLVM part is missing.

I've raised PR38404, and this is the first step to address this. I.e.,
this adds tests for the Armv8.2-A FP16 vector intrinsic, and thus shows
which intrinsics already work, and which need further work.

Differential Revision: https://reviews.llvm.org/D50142

llvm-svn: 338568
2018-08-01 14:43:59 +00:00
Simon Pilgrim e364e57ac9 [llvm-mca][x86] Add LEA instruction resource tests
We already added these to btver2, now add them to other targets, even though none of their models treat them specially (yet).

llvm-svn: 338565
2018-08-01 14:25:33 +00:00
Simon Pilgrim 6754913e95 [llvm-mca][x86] Add more x86-64 system instruction resource tests
CPUID, IN/OUT, INS/OUTS, INT, PAUSE, SCAS, UD2, XLAT

llvm-svn: 338563
2018-08-01 14:18:09 +00:00
Cameron McInally 04ae85859d [FPEnv] Widen illegal width StrictFP vector operations as needed
Differential Revision: https://reviews.llvm.org/D49806

llvm-svn: 338562
2018-08-01 14:17:19 +00:00
Bryan Chan 67106b5e08 [AArch64] Fix FCCMP with FP16 operands
Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection.

Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D50115

llvm-svn: 338554
2018-08-01 13:50:29 +00:00
Simon Pilgrim 5f41ab79c0 [llvm-mca][x86] Add CLFLUSHOPT instruction resource tests
llvm-svn: 338550
2018-08-01 13:34:17 +00:00
Simon Pilgrim bd014f4d91 [llvm-mca][x86] Add CMPS/LODS/MOVS/STOS string instruction resource tests
llvm-svn: 338532
2018-08-01 13:14:45 +00:00
Jonas Devlieghere 8acb74e01f [MC] Report fatal error for DWARF types for non-ELF object files
Getting the DWARF types section is only implemented for ELF object
files. We already disabled emitting debug types in clang (r337717), but
now we also report an fatal error (rather than crashing) when trying to
obtain this section in MC. Additionally we ignore the generate debug
types flag for unsupported target triples.

See PR38190 for more information.

Differential revision: https://reviews.llvm.org/D50057

llvm-svn: 338527
2018-08-01 12:53:06 +00:00
Ryan Taylor 894c8fd0e2 [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero
Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to zero, if so remove lod and change opcode to equivalent mapped _LZ.

Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71

Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49483

llvm-svn: 338523
2018-08-01 12:12:01 +00:00
Ulrich Weigand 58a9786e81 [SystemZ, TableGen] Fix shift count handling
The DAG combiner logic to simplify AND masks in shift counts is invalid.
While it is true that the SystemZ shift instructions ignore all but the
low 6 bits of the shift count, it is still invalid to simplify the AND
masks while the DAG still uses the standard shift operators (which are
*not* defined to match the SystemZ instruction behavior).

Instead, this patch performs equivalent operations during instruction
selection. For completely removing the AND, this now happens via
additional DAG match patterns implemented by a multi-alternative
PatFrags. For simplifying a 32-bit AND to a 16-bit AND, the existing DAG
patterns were already mostly OK, they just needed an output XForm to
actually truncate the immediate value.

Unfortunately, the latter change also exposed a bug in TableGen: it
seems XForms are currently only handled correctly for direct operands of
the outermost operation node. This patch also fixes that bug by simply
recurring through the whole pattern. This should be NFC for all other
targets.

Differential Revision: https://reviews.llvm.org/D50096

llvm-svn: 338521
2018-08-01 11:57:58 +00:00
Simon Pilgrim 18d025a732 [llvm-mca][x86] Add STC + STD instruction resource tests
llvm-svn: 338514
2018-08-01 11:00:11 +00:00
Petar Jovanovic 64c10ba8e2 [MIPS GlobalISel] Select global address
Select G_GLOBAL_VALUE for position dependent code.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D49803

llvm-svn: 338499
2018-08-01 09:03:23 +00:00
David Bolvansky fbbb83c782 Revert "Enrich inline messages", tests fail
llvm-svn: 338496
2018-08-01 08:02:40 +00:00
David Bolvansky 7f36cd9d96 Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338494
2018-08-01 07:37:16 +00:00
Martin Storsjo d4590c38ab [AArch64] Disallow the MachO specific .loh directive for windows
Also add a test for it being unsupported for linux.

Differential Revision: https://reviews.llvm.org/D49929

llvm-svn: 338493
2018-08-01 06:50:18 +00:00
Victor Leschuk 64e0c56717 [DWARF] Basic support for producing DWARFv5 .debug_addr section
This revision implements support for generating DWARFv5 .debug_addr section.
The implementation is pretty straight-forward: we just check the dwarf version
and emit section header if needed.

Reviewers: aprantl, dblaikie, probinson

Reviewed by: dblaikie

Differential Revision: https://reviews.llvm.org/D50005

llvm-svn: 338487
2018-08-01 05:48:06 +00:00
Hiroshi Inoue 02f79eae06 [InstSimplify] fold extracting from std::pair (1/2)
This patch intends to enable jump threading when a method whose return type is std::pair<int, bool> or std::pair<bool, int> is inlined.
For example, jump threading does not happen for the if statement in func.

std::pair<int, bool> callee(int v) {
  int a = dummy(v);
  if (a) return std::make_pair(dummy(v), true);
  else return std::make_pair(v, v < 0);
}

int func(int v) {
  std::pair<int, bool> rc = callee(v);
  if (rc.second) {
    // do something
  }

SROA executed before the method inlining replaces std::pair by i64 without splitting in both callee and func since at this point no access to the individual fields is seen to SROA.
After inlining, jump threading fails to identify that the incoming value is a constant due to additional instructions (like or, and, trunc).

This series of patch add patterns in InstructionSimplify to fold extraction of members of std::pair. To help jump threading, actually we need to optimize the code sequence spanning multiple BBs.
These patches does not handle phi by itself, but these additional patterns help NewGVN pass, which calls instsimplify to check opportunities for simplifying instructions over phi, apply phi-of-ops optimization to result in successful jump threading. 
SimplifyDemandedBits in InstCombine, can do more general optimization but this patch aims to provide opportunities for other optimizers by supporting a simple but common case in InstSimplify.

This first patch in the series handles code sequences that merges two values using shl and or and then extracts one value using lshr.

Differential Revision: https://reviews.llvm.org/D48828

llvm-svn: 338485
2018-08-01 04:40:32 +00:00
Jatin Bhateja 36432a70c1 [X86] Adding more test patterns for lea-opt (PR37939)
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50128

llvm-svn: 338483
2018-08-01 03:53:27 +00:00
Chandler Carruth 2ce191e220 [x86] Fix a really subtle miscompile due to a somewhat glaring bug in
EFLAGS copy lowering.

If you have a branch of LLVM, you may want to cherrypick this. It is
extremely unlikely to hit this case empirically, but it will likely
manifest as an "impossible" branch being taken somewhere, and will be
... very hard to debug.

Hitting this requires complex conditions living across complex control
flow combined with some interesting memory (non-stack) initialized with
the results of a comparison. Also, because you have to arrange for an
EFLAGS copy to be in *just* the right place, almost anything you do to
the code will hide the bug. I was unable to reduce anything remotely
resembling a "good" test case from the place where I hit it, and so
instead I have constructed synthetic MIR testing that directly exercises
the bug in question (as well as the good behavior for completeness).

The issue is that we would mistakenly assume any SETcc with a valid
condition and an initial operand that was a register and a virtual
register at that to be a register *defining* SETcc...

It isn't though....

This would in turn cause us to test some other bizarre register,
typically the base pointer of some memory. Now, testing this register
and using that to branch on doesn't make any sense. It even fails the
machine verifier (if you are running it) due to the wrong register
class. But it will make it through LLVM, assemble, and it *looks*
fine... But wow do you get a very unsual and surprising branch taken in
your actual code.

The fix is to actually check what kind of SETcc instruction we're
dealing with. Because there are a bunch of them, I just test the
may-store bit in the instruction. I've also added an assert for sanity
that ensure we are, in fact, *defining* the register operand. =D

llvm-svn: 338481
2018-08-01 03:01:58 +00:00
Chandler Carruth 014047a99a [x86/slh] Add unwind info to several tests to make it more obvious that
we aren't incorrectly generating any of it when doing SLH.

There was a bug that only occured with SLH that very much looked like it
could be caused by bad unwind info, and so this was a prime suspect.
Turns out that everything is fine, but this way we'll *see* if we end
up, for example, putting things we shouldn't inside the prolog.

llvm-svn: 338480
2018-08-01 03:01:10 +00:00
Hsiangkai Wang 5c63af0d04 [DebugInfo] Generate fixups as emitting DWARF .debug_line.
It is necessary to generate fixups in .debug_line as relaxation is
enabled due to the address delta may be changed after relaxation.

DWARF will record the mappings of lines and addresses in
.debug_line section. It will encode the information using special
opcodes, standard opcodes and extended opcodes in Line Number
Program. I use DW_LNS_fixed_advance_pc to encode fixed length
address delta and DW_LNE_set_address to encode absolute address
to make it possible to generate fixups in .debug_line section.

Differential Revision: https://reviews.llvm.org/D46850

llvm-svn: 338477
2018-08-01 02:18:06 +00:00
Amara Emerson 6cdfe29d8e [GlobalISel][IRTranslator] Use RPO traversal when visiting blocks to translate.
Previously we were just visiting the blocks in the function in IR order, which
is rather arbitrary. Therefore we wouldn't always visit defs before uses, but
the translation code relies on this assumption in some places.

Only codegen change seen in tests is an elision of a redundant copy.

Fixes PR38396

llvm-svn: 338476
2018-08-01 02:17:42 +00:00
Konstantin Zhuravlyov bb30ef7af4 AMDGPU: Add clamp bit to dot intrinsics
Differential Revision: https://reviews.llvm.org/D49874

llvm-svn: 338470
2018-08-01 01:31:30 +00:00
Evandro Menezes eb159e56a6 [PATCH] [SLC] Test simplification of pow() for vector types (NFC)
Add test case for the simplification of `pow()` for vector types that D50035
enables.

llvm-svn: 338463
2018-08-01 00:30:43 +00:00
Reid Kleckner b32ff46ff7 Revert r338354 "[ARM] Revert r337821"
Disable ARMCodeGenPrepare by default again. It is causing verifier
failues in V8 that look like:

  Duplicate integer as switch case
  switch i32 %trunc, label %if.end13 [
    i32 0, label %cleanup36
    i32 0, label %if.then8
  ], !dbg !4981
  i32 0
  fatal error: error in backend: Broken function found, compilation aborted!

I will continue reducing the test case and send it along.

llvm-svn: 338452
2018-07-31 23:09:42 +00:00
David L. Jones 9fb2e3ceaa [WebAssembly] Fix debug info tests after r338437.
After r338437, debug_ranges are no longer emitted. Previously, this was only
done for DWARF version 5 and above.

llvm-svn: 338448
2018-07-31 22:24:14 +00:00
Victor Leschuk 58d3399d8a [DWARF] Support for .debug_addr (consumer)
This patch implements basic support for parsing
  and dumping DWARFv5 .debug_addr section.

llvm-svn: 338447
2018-07-31 22:19:19 +00:00
Fangrui Song 87b4b8f7b4 [llvm-objcopy] Make --strip-debug strip .gdb_index
Summary:
See binutils-gdb/bfd/elf.c, GNU objcopy also strips .stab* (STABS)
.line* (DWARF 1) .gnu.linkonce.wi.* (linkonce section for .debug_info) but
I'm not sure we need to be compatible with it.

Reviewers: dblaikie, alexshap, jakehehrlich, jhenderson

Reviewed By: alexshap, jakehehrlich

Subscribers: aprantl, JDevlieghere, jakehehrlich, llvm-commits

Differential Revision: https://reviews.llvm.org/D50100

llvm-svn: 338443
2018-07-31 21:26:35 +00:00
George Burgess IV 497e8fad51 Revert r338431: "Add DebugCounters to DivRemPairs"
This reverts r338431; the test it added is making buildbots unhappy.
Locally, I can repro the failure on reverse-iteration builds.

llvm-svn: 338442
2018-07-31 21:18:44 +00:00
Matt Arsenault 118c47b6d1 AMDGPU: Split amdgcn/r600 fminnum/fmaxnum tests
R600 breaks on too many things to usefully test changes
with ieee_mode on vs. off.

llvm-svn: 338435
2018-07-31 20:38:42 +00:00
George Burgess IV 907f4f6a74 Add DebugCounters to DivRemPairs
For people who don't use DebugCounters, NFCI.

Patch by Zhizhou Yang!

Differential Revision: https://reviews.llvm.org/D50033

llvm-svn: 338431
2018-07-31 20:07:46 +00:00
Alexandre Ganea b92d3ad762 [CodeView] Add coverage test for r338308 (Fixed crash in type merging)
llvm-svn: 338423
2018-07-31 19:30:03 +00:00
Matt Arsenault feedabfde7 AMDGPU: Break 64-bit arguments into 32-bit pieces
llvm-svn: 338421
2018-07-31 19:29:04 +00:00
Matt Arsenault 0395da7842 AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls
This improves code for the same reasons as scalarizing 32-bit
element vectors.

llvm-svn: 338418
2018-07-31 19:17:47 +00:00
Alexandre Ganea ee8a720051 [CodeView] Minimal support for S_UNAMESPACE records
Differential Revision: https://reviews.llvm.org/D50007

llvm-svn: 338417
2018-07-31 19:15:50 +00:00
Matt Arsenault 9ced1e0d80 AMDGPU: Scalarize vector argument types to calls
When lowering calling conventions, prefer to decompose vectors
into the constitute register types. This avoids artifical constraints
to satisfy a wide super-register.

This improves code quality because now optimizations don't need to
deal with the super-register constraint. For example the immediate
folding code doesn't deal with 4 component reg_sequences, so by
breaking the register down earlier the existing immediate folding
code is able to work.

This also avoids the need for the shader input processing code
to manually split vector types.

llvm-svn: 338416
2018-07-31 19:05:14 +00:00
Vlad Tsyrklevich 48ed9acede Revert "[DebugInfo] Generate DWARF debug information for labels."
This reverts commits r338390 and r338398, they were causing LSan
failures on the ASan bot.

llvm-svn: 338408
2018-07-31 18:10:37 +00:00
Simon Pilgrim 5d9b00d15b [X86][SSE] Use ISD::MULHU for constant/non-zero ISD::SRL lowering (PR38151)
As was done for vector rotations, we can efficiently use ISD::MULHU for vXi8/vXi16 ISD::SRL lowering.

Shift-by-zero cases are still problematic (mainly on v32i8 due to extra AND/ANDN/OR or VPBLENDVB blend masks but v8i16/v16i16 aren't great either if PBLENDW fails) so I've limited this first patch to known non-zero cases if we can't easily use PBLENDW.

Differential Revision: https://reviews.llvm.org/D49562

llvm-svn: 338407
2018-07-31 18:05:56 +00:00
Simon Pilgrim 1f4b9cb6fe [llvm-mca][x86] Add 32-bit instruction resource tests
These aren't exhaustive, but cover some instructions that are only available in 32-bit mode (where would we be without good BCD math performance?).

llvm-svn: 338404
2018-07-31 17:33:08 +00:00
Zachary Turner d30700f82d Resubmit r338340 "[MS Demangler] Better demangling of template arguments."
This broke the build with GCC, but has since been fixed.

llvm-svn: 338403
2018-07-31 17:16:44 +00:00
Craig Topper bef126fb71 [X86] Add pattern matching for PMADDUBSW
Summary:
Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned.

A C example that triggers this pattern

```
static const int N = 128;

int8_t A[2*N];
uint8_t B[2*N];
int16_t C[N];

void foo() {
  for (int i = 0; i != N; ++i)
    C[i] = MIN(MAX((int16_t)A[2*i]*(int16_t)B[2*i] + (int16_t)A[2*i+1]*(int16_t)B[2*i+1], -32768), 32767);
}
```

Reviewers: RKSimon, spatel, zvi

Reviewed By: RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49829

llvm-svn: 338402
2018-07-31 17:12:08 +00:00
Craig Topper d03d44e0b9 [X86] Add test cases that could use PMADDUBSW.
llvm-svn: 338401
2018-07-31 17:12:06 +00:00
Francis Visoiu Mistrih ae8002c1cf [X86] Preserve more liveness information in emitStackProbeInline
This commit fixes two issues with the liveness information after the
call:

1) The code always spills RCX and RDX if InProlog == true, which results
in an use of undefined phys reg.
2) FinalReg, JoinReg, RoundedReg, SizeReg are not added as live-ins to
the basic blocks that use them, therefore they are seen undefined.

https://llvm.org/PR38376

Differential Revision: https://reviews.llvm.org/D50020

llvm-svn: 338400
2018-07-31 16:41:12 +00:00
Hsiangkai Wang 68c6860434 [DebugInfo] Fix build failed in 'clang-cmake-armv8-full'.
Builder clang-cmake-armv8-full failed due to the assembly 'comment'
notation is not '#' in the target. So, I use CHECK-SAME to avoid to
check the comment notation in the same line in the test case.

llvm-svn: 338398
2018-07-31 16:22:09 +00:00
Ewan Crawford d83beb804c Fix InstCombine address space assert
Workaround bug where the InstCombine pass was asserting on the IR added in lit
test, where we have a bitcast instruction after a GEP from an addrspace cast.

The second bitcast in the test was getting combined into
`bitcast <16 x i32>* %0 to <16 x i32> addrspace(3)*`, which looks like it should
be an addrspace cast instruction instead. Otherwise if control flow is allowed
to continue as it is now we create a GEP instruction
`<badref> = getelementptr inbounds <16 x i32>, <16 x i32>* %0, i32 0`. However
because the type of this instruction doesn't match the address space we hit an
assert when replacing the bitcast with that GEP.

```
void llvm::Value::doRAUW(llvm::Value*, bool): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
```

Differential Revision: https://reviews.llvm.org/D50058

llvm-svn: 338395
2018-07-31 15:53:03 +00:00
Sanjay Patel a35781fdf9 [InstCombine] regenerate checks and add tests for D50035; NFC
llvm-svn: 338392
2018-07-31 15:07:32 +00:00
Anastasis Grammenos ac3f8028da [DebugInfo][LCSSA] Preserve debug location in lcssa phis
Summary:
When inserting lcssa Phi Nodes in the exit block
mak sure to preserve the original instructions DL.

Reviewers: vsk

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D50009

llvm-svn: 338391
2018-07-31 14:54:52 +00:00
Hsiangkai Wang cbc58ada99 [DebugInfo] Generate DWARF debug information for labels.
There are two forms for label debug information in DWARF format.

1. Labels in a non-inlined function:

DW_TAG_label
  DW_AT_name
  DW_AT_decl_file
  DW_AT_decl_line
  DW_AT_low_pc

2. Labels in an inlined function:

DW_TAG_label
  DW_AT_abstract_origin
  DW_AT_low_pc

We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.

The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.

We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.

It also generates label debug information under global isel.

Differential Revision: https://reviews.llvm.org/D45556

llvm-svn: 338390
2018-07-31 14:48:32 +00:00
David Bolvansky ab79414f7b Revert Enrich inline messages
llvm-svn: 338389
2018-07-31 14:47:22 +00:00
Sanjay Patel 57d617d676 [InstCombine] auto-generate checks; NFC
llvm-svn: 338388
2018-07-31 14:27:30 +00:00
David Bolvansky b562dbabda Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338387
2018-07-31 14:25:24 +00:00
John Brawn cd5f37f3f1 [MemDep] Use PhiValuesAnalysis to improve alias analysis results
This is being done in order to make GVN able to better optimize certain inputs.
MemDep doesn't use PhiValues directly, but does need to notifiy it when things
get invalidated.

Differential Revision: https://reviews.llvm.org/D48489

llvm-svn: 338384
2018-07-31 14:19:29 +00:00
David Bolvansky 16d8a69b90 [InstSimplify] Fold another Select with And/Or pattern
Summary: Proof: https://rise4fun.com/Alive/L5J

Reviewers: lebedev.ri, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49975

llvm-svn: 338383
2018-07-31 14:17:15 +00:00
Matt Arsenault a5ed032118 DAG: Fix PromoteFloatResult for fcanonicalize
llvm-svn: 338382
2018-07-31 14:15:22 +00:00
Alexey Bataev c0c3a6ed5e [SLP] Fix PR38339: Instruction does not dominate all uses!
Summary:
If the ExtractElement instructions can be optimized out during the
vectorization and we need to reshuffle the parent vector, this
ShuffleInstruction may be inserted in the wrong place causing compiler
to produce incorrect code.

Reviewers: spatel, RKSimon, mkuper, hfinkel, javed.absar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49928

llvm-svn: 338380
2018-07-31 14:02:43 +00:00
Matt Arsenault 4aec86d37a AMDGPU: Fold undef fcanonicalize to qNaN
We could choose a free 0 for this, but this
matches the behavior for fmul undef, 1.0. Also,
the NaN use is more useful for folding use operations
although if it's not eliminated it is more expensive
in terms of code size.

llvm-svn: 338376
2018-07-31 13:34:31 +00:00
Matt Arsenault c1335eaf7e AMDGPU: Fix test check line bugs
llvm-svn: 338374
2018-07-31 13:25:23 +00:00
Andrea Di Biagio a1852b6194 [llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms.
This patch teaches llvm-mca how to identify dependency breaking instructions on
btver2.

An example of dependency breaking instructions is the zero-idiom XOR (example:
`XOR %eax, %eax`), which always generates zero regardless of the actual value of
the input register operands.
Dependency breaking instructions don't have to wait on their input register
operands before executing. This is because the computation is not dependent on
the inputs.

Not all dependency breaking idioms are also zero-latency instructions. For
example, `CMPEQ %xmm1, %xmm1` is independent on
the value of XMM1, and it generates a vector of all-ones.
That instruction is not eliminated at register renaming stage, and its opcode is
issued to a pipeline for execution. So, the latency is not zero. 

This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis
interface. That method takes as input an instruction (i.e. MCInst) and a
MCSubtargetInfo.
The default implementation of isDependencyBreaking() conservatively returns
false for all instructions. Targets may override the default behavior for
specific CPUs, and return a value which better matches the subtarget behavior.

In future, we should teach to Tablegen how to automatically generate the body of
isDependencyBreaking from scheduling predicate definitions. This would allow us
to expose the knowledge about dependency breaking instructions to the machine
schedulers (and, potentially, other codegen passes).

Differential Revision: https://reviews.llvm.org/D49310

llvm-svn: 338372
2018-07-31 13:21:43 +00:00
Jonas Paulsson 2f12e45d5a [SystemZ] Improve decoding in case of instructions with four register operands.
Since z13, the max group size will be 2 if any μop has more than 3 register
sources.

This has been ignored sofar in the SystemZHazardRecognizer, but is now
handled by recognizing those instructions and adjusting the tracking of
decoding and the cost heuristic for grouping.

Review: Ulrich Weigand
https://reviews.llvm.org/D49847

llvm-svn: 338368
2018-07-31 13:00:42 +00:00
Sanjay Patel 9a801cb598 [InstCombine] simplify code for A & (A ^ B) --> A & ~B
This fold was written in an odd way and tried to avoid
an endless loop by bailing out on all constants instead
of the supposedly problematic case of -1. But (X & -1) 
should always be simplified before we reach here, so I'm
not sure how that is a problem.

There were no tests for the commuted patterns, so I added
those at rL338364.

llvm-svn: 338367
2018-07-31 13:00:03 +00:00
Sanjay Patel 995138ce60 [InstCombine] move/add tests for xor+add fold; NFC
llvm-svn: 338364
2018-07-31 12:31:00 +00:00
Martin Storsjo 293079f2de [ARM] Allow automatically deducing the thumb instruction size for .inst
This matches GAS, that allows unsuffixed .inst for thumb.

Differential Revision: https://reviews.llvm.org/D49937

llvm-svn: 338357
2018-07-31 09:27:07 +00:00
Martin Storsjo af18947f0a [ARM] Support the .inst directive for MachO and COFF targets
Contrary to ELF, we don't add any markers that distinguish data generated
with .short/.long from normal instructions, so the .inst directive only
adds compatibility with assembly that uses it.

Differential Revision: https://reviews.llvm.org/D49936

llvm-svn: 338356
2018-07-31 09:27:01 +00:00
Martin Storsjo 3e3d39d07e [AArch64] Support the .inst directive for MachO and COFF targets
Contrary to ELF, we don't add any markers that distinguish data generated
with .long from normal instructions, so the .inst directive only adds
compatibility with assembly that uses it.

Differential Revision: https://reviews.llvm.org/D49935

llvm-svn: 338355
2018-07-31 09:26:52 +00:00
Sam Parker 2a6c842fda [ARM] Revert r337821
Re-enabling ARMCodeGenPrepare by default after failing to reproduce
the bootstrap issues that I was concerned it was causing.

llvm-svn: 338354
2018-07-31 09:04:14 +00:00
Hiroshi Inoue 2f6769be60 [InstSimplify] tests for D48828, D49981: fold extraction from std::pair
Minor touch up in the previous comment.

llvm-svn: 338351
2018-07-31 05:29:20 +00:00
Hiroshi Inoue 5427d3efc2 [InstSimplify] tests for D48828, D49981: fold extraction from std::pair
Updated unit tests for D48828 and D49981.

llvm-svn: 338350
2018-07-31 05:10:36 +00:00
Reid Kleckner d2bad6c639 Revert r338340 "[MS Demangler] Better demangling of template arguments."
Breaks the build with GCC, apparently.

llvm-svn: 338344
2018-07-31 01:08:42 +00:00
Craig Topper 9164b9b16e [X86] Stop accidentally running the Bonnell LEA fixup path on Goldmont.
In one place we checked X86Subtarget.slowLEA() to decide if the pass should run. But to decide what the pass should we only check isSLM. This resulted in Goldmont going down the Bonnell path.

llvm-svn: 338342
2018-07-31 00:43:54 +00:00
Ana Pazos 2baa767455 [RISCV] Fixed test case failure due to r338047
llvm-svn: 338341
2018-07-31 00:36:28 +00:00
Zachary Turner 4f85809a84 [MS Demangler] Better demangling of template arguments.
This patch fixes demangling of template aliases as template-template
arguments, and also fixes function pointers and references as
not type template parameters.  All of these can be properly
demangled now, so I've ported over the test
clang/test/CodeGenCXX/ms-template-callbacks.cpp.  All of these
tests pass

llvm-svn: 338340
2018-07-31 00:26:52 +00:00
Amara Emerson 1e8c164c63 [AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR.
Also refactors some existing code to materialize addresses for the large code
model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR.

This implements PR36390.

Differential Revision: https://reviews.llvm.org/D49903

llvm-svn: 338337
2018-07-31 00:09:02 +00:00
Amara Emerson 0e86c07077 [AArch64][GlobalISel] Make G_BLOCK_ADDR legal.
Differential Revision: https://reviews.llvm.org/D49902

llvm-svn: 338336
2018-07-31 00:08:56 +00:00
Amara Emerson 6aff5a7810 [GlobalISel] Add a G_BLOCK_ADDR opcode to handle IR blockaddress constants.
Differential Revision: https://reviews.llvm.org/D49900

llvm-svn: 338335
2018-07-31 00:08:50 +00:00
Zachary Turner a51403f5cc [MS Demangler] Add ms-return-qualifiers.test.
This is a copy of the tests from
clang/CodeGenCXX/ms-return-qualifiers.cpp converted to demangling
tests.

llvm-svn: 338330
2018-07-30 23:22:39 +00:00
Zachary Turner 931e879cef [MS Demangler] Add rudimentary C++11 Support
This patch adds support for demangling r-value references, new
operators such as the ""_foo operator, lambdas, alias types,
nullptr_t, and various other C++11'isms.

There is 1 failing test remaining in this file, which appears to
be related to back-referencing. This type of problem has the
potential to get ugly so I'd rather fix it in a separate patch.

Differential Revision: https://reviews.llvm.org/D50013

llvm-svn: 338324
2018-07-30 23:02:10 +00:00
Sanjay Patel 9f807f44b1 [DAGCombiner] transform sub-of-shifted-signbit to add
This is exchanging a sub-of-1 with add-of-minus-1:
https://rise4fun.com/Alive/plKAH

This is another step towards improving select-of-constants codegen (see D48970).

x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral.
I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but
I think canonicalizing to 'add' is more likely to produce further transforms because we have more
folds for 'add'.

Differential Revision: https://reviews.llvm.org/D49924

llvm-svn: 338317
2018-07-30 22:21:37 +00:00
David Bolvansky 6737b3a6a1 [InstCombine] Fold Select with binary op
Summary:
Fold
  %A = icmp eq i8 %x, 0
  %B = xor i8 %x, %z
  %C = select i1 %A, i8 %B, i8 %y
To
  %C = select i1 %A, i8 %z, i8 %y

Fixes https://bugs.llvm.org/show_bug.cgi?id=38345
Proof: https://rise4fun.com/Alive/43J

Reviewers: lebedev.ri, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49954

llvm-svn: 338300
2018-07-30 20:38:53 +00:00
Vlad Tsyrklevich 1c7160e85f Revert "[GVNHoist] Re-enable GVNHoist by default"
This reverts commit r338240 because it was causing OOMs on the UBSan
buildbot when building clang/lib/Sema/SemaChecking.cpp

llvm-svn: 338297
2018-07-30 20:07:33 +00:00
Manoj Gupta 9d83ce9043 [Inline] Copy "null-pointer-is-valid" attribute in caller.
Summary:
Normally, inling does not happen if caller does not have
"null-pointer-is-valid"="true" attibute but callee has it.

However, alwaysinline may force callee to be inlined.
In this case, if the caller has the "null-pointer-is-valid"="true"
attribute, copy the attribute to caller.

Reviewers: efriedma, a.elovikov, lebedev.ri, jyknight

Reviewed By: efriedma

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D50000

llvm-svn: 338292
2018-07-30 19:33:53 +00:00
David Bolvansky 76e95865f3 [InstSimplify] [NFC] Tests for Select with AND/OR fold
llvm-svn: 338285
2018-07-30 18:22:18 +00:00
Jessica Paquette fa3bee4756 [MachineOutliner][AArch64] Add support for saving LR to a register
This teaches the outliner to save LR to a register rather than the stack when
possible. This allows us to avoid bumping the stack in outlined functions in
some cases. By doing this, in a later patch, we can teach the outliner to do
something like this:

f1:
  ...
  bl OUTLINED_FUNCTION
  ...

f2:
  ...
  move LR's contents to a register
  bl OUTLINED_FUNCTION
  move the register's contents back

instead of falling back to saving LR in both cases.

llvm-svn: 338278
2018-07-30 17:45:28 +00:00
Jessica Paquette bbcc8895bb Add machine verifier to arm64-opt-remarks-lazy-bfi
Previously, I thought this was a Windows failure. Then I realized it failed on
every bot that used the verifier. This makes it use the verifier always, and
adds that pass to the pipeline checks so that it's consistent across all bots.

llvm-svn: 338272
2018-07-30 17:13:25 +00:00
David Bolvansky 2fa7fb14ea [DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed
Summary:
Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed.  This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op.

Patch by: sameconrad (Sam Conrad)

Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar

Reviewed By: lebedev.ri

Subscribers: efriedma, kparzysz, llvm-commits

Differential Revision: https://reviews.llvm.org/D47681

llvm-svn: 338270
2018-07-30 16:50:00 +00:00
Thomas Preud'homme 196149c943 Reapply "Fix crash on inline asm with 64bit matching input in 32bit GPR"
This reapplies commit r338206 reverted by r338214 since the bug that
r338206 uncovered has been fixed in r338268.

Add support for inline assembly with matching input operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR). Note that regular input is already handled by existing
code.

llvm-svn: 338269
2018-07-30 16:48:39 +00:00
Thomas Preud'homme 6c1b075299 Fix uninitialized read in ARM's PrintAsmOperand
Summary:
Fix read of uninitialized RC variable in ARM's PrintAsmOperand when
hasRegClassConstraint returns false. This was causing
inline-asm-operand-implicit-cast test to fail in r338206.

Reviewers: t.p.northover, weimingz, javed.absar, chill

Reviewed By: chill

Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D49984

llvm-svn: 338268
2018-07-30 16:45:40 +00:00
Jessica Paquette 7816531f3c Attempt to fix Windows test failure caused by r338133
It seems like the pass pipeline on Windows is slightly different than on Linux
and macOS. As a result, the arm64-opt-remarks-lazy-bfi test has been failing.

This switches a CHECK-NEXT to a CHECK-DAG to try and get this running properly
again.

It'd be nice to switch it back to a CHECK-NEXT if possible, but the CHECK-NEXT
lines following the line we care about (the optimization remark emitter)
do a pretty good job of enforcing the ordering we want.

Hopefully this works, since I don't have a Windows machine. ;)

Example failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/11295

llvm-svn: 338267
2018-07-30 16:36:22 +00:00
Evandro Menezes a7d48286fb [SLC] Refactor the simplication of pow() (NFC)
Use more meaningful variable names.  Mostly NFC.

llvm-svn: 338266
2018-07-30 16:20:04 +00:00
Simon Pilgrim 186b62c9e4 [X86] Regenerate NOBMI/BMI combine-select tests.
Test cleanup for D38128

llvm-svn: 338265
2018-07-30 16:18:38 +00:00
Simon Pilgrim 2d5118432b [X86] Regenerate PKU test to merge 32/64-bit rdpkru checks
Test cleanup for D38128

llvm-svn: 338264
2018-07-30 16:15:18 +00:00
Simon Pilgrim 22ff9f94bb [X86] Regenerate fast-isel tests.
Test cleanup for D38128

llvm-svn: 338262
2018-07-30 16:13:40 +00:00
Sander de Smalen e64206a02c [AArch64][SVE] Asm: Enable instructions to be prefixed.
This patch enables instructions that are destructive on their
destination- and first source operand, to be prefixed with a
MOVPRFX instruction.

This patch also adds a variety of tests:

- positive tests for all instructions and forms that accept a
  movprfx for either or both predicated and unpredicated forms.

- negative tests for all instructions and forms that do not accept
  an unpredicated or predicated movprfx.

- negative tests for the diagnostics that get emitted when a MOVPRFX
  instruction is used incorrectly.

This is patch [2/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593

Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D49593

llvm-svn: 338261
2018-07-30 16:05:45 +00:00
David Bolvansky 403826ee0f [InstCombine] [NFC] Added tests for Select with binop fold
llvm-svn: 338257
2018-07-30 15:38:42 +00:00
Krzysztof Parzyszek 24fae50905 [Hexagon] Simplify A4_rcmp[n]eqi R, 0
Consider cases when register R is known to be zero/non-zero, or when it
is defined by a C2_muxii instruction.

llvm-svn: 338251
2018-07-30 14:28:02 +00:00
John Brawn 898cd398d3 Adjust opt pass pipeline tests to cope with combination of r338240 and r338242
The combination of r338240 and r338242 causes the opt pass pipeline tests to
fail because of how r338242 makes BasicAA be invalidated more often. Adjust the
tests to reflect this.

llvm-svn: 338250
2018-07-30 14:26:24 +00:00
Matt Arsenault de496c32a4 AMDGPU: Reduce code size with fcanonicalize (fneg x)
When fcanonicalize is lowered to a mul, we can
use -1.0 for free and avoid the cost of the bigger
encoding for source modifers.

llvm-svn: 338244
2018-07-30 12:16:58 +00:00
Matt Arsenault f3c9a34def AMDGPU: Make fneg combine handle fcanonicalize
llvm-svn: 338243
2018-07-30 12:16:47 +00:00
John Brawn cd73fe8989 [BasicAA] Use PhiValuesAnalysis if available when handling phi alias
By using PhiValuesAnalysis we can get all the values reachable from a phi, so
we can be more precise instead of giving up when a phi has phi operands. We
can't make BaseicAA directly use PhiValuesAnalysis though, as the user of
BasicAA may modify the function in ways that PhiValuesAnalysis can't cope with.

For this optional usage to work correctly BasicAAWrapperPass now needs to be not
marked as CFG-only (i.e. it is now invalidated even when CFG is preserved) due
to how the legacy pass manager handles dependent passes being invalidated,
namely the depending pass still has a pointer to the now-dead dependent pass.

Differential Revision: https://reviews.llvm.org/D44564

llvm-svn: 338242
2018-07-30 11:52:08 +00:00
Alexandros Lamprineas de3ca964c1 [GVNHoist] Re-enable GVNHoist by default
My initial motivation for this came from https://reviews.llvm.org/D48122,
where it was pointed out that my change didn't fit well in SimplifyCFG and
therefore using GVNHoist was a better way to go. GVNHoist has been disabled
for a while as there was a list of bugs related to it.

I have fixed the following bugs:

https://bugs.llvm.org/show_bug.cgi?id=37808 -> https://reviews.llvm.org/D48372 (rL337149)
https://bugs.llvm.org/show_bug.cgi?id=36787 -> https://reviews.llvm.org/D49555 (rL337674)
https://bugs.llvm.org/show_bug.cgi?id=37445 -> https://reviews.llvm.org/D49425 (rL337680)

The next two bugs no longer occur, and it's unclear which commit fixed them:

https://bugs.llvm.org/show_bug.cgi?id=36635
https://bugs.llvm.org/show_bug.cgi?id=37791

I investigated this one and proved to be unrelated to GVNHoist, but a genuine bug in NewGvn:

https://bugs.llvm.org/show_bug.cgi?id=37660

To convince myself GVNHoist is in a good state I made a successful bootstrap build of LLVM.
Merging this change now in order to make it to the LLVM 7.0.0 branch.

Differential Revision: https://reviews.llvm.org/D49858

llvm-svn: 338240
2018-07-30 10:50:18 +00:00
Francis Visoiu Mistrih 7d003657de [MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall construction
The machine verifier asserts with:

Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542.

It calls analyzeBranch which tries to call getMBB if the opcode is
JMP_1, but in this case we do:

JMP_1 @OUTLINED_FUNCTION

I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used
with brtarget8.

Differential Revision: https://reviews.llvm.org/D49299

llvm-svn: 338237
2018-07-30 09:59:33 +00:00
Nicolai Haehnle 7f0d05d532 AMDGPU: Force skip over s_sendmsg and exp instructions
Summary:
These instructions interact with hardware blocks outside the shader core,
and they can have "scalar" side effects even when EXEC = 0. We don't
want these scalar side effects to occur when all lanes want to skip
these instructions, so always add the execz skip branch instruction
for basic blocks that contain them.

Also ensure that we skip scalar stores / atomics, though we don't
code-gen those yet.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48431

Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44
llvm-svn: 338235
2018-07-30 09:23:59 +00:00
Petr Pavlu 8b6eff4e77 [ARM] Fix over-alignment in arguments that are HA of 128-bit vectors
Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling
homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up
fully on stack, the function tries to pack all resulting items of the
aggregate as tightly as possible according to AAPCS.

Once the first item was laid out, the alignment used for consecutive
items was the size of one item. This logic went wrong for 128-bit
vectors because their alignment is normally only 64 bits, and so could
result in inserting unexpected padding between the first and second
element.

The patch fixes the problem by updating the alignment with the item size
only if this results in reducing it.

Differential Revision: https://reviews.llvm.org/D49720

llvm-svn: 338233
2018-07-30 08:49:30 +00:00
Zachary Turner 71c91f948a [MS Demangler] Demangle symbols in function scopes.
There are a couple of issues you run into when you start getting into
more complex names, especially with regards to function local statics.
When you've got something like:

    int x() {
      static int n = 0;
      return n;
    }

Then this needs to demangle to something like

    int `int __cdecl x()'::`1'::n

The nested mangled symbols (e.g. `int __cdecl x()` in the above
example) also share state with regards to back-referencing, so
we need to be able to re-use the demangler in the middle of
demangling a symbol while sharing back-ref state.

To make matters more complicated, there are a lot of ambiguities
when demangling a symbol's qualified name, because a function local
scope pattern (usually something like `?1??name?`) looks suspiciously
like many other possible things that can occur, such as `?1` meaning
the second back-ref and disambiguating these cases is rather
interesting.  The `?1?` in a local scope pattern is actually a special
case of the more general pattern of `? + <encoded number> + ?`, where
"encoded number" can itself have embedded `@` symbols, which is a
common delimeter in mangled names.  So we have to take care during the
disambiguation, which is the reason for the overly complicated
`isLocalScopePattern` function in this patch.

I've added some pretty obnoxious tests to exercise all of this, which
exposed several other problems related to back-referencing, so those
are fixed here as well. Finally, I've uncommented some tests that were
previously marked as `FIXME`, since now these work.

Differential Revision: https://reviews.llvm.org/D49965

llvm-svn: 338226
2018-07-30 03:12:34 +00:00
Sanjay Patel 577c705752 [InstCombine] try to fold 'add+sub' to 'not+add'
These are reassociated versions of the same pattern and
similar transforms as in rL338200 and rL338118.

The motivation is identical to those commits:
Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are
commutative/associative. It can also help codegen.

llvm-svn: 338221
2018-07-29 18:13:16 +00:00
Sanjay Patel 2daf28f9ce [InstCombine] add tests for another sub-not variant; NFC
llvm-svn: 338220
2018-07-29 18:07:28 +00:00
Sanjay Patel 54421ce918 [InstSimplify] fold funnel shifts with 0-shift amount
llvm-svn: 338218
2018-07-29 16:36:38 +00:00
Sanjay Patel 46af5835af [InstSimplify] add tests for funnel shift intrinsics; NFC
llvm-svn: 338217
2018-07-29 16:27:17 +00:00
Jonas Devlieghere ae1727e3dd [dsymutil] Simplify temporary file handling.
Dsymutil's update functionality was broken on Windows because we tried
to rename a file while we're holding open handles to that file. TempFile
provides a solution for this through its keep(Twine) method. This patch
changes dsymutil to make use of that functionality.

Differential revision: https://reviews.llvm.org/D49860

llvm-svn: 338216
2018-07-29 14:56:15 +00:00
Sanjay Patel 7312206f2f revert r338206 because the test does not pass
Example of bot failure:
http://lab.llvm.org:8011/builders/clang-cmake-armv8-quick/builds/5107/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Ainline-asm-operand-implicit-cast.ll

llvm-svn: 338214
2018-07-29 14:30:49 +00:00
Sander de Smalen ad88a99956 [AArch64][SVE] Asm: Support for WHILE(LE|LO|LS|LT) instructions.
The WHILE instructions generate a predicate that is true while the 
comparison of the first scalar operand (incremented for each predicate
element) with the second scalar operand is true and false thereafter.

  WHILELE  While incrementing signed scalar less than or equal to scalar
  WHILELO  While incrementing unsigned scalar lower than scalar
  WHILELS  While incrementing unsigned scalar lower than or same as scalar
  WHILELT  While incrementing signed scalar less than scalar

e.g.

  whilele  p0.s, x0, x1

  generates predicate p0 (for 32bit elements) by incrementing
  (signed) x0 and comparing that vector to splat(x1).

llvm-svn: 338211
2018-07-29 08:51:08 +00:00
Sander de Smalen e70ed3187c [AArch64][SVE] Asm: Instructions to perform serialized operations.
The instructions added in this patch permit active elements within
a vector to be processed sequentially without unpacking the vector.

  PFIRST      Set the first active element to true.
  PNEXT       Find next active element in predicate.
  CTERMEQ     Compare and terminate loop when equal.
  CTERMNE     Compare and terminate loop when not equal.

llvm-svn: 338210
2018-07-29 08:00:16 +00:00
Thomas Preud'homme 74ffd14e15 Fix crash on inline asm with 64bit matching input in 32bit GPR
Add support for inline assembly with matching input operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR). Note that regular input is already handled by existing
code.

llvm-svn: 338206
2018-07-28 21:33:39 +00:00
David Bolvansky f801a58c0d [InstCombine] Tests for fold Select with binary op
Differential Revision: https://reviews.llvm.org/D49961

llvm-svn: 338201
2018-07-28 17:13:33 +00:00
Sanjay Patel 818b253d3a [InstCombine] try to fold 'sub' to 'not'
https://rise4fun.com/Alive/jDd

Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are 
commutative/associative. It can also help codegen.  

llvm-svn: 338200
2018-07-28 16:48:44 +00:00
Sander de Smalen 5b3a289424 [AArch64][SVE] Asm: Support for PFALSE and PTEST instructions.
This patch adds PFALSE (unconditionally sets all elements of
the predicate to false) and PTEST (set the status flags for the
predicate).

llvm-svn: 338198
2018-07-28 14:18:11 +00:00
Matt Arsenault 8f9dde94b7 AMDGPU: Stop wasting argument registers with v3i32/v3f32
SelectionDAGBuilder widens v3i32/v3f32 arguments to
to v4i32/v4f32 which consume an additional register.
In addition to wasting argument space, this produces extra
instructions since now it appears the 4th vector component has
a meaningful value to most combines.

llvm-svn: 338197
2018-07-28 14:11:34 +00:00
Sander de Smalen 3878bf83dd [AArch64][SVE] Asm: Data-dependent loop predicate partitioning instructions.
This patch adds support for instructions that partition a predicate
based on data-dependent termination conditions in a loop.

  BRKA      Break after the first true condition
  BRKAS     Break after the first true condition, setting condition flags
  BRKB      Break before the first true condition
  BRKBS     Break before the first true condition, setting condition flags

  BRKPA     Break after the first true condition, propagating from the 
            previous partition
  BRKPAS    Break after the first true condition, propagating from the 
            previous partition, setting condition flags
  BRKPB     Break before the first true condition, propagating from the 
            previous partition
  BRKPBS    Break before the first true condition, propagating from the 
            previous partition, setting condition flags

  BRKN      Propagate break to next partition
  BKRNS     Propagate break to next partition, setting condition flags

llvm-svn: 338196
2018-07-28 14:04:52 +00:00
David Bolvansky 9800b710c2 [InstSimplify] Moved Select + AND/OR tests from InstCombine
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49957

llvm-svn: 338195
2018-07-28 13:52:45 +00:00
Matt Arsenault 72b0e38b26 AMDGPU: Stop trying to extend arguments for clover
This was trying to replace i8/i16 arguments with i32, which
was broken and no longer necessary.

llvm-svn: 338193
2018-07-28 12:34:25 +00:00
David Green fc4b0fe0a2 [GlobalOpt] Test array indices inside structs for out-of-bounds accesses
We now, from clang, can turn arrays of
  static short g_data[] = {16, 16, 16, 16, 16, 16, 16, 16, 0, 0, 0, 0, 0, 0, 0, 0};
into structs of the form
  @g_data = internal global <{ [8 x i16], [8 x i16] }> ...

GlobalOpt will incorrectly SROA it, not realising that the access to the first
element may overflow into the second. This fixes it by checking geps more
thoroughly.

I believe this makes the globalsra-partial.ll test case invalid as the %i value
could be out of bounds. I've re-purposed it as a negative test for this case.

Differential Revision: https://reviews.llvm.org/D49816

llvm-svn: 338192
2018-07-28 08:20:10 +00:00
David Bolvansky f947608ddf [InstCombine] Fold Select with AND/OR condition
Summary:
Fold
```
%A = icmp ne i8 %X, %V1
%B = icmp ne i8 %X, %V2
%C = or i1 %A, %B
%D = select i1 %C, i8 %X, i8 %V1
ret i8 %D
  =>
ret i8 %X

Fixes https://bugs.llvm.org/show_bug.cgi?id=38334
Proof: https://rise4fun.com/Alive/plI8

Reviewers: spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: craig.topper, llvm-commits

Differential Revision: https://reviews.llvm.org/D49919

llvm-svn: 338191
2018-07-28 06:55:51 +00:00
Craig Topper 50b1d4303d [DAGCombiner] Teach DAG combiner that A-(B-C) can be folded to A+(C-B)
This can be useful since addition is commutable, and subtraction is not.

This matches a transform that is also done by InstCombine.

llvm-svn: 338181
2018-07-28 00:27:25 +00:00
Wouter van Oortmerssen a90d24da1c Revert "[WebAssembly] Added default stack-only instruction mode for MC."
This reverts commit d3c9af4179eae7793d1487d652e2d4e23844555f.
(SVN revision 338164)

llvm-svn: 338176
2018-07-27 23:19:51 +00:00
Craig Topper c3e11bf3f7 [X86] Add support expanding multiplies by constant where the constant is -3/-5/-9 multplied by a power of 2.
These can be replaced with an LEA, a shift, and a negate. This seems to match what gcc and icc would do.

llvm-svn: 338174
2018-07-27 23:04:59 +00:00
Reid Kleckner ba82788ff6 [InstrProf] Don't register __llvm_profile_runtime_user
Refactor some FileCheck prefixes while I'm at it.

Fixes PR38340

llvm-svn: 338172
2018-07-27 22:21:35 +00:00
Wouter van Oortmerssen a67c4137c3 [WebAssembly] Added default stack-only instruction mode for MC.
Summary:
Moved Explicit Locals pass to last.
Made that pass obligatory.
Made it convert from register to stack based instructions, and removed the registers.
Fixes to related code that was expecting register based instructions.
Added the correct testing flag to all tests, depending on what the
format they were expecting so far.
Translated one test to stack format as example: reg-stackify-stack.ll

tested:
llvm-lit -v `find test -name WebAssembly`
unittests/MC/*

Reviewers: dschuff, sunfish

Subscribers: sbc100, jgravelle-google, eraman, aheejin, llvm-commits

Differential Revision: https://reviews.llvm.org/D49160

llvm-svn: 338164
2018-07-27 20:56:43 +00:00
David Bolvansky 173484d78c [InstCombine] [NFC] [Tests] Fold Select with AND/OR condition - fixed
Differential Revision: https://reviews.llvm.org/D49933

llvm-svn: 338161
2018-07-27 20:29:32 +00:00
Jessica Paquette f90edbe3d6 Recommit "Enable MachineOutliner by default under -Oz for AArch64"
Fixed the ASAN failure from before in r338148, so recommiting.

This patch enables the MachineOutliner by default in AArch64 under -Oz.

The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.

We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!

llvm-svn: 338160
2018-07-27 20:18:27 +00:00
David Bolvansky 1b82617473 [InstCombine] [NFC] [Tests] Fold Select with AND/OR condition
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49932

llvm-svn: 338159
2018-07-27 20:18:12 +00:00
Evandro Menezes f611ce82c0 [SLC] Test simplification of pow(x, 0.333...) to cbrt(x) (NFC)
Add test case for simplifying `pow(x, 0.333...)` into `cbrt(x)`, which
D49040 enables.

llvm-svn: 338152
2018-07-27 18:56:47 +00:00
Sanjay Patel 06c7d5aef6 [AArch64, PowerPC, x86] add more signbit math tests; NFC
The tests with a constant sub operand were added with rL338143,
but the potential transform doesn't have that requirement, so
adding more tests with variable operands.

llvm-svn: 338150
2018-07-27 18:31:21 +00:00
Evandro Menezes fcca45f0dd [ARM] Add new target feature to fuse literal generation
This feature enables the fusion of such operations on Cortex A57 and Cortex
A72, as recommended in their Software Optimisation Guides, sections 4.14 and
4.11, respectively.

Differential revision: https://reviews.llvm.org/D49563

llvm-svn: 338147
2018-07-27 18:16:47 +00:00
Sanjay Patel efac39eef6 [AArch64, PowerPC, x86] add more signbit math tests; NFC
llvm-svn: 338143
2018-07-27 18:12:29 +00:00
Jessica Paquette faea2d3130 Revert "Enable MachineOutliner by default under -Oz for AArch64"
It failed an Asan test on a bot:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/21543/steps/check-llvm%20asan/logs/stdio

Fixing that before recommitting.

llvm-svn: 338136
2018-07-27 17:25:38 +00:00
Yonghong Song 04ccfda075 bpf: add missing RegState to notify MachineInstr verifier necessary register usage
Errors like the following are reported by:

  https://urldefense.proofpoint.com/v2/url?u=http-3A__lab.llvm.org-3A8011_builders_llvm-2Dclang-2Dx86-5F64-2Dexpensive-2Dchecks-2Dwin_builds_11261&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=929oWPCf7Bf2qQnir4GBtowB8ZAlIRWsAdTfRkDaK-g&s=9k-wbEUVpUm474hhzsmAO29VXVvbxJPWD9RTgCD71fQ&e=

  *** Bad machine code: Explicit definition marked as use ***
  - function:    cal_align1
  - basic block: %bb.0 entry (0x47edd98)
  - instruction: LDB $r3, $r2, 0
  - operand 0:   $r3

This is because RegState info was missing for ScratchReg inside
expandMEMCPY. This caused incomplete register usage information to
MachineInstr verifier which then would complain as there could be potential
code-gen issue if the complained MachineInstr is used in place where
register usage information matters even though the memcpy expanding is not
in such case as it happens at the last stage of IR optimization pipeline.

We should always specify those register usage information which compiler
couldn't deduct automatically whenever we add a hardware register manually.

Reported-by: Builder llvm-clang-x86_64-expensive-checks-win Build #11261
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 338134
2018-07-27 16:58:52 +00:00
Jessica Paquette d4229b985c Enable MachineOutliner by default under -Oz for AArch64
This patch enables the MachineOutliner by default in AArch64 under -Oz.

The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.

We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!

llvm-svn: 338133
2018-07-27 16:44:42 +00:00
Sanjay Patel c7abb416dc [DAGCombiner] fold 'not' with signbit math
This is a follow-up suggested in D48970. 

Alive proofs:
https://rise4fun.com/Alive/sII

We can eliminate an instruction in the usual select-of-constants 
to bit hack transform by adjusting the add/sub with constant.
This is always a win. 

There are more transforms that are likely wins, but they may need 
target hooks in case some targets do not benefit. 

This is another step towards making up for canonicalizing to 
select-of-constants in rL331486.

llvm-svn: 338132
2018-07-27 16:42:55 +00:00
Sanjay Patel 1812d33e22 [x86] add more tests for signbit math; NFC
llvm-svn: 338131
2018-07-27 16:22:40 +00:00
Sanjay Patel 60c04b961e [PowerPC] add more tests for signbit math; NFC
llvm-svn: 338130
2018-07-27 16:22:18 +00:00
Sanjay Patel f815bc658b [AArch64] add more tests for signbit math; NFC
llvm-svn: 338129
2018-07-27 16:21:56 +00:00
Jan Vesely 6ff58ed5ca AMDGPU/R600: Add MOV instructions to BFE patterns
R600 can't handle immediates for BFE, these will be eliminated later.
Fixes powr/pow regressions n r600 since r334817

Differential Revision: https://reviews.llvm.org/D49641

llvm-svn: 338127
2018-07-27 15:00:13 +00:00
Sander de Smalen a703b8dc71 [AArch64][SVE] Asm: Predicated integer reductions.
This patch adds support for various integer reduction operations:

  SADDV    signed add reduction to scalar
  UADDV    unsigned add reduction to scalar

  SMAXV    signed maximum reduction to scalar
  SMINV    signed minimum reduction to scalar
  UMAXV    unsigned maximum reduction to scalar
  UMINV    unsigned minimum reduction to scalar

  ANDV     logical AND reduction to scalar
  ORV      logical OR reduction to scalar
  EORV     logical EOR reduction to scalar

The reduction is predicated, e.g.
  smaxv s0, p0, z1.s

performs a signed maximum reduction on active elements in z1,
and stores the (signed max value) result in s0.

llvm-svn: 338126
2018-07-27 14:24:55 +00:00
Sander de Smalen fcb636d222 [AArch64][SVE] Asm: Predicated floating point reductions.
This patch adds support for various floating-point
reduction operations:

  FADDA    strictly-ordered add reduction, accumulating in scalar
  FADDV    recursive add reduction to scalar
  FMAXV    recursive max reduction to scalar
  FMINV    recursive min reduction to scalar
  FMAXNMV  recursive max number reduction to scalar
  FMINNMV  recursive min number reduction to scalar

The reduction is predicated, e.g.

  fadda d0, p0, d0, z1.d

performs the add-reduction in strict order on active elements
in z1, accumulating into d0.

  faddv d0, p0, z1.d

performs the add-reduction (not in strict order)
on active elements in z1, storing the result in d0.

llvm-svn: 338123
2018-07-27 13:58:48 +00:00
Sander de Smalen 88e154ff90 [AArch64][SVE] Asm: Support for FEXPA and FTSSEL.
This patch adds support for transcendental acceleration
instructions 'FEXPA' (exponential accelerator) and 'FTSSEL'
(trigonometric select coefficient).

llvm-svn: 338121
2018-07-27 12:40:09 +00:00
Sander de Smalen 71929e7cad [AArch64][SVE] Asm: Support for FRECPE and FRSQRTE.
Support for floating-point instructions for reciprocal
estimate (FRECPE) and reciprocal square root estimate (FRSQRTE).

llvm-svn: 338120
2018-07-27 12:26:24 +00:00
Sanjay Patel 78e4b4d3c4 [InstCombine] not(sub X, Y) --> add (not X), Y
The tests with constants show a missing optimization.
Analysis for adds is better than subs, so this can also
help with other transforms. And codegen is better with 
adds for targets like x86 (destructive ops, no sub-from).

https://rise4fun.com/Alive/llK

llvm-svn: 338118
2018-07-27 10:54:48 +00:00
Sanjay Patel eee52b5090 [InstCombine] add tests for not+sub; NFC
llvm-svn: 338117
2018-07-27 10:45:04 +00:00
Max Kazantsev 4d980515d2 [SimplifyIndVar] Canonicalize comparisons to unsigned while eliminating truncs
This is a follow-up for the patch rL335020. When we replace compares against
trunc with compares against wide IV, we can also replace signed predicates with
unsigned where it is legal.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D48763

llvm-svn: 338115
2018-07-27 09:43:39 +00:00
Matt Arsenault 0183c56c11 AMDGPU: Fix code size for return_to_epilog pseudo
llvm-svn: 338113
2018-07-27 09:15:03 +00:00
Anastasis Grammenos f6e143e67f Revert "[LV][DebugInfo] Set DL to the middle block Icmp instruction"
This reverts commit r338106.

llvm-svn: 338109
2018-07-27 08:22:54 +00:00
Hiroshi Inoue eeab694cea [InstSimplify] tests for D48828: fold extraction from std::pair
This commit includes unit tests for D48828, which enhances InstSimplify to enable jump threading with a method whose return type is std::pair<int, bool> or std::pair<bool, int>.
I am going to commit the actual transformation later.

llvm-svn: 338107
2018-07-27 07:21:02 +00:00
Anastasis Grammenos 03948d0e0f [LV][DebugInfo] Set DL to the middle block Icmp instruction
Reviewers: hsaito

Differential Revision: https://reviews.llvm.org/D49746

llvm-svn: 338106
2018-07-27 07:12:44 +00:00
Tom Stellard e9bdc5f1d8 AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 types
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D49624

llvm-svn: 338102
2018-07-27 06:04:40 +00:00
Craig Topper 561e298e29 [X86] Remove an unnecessary 'if' that prevented treating INT64_MAX and -INT64_MAX as power of 2 minus 1 in the multiply expansion code.
Not sure why they were being explicitly excluded, but I believe all the math inside the if works. I changed the absolute value to be uint64_t instead of int64_t so INT64_MIN+1 wouldn't be signed wrap.

llvm-svn: 338101
2018-07-27 05:56:27 +00:00
Bob Haarman eae4742d81 [LTO] Don't internalize declarations
Summary:
Some links were failing with "Global is external, but doesn't have
external or weak linkage!" in ThinLTO builds with debug
information. This happened when we elide the body of a global that is
referenced by debug info. This results in a declaration, which we
would then internalize - but declarations cannot be internal. This
change avoids the problem by not internalizing these declarations.

Fixes PR38046.

Reviewers: pcc, tejohnson

Subscribers: mehdi_amini, aprantl, hiraditya, JDevlieghere, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49777

llvm-svn: 338100
2018-07-27 05:40:29 +00:00
Craig Topper e364baa88b [X86] Add matching for another pattern of PMADDWD.
Summary:
This is the pattern you get from the loop vectorizer for something like this

int16_t A[1024];
int16_t B[1024];
int32_t C[512];

void pmaddwd() {
  for (int i = 0; i != 512; ++i)
    C[i] = (A[2*i]*B[2*i]) + (A[2*i+1]*B[2*i+1]);
}

In this case we will have (add (mul (build_vector), (build_vector)), (mul (build_vector), (build_vector))). This is different than the pattern we currently match which has the build_vectors between an add and a single multiply. I'm not sure what C code would get you that pattern.

Reviewers: RKSimon, spatel, zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49636

llvm-svn: 338097
2018-07-27 04:29:10 +00:00
Chen Zheng 567485a72f [InstCombine] canonicalize abs pattern
Differential Revision: https://reviews.llvm.org/D48754

llvm-svn: 338092
2018-07-27 01:49:51 +00:00
Craig Topper f7bc550223 [X86] When removing sign extends from gather/scatter indices, make sure we handle UpdateNodeOperands finding an existing node to CSE with.
If this happens the operands aren't updated and the existing node is returned. Make sure we pass this existing node up to the DAG combiner so that a proper replacement happens. Otherwise we get stuck in an infinite loop with an unoptimized node.

llvm-svn: 338090
2018-07-27 00:00:30 +00:00
Craig Topper 1a40a06549 [SelectionDAGBuilder] Add masked loads to PendingLoads rather than calling DAG.setRoot.
Masked loads are calling DAG.getRoot rather than calling SelectionDAGBuilder::getRoot, which means the PendingLoads weren't emptied to update the root and create any needed TokenFactor. So it would be incorrect to call setRoot for the masked load.

This patch instead adds the masked load to PendingLoads so that the root doesn't get update until a store or scatter or something happens.. Alternatively, we could call SelectionDAGBuilder::getRoot before it, but that would create unnecessary serialization.

llvm-svn: 338085
2018-07-26 23:22:11 +00:00
Reid Kleckner c6cf918f44 [InstrProf] Use comdats on COFF for available_externally functions
Summary:
r262157 added ELF-specific logic to put a comdat on the __profc_*
globals created for available_externally functions. We should be able to
generalize that logic to all object file formats that support comdats,
i.e. everything other than MachO. This fixes duplicate symbol errors,
since on COFF, linkonce_odr doesn't make the symbol weak.

Fixes PR38251.

Reviewers: davidxl, xur

Subscribers: hiraditya, dmajor, llvm-commits, aheejin

Differential Revision: https://reviews.llvm.org/D49882

llvm-svn: 338082
2018-07-26 22:59:17 +00:00
Wolfgang Pieb 9ea65082ff [DWARF v5] Reposting r337981, which was reverted in r337997 due to a test failure in debuginfo_tests.
The test failure was caused by the compiler not emitting a __debug_ranges section with DWARF 4 and
earlier when no ranges are needed. The test checks for the existence regardless.

llvm-svn: 338081
2018-07-26 22:48:52 +00:00
Zachary Turner 23df1319ca [MS Demangler] Properly handle function parameter back-refs.
Properly demangle function parameter back-references.

Previously we treated lists of function parameters and template
parameters the same. There are some important differences with regards
to back-references, and some less important differences regarding which
characters can appear before or after the name.

The important differences are that with a given type T, all instances of
a function parameter list share the same global back-ref table.
Specifically, if X and Y are function pointers, then there are 3
entities in the declaration X func(Y) which all affect and are affected
by the master parameter back-ref table:
  1) The parameter list of X's function type
  2) the parameter list of func itself
  3) The parameter list of Y's function type.

The previous code would create a back-reference table that was local to
a single parameter list, so it would not be shared across parameter
lists.

This was discovered when porting ms-back-references.test from clang's
mangling tests. All of these tests should now pass with the new changes.

In doing so, I split the function for parsing template and function
parameters into two separate functions. This makes the template
parameter list parsing code in particular very small and easy to
understand now.

Differential Revision: https://reviews.llvm.org/D49875

llvm-svn: 338075
2018-07-26 22:13:39 +00:00
Keno Fischer 864fbd8e9a [SCEV] Don't expand Wrap predicate using inttoptr in ni addrspaces
Summary:
In non-integral address spaces, we're not allowed to introduce inttoptr/ptrtoint
intrinsics. Instead, we need to expand any pointer arithmetic as geps on the
base pointer. Luckily this is a common task for SCEV, so all we have to do here
is hook up the corresponding helper function and add test case.

Fixes PR38290

Reviewers: sanjoy
Differential Revision: https://reviews.llvm.org/D49832

llvm-svn: 338073
2018-07-26 21:55:06 +00:00
Vedant Kumar b572f64212 [DebugInfo] LowerDbgDeclare: Add derefs when handling CallInst users
LowerDbgDeclare inserts a dbg.value before each use of an address
described by a dbg.declare. When inserting a dbg.value before a CallInst
use, however, it fails to append DW_OP_deref to the DIExpression.

The DW_OP_deref is needed to reflect the fact that a dbg.value describes
a source variable directly (as opposed to a dbg.declare, which relies on
pointer indirection).

This patch adds in the DW_OP_deref where needed. This results in the
correct values being shown during a debug session for a program compiled
with ASan and optimizations (see https://reviews.llvm.org/D49520). Note
that ConvertDebugDeclareToDebugValue is already correct -- no changes
there were needed.

One complication is that SelectionDAG is unable to distinguish between
direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also
fixes this long-standing issue in order to not regress integration tests
relying on the incorrect assumption that all frame-index SDDbgValues are
indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot
be lowered properly otherwise. Basically the fix prevents a direct
SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice
by a debugger. There were a handful of tests relying on this incorrect
"FRAMEIX => indirect" assumption which actually had incorrect
DW_AT_locations: these are all fixed up in this patch.

Testing:

- check-llvm, and an end-to-end test using lldb to debug an optimized
  program.
- Existing unit tests for DIExpression::appendToStack fully cover the
  new DIExpression::append utility.
- check-debuginfo (the debug info integration tests)

Differential Revision: https://reviews.llvm.org/D49454

llvm-svn: 338069
2018-07-26 20:56:53 +00:00
Zachary Turner 024e1762aa [MS Demangler] Print calling convention inside parentheses.
For function pointers, we would print something like

int __cdecl (*)(int)

We need to move the calling convention inside, and print

int (__cdecl *)(int)

This patch implements this change for regular function pointers as
well as member function pointers.

llvm-svn: 338068
2018-07-26 20:33:48 +00:00
Zachary Turner ca7aef10c4 [MS Demangler] Add ms-arg-qualifiers.test
This converts the arg qualifier mangling tests from
clang/CodeGenCXX/mangle-ms-arg-qualifiers.cpp to demangling tests.
Most tests already pass, so this patch doesn't come with any
functional change, just the addition of new tests.  The few tests
that don't pass are left in with a FIXME label so that they don't
run but serve as documentation about what still doesn't work.

llvm-svn: 338067
2018-07-26 20:25:35 +00:00
Zachary Turner f4c4519532 Add missing tests from ms-mangle.cpp.
None of these tests pass yet so they are commented out, but I'm
adding them with a FIXME label so that they don't get lost when
copying tests over from clang's mangling tests.  Currently these
tests are all commented out.

llvm-svn: 338066
2018-07-26 20:20:29 +00:00
Zachary Turner 38b78a7f0e [MS Demangler] Demangle pointers to member functions.
After this patch, we can now properly demangle pointers to member
functions.  The calling convention is located in the wrong place,
but this will be fixed in a followup since it also affects non
member function pointers.

Differential Revision: https://reviews.llvm.org/D49639

llvm-svn: 338065
2018-07-26 20:20:10 +00:00
Martin Storsjo 390bce4322 [MC] Add support for the .rva assembler directive for COFF targets
Even though gas doesn't document it, it has been supported there for
a very long time.

This produces the 32 bit relative virtual address (aka image relative
address) for a given symbol. ".rva foo" is essentially equal to
".long foo@imgrel".

Differential Revision: https://reviews.llvm.org/D49821

llvm-svn: 338063
2018-07-26 20:11:26 +00:00
Stephen Hines e6e75bf84c Handle the lack of a symbol table correctly.
Summary:
These two cases will trigger a dereference on a nullptr, since the
SymbolTable can be nonexistent for a given library, in addition to just
being empty.

Reviewers: alexshap

Reviewed By: alexshap

Subscribers: meikeb, kongyi, chh, jakehehrlich, llvm-commits, pirama

Differential Revision: https://reviews.llvm.org/D49534

llvm-svn: 338062
2018-07-26 20:05:31 +00:00
Zachary Turner d742d645a1 [MS Demangler] Demangle data member pointers.
Differential Revision: https://reviews.llvm.org/D49630

llvm-svn: 338061
2018-07-26 19:56:09 +00:00
Scott Linder eb1f75d561 [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits
Scale the offset of VGPR spills by the wave size when it cannot fit in the
12-bit offset immediate field and so is added to the soffset SGPR. This
accounts for hardware swizzling of scratch memory.

Differential Revision: https://reviews.llvm.org/D49448

llvm-svn: 338060
2018-07-26 19:47:51 +00:00
Sanjay Patel 6d6eab66e0 [InstCombine] fold udiv with common factor from muls with nuw
Unfortunately, sdiv isn't as simple because of UB due to overflow.

This fold is mentioned in PR38239:
https://bugs.llvm.org/show_bug.cgi?id=38239

llvm-svn: 338059
2018-07-26 19:22:41 +00:00
Ana Pazos 2e4106b73d [RISCV] Add support for _interrupt attribute
- Save/restore only registers that are used.
This includes Callee saved registers and Caller saved registers
(arguments and temporaries) for integer and FP registers.
- If there is a call in the interrupt handler, save/restore all
Caller saved registers (arguments and temporaries) and all FP registers.
- Emit special return instructions depending on "interrupt"
attribute type.
Based on initial patch by Zhaoshi Zheng.

Reviewers: asb

Reviewed By: asb

Subscribers: rkruppe, the_o, MartinMosbeck, brucehoult, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, llvm-commits

Differential Revision: https://reviews.llvm.org/D48411

llvm-svn: 338047
2018-07-26 17:49:43 +00:00
Matthias Braun 09810c9269 MacroFusion: Fix macro fusion with ExitSU failing in top-down scheduling
When fusing instructions A and B, we must add all predecessors of B as
predecessors of A to avoid instructions getting scheduling in between.

There is a special case involving ExitSU: Every other node must be
scheduled before it by design and we don't need to make this explicit in
the graph, however when fusing with a different node we need to schedule
every othere node before the fused node too and we need to make this
explicit now: This patch adds a dependency from the fused node to all
roots in the graph.

Differential Revision: https://reviews.llvm.org/D49830

llvm-svn: 338046
2018-07-26 17:43:56 +00:00
Roman Lebedev 41ba5c1455 [DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle ule,ugt CondCodes.
Summary:
A follow-up for D49266 / rL337166.

At least one of these cases is more canonical,
so we really do have to handle it.
https://godbolt.org/g/pkzP3X
https://rise4fun.com/Alive/pQyhZZ

We won't get to these cases with I1 being -1,
as that will be constant-folded to true or false.

I'm also not sure we actually hit the 'ule' case,
but i think the worst think that could happen is that being dead code.

Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49497

llvm-svn: 338044
2018-07-26 17:34:28 +00:00
Alexey Bataev 4dd7558fab [DEBUGINFO, NVPTX] Emit correct debug information for local variables.
Summary:
NVPTX target dos not use register-based frame information. Instead it
relies on the artificial local_depot that is used instead of the frame
and the data for variables must be emitted relatively to this
local_depot.

Reviewers: tra, jlebar, echristo

Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D45963

llvm-svn: 338039
2018-07-26 16:29:52 +00:00
Sanjay Patel b381e7e59a [InstCombine] add tests for udiv with common factor; NFC
This fold is mentioned in PR38239:
https://bugs.llvm.org/show_bug.cgi?id=38239

The general case probably belongs in -reassociate, but given that we do 
basic reassociation optimizations similar to this in instcombine already, 
we might as well be consistent within instcombine and handle this pattern?

llvm-svn: 338038
2018-07-26 16:14:53 +00:00
Alexey Bataev 7ae86fe71c [DEBUGINFO, NVPTX] Set `DW_AT_frame_base` to `DW_OP_call_frame_cfa`.
Summary:
For NVPTX target the value of `DW_AT_frame_base` attribute must be set
to `DW_OP_call_frame_cfa`.

Reviewers: tra, jlebar, echristo

Subscribers: jholewinski, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D45785

llvm-svn: 338036
2018-07-26 16:10:05 +00:00
Jonas Devlieghere 640e790af2 [test] Disable dsymutil update test on windows
Apparently, the issue with dsymutil update functionality on Windows was
that Windows doesn't like dsymutil renaming files that have open handles
to them. This disables the new accelerator test and updates the comment
in the other two test.

We should be able to enable the tests again once we updated the
implementation to use TempFile::keep() to keep the temporary files in
MachOUtils.

A big thank you to Jeremy Morse from Sony for figuring this out and
bringing it to my attention.

llvm-svn: 338030
2018-07-26 14:16:19 +00:00
Luke Cheeseman 66b5e7da4c Enable some pointer authentication instructions for aarch64 v8a targets
- Some of the v8.3 pointer authentication instruction inhabit the Hint space
- These instructions can be assembled to hint instructions which act as NOP instructions prior to v8.3
- This patch permits using the hint instructions for all v8a targets
- Also, correct the RETA{A,B} instructions to match the instruction attributes of RET (set isTerminator and isBarrier)

Differential Revision: https://reviews.llvm.org/D49786

llvm-svn: 338029
2018-07-26 14:00:50 +00:00
Stefan Maksimovic 4a612d4bf2 [mips] Sign extend i32 return values on MIPS64
Override getTypeForExtReturn so that functions returning
an i32 typed value have it sign extended on MIPS64.

Also provide patterns to get rid of unneeded sign extensions
for arithmetic instructions which implicitly sign extend
their results.

Differential Revision: https://reviews.llvm.org/D48374

llvm-svn: 338019
2018-07-26 10:59:35 +00:00
Martin Storsjo 9dafd6f6d9 Revert "[COFF] Use comdat shared constants for MinGW as well"
This reverts commit r337951.

While that kind of shared constant generally works fine in a MinGW
setting, it broke some cases of inline assembly that worked before:

$ cat const-asm.c
int MULH(int a, int b) {
    int rt, dummy;
    __asm__ (
        "imull %3"
        :"=d"(rt), "=a"(dummy)
        :"a"(a), "rm"(b)
    );
    return rt;
}
int func(int a) {
    return MULH(a, 1);
}
$ clang -target x86_64-win32-gnu -c const-asm.c -O2
const-asm.c:4:9: error: invalid variant '00000001'
        "imull %3"
        ^
<inline asm>:1:15: note: instantiated into assembly here
        imull __real@00000001(%rip)
                     ^

A similar error is produced for i686 as well. The same test with a
target of x86_64-win32-msvc or i686-win32-msvc works fine.

llvm-svn: 338018
2018-07-26 10:48:20 +00:00
Jonas Devlieghere f290256dfb [test] Do dsymutil update in place
Update the dSYM bundle in place when swapping out the accelerator
tables. This should unbreak the windows bot that have been failing with
an access denied.

llvm-svn: 338014
2018-07-26 09:23:10 +00:00
Sjoerd Meijer 31d38586e7 [AArch64][NFC] Removed tab characters from test files.
llvm-svn: 338011
2018-07-26 07:59:39 +00:00
Sjoerd Meijer dc198344ce [AArch64] Armv8.2-A: add the crypto extensions
This adds MC support for the crypto instructions that were made optional
extensions in Armv8.2-A (AArch64 only).

Differential Revision: https://reviews.llvm.org/D49370

llvm-svn: 338010
2018-07-26 07:13:59 +00:00
Fangrui Song f2822e2d9d [ConstProp] Fix calls-math-finite.ll on FreeBSD
FreeBSD's log(3.0) is less precise than glibc and musl.
Let's forgive its rounding error of more than half an ulp.

llvm-svn: 338009
2018-07-26 06:24:11 +00:00
Fangrui Song c32561ea8b [AsmParser] Fix preserve-comments-crlf.s on FreeBSD
--strip-trailing-cr is a diffutils option which is also available on
BSD-licensed diff introduced in FreeBSD 11.2, however, it has a bug
comparing files mixing \r and \r\n. Use -b (POSIX) instead.

llvm-svn: 338008
2018-07-26 06:07:03 +00:00
Craig Topper 4e687d5bb2 [X86] Don't use CombineTo to skip adding new nodes to the DAGCombiner worklist in combineMul.
I'm not sure if this was trying to avoid optimizing the new nodes further or what. Or maybe to prevent a cycle if something tried to reform the multiply? But I don't think its a reliable way to do that. If the user of the expanded multiply is visited by the DAGCombiner after this conversion happens, the DAGCombiner will check its operands, see that they haven't been visited by the DAGCombiner before and it will then add the first node to the worklist. This process will repeat until all the new nodes are visited.

So this seems like an unreliable prevention at best. So this patch just returns the new nodes like any other combine. If this starts causing problems we can try to add target specific nodes or something to more directly prevent optimizations.

Now that we handle the combine normally, we can combine any negates the mul expansion creates into their users since those will be visited now.

llvm-svn: 338007
2018-07-26 05:40:10 +00:00
Alex Lorenz 7d808c19ff Revert r337981: it breaks the debuginfo-tests
This commit caused a regression in the debuginfo-tests:

FAIL: debuginfo-tests :: apple-accel.cpp (40748 of 46595)
llvm-svn: 337997
2018-07-26 03:21:40 +00:00
Amara Emerson fdd089aa14 [GlobalISel] Fall back to SDISel for swifterror/swiftself attributes.
We don't currently support these, fall back until we do.

llvm-svn: 337994
2018-07-26 01:25:58 +00:00
Wolfgang Pieb 1d56b4ae40 [DWARF v5] Don't report an error when the .debug_rnglists section is empty or non-existent. Fixes PR38297.
Reviewer: JDevlieghere

Differential Revision: https://reviews.llvm.org/D49815
 

llvm-svn: 337993
2018-07-26 01:12:41 +00:00
Wolfgang Pieb c42087df7c [DWARF v5] Don't emit multiple DW_AT_rnglists_base attributes. Some refactoring of
range lists emissions and added test cases.

Reviewer: dblaikie

Differential Revision: https://reviews.llvm.org/D49522

llvm-svn: 337981
2018-07-25 23:03:22 +00:00
Jonas Devlieghere 743d351120 [dsymutil] Add support for generating DWARF5 accelerator tables.
This patch add support for emitting DWARF5 accelerator tables
(.debug_names) from dsymutil. Just as with the Apple style accelerator
tables, it's possible to update existing dSYMs. This patch includes a
test that show how you can convert back and forth between the two types.

If no kind of table is specified, dsymutil will default to generating
Apple-style accelerator tables whenever it finds those in its input. The
same is true when there are no accelerator tables at all. Finally, in
the remaining case, where there's at least one DWARF v5 table and no
Apple ones, the output will contains a DWARF accelerator tables
(.debug_names).

Differential revision: https://reviews.llvm.org/D49137

llvm-svn: 337980
2018-07-25 23:01:38 +00:00
Yonghong Song 71d81e5c8f bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in order
Some BPF JIT backends would want to optimize memcpy in their own
architecture specific way.

However, at the moment, there is no way for JIT backends to see memcpy
semantics in a reliable way. This is due to LLVM BPF backend is expanding
memcpy into load/store sequences and could possibly schedule them apart from
each other further. So, BPF JIT backends inside kernel can't reliably
recognize memcpy semantics by peephole BPF sequence.

This patch introduce new intrinsic expand infrastructure to memcpy.

To get stable in-order load/store sequence from memcpy, we first lower
memcpy into BPF::MEMCPY node which then expanded into in-order load/store
sequences in expandPostRAPseudo pass which will happen after instruction
scheduling. By this way, kernel JIT backends could reliably recognize
memcpy through scanning BPF sequence.

This new memcpy expand infrastructure is gated by a new option:

  -bpf-expand-memcpy-in-order

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 337977
2018-07-25 22:40:02 +00:00
Eli Friedman d6baff65f7 [GlobalMerge] Handle llvm.compiler.used correctly.
Reuse the handling for llvm.used, and don't transform such globals.

Fixes a failure on the asan buildbot caused by my previous commit.

llvm-svn: 337973
2018-07-25 22:03:35 +00:00
Sanjay Patel 215dcbf4db [SelectionDAG] try to convert funnel shift directly to rotate if legal
If the DAGCombiner's rotate matching was working as expected, 
I don't think we'd see any test diffs here. 

This sidesteps the issue of custom lowering for rotates raised in PR38243:
https://bugs.llvm.org/show_bug.cgi?id=38243
...by only dealing with legal operations.

llvm-svn: 337966
2018-07-25 21:38:30 +00:00
Roman Tereshin 4f10a9d3a3 [LSV] Look through selects for consecutive addresses
In some cases LSV sees (load/store _ (select _ <pointer expression>
<pointer expression>)) patterns in input IR, often due to sinking and
other forms of CFG simplification, sometimes interspersed with
bitcasts and all-constant-indices GEPs. With this
patch`areConsecutivePointers` method would attempt to handle select
instructions. This leads to an increased number of successful
vectorizations.

Technically, select instructions could appear in index arithmetic as
well, however, we don't see those in our test suites / benchmarks.
Also, there is a lot more freedom in IR shapes computing integral
indices in general than in what's common in pointer computations, and
it appears that it's quite unreliable to do anything short of making
select instructions first class citizens of Scalar Evolution, which
for the purposes of this patch is most definitely an overkill.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D49428

llvm-svn: 337965
2018-07-25 21:33:00 +00:00
Sanjay Patel f94c4c84e6 [AArch, PowerPC] add more tests for legal rotate ops; NFC
llvm-svn: 337964
2018-07-25 21:25:50 +00:00
Eli Friedman 0887cf9cab [GlobalMerge] Allow merging globals with arbitrary alignment.
Instead of depending on implicit padding from the structure layout code,
use a packed struct and emit the padding explicitly.

Differential Revision: https://reviews.llvm.org/D49710

llvm-svn: 337961
2018-07-25 20:58:01 +00:00
Florian Hahn b6613ac665 Revert r337904: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
I suspect it is causing the clang-stage2-Rthinlto failures.

llvm-svn: 337956
2018-07-25 19:44:19 +00:00
Martin Storsjo ff33a95ed4 [COFF] Use comdat shared constants for MinGW as well
GNU binutils tools have no problems with this kind of shared constants,
provided that we actually hook it up completely in AsmPrinter and
produce a global symbol.

This effectively reverts SVN r335918 by hooking the rest of it up
properly.

This feature was implemented originally in SVN r213006, with no reason
for why it can't be used for MinGW other than the fact that GCC doesn't
do it while MSVC does.

Differential Revision: https://reviews.llvm.org/D49646

llvm-svn: 337951
2018-07-25 18:35:42 +00:00
Martin Storsjo d2662c32fb [COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.

With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:

fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4

Differential Revision: https://reviews.llvm.org/D49644

llvm-svn: 337950
2018-07-25 18:35:31 +00:00
Eli Friedman 733f4ed1bb [ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.
Saves materializing the immediate for the "ands".

Corresponding patterns exist for lsrs+lsls, but that seems less common
in practice.

Now implemented as a DAGCombine.

Differential Revision: https://reviews.llvm.org/D49585

llvm-svn: 337945
2018-07-25 18:22:22 +00:00
Roman Tereshin ed047b0184 [SCEV] Add [zs]ext{C,+,x} -> (D + [zs]ext{C-D,+,x})<nuw><nsw> transform
as well as sext(C + x + ...) -> (D + sext(C-D + x + ...))<nuw><nsw>
similar to the equivalent transformation for zext's

if the top level addition in (D + (C-D + x * n)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x * n), ensuring homogeneous behaviour of the
transformation and better canonicalization of such AddRec's

(indeed, there are 2^(2w) different expressions in `B1 + ext(B2 + Y)` form for
the same Y, but only 2^(2w - k) different expressions in the resulting `B3 +
ext((B4 * 2^k) + Y)` form, where w is the bit width of the integral type)

This patch generalizes sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformations added in
r209568 relaxing the requirements the following way:

1. C2 doesn't have to be a power of 2, it's enough if it's divisible by 2
 a sufficient number of times;
2. C1 doesn't have to be less than C2, instead of extracting the entire
  C1 we can split it into 2 terms: (00...0XXX + YY...Y000), keep the
  second one that may cause wrapping within the extension operator, and
  move the first one that doesn't affect wrapping out of the extension
  operator, enabling further simplifications;
3. C1 and C2 don't have to be positive, splitting C1 like shown above
 produces a sum that is guaranteed to not wrap, signed or unsigned;
4. in AddExpr case there could be more than 2 terms, and in case of
  AddExpr the 2nd and following terms and in case of AddRecExpr the
  Step component don't have to be in the C2*X form or constant
  (respectively), they just need to have enough trailing zeros,
  which in turn could be guaranteed by means other than arithmetics,
  e.g. by a pointer alignment;
5. the extension operator doesn't have to be a sext, the same
  transformation works and profitable for zext's as well.

Apparently, optimizations like SLPVectorizer currently fail to
vectorize even rather trivial cases like the following:

 double bar(double *a, unsigned n) {
   double x = 0.0;
   double y = 0.0;
   for (unsigned i = 0; i < n; i += 2) {
     x += a[i];
     y += a[i + 1];
   }
   return x * y;
 }

If compiled with `clang -std=c11 -Wpedantic -Wall -O3 main.c -S -o - -emit-llvm`
(!{!"clang version 7.0.0 (trunk 337339) (llvm/trunk 337344)"})

it produces scalar code with the loop not unrolled with the unsigned `n` and
`i` (like shown above), but vectorized and unrolled loop with signed `n` and
`i`. With the changes made in this commit the unsigned version will be
vectorized (though not unrolled for unclear reasons).

How it all works:

Let say we have an AddExpr that looks like (C + x + y + ...), where C
is a constant and x, y, ... are arbitrary SCEVs. Let's compute the
minimum number of trailing zeroes guaranteed of that sum w/o the
constant term: (x + y + ...). If, for example, those terms look like
follows:

        i
XXXX...X000
YYYY...YY00
   ...
ZZZZ...0000

then the rightmost non-guaranteed-zero bit (a potential one at i-th
position above) can change the bits of the sum to the left (and at
i-th position itself), but it can not possibly change the bits to the
right. So we can compute the number of trailing zeroes by taking a
minimum between the numbers of trailing zeroes of the terms.

Now let's say that our original sum with the constant is effectively
just C + X, where X = x + y + .... Let's also say that we've got 2
guaranteed trailing zeros for X:

         j
CCCC...CCCC
XXXX...XX00  // this is X = (x + y + ...)

Any bit of C to the left of j may in the end cause the C + X sum to
wrap, but the rightmost 2 bits of C (at positions j and j - 1) do not
affect wrapping in any way. If the upper bits cause a wrap, it will be
a wrap regardless of the values of the 2 least significant bits of C.
If the upper bits do not cause a wrap, it won't be a wrap regardless
of the values of the 2 bits on the right (again).

So let's split C to 2 constants like follows:

0000...00CC  = D
CCCC...CC00  = (C - D)

and represent the whole sum as D + (C - D + X). The second term of
this new sum looks like this:

CCCC...CC00
XXXX...XX00
-----------  // let's add them up
YYYY...YY00

The sum above (let's call it Y)) may or may not wrap, we don't know,
so we need to keep it under a sext/zext. Adding D to that sum though
will never wrap, signed or unsigned, if performed on the original bit
width or the extended one, because all that that final add does is
setting the 2 least significant bits of Y to the bits of D:

YYYY...YY00 = Y
0000...00CC = D
-----------  <nuw><nsw>
YYYY...YYCC

Which means we can safely move that D out of the sext or zext and
claim that the top-level sum neither sign wraps nor unsigned wraps.

Let's run an example, let's say we're working in i8's and the original
expression (zext's or sext's operand) is 21 + 12x + 8y. So it goes
like this:

0001 0101  // 21
XXXX XX00  // 12x
YYYY Y000  // 8y

0001 0101  // 21
ZZZZ ZZ00  // 12x + 8y

0000 0001  // D
0001 0100  // 21 - D = 20
ZZZZ ZZ00  // 12x + 8y

0000 0001  // D
WWWW WW00  // 21 - D + 12x + 8y = 20 + 12x + 8y

therefore zext(21 + 12x + 8y) = (1 + zext(20 + 12x + 8y))<nuw><nsw>

This approach could be improved if we move away from using trailing
zeroes and use KnownBits instead. For instance, with KnownBits we could
have the following picture:

    i
10 1110...0011  // this is C
XX X1XX...XX00  // this is X = (x + y + ...)

Notice that some of the bits of X are known ones, also notice that
known bits of X are interspersed with unknown bits and not grouped on
the rigth or left.

We can see at the position i that C(i) and X(i) are both known ones,
therefore the (i + 1)th carry bit is guaranteed to be 1 regardless of
the bits of C to the right of i. For instance, the C(i - 1) bit only
affects the bits of the sum at positions i - 1 and i, and does not
influence if the sum is going to wrap or not. Therefore we could split
the constant C the following way:

    i
00 0010...0011  = D
10 1100...0000  = (C - D)

Let's compute the KnownBits of (C - D) + X:

XX1 1            = carry bit, blanks stand for known zeroes
 10 1100...0000  = (C - D)
 XX X1XX...XX00  = X
--- -----------
 XX X0XX...XX00

Will this add wrap or not essentially depends on bits of X. Adding D
to this sum, however, is guaranteed to not to wrap:

0    X
 00 0010...0011  = D
 sX X0XX...XX00  = (C - D) + X
--- -----------
 sX XXXX   XX11

As could be seen above, adding D preserves the sign bit of (C - D) +
X, if any, and has a guaranteed 0 carry out, as expected.

The more bits of (C - D) we constrain, the better the transformations
introduced here canonicalize expressions as it leaves less freedom to
what values the constant part of ((C - D) + x + y + ...) can take.

Reviewed By: mzolotukhin, efriedma

Differential Revision: https://reviews.llvm.org/D48853

llvm-svn: 337943
2018-07-25 18:01:41 +00:00
Ulrich Weigand 5f75371c5d Fix corruption of result number in LegalizeVectorOps.cpp
When VectorLegalizer::LegalizeOp creates a new SDValue after iterating
over its arguments, we need to refer to the same result number of the
new node that the original value used.

Reviewed by: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D49805

llvm-svn: 337939
2018-07-25 17:08:13 +00:00
Stanislav Mekhanoshin 7e7268ac1c [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion
Differential Revision: https://reviews.llvm.org/D49761

llvm-svn: 337938
2018-07-25 17:02:11 +00:00
Stanislav Mekhanoshin b8269a9589 Fix llvm::ComputeNumSignBits with some operations and llvm.assume
Currently ComputeNumSignBits does early exit while processing some
of the operations (add, sub, mul, and select). This prevents the
function from using AssumptionCacheTracker if passed.

Differential Revision: https://reviews.llvm.org/D49759

llvm-svn: 337936
2018-07-25 16:39:24 +00:00
Krzysztof Parzyszek 4e07509d18 [Hexagon] Properly scale bit index when extracting elements from vNi1
For example v = <2 x i1> is represented as bbbbaaaa in a predicate register,
where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4
from the predicate register.

llvm-svn: 337934
2018-07-25 16:20:59 +00:00
Petar Jovanovic 58c0210023 [MIPS GlobalISel] Lower pointer arguments
Add support for lowering pointer arguments.
Changing type from pointer to integer is already done in
MipsTargetLowering::getRegisterTypeForCallingConv.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D49419

llvm-svn: 337912
2018-07-25 12:35:01 +00:00
Florian Hahn 6f5c6adbcd Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
r337828 resolves a PredicateInfo issue with unnamed types.

Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

llvm-svn: 337904
2018-07-25 11:13:40 +00:00
Thomas Preud'homme 768d6ce4a3 Fix PR34170: Crash on inline asm with 64bit output in 32bit GPR
Add support for inline assembly with output operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR as in the PR).

llvm-svn: 337903
2018-07-25 11:11:12 +00:00
Paul Semel 0913dcd747 [llvm-objdump] Add dynamic section printing to private-headers option
Differential Revision: https://reviews.llvm.org/D49016

llvm-svn: 337902
2018-07-25 11:09:20 +00:00
Paul Semel 5ce8f1598c [llvm-readobj] Generic hex-dump option
Helpers are available to make this option file format independant. This
patch adds the feature for Wasm file format. It doesn't change the
behavior of the other file format handling.

Differential Revision: https://reviews.llvm.org/D49545

llvm-svn: 337896
2018-07-25 10:04:37 +00:00
Simon Atanasyan b524459288 [mips] Replace custom parsing logic for data directives by the `addAliasForDirective`
The target independent AsmParser doesn't recognise .hword, .word, .dword
which are required for Mips. Currently MipsAsmParser recognises these
through dispatch to MipsAsmParser::parseDataDirective. This contains
equivalent logic to AsmParser::parseDirectiveValue. This patch allows
reuse of AsmParser::parseDirectiveValue by making use of
addAliasForDirective to support .hword, .word and .dword.

Original patch provided by Alex Bradbury at D47001 was modified to fix
handling of microMIPS symbols. The `AsmParser::parseDirectiveValue`
calls either `EmitIntValue` or `EmitValue`. In this patch we override
`EmitIntValue` in the `MipsELFStreamer` to clear a pending set of
microMIPS symbols.

Differential revision: https://reviews.llvm.org/D49539

llvm-svn: 337893
2018-07-25 07:07:43 +00:00
Craig Topper d9fa8147c4 [X86] Autogenerate complete checks and fix a failure introduced in r337875.
llvm-svn: 337889
2018-07-25 05:22:13 +00:00
Chandler Carruth 7024921c0a [x86/SLH] Teach the x86 speculative load hardening pass to harden
against v1.2 BCBS attacks directly.

Attacks using spectre v1.2 (a subset of BCBS) are described in the paper
here:
https://people.csail.mit.edu/vlk/spectre11.pdf

The core idea is to speculatively store over the address in a vtable,
jumptable, or other target of indirect control flow that will be
subsequently loaded. Speculative execution after such a store can
forward the stored value to subsequent loads, and if called or jumped
to, the speculative execution will be steered to this potentially
attacker controlled address.

Up until now, this could be mitigated by enableing retpolines. However,
that is a relatively expensive technique to mitigate this particular
flavor. Especially because in most cases SLH will have already mitigated
this. To fully mitigate this with SLH, we need to do two core things:
1) Unfold loads from calls and jumps, allowing the loads to be post-load
   hardened.
2) Force hardening of incoming registers even if we didn't end up
   needing to harden the load itself.

The reason we need to do these two things is because hardening calls and
jumps from this particular variant is importantly different from
hardening against leak of secret data. Because the "bad" data here isn't
a secret, but in fact speculatively stored by the attacker, it may be
loaded from any address, regardless of whether it is read-only memory,
mapped memory, or a "hardened" address. The only 100% effective way to
harden these instructions is to harden the their operand itself. But to
the extent possible, we'd like to take advantage of all the other
hardening going on, we just need a fallback in case none of that
happened to cover the particular input to the control transfer
instruction.

For users of SLH, currently they are paing 2% to 6% performance overhead
for retpolines, but this mechanism is expected to be substantially
cheaper. However, it is worth reminding folks that this does not
mitigate all of the things retpolines do -- most notably, variant #2 is
not in *any way* mitigated by this technique. So users of SLH may still
want to enable retpolines, and the implementation is carefuly designed to
gracefully leverage retpolines to avoid the need for further hardening
here when they are enabled.

Differential Revision: https://reviews.llvm.org/D49663

llvm-svn: 337878
2018-07-25 01:51:29 +00:00
Craig Topper fc501a9223 [X86] Use a shift plus an lea for multiplying by a constant that is a power of 2 plus 2/4/8.
The LEA allows us to combine an add and the multiply by 2/4/8 together so we just need a shift for the larger power of 2.

llvm-svn: 337875
2018-07-25 01:15:38 +00:00
Craig Topper 5be253d988 [X86] Expand mul by pow2 + 2 using a shift and two adds similar to what we do for pow2 - 2.
llvm-svn: 337874
2018-07-25 01:15:35 +00:00
Craig Topper 56c104f104 [X86] Use a two lea sequence for multiply by 37, 41, and 73.
These fit a pattern used by 11, 21, and 19.

llvm-svn: 337871
2018-07-24 23:44:17 +00:00
Craig Topper b5342b592e [X86] Add test cases for multiply by 37, 41, and 73.
These can all be handled with 2 LEAs similar to what we do for 11, 19, 21.

llvm-svn: 337870
2018-07-24 23:44:15 +00:00
Craig Topper f8fcee70a3 [X86] Change multiply by 26 to use two multiplies by 5 and an add instead of multiply by 3 and 9 and a subtract.
Same number of operations, but ending in an add is friendlier due to it being commutable.

llvm-svn: 337869
2018-07-24 23:44:12 +00:00
Hideki Saito ef380b0fc5 [LV] Fix for PR38110, LV encountered llvm_unreachable()
Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash).

Reviewers: sbaranga, mssimpso, hsaito

Reviewed By: hsaito

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49461

llvm-svn: 337861
2018-07-24 22:30:31 +00:00
Roman Tereshin 1ba1f9310c [SCEV] Add zext(C + x + ...) -> D + zext(C-D + x + ...)<nuw><nsw> transform
if the top level addition in (D + (C-D + x + ...)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x + ...), ensuring homogeneous behaviour of the
transformation and better canonicalization of such expressions.

This enables better canonicalization of expressions like

  1 + zext(5 + 20 * %x + 24 * %y)  and
      zext(6 + 20 * %x + 24 * %y)

which get both transformed to

  2 + zext(4 + 20 * %x + 24 * %y)

This pattern is common in address arithmetics and the transformation
makes it easier for passes like LoadStoreVectorizer to prove that 2 or
more memory accesses are consecutive and optimize (vectorize) them.

Reviewed By: mzolotukhin

Differential Revision: https://reviews.llvm.org/D48853

llvm-svn: 337859
2018-07-24 21:48:56 +00:00
Craig Topper 5ddc0a2b14 [X86] When expanding a multiply by a negative of one less than a power of 2, like 31, don't generate a negate of a subtract that we'll never optimize.
We generated a subtract for the power of 2 minus one then negated the result. The negate can be optimized away by swapping the subtract operands, but DAG combine doesn't know how to do that and we don't add any of the new nodes to the worklist anyway.

This patch makes use explicitly emit the swapped subtract.

llvm-svn: 337858
2018-07-24 21:31:21 +00:00
Craig Topper 6d29891bef [X86] Generalize the multiply by 30 lowering to generic multipy by power 2 minus 2.
Use a left shift and 2 subtracts like we do for 30. Move this out from behind the slow lea check since it doesn't even use an LEA.

Use this for multiply by 14 as well.

llvm-svn: 337856
2018-07-24 21:15:41 +00:00
Heejin Ahn 8daef0751d [WebAssembly] Add tests for weaker memory consistency orderings
Summary:
Currently all wasm atomic memory access instructions are sequentially
consistent, so even if LLVM IR specifies weaker orderings than that, we
should upgrade them to sequential ordering and treat them in the same
way.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49194

llvm-svn: 337854
2018-07-24 21:06:44 +00:00
Craig Topper 86d6320b94 [X86] Change multiply by 19 to use (9 * X) * 2 + X instead of (5 * X) * 4 - 1.
The new lowering can be done in 2 LEAs. The old code took 1 LEA, 1 shift, and 1 sub.

llvm-svn: 337851
2018-07-24 20:31:48 +00:00
Craig Topper 1296c622df [X86] Add test case to show failure to combine away negates that may be created by mul by constant expansion.
Mul by constant can expand to a sequence that ends with a negate. If the next instruction is an add or sub we might be able to fold the negate away.

We currently fail to do this because we explicitly don't add anything to the DAG combine worklist when we expand multiplies. This is primarily to keep the multipy from being reformed, but we should consider adding the users to worklist.

llvm-svn: 337843
2018-07-24 18:36:46 +00:00
Joel Galenson 8dbcc58917 Use SCEV to avoid inserting some bounds checks.
This patch uses SCEV to avoid inserting some bounds checks when they are not needed.  This slightly improves the performance of code compiled with the bounds check sanitizer.

Differential Revision: https://reviews.llvm.org/D49602

llvm-svn: 337830
2018-07-24 15:21:54 +00:00
Florian Hahn 36d2e25d5a [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types.
This is a workaround and it would be better to fix this generally, but
doing it generally is quite tricky. See D48541 and PR38117.

Doing it in PredicateInfo directly allows us to use the type address to
differentiate different unnamed types, because neither the created
declarations nor the ssa_copy calls should be visible after
PredicateInfo got destroyed.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D49126

llvm-svn: 337828
2018-07-24 14:49:52 +00:00
Simon Atanasyan 28ded4ee19 [mips] Fix local dynamic TLS with Sym64
For the final DTPREL addition, rather than a lui/daddiu/daddu triple,
LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi
as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64.
Instead, use a new TlsHi node and, although unnecessary due to the exact
structure of the nodes emitted, use TlsHi for local exec too to prevent
future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes,
and TprelHi since its functionality is provided by the new common TlsHi node.

Patch by James Clarke.

Differential revision: https://reviews.llvm.org/D49259

llvm-svn: 337827
2018-07-24 13:47:52 +00:00
Sam Parker 8b93e82c3d [ARM] Disable ARMCodeGenPrepare by default
ARM Stage 2 builders have been suspiciously broken since the pass was
committed. Disabling to hopefully fix the bots and give me time to
debug.

llvm-svn: 337821
2018-07-24 12:04:23 +00:00
Shiva Chen f5938bfbf9 Revert "[DebugInfo] Generate DWARF debug information for labels."
This reverts commit b454fa1b4079b6c0a5b1565982d16516385838d7.

llvm-svn: 337812
2018-07-24 06:17:45 +00:00
Chandler Carruth a25aca21af [x86] Clean up and convert test to use generated CHECK lines.
This test was already checking microscopic behavior of tail call under
specific conditions. This just makes the CHECK lines much more
consistent, clear, and easily updated when intentional changes are made.

I've also switched the test to consistently name the entry block and to
order the helper declarations and comments for specific tests in the
more usual locations.

llvm-svn: 337806
2018-07-24 03:18:08 +00:00
Chandler Carruth d41dca2ddc [x86] Update the CHECK lines of this test to use the latest patterns
from the script. This minimizes the diff in subsequent changes.

llvm-svn: 337805
2018-07-24 03:07:07 +00:00
Shiva Chen d6b2cdf9d4 [DebugInfo] Generate DWARF debug information for labels.
There are two forms for label debug information in DWARF format.

1. Labels in a non-inlined function:

DW_TAG_label
  DW_AT_name
  DW_AT_decl_file
  DW_AT_decl_line
  DW_AT_low_pc

2. Labels in an inlined function:

DW_TAG_label
  DW_AT_abstract_origin
  DW_AT_low_pc

We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.

The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.

We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.

Differential Revision: https://reviews.llvm.org/D45556

Patch by Hsiangkai Wang.

llvm-svn: 337799
2018-07-24 02:22:55 +00:00
Tom Stellard b7f19e6d1e AMDGPU/GlobalISel: Legalize G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49601

llvm-svn: 337798
2018-07-24 02:19:20 +00:00
Vedant Kumar d6ff43cc71 [Debugify] Export per-pass debug info loss statistics
Add a -debugify-export option to opt. This exports per-pass `debugify`
loss statistics to a file in CSV format.

For some interesting numbers on debug value loss during an -O2 build
of the sqlite3 amalgamation, see the review thread.

Differential Revision: https://reviews.llvm.org/D49003

llvm-svn: 337787
2018-07-24 00:41:29 +00:00
Thomas Anderson 8e8a652c2f Fix typo in test/CodeGen/Mips/dins.ll
Differential Revision: https://reviews.llvm.org/D49704

llvm-svn: 337771
2018-07-23 23:19:53 +00:00
Wolfgang Pieb 439801ba1d [DWARF v5] Refactor range lists dumping by using a more generic way of handling tables of lists.
The intent is to use it for location list tables as well. Change is almost NFC with the exception
of the spelling of some strings used during dumping (all lowercase now).

Reviewer: JDevlieghere

Differential Revision: https://reviews.llvm.org/D49500

llvm-svn: 337763
2018-07-23 22:37:17 +00:00
Teresa Johnson b963c0b658 [LTO] Handle __imp_ (dllimport) symbols consistently with lld
Summary:
Similar to what lld already does for dllimport symbols which are
prefaced with __imp_ (see lld patch r240620), strip off the __imp_
prefix in LTO. Otherwise we can get 2 separate GlobalResolution for
a single symbol, the dllimport declaration, and the definition, which
leads to incorrect LTO handling.

Fixes PR38105.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49138

llvm-svn: 337762
2018-07-23 22:33:57 +00:00
Martin Storsjo 100fc97051 [COFF] Fix assembly output of comdat sections without an attached symbol
Since SVN r335286, the .xdata sections are produced without an attached
symbol, which requires using a different syntax when printing assembly
output.

Instead of the usual syntax of '.section <name>,"dr",discard,<symbol>',
use '.section <name>,"dr"' + '.linkonce discard' (which is what GCC
uses for all assembly output).

This fixes PR38254.

Differential Revision: https://reviews.llvm.org/D49651

llvm-svn: 337756
2018-07-23 22:15:19 +00:00
Martin Storsjo c2b701408e [AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes
This matches the structure used on X86 and ARM. This requires
a little bit of duplication of the parts that are equal in both
AArch64 COFF variants though.

Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF
didn't, but now they do.

This makes AArch64 match X86 in how comdat is used for float constants
for MinGW.

Differential Revision: https://reviews.llvm.org/D49637

llvm-svn: 337755
2018-07-23 22:15:14 +00:00
Teresa Johnson e214fdeb69 [ThinLTO] Ensure the TargetLibraryInfo is constructed early enough
Summary:
Without this change, the WholeProgramDevirt pass, which requires the
TargetLibraryInfo, will construct one from the default triple.

Fixes PR38139.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49278

llvm-svn: 337750
2018-07-23 21:58:19 +00:00
Manoj Gupta f9f50f634d ConstantFolding: Avoid a crash.
Summary:
Check if the parent basic block and caller exists
before calling CS.getCaller when constant folding
strip.invariant.group instrinsic.

This avoids a crash when the function containing the intrinsic
is being inlined. The instruction is checked for any simplifiction
but has not yet been added to a basic block.

Reviewers: Prazek, rsmith, efriedma

Reviewed By: efriedma

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D49690

llvm-svn: 337742
2018-07-23 21:20:00 +00:00
Reid Kleckner 980c4df037 Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"
Don't try to generate large PIC code for non-ELF targets. Neither COFF
nor MachO have relocations for large position independent code, and
users have been using "large PIC" code models to JIT 64-bit code for a
while now. With this change, if they are generating ELF code, their
JITed code will truly be PIC, but if they target MachO or COFF, it will
contain 64-bit immediates that directly reference external symbols. For
a JIT, that's perfectly fine.

llvm-svn: 337740
2018-07-23 21:14:35 +00:00
Nirav Dave 5af81d5bfa Add inline asm aliasing test.
llvm-svn: 337734
2018-07-23 20:19:10 +00:00
Paul Semel 1dbbfba888 [yaml2obj] Add default sh_entsize for dynamic sections
Dynamic section holds a table, so the sh_entsize might be set. As the
dynamic section entry size never changes, we can default it to the size
of a dynamic entry.

Differential Revision: https://reviews.llvm.org/D49619

llvm-svn: 337725
2018-07-23 18:49:04 +00:00
Krzysztof Parzyszek 9500a24fce [Hexagon] Handle unnamed globals in HexagonConstExpr
Instead of comparing names, compare positions in the parent module.

llvm-svn: 337723
2018-07-23 18:30:17 +00:00
Simon Atanasyan 307e5b31ce [mips] Add more checks to the tls.ll test case. NFC
llvm-svn: 337705
2018-07-23 16:05:44 +00:00
Cameron McInally 2c9bcffc92 [FPEnv] Legalize double width StrictFP vector operations
Differential Revision: https://reviews.llvm.org/D48809

llvm-svn: 337698
2018-07-23 14:40:17 +00:00
Sam Parker 3828c6ff94 [ARM] ARMCodeGenPrepare backend pass
Arm specific codegen prepare is implemented to perform type promotion
on icmp operands, which can enable the removal of uxtb and uxth
(unsigned extend) instructions. This is possible because performing
type promotion before ISel alleviates this duty from the DAG builder
which has to perform legalisation, but has a limited view on data
ranges.
    
The pass visits any instruction operand of an icmp and creates a
worklist to traverse the use-def tree to determine whether the values
can simply be promoted. Our concern is values in the registers
overflowing the narrow (i8, i16) data range, so instructions marked
with nuw can be promoted easily. For add and sub instructions, we are
able to use the parallel dsp instructions to operate on scalar data
types and avoid overflowing bits. Underflowing adds and subs are also
permitted when the result is only used by an unsigned icmp.

Differential Revision: https://reviews.llvm.org/D48832

llvm-svn: 337687
2018-07-23 12:27:47 +00:00
John Brawn fc18a6ad7d [GVN] Don't use the eliminated load as an available value in phi construction
In ConstructSSAForLoadSet if an available value is actually the load that we're
doing SSA construction to eliminate, then we can omit it as SSAUpdate will add
in the value for the phi that will be replacing it anyway. This can result in
simpler IR which can allow further optimisation.

Differential Revision: https://reviews.llvm.org/D44160

llvm-svn: 337686
2018-07-23 12:14:45 +00:00
Alexandros Lamprineas bf6009c234 [MemorySSAUpdater] Update Phi operands after trivial Phi elimination
Bug fix for PR37445. The underlying problem and its fix are similar to PR37808.
The bug lies in MemorySSAUpdater::getPreviousDefRecursive(), where PhiOps is
computed before the call to tryRemoveTrivialPhi() and it ends up being out of
date, pointing to stale data. We have now turned each of the PhiOps into a
TrackingVH<MemoryAccess>.

Differential Revision: https://reviews.llvm.org/D49425

llvm-svn: 337680
2018-07-23 10:56:30 +00:00
Roman Lebedev 52b85377eb [NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on partially written registers.
Summary:
Pretty mechanical follow-up for D49196.

As microarchitecture.pdf notes, "20 AMD Ryzen pipeline",
"20.8 Register renaming and out-of-order schedulers":
  The integer register file has 168 physical registers of 64 bits each.
  The floating point register file has 160 registers of 128 bits each.
"20.14 Partial register access":
  The processor always keeps the different parts of an integer register together.
  ...
  An instruction that writes to part of a register will therefore have a false dependence
  on any previous write to the same register or any part of it.

Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh

Reviewed By: GGanesh

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D49393

llvm-svn: 337676
2018-07-23 10:10:13 +00:00
Roman Lebedev d57bd45acc [NFC][MCA] ZnVer1: add partial-reg-update tests
Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh

Reviewed By: GGanesh

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D49392

llvm-svn: 337675
2018-07-23 10:10:04 +00:00
Alexandros Lamprineas 592cc78dd8 [GVNHoist] safeToHoistLdSt allows illegal hoisting
Bug fix for PR36787. When reasoning if it's safe to hoist a load we
want to make sure that the defining memory access dominates the new
insertion point of the hoisted instruction. safeToHoistLdSt calls
firstInBB(InsertionPoint,DefiningAccess) which returns false if
InsertionPoint == DefiningAccess, and therefore it falsely thinks
it's safe to hoist.

Differential Revision: https://reviews.llvm.org/D49555

llvm-svn: 337674
2018-07-23 09:42:35 +00:00
Chandler Carruth 1d926fb9f4 [x86/SLH] Fix a bug where we would harden tail calls twice -- once as
a call, and then again as a return.

Also added a comment to try and explain better why we would be doing
what we're doing when hardening the (non-call) returns.

llvm-svn: 337673
2018-07-23 07:56:15 +00:00
Chandler Carruth b66f2d8df8 [x86/SLH] Add a test covering indirect forms of control flow. NFC.
This specifically covers different ways of making indirect calls and
jumps. There are some bugs in SLH that I will be fixing in subsequent
patches where the diff in the generated instructions makes the bug fix
much more clear, so just checking in a baseline of this test to start.

I'm also going to be adding direct mitigation for variant 1.2 which this
file very specifically tests in the various forms it can arise on x86.
Again, the diff to the generated instructions should make the change for
that much more clear, so having the test as a baseline seems useful.

llvm-svn: 337672
2018-07-23 07:51:51 +00:00
Craig Topper b2a626b52e [X86] Remove the max vector width restriction from combineLoopMAddPattern and rely splitOpsAndApply to handle splitting.
This seems to be a net improvement. There's still an issue under avx512f where we have a 512-bit vpaddd, but not vpmaddwd so we end up doing two 256-bit vpmaddwds and inserting the results before a 512-bit vpaddd. It might be better to do two 512-bits paddds with zeros in the upper half. Same number of instructions, but breaks a dependency.

llvm-svn: 337656
2018-07-22 19:44:35 +00:00
Craig Topper d8f80e90ce [X86] Add more MADD recurrence test cases with larger and narrower vector widths.
llvm-svn: 337650
2018-07-22 05:16:47 +00:00
Simon Atanasyan ecd1e0afdd [mips] Move out the WrapperPat declaration from the NotInMicroMips predicate
This is a follow-up to the rL335185. Those commit adds some WrapperPat
patterns for microMIPS target. But declaration of the WrapperPat class
is under the NotInMicroMips predicate and microMIPS patterns cannot be
selected because predicate (Subtarget->inMicroMipsMode()) &&
(!Subtarget->inMicroMipsMode()) is always false.

This change move out the WrapperPat class declaration from the
NotInMicroMips predicate and enables microMIPS WrapperPat patterns.

Differential revision: https://reviews.llvm.org/D49533

llvm-svn: 337646
2018-07-21 16:16:03 +00:00
Chen Zheng 69bb064539 [InstrSimplify] fold sdiv if two operands are negated and non-overflow
Differential Revision: https://reviews.llvm.org/D49382

llvm-svn: 337642
2018-07-21 12:27:54 +00:00
Krzysztof Parzyszek 05337bdb50 [Hexagon] Disable packets in test to avoid ordering issues in checks
llvm-svn: 337624
2018-07-20 21:55:55 +00:00
Martin Storsjo a6ffc9c8df [COFF] Adjust how we flag weak externals
This fixes PR36096.

Originally based on a patch by Martell Malone.

Differential Revision: https://reviews.llvm.org/D44357

llvm-svn: 337613
2018-07-20 20:48:29 +00:00
George Karpenkov 346dfbe2bc [FileCheck] Provide an option for FileCheck to dump original input to stderr on failure
The option can be either set using environment variable (e.g. env
FILECHECK_DUMP_INPUT_ON_FAILURE=1 ninja check-fuzzer) or with a
FileCheck flag.

This can be extremely useful for debugging, cf.
https://groups.google.com/forum/#!topic/llvm-dev/kLrzg8OM_h8 for
discussion.

Differential Revision: https://reviews.llvm.org/D49328

llvm-svn: 337609
2018-07-20 20:21:57 +00:00
Roman Tereshin 31d52847ef Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reapplies commit r337489 reverted by r337541
Additionally, this commit contains a speculative fix to the issue reported in r337541
(the report does not contain an actionable reproducer, just a stack trace)

llvm-svn: 337606
2018-07-20 20:10:04 +00:00
Joel E. Denny 6fc21c2522 [FileCheck] Fix search ranges for DAG-NOT-DAG
A DAG-NOT-DAG is a CHECK-DAG group, X, followed by a CHECK-NOT group,
N, followed by a CHECK-DAG group, Y.  Let y be the initial directive
of Y.  This patch makes the following changes to the behavior:

    1. Directives in N can no longer match within part of Y's match
       range just because y happens not to be the earliest match from
       Y.  Specifically, this patch withdraws N's search range end
       from y's match range start to Y's match range start.

    2. y can no longer match within X's match range, where a y match
       produced a reordering complaint, which is thus no longer
       possible.  Specifically, this patch withdraws y's search range
       start from X's permitted range start to X's match range end,
       which was already the search range start for other members of
       Y.

Both of these changes can only increase the number of test passes: #1
constrains the ability of CHECK-NOTs to match, and #2 expands the
ability of CHECK-DAGs to match without complaints.

These changes are based on discussions at:

   <http://lists.llvm.org/pipermail/llvm-dev/2018-May/123550.html>
   <https://reviews.llvm.org/D47106>

which conclude that:

    1. These changes simplify the FileCheck conceptual model.  First,
       it makes search ranges for DAG-NOT-DAG more consistent with
       other cases.  Second, it was confusing that y was treated
       differently from the rest of Y.

    2. These changes add theoretical use cases for DAG-NOT-DAG that
       had no obvious means to be expressed otherwise.  We can justify
       the first half of this assertion with the observation that
       these changes can only increase the number of test passes.

    3. Reordering detection for DAG-NOT-DAG had no obvious real
       benefit.

We don't have evidence from real uses cases to help us debate
conclusions #2 and #3, but #1 at least seems intuitive.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D48986

llvm-svn: 337605
2018-07-20 20:09:56 +00:00
Jordan Rupprecht db2036e1f5 [llvm-objcopy] Add basic support for --rename-section
Summary:
Add basic support for --rename-section=old=new to llvm-objcopy.

A full replacement for GNU objcopy requires also modifying flags (i.e. --rename-section=old=new,flag1,flag2); I'd like to keep that in a separate change to keep this simple.

Reviewers: jakehehrlich, alexshap

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49576

llvm-svn: 337604
2018-07-20 19:54:24 +00:00
Reid Kleckner 415b0bf370 And add a lit substitution for llvm-undname, as the comment says to
llvm-svn: 337600
2018-07-20 18:45:01 +00:00
Reid Kleckner d463453d81 Make check-llvm depend on llvm-undname
llvm-svn: 337597
2018-07-20 18:42:19 +00:00
Zachary Turner 91ecedd29e Fix a few warnings and style issues in MS demangler.
Also remove a broken test case.

llvm-svn: 337591
2018-07-20 18:07:33 +00:00
Craig Topper 28ac623f6f [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.
Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction.

Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining.

Differential Revision: https://reviews.llvm.org/D49280

llvm-svn: 337590
2018-07-20 17:57:53 +00:00
Simon Pilgrim 5e729dcc03 [llvm-mca][x86] Add movsx/movzx instructions to general x86_64 resource tests
llvm-svn: 337586
2018-07-20 17:43:42 +00:00
Zachary Turner f435a7eada Add a Microsoft Demangler.
This adds initial support for a demangling library (LLVMDemangle)
and tool (llvm-undname) for demangling Microsoft names.  This
doesn't cover 100% of cases and there are some known limitations
which I intend to address in followup patches, at least until such
time that we have (near) 100% test coverage matching up with all
of the test cases in clang/test/CodeGenCXX/mangle-ms-*.

Differential Revision: https://reviews.llvm.org/D49552

llvm-svn: 337584
2018-07-20 17:27:48 +00:00
Simon Pilgrim 70fcd0f481 [X86][XOP] Fix SUB constant folding for VPSHA/VPSHL shift lowering
We can safely use getConstant here as we're still lowering, which allows constant folding to kick in and simplify the vector shift codegen.

Noticed while working on D49562.

llvm-svn: 337578
2018-07-20 16:55:18 +00:00
Evandro Menezes fffa9b5897 [ARM] Add new feature to enable optimizing the VFP registers
Enable the optimization of operations on DPR and SPR via a feature instead
of checking the target.

Differential revision: https://reviews.llvm.org/D49463

llvm-svn: 337575
2018-07-20 16:49:28 +00:00
Alexander Potapenko 5ff3abbc31 [MSan] run materializeChecks() before materializeStores()
When pointer checking is enabled, it's important that every pointer is
checked before its value is used.
For stores MSan used to generate code that calculates shadow/origin
addresses from a pointer before checking it.
For userspace this isn't a problem, because the shadow calculation code
is quite simple and compiler is able to move it after the check on -O2.
But for KMSAN getShadowOriginPtr() creates a runtime call, so we want the
check to be performed strictly before that call.

Swapping materializeChecks() and materializeStores() resolves the issue:
both functions insert code before the given IR location, so the new
insertion order guarantees that the code calculating shadow address is
between the address check and the memory access.

llvm-svn: 337571
2018-07-20 16:28:49 +00:00
Simon Pilgrim c7132031a2 [X86][SSE] Use SplitOpsAndApply to improve HADD/HSUB lowering
Improve AVX1 256-bit vector HADD/HSUB matching by using SplitOpsAndApply to split into 128-bit instructions.

llvm-svn: 337568
2018-07-20 16:20:45 +00:00
Stella Stamenova ca0547c83c [llvm-objcopy, tests] Fix several llvm-objcopy tests
Summary: In Python 3, sys.stdout.write expects a string rather than bytes. In order to be able to write the bytes to stdout, we need to use the buffer directly instead. This change is borrowing the implementation for writing to stdout that cat.py uses. Note that we cannot use cat.py directly because the file we are trying to open is a gzip file.

Reviewers: asmith, bkramer, alexshap, jakehehrlich

Reviewed By: alexshap, jakehehrlich

Subscribers: jakehehrlich, llvm-commits

Differential Revision: https://reviews.llvm.org/D49515

llvm-svn: 337567
2018-07-20 16:19:36 +00:00
Simon Pilgrim a85b86a982 [X86][AVX] Add support for i16 256-bit vector horizontal op redundant shuffle removal
llvm-svn: 337566
2018-07-20 15:51:01 +00:00
Simon Pilgrim a2bc2d488c [X86][AVX] Add v16i16 horizontal op redundant shuffle tests
llvm-svn: 337565
2018-07-20 15:41:15 +00:00
Nirav Dave 25802ac9fd [DAG] Avoid Node Update assertion due to AND simplification
Check for construction-time folding for incomplete AND nodes in
BackwardsPropagateMask.

Fixes PR38185.

Reviewers: RKSimon, samparker

Reviewed By: samparker

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D49444

llvm-svn: 337563
2018-07-20 15:27:24 +00:00
Simon Pilgrim 7c56bce996 [X86][AVX] Add support for 32/64 bits 256-bit vector horizontal op redundant shuffle removal
llvm-svn: 337561
2018-07-20 15:24:12 +00:00
Nirav Dave 5a4e11ad9c [DAG] Fix Memory ordering check in ReduceLoadOpStore.
When merging through a TokenFactor we need to check that the
load may be ordered such that no other aliasing memory operations may
happen. It is not sufficient to just check that the load is a member
of the chain token factor as it there may be a indirect chain. Require
the load's chain has only one use.

This fixes PR37826.

Reviewers: spatel, davide, efriedma, craig.topper, RKSimon

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49388

llvm-svn: 337560
2018-07-20 15:20:50 +00:00
Simon Pilgrim 8342126c4e [X86][AVX] Add 256-bit vector horizontal op redundant shuffle tests
llvm-svn: 337558
2018-07-20 15:07:53 +00:00
Simon Pilgrim f907e19b5e Regenerate partial vector fold test. NFCI.
llvm-svn: 337551
2018-07-20 13:58:57 +00:00
Chen Zheng 0f609db61a [NFC][testcases] fold sdiv if two operands are negated and non-overflow
llvm-svn: 337549
2018-07-20 13:38:59 +00:00
Florian Hahn 0a560d5d9c Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.

Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.

Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.

Reviewers: dberlin, mssimpso, davide, efriedma

Reviewed By: mssimpso

Differential Revision: https://reviews.llvm.org/D43762

llvm-svn: 337548
2018-07-20 13:29:12 +00:00
Simon Pilgrim cbf5af12b0 Regenerate remainder test.
llvm-svn: 337546
2018-07-20 13:14:29 +00:00
Chen Zheng f801d0fea9 [InstSimplify] fold srem instruction if its two operands are negated.
Differential Revision: https://reviews.llvm.org/D49423

llvm-svn: 337545
2018-07-20 13:00:47 +00:00
Pavel Labath f9adc20aef [DebugInfo] Generate .debug_names section when it makes sense
Summary:
This patch makes us generate the debug_names section in response to some
user-facing commands (previously it was only generated if explicitly
selected via the -accel-tables option).

My goal was to make this work for DWARF>=5 (as it's an official part of
that standard), and also, as an extension, for DWARF<5 if one is
explicitly tuning for lldb as a debugger (because it brings a large
performance improvement there).

This is slightly complicated by the fact that the debug_names tables are
incompatible with the DWARF v4 type units (they assume that the type
units are in the debug_info section), and unfortunately, right now we
generate DWARF v4-style type units even for -gdwarf-5. For this reason,
I disable all accelerator tables if the user requested type unit
generation. I do this even for apple tables, as they have the same
problem (in fact generating type units for apple targets makes us crash
even before we get around to emitting the accelerator tables).

Reviewers: JDevlieghere, aprantl, dblaikie, echristo, probinson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49420

llvm-svn: 337544
2018-07-20 12:59:05 +00:00
Chen Zheng 35395a6773 [NFC][testcases] more testcases for folding srem if its two operands are negatived.
llvm-svn: 337543
2018-07-20 12:53:45 +00:00
Ulrich Weigand 9dd23b8433 [SystemZ] Test case formatting fixes
Fix systematically wrong whitespace from a prior automated change.

NFC.

llvm-svn: 337542
2018-07-20 12:12:10 +00:00
Sam McCall 57743883f1 Revert "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reverts commit r337489.
It causes asserts to fire in some TensorFlow tests, e.g.
tensorflow/compiler/tests/gather_test.py on GPU.

Example stack trace:
Start test case: GatherTest.testHigherRank
assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits"
    @     0x5559446ebe10  __assert_fail
    @     0x55593ef32f5e  llvm::APInt::trunc()
    @     0x55593d78f86e  (anonymous namespace)::Vectorizer::lookThroughComplexAddresses()
    @     0x55593d78f2bc  (anonymous namespace)::Vectorizer::areConsecutivePointers()
    @     0x55593d78d128  (anonymous namespace)::Vectorizer::isConsecutiveAccess()
    @     0x55593d78c926  (anonymous namespace)::Vectorizer::vectorizeInstructions()
    @     0x55593d78c221  (anonymous namespace)::Vectorizer::vectorizeChains()
    @     0x55593d78b948  (anonymous namespace)::Vectorizer::run()
    @     0x55593d78b725  (anonymous namespace)::LoadStoreVectorizer::runOnFunction()
    @     0x55593edf4b17  llvm::FPPassManager::runOnFunction()
    @     0x55593edf4e55  llvm::FPPassManager::runOnModule()
    @     0x55593edf563c  (anonymous namespace)::MPPassManager::runOnModule()
    @     0x55593edf5137  llvm::legacy::PassManagerImpl::run()
    @     0x55593edf5b71  llvm::legacy::PassManager::run()
    @     0x55593ced250d  xla::gpu::IrDumpingPassManager::run()
    @     0x55593ced5033  xla::gpu::(anonymous namespace)::EmitModuleToPTX()
    @     0x55593ced40ba  xla::gpu::(anonymous namespace)::CompileModuleToPtx()
    @     0x55593ced33d0  xla::gpu::CompileToPtx()
    @     0x55593b26b2a2  xla::gpu::NVPTXCompiler::RunBackend()
    @     0x55593b21f973  xla::Service::BuildExecutable()
    @     0x555938f44e64  xla::LocalService::CompileExecutable()
    @     0x555938f30a85  xla::LocalClient::Compile()
    @     0x555938de3c29  tensorflow::XlaCompilationCache::BuildExecutable()
    @     0x555938de4e9e  tensorflow::XlaCompilationCache::CompileImpl()
    @     0x555938de3da5  tensorflow::XlaCompilationCache::Compile()
    @     0x555938c5d962  tensorflow::XlaLocalLaunchBase::Compute()
    @     0x555938c68151  tensorflow::XlaDevice::Compute()
    @     0x55593f389e1f  tensorflow::(anonymous namespace)::ExecutorState::Process()
    @     0x55593f38a625  tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()()
*** SIGABRT received by PID 7798 (TID 7837) from PID 7798; ***

llvm-svn: 337541
2018-07-20 12:03:00 +00:00
Jonas Paulsson c88d3f6a99 [SystemZ] Reimplent SchedModel IssueWidth and WriteRes/ReadAdvance mappings.
As a consequence of recent discussions
(http://lists.llvm.org/pipermail/llvm-dev/2018-May/123164.html), this patch
changes the SystemZ SchedModels so that the IssueWidth is 6, which is the
decoder capacity, and NumMicroOps become the number of decoder slots needed
per instruction.

In addition, the SchedWrite latencies now match the MachineInstructions
def-operand indexes, and ReadAdvances have been added on instructions with
one register operand and one memory operand.

Review: Ulrich Weigand
https://reviews.llvm.org/D47008

llvm-svn: 337538
2018-07-20 09:40:43 +00:00
Matt Arsenault 4bec7d4261 Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
Reverts r337079 with fix for msan error.

llvm-svn: 337535
2018-07-20 09:05:08 +00:00
Sander de Smalen 33f588acb9 [AArch64][SVE] Asm: Support for bit/byte reverse operations.
This patch adds the following instructions:

  RBIT      reverse bits within each active elemnt (predicated), e.g.
                rbit z0.d, p0/m, z1.d

            for 8, 16, 32 and 64 bit elements.

  REV       reverse order of elements in data/predicate vector
            (unpredicated), e.g.
                rev z0.d, z1.d
                rev p0.d, p1.d

            for 8, 16, 32 and 64 bit elements.

  REVB      reverse order of bytes within each active element, e.g.
                revb z0.d, p0/m, z1.d

            for 16, 32 and 64 bit elements.

  REVH      reverse order of 16-bit half-words within each active
            element, e.g.
                revh z0.d, p0/m, z1.d

            for 32 and 64 bit elements.

  REVW      reverse order of 32-bit words within each active element,
            e.g.
                revw z0.d, p0/m, z1.d

            for 64 bit elements.

llvm-svn: 337534
2018-07-20 09:00:44 +00:00
Sander de Smalen 3ed7f81ce1 [AArch64][SVE] Asm: Support for FTMAD instruction.
Floating-point trigonometric multiply-add coefficient,
e.g.

  ftmad z0.h, z0.h, z1.h, #7

with variants for 16, 32 and 64-bit elements.

llvm-svn: 337533
2018-07-20 08:47:26 +00:00
Stephen Canon 34f5867310 Add x86_64-unkown triple to llc for x86 test.
llvm-svn: 337523
2018-07-20 03:50:55 +00:00
Craig Topper d8734450a2 [DAGCombiner] Fold X - (-Y *Z) -> X + (Y * Z)
llvm-svn: 337518
2018-07-20 01:40:03 +00:00
Eli Friedman a3c78f5981 [SCCP] Don't use markForcedConstant on branch conditions.
It's more aggressive than we need to be, and leads to strange
workarounds in other places like call return value inference. Instead,
just directly mark an edge viable.

Tests by Florian Hahn.

Differential Revision: https://reviews.llvm.org/D49408

llvm-svn: 337507
2018-07-19 23:02:07 +00:00
Stephen Canon 8995c5f0f6 Skip out of SimplifyDemandedBits for BITCAST of f16 to i16
Mirrors the existing exit path for f128, avoiding a crash later on.

Differential Revision: https://reviews.llvm.org/D49524

llvm-svn: 337506
2018-07-19 22:46:42 +00:00
Teresa Johnson 0c432b1a70 [ThinLTO] Only emit referenced type id records in index files
Summary:
Currently all type ids are emitted into the index file when it is
written. For distributed ThinLTO, that meant that all type ids were
being duplicated into every single distributed index file, regardless of
whether they were referenced, leading to huge amounts of unnecessary
duplication and size bloat.

Keep track of the type id GUIDs actually referenced by the GV summary
records being emitted, and only emit those type IDs.

Add a new test, and fix test/Assembler/thinlto-summary.ll so that all
type ids are referenced to prevent deletion in that test.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, vitalybuka, llvm-commits

Differential Revision: https://reviews.llvm.org/D49565

llvm-svn: 337503
2018-07-19 22:25:56 +00:00
Craig Topper c12c5d421f [DAGCombiner] Teach DAGCombiner that A-(-B) is A+B.
We already knew A+(-B) is A-B in visitAdd. This does the opposite for visitSub.

llvm-svn: 337502
2018-07-19 22:24:43 +00:00
Simon Pilgrim 1d181bc992 [X86][AVX] Use extract_subvector to reduce vector op widths (PR36761)
We have a number of cases where we fail to reduce vector op widths, performing the op in a larger vector and then extracting a subvector. This is often because by default it would create illegal types.

This peephole patch attempts to handle a few common cases detailed in PR36761, which typically involved extension+conversion to vX2f64 types.

Differential Revision: https://reviews.llvm.org/D49556

llvm-svn: 337500
2018-07-19 21:52:06 +00:00
Roman Tereshin b49b2a601f [LSV] Refactoring + supporting bitcasts to a type of different size
This is mostly a preparation work for adding a limited support for
select instructions. It proved to be difficult to do due to size and
irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I
believe.

It also turned out that these changes make it simpler to finish one of
the TODOs and fix a number of other small issues, namely:

1. Looking through bitcasts to a type of a different size (requires
careful tracking of the original load/store size and some math
converting sizes in bytes to expected differences in indices of GEPs).

2. Reusing partial analysis of pointers done by first attempt in proving
them consecutive instead of starting from scratch. This added limited
support for nested GEPs co-existing with difficult sext/zext
instructions. This also required a careful handling of negative
differences between constant parts of offsets.

3. Handing a case where the first pointer index is not an add, but
something else (a function parameter for instance).

I observe an increased number of successful vectorizations on a large
set of shader programs. Only few shaders are affected, but those that
are affected sport >5% less loads and stores than before the patch.

Reviewed By: rampitec

Differential-Revision: https://reviews.llvm.org/D49342
llvm-svn: 337489
2018-07-19 19:42:43 +00:00
Stefan Pintilie 0c122d5a41 [Power9] Code Cleanup - Remove needsAggressiveScheduling()
As we already return true from needsAggressiveScheduling() for the most recent
hardware it would be cleaner to just return true for all PowerPC hardware.

Differential Revision: https://reviews.llvm.org/D48663

llvm-svn: 337488
2018-07-19 19:34:18 +00:00
Farhana Aleen 8c7a30baea [LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers.
Summary: Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by
         comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or
         both of them are not factorized. But isConsecutiveAccess() fails if one of the
         pointers is factorized but the other one is not.

         Here is an example:
         PtrA = 4 * (A + B)
         PtrB = 4 + 4A + 4B

         This patch uses getMinusSCEV() to compute the distance between two pointers.
         getMinusSCEV() allows combining the expressions and computing the simplified distance.

Author: FarhanaAleen

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D49516

llvm-svn: 337471
2018-07-19 16:50:27 +00:00
Andrea Di Biagio b6022aa8d9 [X86][BtVer2] correctly model the latency/throughput of LEA instructions.
This patch fixes the latency/throughput of LEA instructions in the BtVer2
scheduling model.

On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of
1. That is because it uses one cycle of SAGU followed by 1cy of ALU1.  An LEA
with a "Scale" operand is also slow, and it has the same latency profile as the
3-operands LEA. An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e.
RThrouhgput of 2.0).

This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td.
The tablegen backend (for instruction-info) expands that definition into this
(file X86GenInstrInfo.inc):
```
static bool isThreeOperandsLEA(const MachineInstr &MI) {
  return (
    (
      MI.getOpcode() == X86::LEA32r
      || MI.getOpcode() == X86::LEA64r
      || MI.getOpcode() == X86::LEA64_32r
      || MI.getOpcode() == X86::LEA16r
    )
    && MI.getOperand(1).isReg()
    && MI.getOperand(1).getReg() != 0
    && MI.getOperand(3).isReg()
    && MI.getOperand(3).getReg() != 0
    && (
      (
        MI.getOperand(4).isImm()
        && MI.getOperand(4).getImm() != 0
      )
      || (MI.getOperand(4).isGlobal())
    )
  );
}
```

A similar method is generated in the X86_MC namespace, and included into
X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h).

Back to the BtVer2 scheduling model:
A new scheduling predicate named JSlowLEAPredicate now checks if either the
instruction is a three-operands LEA, or it is an LEA with a Scale value
different than 1.
A variant scheduling class uses that new predicate to correctly select the
appropriate latency profile.

Differential Revision: https://reviews.llvm.org/D49436

llvm-svn: 337469
2018-07-19 16:42:15 +00:00
Simon Pilgrim d8291e34e1 [X86][SSE] Add FPEXT vXf32 - vXf64 tests
Some basic subvector special cases based on PR36761

llvm-svn: 337464
2018-07-19 15:32:45 +00:00
George Rimar a2b553b4c9 [llvm-readobj] - Do not report invalid amount of sections.
When output style is GNU and amount of sections is >= SHN_LORESERVE,
llvm-readobj reports zero number of sections instead of actual value.

The patch fixes that.

Differential revision: https://reviews.llvm.org/D49544

llvm-svn: 337462
2018-07-19 14:52:57 +00:00
Teresa Johnson 28023dbed7 [ThinLTO] Enable ThinLTO WholeProgramDevirt and LowerTypeTests in new PM
Summary:
Enable these passes for CFI and WPD in ThinLTO and LTO with the new pass
manager. Add a couple of tests for both PMs based on the clang tests
tools/clang/test/CodeGen/thinlto-distributed-cfi*.ll, but just test
through llvm-lto2 and not with distributed ThinLTO.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49429

llvm-svn: 337461
2018-07-19 14:51:32 +00:00
Tim Northover a7c18ee8c4 ARM: switch armv7em MachO triple to hard-float defaults and libcalls.
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.

Recommitted without breaking ELF this time.

llvm-svn: 337450
2018-07-19 12:44:51 +00:00
Simon Pilgrim bfb900d363 [DAGCombiner] Add rotate-extract tests
Add new tests from D47681 to current codegen. Also added i686 codegen tests.

llvm-svn: 337445
2018-07-19 09:27:34 +00:00
Max Kazantsev d41faecc49 [SCEV] Fix buggy behavior in getAddExpr with truncs
SCEV tries to constant-fold arguments of trunc operands in SCEVAddExpr, and when it does
that, it passes wrong flags into the recursion. It is only valid to pass flags that are proved for
narrow type into a computation in wider type if we can prove that trunc instruction doesn't
actually change the value. If it did lose some meaningful bits, we may end up proving wrong
no-wrap flags for sum of arguments of trunc.

In the provided test we end up with `nuw` where it shouldn't be because of this bug.

The solution is to conservatively pass `SCEV::FlagAnyWrap` which is always a valid thing to do.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D49471

llvm-svn: 337435
2018-07-19 01:46:21 +00:00
Peter Collingbourne 4a653fa7f1 Rename __asan_gen_* symbols to ___asan_gen_*.
This prevents gold from printing a warning when trying to export
these symbols via the asan dynamic list after ThinLTO promotes them
from private symbols to external symbols with hidden visibility.

Differential Revision: https://reviews.llvm.org/D49498

llvm-svn: 337428
2018-07-18 22:23:14 +00:00
Heejin Ahn 47068a42d2 [WebAssembly] Add missing -mattr=+exception-handling guards
Summary:
The use of exception handling instructions should only be enabled with
`-mattr=+exception-handling` option.

Reviewers: jgravelle-google

Subscribers: dschuff, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49391

llvm-svn: 337425
2018-07-18 21:42:22 +00:00
Tim Northover da142d10d9 Revert "ARM: switch armv7em triple to hard-float defaults and libcalls."
This reverts commit r337385 until it can be targeted at MachO only.

llvm-svn: 337424
2018-07-18 21:32:49 +00:00
David Blaikie 71812e3001 Fix some tests that had (implied) duplicate mtriple
I 'fixed' one of these to use %llc_dwarf unnecessarily, so switch them
both back to using llc directly.

llvm-svn: 337423
2018-07-18 20:37:01 +00:00
Simon Pilgrim d4b82da113 [X86][SSE] Canonicalize scalar fp arithmetic shuffle patterns
As discussed on PR38197, this canonicalizes MOVS*(N0, OP(N0, N1)) --> MOVS*(N0, SCALAR_TO_VECTOR(OP(N0[0], N1[0])))

This returns the scalar-fp codegen lost by rL336971.

Additionally it handles the OP(N1, N0)) case for commutable (FADD/FMUL) ops.

Differential Revision: https://reviews.llvm.org/D49474

llvm-svn: 337419
2018-07-18 19:55:19 +00:00
Nirav Dave 5217bbce04 [DAG] Add testcase.
llvm-svn: 337414
2018-07-18 18:34:52 +00:00
David Blaikie d66140514d [DebugInfo] Dwarfv5: Avoid unnecessary base_address specifiers in rnglists
Since DWARFv5 rnglists are self descriptive and have distinct encodings
for base-relative (offset_pair) and absolute (start_length) entries,
there's no need to use a base address specifier when describing a lone
address range in a section.

Use that, and improve the test coverage a bit here to include cases like
this and others.

llvm-svn: 337411
2018-07-18 18:04:42 +00:00
Nirav Dave a747d3ca60 [ScheduleDAG] Fix unfolding of SUnits to already existent nodes.
Summary:
If unfolding an SUnit results in both load or the operation using it which
already exist in the DAG, abort the unfold if they are already scheduled.
If not, make sure we don't add duplicate dependencies.

This fixes PR37916.

Reviewers: davide, eli.friedman, fhahn, bogner

Subscribers: MatzeB, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48666

llvm-svn: 337409
2018-07-18 18:01:03 +00:00
Paul Semel 6e13790801 [llvm-readobj] Generic -string-dump option
Differential Revision: https://reviews.llvm.org/D49470

llvm-svn: 337408
2018-07-18 18:00:41 +00:00
Paul Semel 007dedbf77 [llvm-objdump] Add -demangle (-C) option
Differential Revision: https://reviews.llvm.org/D49043

llvm-svn: 337401
2018-07-18 16:39:21 +00:00
Roman Lebedev 5317e88300 [NFC][X86][AArch64][DAGCombine] More tests for optimizeSetCCOfSignedTruncationCheck()
At least one of these cases is more canonical,
so we really do have to handle it.
https://godbolt.org/g/pkzP3X
https://rise4fun.com/Alive/pQyh

llvm-svn: 337400
2018-07-18 16:19:06 +00:00
Benjamin Kramer 99ea38f42d [llvm-objcopy] %python wants to be in quotes, because it might contain spaces
llvm-svn: 337399
2018-07-18 16:17:53 +00:00
Nirav Dave e24fcd5382 [MC] Fix nested macro body parsing
Add missing .rep case in nestlevel checking for macro body parsing.

llvm-svn: 337398
2018-07-18 16:17:03 +00:00
Simon Atanasyan e9d7b3198a [mips] Fix predicate for the MipsTruncIntFP pattern
This is a follow-up to the rL337171. This patch fixes regression
introduced by the r337171 and enables MipsTruncIntFP pattern.

Differential revision: https://reviews.llvm.org/D49469

llvm-svn: 337392
2018-07-18 14:11:22 +00:00
Tim Northover d4abd14c1b ARM: switch armv7em triple to hard-float defaults and libcalls.
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.

llvm-svn: 337385
2018-07-18 12:37:04 +00:00
Sander de Smalen 330d887d72 [AArch64][SVE] Asm: Support for unpredicated FP operations.
This patch adds support for the following unpredicated
floating-point instructions:

  FADD      Floating point add
  FSUB      Floating point subtract
  FMUL      Floating point multiplication
  FTSMUL    Floating point trigonometric starting value
  FRECPS    Floating point reciprocal step
  FRSQRTS   Floating point reciprocal square root step

The instructions have the following assembly format:
  fadd z0.h, z1.h, z2.h
and have variants for 16, 32 and 64-bit FP elements.

llvm-svn: 337383
2018-07-18 11:59:12 +00:00
Max Kazantsev 6b12506200 [NFC] Make a test more neat
llvm-svn: 337379
2018-07-18 11:03:40 +00:00
Roman Lebedev 3cb87e905c [InstCombine] Re-commit: Fold 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
  // Potential handling of non-splats: for each element:
  //  * if both are undef, replace with constant 0.
  //    Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
  //  * if both are not undef, and are different, bailout.
  //  * else, only one is undef, then pick the non-undef one.
```

This is a re-commit, as the original patch, committed in rL337190
was reverted in rL337344 as it broke chromium build:
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832
Proofs that the fixed folds are ok: https://rise4fun.com/Alive/VYM

Differential Revision: https://reviews.llvm.org/D49320

llvm-svn: 337376
2018-07-18 10:55:17 +00:00
Simon Pilgrim 21813140f6 [X86][SSE] Add extra scalar fop + blend tests for commuted inputs
While working on PR38197, I noticed that we don't make use of FADD/FMUL being able to commute the inputs to support the addps+movss -> addss style combine

llvm-svn: 337375
2018-07-18 10:54:13 +00:00
Daniel Cederman 959c8bf51c Revert "[Sparc] Use the IntPair reg class for r constraints with value type f64"
This reverts commit 55222c9183c6e07f53a54c4061677734f54feac1.

I missed that this patch has a dependency on https://reviews.llvm.org/D49219
that has not been approved yet.

llvm-svn: 337373
2018-07-18 10:05:30 +00:00
Sander de Smalen ccdc7ebc1d [AArch64][SVE] Asm: Support for UDOT/SDOT instructions.
The signed/unsigned DOT instructions perform a dot-product on
quadtuplets from two source vectors and accumulate the result in
the destination register. The instructions come in two forms:

Vector form, e.g.
  sdot  z0.s, z1.b, z2.b     - signed dot product on four 8-bit quad-tuplets,
                               accumulating results in 32-bit elements.

  udot  z0.d, z1.h, z2.h     - unsigned dot product on four 16-bit quad-tuplets,
                               accumulating results in 64-bit elements.

Indexed form, e.g.
  sdot  z0.s, z1.b, z2.b[3]  - signed dot product on four 8-bit quad-tuplets
                               with specified quadtuplet from second
                               source vector, accumulating results in 32-bit
                               elements.
  udot  z0.d, z1.h, z2.h[1]  - dot product on four 16-bit quad-tuplets
                               with specified quadtuplet from second
                               source vector, accumulating results in 64-bit
                               elements.

llvm-svn: 337372
2018-07-18 09:37:51 +00:00
Daniel Cederman 4e38df18ea [Sparc] Use the IntPair reg class for r constraints with value type f64
Summary: This is how it appears to be handled in GCC and it prevents a
"Unknown mismatch" error in the SelectionDAGBuilder.

Reviewers: venkatra, jyknight, jrtc27

Reviewed By: jyknight, jrtc27

Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D49218

llvm-svn: 337370
2018-07-18 09:25:33 +00:00
Sander de Smalen 889fe81ce5 [AArch64][SVE] Asm: Integer divide instructions.
This patch adds the following predicated instructions:

  UDIV    Unsigned divide active elements
  UDIVR   Unsigned divide active elements, reverse form.
  SDIV    Signed divide active elements
  SDIVR   Signed divide active elements, reverse form.

e.g.
  udiv  z0.s, p0/m, z0.s, z1.s
    (unsigned divide active elements in z0 by z1, store result in z0)

  sdivr z0.s, p0/m, z0.s, z1.s
    (signed divide active elements in z1 by z0, store result in z0)

llvm-svn: 337369
2018-07-18 09:17:29 +00:00
Roman Lebedev 3404d4dd41 [NFC][InstCombine] i65 tests for 'check for [no] signed truncation' pattern
Those initially broke chromium build:
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832

llvm-svn: 337364
2018-07-18 08:49:51 +00:00
George Rimar e35e6448f9 [llvm-objdump] - Stop reporting bogus section IDs.
Imagine we have a file with few sections, and one of them is .foo
with index N != 0.

Problem is that when llvm-objdump is given a -section=.foo parameter
it lists .foo as a section at index 0. That makes impossible to write
test cases which needs to find the index of the particular section,
while ignoring dumping of others.

The patch fixes that.

Differential revision: https://reviews.llvm.org/D49372

llvm-svn: 337361
2018-07-18 08:34:35 +00:00
George Rimar 6fdac3b23a [llvm-readobj] - Teach tool to dump objects with >= SHN_LORESERVE of sections.
http://www.sco.com/developers/gabi/2003-12-17/ch4.eheader.html

says that e_shnum and/or e_shstrndx may have special values if
"the number of sections is greater than or equal to SHN_LORESERVE" or
"the section name string table section index is greater than or equal to SHN_LORESERVE (0xff00)"

Previously llvm-readobj was unable to dump such files, patch changes that.

I had to add a precompiled test case because it does not seem possible to
prepare a test using yaml2obj or llvm-mc (not clear how to make .shstrtab
to have index >= SHN_LORESERVE).

Differential revision: https://reviews.llvm.org/D49369

llvm-svn: 337360
2018-07-18 08:19:58 +00:00
Roman Lebedev ad50ae82ad Revert test changes part of "Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern""
We want the test to remain good anyway.
I think the fix is incoming.

This reverts part of commit rL337344.

llvm-svn: 337359
2018-07-18 08:15:13 +00:00
Sander de Smalen ac0cb5bf75 [AArch64][SVE] Asm: Support for integer MUL instructions.
This patch adds the following instructions:
  MUL   - multiply vectors, e.g.
    mul z0.h, p0/m, z0.h, z1.h
        - multiply with immediate, e.g.
    mul z0.h, z0.h, #127

  SMULH - signed multiply returning high half, e.g.
    smulh z0.h, p0/m, z0.h, z1.h

  UMULH - unsigned multiply returning high half, e.g.
    umulh z0.h, p0/m, z0.h, z1.h

llvm-svn: 337358
2018-07-18 08:10:03 +00:00
Craig Topper 92ea7a7b48 [X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by using VMOVLPS with a modified address.
This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable.

llvm-svn: 337357
2018-07-18 07:31:32 +00:00
Craig Topper a2bbfa21ce [X86] Add test case for missed opportunity to commute vunpckhpd to enable use of vmovlps to fold a load.
We do this transform for SSE, but not AVX or AVX512VL.

llvm-svn: 337356
2018-07-18 07:31:30 +00:00
Craig Topper a9ec6a11d8 [X86] Regenerate fma.ll checks using current version of the script which produces different regular expressions on spills and reloads. NFC
llvm-svn: 337354
2018-07-18 07:08:28 +00:00
Craig Topper 1425e10cc6 [X86] Generate v2f64 X86ISD::UNPCKL/UNPCKH instead of X86ISD::MOVLHPS/MOVHLPS for unary v2f64 {0,0} and {1,1} shuffles with SSE2.
I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts.

I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code.

We already support domain switching for UNPCKLPD and MOVLHPS.

llvm-svn: 337348
2018-07-18 05:10:51 +00:00
Justin Hibbits d52990c71b Introduce codegen for the Signal Processing Engine
Summary:
The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1,
e500v2, and several e200 cores.  This adds support targeting the e500v2,
as this is more common than the e500v1, and is in SoCs still on the
market.

This patch is very intrusive because the SPE is binary incompatible with
the traditional FPU.  After discussing with others, the cleanest
solution was to make both SPE and FPU features on top of a base PowerPC
subset, so all FPU instructions are now wrapped with HasFPU predicates.

Supported by this are:
* Code generation following the SPE ABI at the LLVM IR level (calling
conventions)
* Single- and Double-precision math at the level supported by the APU.

Still to do:
* Vector operations
* SPE intrinsics

As this changes the Callee-saved register list order, one test, which
tests the precise generated code, was updated to account for the new
register order.

Reviewed by: nemanjai
Differential Revision: https://reviews.llvm.org/D44830

llvm-svn: 337347
2018-07-18 04:25:10 +00:00
Justin Hibbits 4fa4fa6a73 Complete the SPE instruction set patterns
This is the lead-up to having SPE codegen.  Add the rest of the
instructions, along with MC tests.

Differential Revision:  https://reviews.llvm.org/D44829

llvm-svn: 337346
2018-07-18 04:24:57 +00:00
Bob Haarman 4ebe5d59b6 Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern"
This reverts r337190 (and a few follow-up commits), which caused the
Chromium build to fail. See
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832

llvm-svn: 337344
2018-07-18 02:18:28 +00:00
Peter Collingbourne fc50498ced CodeGen: Don't create address significance table entries for thread-local variables.
The presence of these symbols in the symbol table can cause symbol type
mismatch errors (or undefined symbol errors on emulated TLS targets)
and they can't be ICF'd anyway.

llvm-svn: 337338
2018-07-18 00:21:40 +00:00
Craig Topper a29f58dc31 [X86] Remove the vector alignment requirement from the patterns added in r337320.
The resulting instruction will only load 64 bits so alignment isn't required.

llvm-svn: 337334
2018-07-17 23:26:20 +00:00
Peter Collingbourne bd9d313d5c CodeGen: Add a target option for emitting .addrsig directives for all address-significant symbols.
Differential Revision: https://reviews.llvm.org/D48143

llvm-svn: 337331
2018-07-17 22:40:08 +00:00
Peter Collingbourne 3e22733698 MC: Implement support for new .addrsig and .addrsig_sym directives.
Part of the address-significance tables proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123514.html

Differential Revision: https://reviews.llvm.org/D47744

llvm-svn: 337328
2018-07-17 22:17:18 +00:00
Craig Topper 9ef92865ec [X86] Add patterns for folding full vector load into MOVHPS and MOVLPS with SSE1 only.
llvm-svn: 337320
2018-07-17 20:16:18 +00:00
Craig Topper c0f2e306f2 [X86] Add test case for missed opportunity to use MOVLPS on the SSE1 only targets.
llvm-svn: 337319
2018-07-17 20:16:15 +00:00
Vedant Kumar 9ece818291 [InstCombine] Preserve debug value when simplifying cast-of-select
InstCombine has a cast transform that matches a cast-of-select:

  Orig = cast (Src = select Cond TV FV)

And tries to replace it with a select which has the cast folded in:

  NewSel = select Cond (cast TV) (cast FV)

The combiner does RAUW(Orig, NewSel), so any debug values for Orig would
survive the transform. But debug values for Src would be lost.

This patch teaches InstCombine to replace all debug uses of Src with
NewSel (taking care of doing any necessary DIExpression rewriting).

Differential Revision: https://reviews.llvm.org/D49270

llvm-svn: 337310
2018-07-17 18:08:36 +00:00
Vedant Kumar ced3fd4736 Remove an errant piece of !dbg metadata from a test, NFC
llvm-svn: 337309
2018-07-17 18:08:34 +00:00
Chandler Carruth c0cb5731fc [x86/SLH] Flesh out the data-invariant instruction table a bit based on feedback from Craig.
Summary:
The only thing he suggested that I've skipped here is the double-wide
multiply instructions. Multiply is an area I'm nervous about there being
some hidden data-dependent behavior, and it doesn't seem important for
any benchmarks I have, so skipping it and sticking with the minimal
multiply support that matches what I know is widely used in existing
crypto libraries. We can always add double-wide multiply when we have
clarity from vendors about its behavior and guarantees.

I've tried to at least cover the fundamentals here with tests, although
I've not tried to cover every width or permutation. I can add more tests
where folks think it would be helpful.

Reviewers: craig.topper

Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D49413

llvm-svn: 337308
2018-07-17 18:07:59 +00:00
Simon Pilgrim 03164dfa5e [llvm-mca][x86] Add extend, carry-flag and CMP instructions to general x86_64 resource tests
llvm-svn: 337306
2018-07-17 17:47:35 +00:00
Simon Pilgrim 92da01fed9 [llvm-mca][x86] Add MOVBE resource tests to all supporting targets
SNB doesn't support MOVBE but the numbers in Generic (which use the SNB model) look sane.

llvm-svn: 337305
2018-07-17 17:41:45 +00:00
Simon Pilgrim 94049e8b15 [llvm-mca][x86] Add BSWAP resource tests
llvm-svn: 337302
2018-07-17 17:10:47 +00:00
Sam Clegg 28b3e99482 [WebAssembly] Update WebAssemblyLowerEmscriptenEHSjLj to handle separate compilation
Previously we were assuming whole program compilation. Now that
separate compilation is a thing we need to update this pass.
Firstly, it can no longer assert on the existence of malloc and free.
This functions might not be in the current translation unit.  If we
need them then we will generate not imports for them.

Secondly the global helper function we create should be marked as
weak since we will be generating a separate copy in each translation
unit.

Finally the names of the symbols used must be unique and fixed since
they need to agree across translation units.

Differential Revision: https://reviews.llvm.org/D49263

llvm-svn: 337301
2018-07-17 16:40:03 +00:00
Simon Pilgrim 99a4f3195b [llvm-mca][x86] Add displacement-only and additional scale=1 LEA tests
llvm-svn: 337298
2018-07-17 16:17:33 +00:00
Simon Pilgrim 17d89ca70e [llvm-mca][x86] Add LEA resource tests (PR32326)
Add llvm-mca tests demonstrating how LEA instructions are currently modelled. Once this is working on btver2 I'll copy the test file to the other target directories.

llvm-svn: 337297
2018-07-17 16:13:29 +00:00
Sander de Smalen 7b33faaf38 [AArch64][SVE]: Integer multiply-add/subtract instructions.
This patch adds support for the following instructions:
  MLA  mul-add, writing addend       (Zda = Zda +   Zn * Zm)
  MLS  mul-sub, writing addend       (Zda = Zda +  -Zn * Zm)
  MAD  mul-add, writing multiplicant (Zdn =  Za +  Zdn * Zm)
  MSB  mul-sub, writing multiplicant (Zdn =  Za + -Zdn * Zm)

llvm-svn: 337293
2018-07-17 15:41:58 +00:00
Petar Jovanovic f10e4798b4 [Mips][FastISel] Fix handling of icmp with i1 type
The Mips FastISel back-end does not extend i1 values while lowering icmp.
Ensure that we bail into DAG ISel when handling this case.

Patch by Dragan Mladjenovic.

Differential Revision: https://reviews.llvm.org/D49290

llvm-svn: 337288
2018-07-17 14:57:46 +00:00
Florian Hahn d95761d9d0 [IPSCCP] Run Solve each time we resolved an undef in a function.
Once we resolved an undef in a function we can run Solve, which could
lead to finding a constant return value for the function, which in turn
could turn undefs into constants in other functions that call it, before
resolving undefs there.

Computationally the amount of work we are doing stays the same, just the
order we process things is slightly different and potentially there are
a few less undefs to resolve.

We are still relying on the order of functions in the IR, which means
depending on the order, we are able to resolve the optimal undef first
or not. For example, if @test1 comes before @testf, we find the constant
return value of @testf too late and we cannot use it while solving
@test1.

This on its own does not lead to more constants removed in the
test-suite, probably because currently we have to be very lucky to visit
applicable functions in the right order.

Maybe we manage to come up with a better way of resolving undefs in more
'profitable' functions first.

Reviewers: efriedma, mssimpso, davide

Reviewed By: efriedma, davide

Differential Revision: https://reviews.llvm.org/D49385

llvm-svn: 337283
2018-07-17 14:04:59 +00:00
Sander de Smalen f41dd122d6 [AArch64][SVE] Asm: FP fused multiply-add/subtract instructions.
This patch adds support for the following instructions:

  FMLA    mul-add, writing addend                (Zda =  Zda +   Zn * Zm)
  FNMLA   negated mul-add, writing addend        (Zda = -Zda +  -Zn * Zm)

  FMLS    mul-sub, writing addend                (Zda =  Zda +  -Zn * Zm)
  FNMLS   negated mul-sub, writing addend        (Zda = -Zda +   Zn * Zm)

  FMAD    mul-add, writing multiplicant          (Zdn =  Za  +  Zdn * Zm)
  FNMAD   negated mul-add, writing multiplicant  (Zdn = -Za  + -Zdn * Zm)

  FMSB    mul-sub, writing multiplicant          (Zdn =  Za  + -Zdn * Zm)
  FNMSB   negated mul-sub, writing multiplicant  (Zdn = -Za  +  Zdn * Zm)

llvm-svn: 337282
2018-07-17 13:58:46 +00:00
Simon Pilgrim 1a4f3c93fb [SLPVectorizer] Don't attempt horizontal reduction on pointer types (PR38191)
TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only.

llvm-svn: 337280
2018-07-17 13:43:33 +00:00
Tim Renouf e1016f1bc7 More fixes for subreg join failure in RegCoalescer
Summary:
Part of the adjustCopiesBackFrom method wasn't correctly dealing with SubRange
intervals when updating.

2 changes. The first to ensure that bogus SubRange Segments aren't propagated when
encountering Segments of the form [1234r, 1234d:0) when preparing to merge value
numbers. These can be removed in this case.

The second forces a shrinkToUses call if SubRanges end on the copy index
(instead of just the parent register).

V2: Addressed review comments, plus MIR test instead of ll test

Subscribers: MatzeB, qcolombet, nhaehnle

Differential Revision: https://reviews.llvm.org/D40308

Change-Id: I1d2b2b4beea802fce11da01edf71feb2064aab05
llvm-svn: 337273
2018-07-17 12:38:39 +00:00
Sander de Smalen 5dabcf887b [AArch64][SVE] Asm: Support for predicated FP operations (FP immediate)
This patch completes support for the following floating point
instructions that take FP immediates:
  FADD*  (addition)
  FSUB   (subtract)
  FSUBR  (subtract reverse form)
  FMUL*  (multiplication)
  FMAX*  (maximum)
  FMAXNM (maximum number)
  FMIN   (maximum)
  FMINNM (maximum number)

All operations are predicated and take a FP immediate operand,
e.g.

  fadd z0.h, p0/m, z0.h, #0.5
  fmin z0.s, p0/m, z0.s, #1.0
        ^___________^ (tied)

* Instructions added in a previous patch.

llvm-svn: 337272
2018-07-17 12:36:08 +00:00
Chen Zheng fcfcf07104 [NFC][testcases] add testcases for folding srem whose operands are negatived.
Finish same optimization for add instruction in D49216 and sdiv instruction in 
D49382. This patch is for srem instruction.

llvm-svn: 337270
2018-07-17 12:31:54 +00:00
Benjamin Kramer e6dac13bab [llvm-objcopy] Run not with any python, but the python configured in lit.
llvm-svn: 337262
2018-07-17 10:30:56 +00:00
Sander de Smalen 3b9e342ae1 [AArch64][SVE] Asm: Support for predicated FP operations.
This patch adds support for the following floating point
instructions:
  FABD   (absolute difference)
  FADD   (addition)
  FSUB   (subtract)
  FSUBR  (subtract reverse form)
  FDIV   (divide)
  FDIVR  (divide reverse form)
  FMAX   (maximum)
  FMAXNM (maximum number)
  FMIN   (minimum)
  FMINNM (minimum number)
  FSCALE (adjust exponent)
  FMULX  (multiply extended)

All operations are predicated and binary form, e.g.

  fadd z0.h, p0/m, z0.h, z1.h
        ^___________^ (tied)

Supporting 16, 32 and 64-bit FP elements.

llvm-svn: 337259
2018-07-17 09:48:57 +00:00
Simon Pilgrim e4d12bb2d6 [DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT
If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops.

Differential Revision: https://reviews.llvm.org/D49262

llvm-svn: 337258
2018-07-17 09:45:35 +00:00
Sander de Smalen ec229abb9b [AArch64][SVE] Asm: Support for SPLICE instruction.
The SPLICE instruction splices two vectors into one vector using a
predicate. It copies the active elements from the first vector, and
then fills the remaining elements with the low-numbered elements from
the second vector.

The instruction has the following form, e.g.

  splice z0.b, p0, z0.b, z1.b

for 8-bit elements. It also supports 16, 32 and
64-bit elements.

llvm-svn: 337253
2018-07-17 08:52:45 +00:00
Sander de Smalen 78fe30a662 [AArch64][SVE] Asm: Support for EXT instruction.
This patch adds an instruction that allows extracting
a vector from a pair of vectors, given an immediate index
that describes the element position to extract from.

The instruction has the following assembly:
  ext z0.b, z0.b, z1.b, #imm

where #imm is an immediate between 0 and 255.

llvm-svn: 337251
2018-07-17 08:39:48 +00:00
Daniel Cederman c812dcc3db [Sparc] Do not depend on icc for ta 1
The ta instruction will always trap, regardless of the value
of the integer condition codes. TRAPri is marked as using icc,
so we cannot use a pattern for TRAPri to implement ta 1, as
verify-machineinstrs can complain that icc is not defined.
Instead we implement ta 1 the same way as ta 5.

llvm-svn: 337236
2018-07-17 05:49:33 +00:00
Craig Topper c376a1916b [X86] Add full set of patterns for turning ceil/floor/trunc/rint/nearbyint into rndscale with loads, broadcast, and masking.
This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from.

llvm-svn: 337234
2018-07-17 05:48:48 +00:00
Craig Topper c89ef8b13e [X86] Add test cases for selecting floor/ceil/trunc/rint/nearbyint to rndscale with masking, loading, and broadcasting.
llvm-svn: 337233
2018-07-17 05:48:46 +00:00
Chen Zheng c992cc4e97 [testcases] move testcases to right place - NFC
Differential Revision: https://reviews.llvm.org/D49409

llvm-svn: 337230
2018-07-17 01:04:41 +00:00
Craig Topper 6751727d76 [X86] Add a missing FMA3 scalar intrinsic pattern.
This allows us to use 231 form to fold an insertelement on the add input to the fma. There is technically no software intrinsic that can use this until AVX512F, but it can be manually built up from other intrinsics.

llvm-svn: 337223
2018-07-16 23:10:58 +00:00
Sam Clegg cf2a9e28b1 [WebAssembly] Remove ELF file support.
This support was partial and temporary.  Now that we have
wasm object file support its no longer needed.

Differential Revision: https://reviews.llvm.org/D48744

llvm-svn: 337222
2018-07-16 23:09:29 +00:00
Sanjay Patel c71adc8040 [Intrinsics] define funnel shift IR intrinsics + DAG builder support
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html

We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, 
and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation 
by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" 
operation which may also be a single hardware instruction.

Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working 
on that (much larger patch), but then I concluded that we don't need it. At least as a first 
step, we have all of the backend support necessary to match these ops...because it was required. 
And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are 
likely all that we'll ever need.

There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes
(along with improving the oversized shift documentation). Again, I don't think that's strictly 
necessary (as the test results here prove). That can be an efficiency improvement as a small 
follow-up patch.

So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. 

Differential Revision: https://reviews.llvm.org/D49242

llvm-svn: 337221
2018-07-16 22:59:31 +00:00
Roman Lebedev 5da636fb90 [NFC][InstCombine] Fine-tune 'check for [no] signed truncation' tests
We are using i8 for these tests, and shifting by 4,
which is exactly the half of i8.

But as it is seen from the proofs https://rise4fun.com/Alive/mgu
KeptBits = bitwidth(%x) - MaskedBits,
so with using shifts by 4, we are not really testing that
we actually properly handle the other cases with shifts not by half...

llvm-svn: 337208
2018-07-16 20:10:46 +00:00
Jake Ehrlich c7f8ac7896 [llvm-objcopy] Add support for large indexes
This patch is an update of an older patch that never landed
(see here: https://reviews.llvm.org/D42516)

Recently various users have run into this issue and it just 100%
has to be solved at this point. The main difference in this patch
is that I use gunzip instead of unzip which should hopefully allow
tests to pass. Please review this as if it is a new patch however.
I found some issues along the way and made some minor modifications.

The binary used in this patch for testing (a zip file to make it small)
can be found here:
https://drive.google.com/file/d/1UjsnTO9edLttZibbr-2T1bJl92KEQFAO/view?usp=sharing

Differential Revision: https://reviews.llvm.org/D49206

llvm-svn: 337204
2018-07-16 19:48:52 +00:00
Farhana Aleen c370d7b33d [AMDGPU] [AMDGPU] Support a fdot2 pattern.
Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z))
                   -> fdot2((v2f16)S0, (v2f16)S1, (float)z)

Author: FarhanaAleen

Reviewed By: rampitec, b-sumner

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D49146

llvm-svn: 337198
2018-07-16 18:19:59 +00:00
Roman Lebedev b79b4f539b [InstCombine] Fold 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

Proofs for this transform: https://rise4fun.com/Alive/mgu
This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
  // Potential handling of non-splats: for each element:
  //  * if both are undef, replace with constant 0.
  //    Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
  //  * if both are not undef, and are different, bailout.
  //  * else, only one is undef, then pick the non-undef one.
```

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: JDevlieghere, rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D49320

llvm-svn: 337190
2018-07-16 16:45:42 +00:00
Wei Mi 40c4aa7637 [RegAlloc] Skip global splitting if the live range is huge and its spill is
trivially rematerializable.

We run into a case where machineLICM hoists a large number of live ranges
outside of a big loop because it thinks those live ranges are trivially
rematerializable. In regalloc, global splitting is tried out first for those
live ranges before they are spilled and rematerialized. Because the global
splitting algorithm is quadratic, increasing a lot of global splitting
candidates causes huge compile time increase (50s to 1400s on my local
machine when compiling a module).

However, we think for live ranges which are very large and are trivially
rematerialiable, it is better to just skip global splitting so as to save
compile time with little chance of sacrificing performance.  We uses the
segment size of live range to indirectly evaluate whether the global
splitting of the live range can introduce high cost, and use an option
as a knob to adjust the size limit threshold.

Differential Revision: https://reviews.llvm.org/D49353

llvm-svn: 337186
2018-07-16 15:42:20 +00:00
Teresa Johnson d68935c5ac Restore "[ThinLTO] Ensure we always select the same function copy to import"
This reverts commit r337081, therefore restoring r337050 (and fix in
r337059), with test fix for bot failure described after the original
description below.

In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.

Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).

There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.

I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.

(I tried a few other variations of the change but they also didn't
improve time or peak memory).

The new commit removes a test that no longer makes sense
(Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the
reverse-iteration bot. The test depends on the order of processing the
summary call edges, and actually depended on the old problematic
behavior of selecting more than one summary for a given GUID when
encountered with different thresholds. There was no guarantee even
before that we would eventually pick the linkonce copy with the hottest
call edges, it just happened to work with the test and the old code, and
there was no guarantee that we would end up importing the selected
version of the copy that had the hottest call edges (since the backend
would effectively import only one of the selected copies).

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D48670

llvm-svn: 337184
2018-07-16 15:30:27 +00:00
Joel Galenson 4099b249fb [cfi-verify] Abort on unsupported targets
As suggested in the review for r337007, this makes cfi-verify abort on unsupported targets instead of producing incorrect results.  It also updates the design document to reflect this.

Differential Revision: https://reviews.llvm.org/D49304

llvm-svn: 337181
2018-07-16 15:26:44 +00:00
Chen Zheng 1ff49315ab [InstrSimplify] add testcases for fold sdiv if two operands are negatived and non-overflow
Differential Revision: https://reviews.llvm.org/D49365

llvm-svn: 337179
2018-07-16 15:06:42 +00:00
Chandler Carruth fa065aa75c [x86/SLH] Completely rework how we sink post-load hardening past data
invariant instructions to be both more correct and much more powerful.

While testing, I continued to find issues with sinking post-load
hardening. Unfortunately, it was amazingly hard to create any useful
tests of this because we were mostly sinking across copies and other
loading instructions. The fact that we couldn't sink past normal
arithmetic was really a big oversight.

So first, I've ported roughly the same set of instructions from the data
invariant loads to also have their non-loading varieties understood to
be data invariant. I've also added a few instructions that came up so
often it again made testing complicated: inc, dec, and lea.

With this, I was able to shake out a few nasty bugs in the validity
checking. We need to restrict to hardening single-def instructions with
defined registers that match a particular form: GPRs that don't have
a NOREX constraint directly attached to their register class.

The (tiny!) test case included catches all of the issues I was seeing
(once we can sink the hardening at all) except for the NOREX issue. The
only test I have there is horrible. It is large, inexplicable, and
doesn't even produce an error unless you try to emit encodings. I can
keep looking for a way to test it, but I'm out of ideas really.

Thanks to Ben for giving me at least a sanity-check review. I'll follow
up with Craig to go over this more thoroughly post-commit, but without
it SLH crashes everywhere so landing it for now.

Differential Revision: https://reviews.llvm.org/D49378

llvm-svn: 337177
2018-07-16 14:58:32 +00:00
Petar Jovanovic 021e4c82eb [MIPS GlobalISel] Select instructions to load and store i32 on stack
Add code for selection of G_LOAD, G_STORE, G_GEP, G_FRAMEINDEX and
G_CONSTANT. Support loads and stores of i32 values.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D48957

llvm-svn: 337168
2018-07-16 13:29:32 +00:00
Roman Lebedev de506632aa [X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern
Summary:

[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

But the IR-optimal patter does not lower efficiently, so we want to undo it..

This handles the simple pattern.
There is a second pattern with predicate and constants inverted.

NOTE: we do not check uses here. we always do the transform.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49266

llvm-svn: 337166
2018-07-16 12:44:10 +00:00
Daniel Cederman 92a700a215 [Sparc] Use the correct encoding for ta 3
Summary: The old encoding generated a "tn %g1 + 3" instruction instead
of the expected "ta 3".

Reviewers: venkatra, jyknight

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D49171

llvm-svn: 337165
2018-07-16 12:28:26 +00:00
Daniel Cederman ab09da7b57 [Sparc] Use the names .rem and .urem instead of __modsi3 and __umodsi3
Summary: These are the names used in libgcc.

Reviewers: venkatra, jyknight, ekedaigle

Reviewed By: jyknight

Subscribers: joerg, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48915

llvm-svn: 337164
2018-07-16 12:22:08 +00:00
Daniel Cederman 68765757d4 [Sparc] Generate ta 1 for the @llvm.debugtrap intrinsic
Summary: Software trap number one is the trap used for breakpoints
in the Sparc ABI.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48637

llvm-svn: 337163
2018-07-16 12:16:53 +00:00
Daniel Cederman c3d8002c2e Avoid losing Hi part when expanding VAARG nodes on big endian machines
Summary:
If the high part of the load is not used the offset to the next element
will not be set correctly.

For example, on Sparc V8, the following code will read val2 from offset 4
instead of 8.

```
int val = __builtin_va_arg(va, long long);
int val2 = __builtin_va_arg(va, int);
```

Reviewers: jyknight

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48595

llvm-svn: 337161
2018-07-16 12:14:17 +00:00
Chandler Carruth e66a6f48e3 [x86/SLH] Fix a bug where we would try to post-load harden non-GPRs.
Found cases that hit the assert I added. This patch factors the validity
checking into a nice helper routine and calls it when deciding to harden
post-load, and asserts it when doing so later.

I've added tests for the various ways of loading a floating point type,
as well as loading all vector permutations. Even though many of these go
to identical instructions, it seems good to somewhat comprehensively
test them.

I'm confident there will be more fixes needed here, I'll try to add
tests each time as I get this predicate adjusted.

llvm-svn: 337160
2018-07-16 11:38:48 +00:00
Mark Searles 72da47df25 run post-RA hazard recognizer pass late
Memory legalizer, waitcnt, and shrink  passes can perturb the instructions,
which means that the post-RA hazard recognizer pass should run after them.
Otherwise, one of those passes may invalidate the work done by the hazard
recognizer. Note that this has adverse side-effect that any consecutive
S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a
single S_NOP <N>. This should be addressed in a follow-on patch.

Differential Revision: https://reviews.llvm.org/D49288

llvm-svn: 337154
2018-07-16 10:02:41 +00:00
Alexandros Lamprineas f854ce84c4 [MemorySSAUpdater] Remove deleted trivial Phis from active workset
Bug fix for PR37808. The regression test is a reduced version of the
original reproducer attached to the bug report. As stated in the report,
the problem was that InsertedPHIs was keeping dangling pointers to
deleted Memory-Phis. MemoryPhis are created eagerly and sometimes get
zapped shortly afterwards. I've used WeakVH instead of an expensive
removal operation from the active workset.

Differential Revision: https://reviews.llvm.org/D48372

llvm-svn: 337149
2018-07-16 07:51:27 +00:00
Craig Topper 07a1787501 [X86] Merge the FR128 and VR128 regclass since they have identical spill and alignment characteristics.
This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid.

The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well.

llvm-svn: 337147
2018-07-16 06:56:09 +00:00
Chandler Carruth cdf0addc65 [x86/SLH] Teach speculative load hardening to correctly harden the
indices used by AVX2 and AVX-512 gather instructions.

The index vector is hardened by broadcasting the predicate state
into a vector register and then or-ing. We don't even have to worry
about EFLAGS here.

I've added a test for all of the gather intrinsics to make sure that we
don't miss one. A particularly interesting creation is the gather
prefetch, which needs to be marked as potentially "loading" to get the
correct behavior. It's a memory access in many ways, and is actually
relevant for SLH. Based on discussion with Craig in review, I've moved
it to be `mayLoad` and `mayStore` rather than generic side effects. This
matches how we model other prefetch instructions.

Many thanks to Craig for the review here.

Differential Revision: https://reviews.llvm.org/D49336

llvm-svn: 337144
2018-07-16 04:17:51 +00:00
Chen Zheng ccc8422464 [InstCombine] add more SPFofSPF folding
Differential Revision: https://reviews.llvm.org/D49238

llvm-svn: 337143
2018-07-16 02:23:00 +00:00
Chen Zheng b972273f98 [InstCombine] fold icmp pred (sub 0, X) C for vector type
Differential Revision: https://reviews.llvm.org/D49283

llvm-svn: 337141
2018-07-16 00:51:40 +00:00
Michael J. Spencer 7bb2767fba Recommit r335794 "Add support for generating a call graph profile from Branch Frequency Info." with fix for removed functions.
llvm-svn: 337140
2018-07-16 00:28:24 +00:00
Craig Topper cdb0ed2910 [X86] Add custom execution domain fixing for 128/256-bit integer logic operations with AVX512F, but not AVX512DQ.
AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions.

Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences.

This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used.

llvm-svn: 337137
2018-07-15 23:32:36 +00:00
Craig Topper ec0038398a [X86] Use 128-bit blends instead vmovss/vmovsd for 512-bit vzmovl patterns to match AVX.
llvm-svn: 337135
2018-07-15 18:51:08 +00:00
Craig Topper 8f34858779 [X86] Use 128-bit ops for 256-bit vzmovl patterns.
128-bit ops implicitly zero the upper bits. This should address the comment about domain crossing for the integer version without AVX2 since we can use a 128-bit VBLENDW without AVX2.

The only bad thing I see here is that we failed to reuse an vxorps in some of the tests, but I think that's already known issue.

llvm-svn: 337134
2018-07-15 18:51:07 +00:00
Sanjay Patel a41c886c55 [DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)
This is almost the same as an existing IR canonicalization in instcombine, 
so I'm assuming this is a good early generic DAG combine too.

The motivation comes from reduced bit-hacking for select-of-constants in IR 
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html

The PPC and AArch tests show that those targets are already doing something 
similar. x86 will be neutral in the minimal case and generally better when 
this pattern is extended with other ops as shown in the signbit-shift.ll tests.

Note the asymmetry: we don't include the (extend (ifneg X)) transform because 
it already exists in SimplifySelectCC(), and that is verified in the later 
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the 
general transform to use a shift is always a win because that's a single 
instruction.

Alive proofs:
https://rise4fun.com/Alive/ysli

Name: if pos, get -1
  %c = icmp sgt i16 %x, -1
  %r = sext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = ashr i16 %n, 15

Name: if pos, get 1
  %c = icmp sgt i16 %x, -1
  %r = zext i1 %c to i16
  =>
  %n = xor i16 %x, -1
  %r = lshr i16 %n, 15

Differential Revision: https://reviews.llvm.org/D48970

llvm-svn: 337130
2018-07-15 16:27:07 +00:00
Sanjay Patel fae8ed0104 [InstSimplify] add fixme comment for PR37776; NFC
llvm-svn: 337129
2018-07-15 16:13:58 +00:00
Sanjay Patel 810f51ec1b [AMDGPU] adjusted test checks because minnum with NaN gets simplified
This was improved with rL337127, but I missed the failure in this test.
I'm not sure what the expected result will be, so I've generalized it
and added a FIXME comment.

llvm-svn: 337128
2018-07-15 15:14:40 +00:00
Sanjay Patel 92d0c1c129 [InstSimplify] fold minnum/maxnum with NaN arg
This fold is repeated/misplaced in instcombine, but I'm
not sure if it's safe to remove that yet because some
other folds appear to be asserting that the transform
has occurred within instcombine itself.

This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776

We have another test to demonstrate the more general bug.

llvm-svn: 337127
2018-07-15 14:52:16 +00:00
Sanjay Patel ef71b704c2 [InstSimplify] add tests for minnum/maxnum; NFC
This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776

We have another test to demonstrate the more general
bug.

llvm-svn: 337126
2018-07-15 14:46:48 +00:00
Andrea Di Biagio f84b0a6914 [llvm-mca] Regenerate X86 specific tests. NFC
Not all tests were correctly updated by the update script after r336797.

llvm-svn: 337124
2018-07-15 11:43:11 +00:00
Andrea Di Biagio ff630c2cdc [llvm-mca][BtVer2] teach how to identify false dependencies on partially written
registers.

The goal of this patch is to improve the throughput analysis in llvm-mca for the
case where instructions perform partial register writes.

On x86, partial register writes are quite difficult to model, mainly because
different processors tend to implement different register merging schemes in
hardware.

When the code contains partial register writes, the IPC (instructions per
cycles) estimated by llvm-mca tends to diverge quite significantly from the
observed IPC (using perf).

Modern AMD processors (at least, from Bulldozer onwards) don't rename partial
registers. Quoting Agner Fog's microarchitecture.pdf:
" The processor always keeps the different parts of an integer register together.
For example, AL and AH are not treated as independent by the out-of-order
execution mechanism. An instruction that writes to part of a register will
therefore have a false dependence on any previous write to the same register or
any part of it."

This patch is a first important step towards improving the analysis of partial
register updates. It changes the semantic of RegisterFile descriptors in
tablegen, and teaches llvm-mca how to identify false dependences in the presence
of partial register writes (for more details: see the new code comments in
include/Target/TargetSchedule.h - class RegisterFile).

This patch doesn't address the case where a write to a part of a register is
followed by a read from the whole register.  On Intel chips, high8 registers
(AH/BH/CH/DH)) can be stored in separate physical registers. However, a later
(dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which
adds extra latency (and potentially affects the pipe usage).
This is a very interesting article on the subject with a very informative answer
from Peter Cordes:
https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to

In future, the definition of RegisterFile can be extended with extra information
that may be used to identify delays caused by merge opcodes triggered by a dirty
read of a partial write.

Differential Revision: https://reviews.llvm.org/D49196

llvm-svn: 337123
2018-07-15 11:01:38 +00:00
Craig Topper 60ce856134 [X86] Add some optsize patterns for 256-bit X86vzmovl.
These patterns use VMOVSS/SD. Without optsize we use BLENDI instead.

llvm-svn: 337119
2018-07-15 06:03:19 +00:00
Roman Lebedev b972fc3e8a [InstCombine] Fold x & (-1 >> y) s< x to x s> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337111
2018-07-14 20:08:47 +00:00
Roman Lebedev edba515baa [NFC][InstCombine] Tests for x & (-1 >> y) s< x to x s> (-1 >> y) fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337110
2018-07-14 20:08:42 +00:00
Roman Lebedev f14426101e [InstCombine] Fold x & (-1 >> y) s>= x to x s<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337109
2018-07-14 20:08:37 +00:00
Roman Lebedev 53a014e61d [NFC][InstCombine] Tests for x & (-1 >> y) s>= x to x s<= (-1 >> y) fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337108
2018-07-14 20:08:31 +00:00
Roman Lebedev 1e61e358a4 [InstCombine] Fold x s<= x & (-1 >> y) to x s<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337107
2018-07-14 20:08:26 +00:00
Roman Lebedev f1a351cf3a [NFC][InstCombine] Tests for x s<= x & (-1 >> y) to x s<= (-1 >> y) fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337106
2018-07-14 20:08:21 +00:00
Roman Lebedev 859e14aeaa [InstCombine] Fold x s> x & (-1 >> y) to x s> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337105
2018-07-14 20:08:16 +00:00
Roman Lebedev 62d050d2c4 [NFC][InstCombine] Tests for x s> x & (-1 >> y) to x s> (-1 >> y) fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337104
2018-07-14 20:08:09 +00:00
Roman Lebedev 0f5ec8921b [InstCombine] Fold x u<= x & C to x u<= C
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Fqp

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337102
2018-07-14 16:44:54 +00:00
Roman Lebedev 79ee3211ba [NFC][InstCombine] Tests for x u<= x & C to x u<= C fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Fqp

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337101
2018-07-14 16:44:48 +00:00
Roman Lebedev 74f611a1f5 [InstCombine] Fold x u> x & C to x u> C
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/JvS

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337100
2018-07-14 16:44:43 +00:00
Roman Lebedev 60ea5df7c2 [NFC][InstCombine] Tests for x u> x & C to x u> C fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/JvS

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337099
2018-07-14 16:44:37 +00:00
Roman Lebedev e3dc587ae0 [InstCombine] Fold x & (-1 >> y) u< x to x u> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/ocb

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337098
2018-07-14 12:20:16 +00:00
Roman Lebedev cad4e3d741 [NFC][InstCombine] Tests for x & (-1 >> y) u< x to x u> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/ocb

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337097
2018-07-14 12:20:11 +00:00
Roman Lebedev fac48474ce [InstCombine] Fold x & (-1 >> y) u>= x to x u<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/azI

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337096
2018-07-14 12:20:06 +00:00
Roman Lebedev 3591eb9296 [NFC][InstCombine] Tests for x & (-1 >> y) u>= x to x u<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/azI

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337095
2018-07-14 12:20:01 +00:00
Roman Lebedev ea2ffc3fcf [NFC][InstCombine] Add forgotten variable tests for foldICmpWithLowBitMaskedVal()
llvm-svn: 337094
2018-07-14 12:19:56 +00:00
Nico Weber 337e241d58 Attempt to get test/tools/llvm-lib/help.test passing on sanitizer-x86_64-linux-fast
The bot has a /b directory, so /? matches against that and gets expanded to it.

(Thanks to Hans's r187366, which solved the same problem for clang-cl a while
ago and which saved me much head scratching.)

llvm-svn: 337092
2018-07-14 11:33:33 +00:00
Francis Visoiu Mistrih f905bf1408 [MachineOutliner] Check the last instruction from the sequence when updating liveness
The MachineOutliner was doing an std::for_each from the call (inserted
before the outlined sequence) to the iterator at the end of the
sequence.

std::for_each needs the iterator past the end, so the last instruction
was not taken into account when propagating the liveness information.

This fixes the machine verifier issue in machine-outliner-disubprogram.ll.

Differential Revision: https://reviews.llvm.org/D49295

llvm-svn: 337090
2018-07-14 09:40:01 +00:00
Chandler Carruth fb503ac027 [x86/SLH] Fix an issue where we wouldn't harden any loads if we found
no conditions.

This is only valid to do if we're hardening calls and rets with LFENCE
which results in an LFENCE guarding the entire entry block for us.

llvm-svn: 337089
2018-07-14 09:32:37 +00:00
Craig Topper 7426cf6717 [X86] Fix a subtle bug in the custom execution domain fixing for blends.
The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted.

Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file.

llvm-svn: 337088
2018-07-14 06:30:30 +00:00
Nico Weber 17172c6b80 Give llvm-lib rudimentary help output.
https://reviews.llvm.org/D49318

llvm-svn: 337084
2018-07-14 02:29:44 +00:00
Craig Topper f0b164415c [X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing for size.
AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX.

This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that.

llvm-svn: 337083
2018-07-14 02:05:08 +00:00
Teresa Johnson b78c5d0602 Revert "[ThinLTO] Ensure we always select the same function copy to import"
This reverts commits r337050 and r337059. Caused failure in
reverse-iteration bot that needs more investigation.

llvm-svn: 337081
2018-07-14 01:45:49 +00:00
Teresa Johnson 8e739f98fe Revert "[ThinLTO] Add debug output to test"
This reverts commit r337076. Not needed any more.

llvm-svn: 337080
2018-07-14 01:34:06 +00:00
Evgeniy Stepanov 1971ba097d Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
This reverts commit r337021.

WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
    #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
    #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
    #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
    #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
    #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
    #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
    #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
    #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26

[...]

Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
    #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192

llvm-svn: 337079
2018-07-14 01:20:53 +00:00
Teresa Johnson 05d28c8002 [ThinLTO] Add debug output to test
Add -debug-only=function-import to get more information for debugging
reverse-iteration bot failure from r337050.

llvm-svn: 337076
2018-07-14 00:08:48 +00:00
Tim Shen a064622bd3 Re-apply "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)."
llvm-svn: 337075
2018-07-13 23:58:46 +00:00
Tim Shen e78bd2815e Add a CHECK line for r337072.
llvm-svn: 337074
2018-07-13 23:48:59 +00:00
Krzysztof Parzyszek 7ced04c0fd [Hexagon] Avoid introducing calls into coalesced range of HVX vector pairs
If an HVX vector register is to be coalesced into a vector pair, make
sure that the vector pair will not have a function call in its live range,
unless it already had one. All HVX vector registers are volatile, so
any vector register live across a function call will have to be spilled.

If a vector needs to be spilled, and it's coalesced into a vector pair
then the whole pair will need to be spilled (even if only a part of it is
live), taking extra stack space.

llvm-svn: 337073
2018-07-13 23:42:29 +00:00
Tim Shen 9e25d5d2ce [LSR] If no Use is interesting, early return.
Summary:
By looking at the callers of getUse(), we can see that even though
IVUsers may offer uses, but they may not be interesting to
LSR. It's possible that none of them is interesting.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D49049

llvm-svn: 337072
2018-07-13 23:40:00 +00:00
Teresa Johnson 4f8d704cd0 [ThinLTO] Require x86 target for new test
Should fix non-x86 bot failures for new test from r337050.

llvm-svn: 337059
2018-07-13 22:36:22 +00:00
Tom Stellard ac68471326 AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46172

llvm-svn: 337056
2018-07-13 22:16:03 +00:00
Craig Topper da424ba1c5 [X86][FastISel] Support uitofp with avx512.
llvm-svn: 337055
2018-07-13 22:09:30 +00:00
Eli Friedman 835297a951 [LTO] Fix linking with an alias defined using another alias.
When we're linking an alias which will be defined later, we neeed to
build a GlobalAlias, or else we'll crash later in
IRLinker::linkGlobalValueBody.

clang sometimes constructs aliases like this for C++ destructors.

Differential Revision: https://reviews.llvm.org/D49316

llvm-svn: 337053
2018-07-13 21:58:55 +00:00
Teresa Johnson d94c0594d9 [ThinLTO] Ensure we always select the same function copy to import
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.

Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).

There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.

I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.

(I tried a few other variations of the change but they also didn't
improve time or peak memory).

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D48670

llvm-svn: 337050
2018-07-13 21:35:51 +00:00
Tom Stellard 390a5f4774 AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.exp
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45882

llvm-svn: 337046
2018-07-13 21:05:14 +00:00
Craig Topper f0831eef0b [X86][FastISel] Add EVEX support to sitofp handling.
llvm-svn: 337045
2018-07-13 21:03:43 +00:00
Fangrui Song 90d9c201dc [X86] Try fixing r336768
llvm-svn: 337043
2018-07-13 20:54:24 +00:00
Roman Lebedev 4d22f15a2f [NFC][InstCombine] Tests for 'check for [no] signed truncation' pattern
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

llvm-svn: 337042
2018-07-13 20:33:34 +00:00
Vlad Tsyrklevich cd1559366d [LowerTypeTests] Limit when icall jumptable entries are emitted
Summary:
Currently LowerTypeTests emits jumptable entries for all live external
and address-taken functions; however, we could limit the number of
functions that we emit entries for significantly.

For Cross-DSO CFI, we continue to emit jumptable entries for all
exported definitions.  In the non-Cross-DSO CFI case, we only need to
emit jumptable entries for live functions that are address-taken in live
functions. This ignores exported functions and functions that are only
address taken in dead functions. This change uses ThinLTO summary data
(now emitted for all modules during ThinLTO builds) to determine
address-taken and liveness info.

The logic for emitting jumptable entries is more conservative in the
regular LTO case because we don't have summary data in the case of
monolithic LTO builds; however, once summaries are emitted for all LTO
builds we can unify the Thin/monolithic LTO logic to only use summaries
to determine the liveness of address taking functions.

This change is a partial fix for PR37474. It reduces the build size for
nacl_helper by ~2-3%, the reduction is due to nacl_helper compiling in
lots of unused code and unused functions that are address taken in dead
functions no longer being being considered live due to emitted jumptable
references. The reduction for chromium is ~0.1-0.2%.

Reviewers: pcc, eugenis, javed.absar

Reviewed By: pcc

Subscribers: aheejin, dexonsmith, dschuff, mehdi_amini, eraman, steven_wu, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D47652

llvm-svn: 337038
2018-07-13 19:57:39 +00:00
Jonas Devlieghere 327e7a1608 [dwarfdump] Add pretty printer for accelerator table based on Atom.
For instance, When dumping .apple_types, the second atom represents the
DW_TAG. In addition to printing the raw value, we now also pretty print
the value if the ATOM tells us how.

llvm-svn: 337026
2018-07-13 17:21:51 +00:00
Andrea Di Biagio e86e6efea1 [llvm-mca][BtVer2] Add tests for dependency breaking instructions.
llvm-svn: 337024
2018-07-13 16:46:51 +00:00
Matt Arsenault de95077780 AMDGPU: Fix handling of alignment padding in DAG argument lowering
This was completely broken if there was ever a struct argument, as
this information is thrown away during the argument analysis.

The offsets as passed in to LowerFormalArguments are not useful,
as they partially depend on the legalized result register type,
and they don't consider the alignment in the first place.

Ignore the Ins array, and instead figure out from the raw IR type
what we need to do. This seems to fix the padding computation
if the DAG lowering is forced (and stops breaking arguments
following padded arguments if the arguments were only partially
lowered in the IR)

llvm-svn: 337021
2018-07-13 16:40:25 +00:00
Evgeniy Stepanov ca8c2f7638 Revert "CallGraphSCCPass: iterate over all functions."
This reverts commit r336419: use-after-free on CallGraph::FunctionMap elements
due to the use of a stale iterator in CGPassManager::runOnModule.

The iterator may be invalidated if a pass removes a function, ex.:
  llvm::LegacyInlinerBase::inlineCalls
  inlineCallsImpl
  llvm::CallGraph::removeFunctionFromModule

llvm-svn: 337018
2018-07-13 16:32:31 +00:00
Roman Lebedev b64e74feed [NFC][X86][AArch64] Negative tests for 'check for [no] signed truncation' pattern
See D49247, D49266

I'm only adding the sane negative tests, and not
adding the one-use tests yet. Also, not adding
negative tests for the second pattern with inverted operands yet,
since it's handling will be added in later differential.

llvm-svn: 337014
2018-07-13 16:14:37 +00:00
Joel Galenson 667eac80da [cfi-verify] Only run AArch64 tests when it is a supported target
This stops the tests I added in r337007 from running when AArch64 is not a supported target.

llvm-svn: 337012
2018-07-13 16:09:19 +00:00
Jonas Devlieghere 8afd926077 [dwarfdump] Pretty print DW_AT_APPLE_runtime_class
Instead of printing

  DW_AT_APPLE_runtime_class       (0x10)

we now print

  DW_AT_APPLE_runtime_class       (DW_LANG_ObjC)

llvm-svn: 337011
2018-07-13 16:06:17 +00:00
Nemanja Ivanovic 080c35050e [PowerPC] Materialize more constants with CR-field set in late peephole
Revision r322373 fixed a bug in how we materialize constants when the CR-field
needs to be set.

However the fix is overly conservative. It will only do the transform if
AND-ing the input with the new constant produces the same new constant.
This is of course correct, but not necessarily required.

If there are no futher uses of the constant, the constant can be changed.
If there are no uses of the GPR result, the final result of the materialization
isn't important other than it needs to compare to zero correctly (lt, gt, eq).

Differential revision: https://reviews.llvm.org/D42109

llvm-svn: 337008
2018-07-13 15:21:03 +00:00
Joel Galenson 06e7e5798f [cfi-verify] Support AArch64.
This patch adds support for AArch64 to cfi-verify.

This required three changes to cfi-verify.  First, it generalizes checking if an instruction is a trap by adding a new isTrap flag to TableGen (and defining it for x86 and AArch64).  Second, the code that ensures that the operand register is not clobbered between the CFI check and the indirect call needs to allow a single dereference (in x86 this happens as part of the jump instruction).  Third, we needed to ensure that return instructions are not counted as indirect branches.  Technically, returns are indirect branches and can be covered by CFI, but LLVM's forward-edge CFI does not protect them, and x86 does not consider them, so we keep that behavior.

In addition, we had to improve AArch64's code to evaluate the branch target of a MCInst to handle calls where the destination is not the first operand (which it often is not).

Differential Revision: https://reviews.llvm.org/D48836

llvm-svn: 337007
2018-07-13 15:19:33 +00:00