Commit Graph

34334 Commits

Author SHA1 Message Date
Matthew Simpson 45dee06177 Add test case missing from r259357 (NFC)
llvm-svn: 259385
2016-02-01 19:09:24 +00:00
Ulrich Weigand 4a4d4ab7a4 [SystemZ] Fix wrong-code generation for certain always-false conditions
We've found another bug in the code generation logic conditions for a
certain class of always-false conditions, those of the form
   if ((a & 1) < 0)

These only reach the back end when compiling without optimization.

The bug was introduced by the choice of using TEST UNDER MASK
to implement a check for
   if ((a & MASK) < VAL)
as
   if ((a & MASK) == 0)

where VAL is less than the the lowest bit of MASK.  This is correct
in all cases except for VAL == 0, in which case the original
condition is always false, but the replacement isn't.

Fixed by excluding that particular case.

llvm-svn: 259381
2016-02-01 18:31:19 +00:00
Sanjay Patel cf57c5a4ee fix broken check lines
Without the colon, it doesn't mean anything!

llvm-svn: 259377
2016-02-01 17:46:18 +00:00
David Majnemer f8853ae7b3 [InstCombine] Don't transform (X+INT_MAX)>=(Y+INT_MAX) -> (X<=Y)
This miscompile came about because we tried to use a transform which was
only appropriate for xor operators when addition was present.

This fixes PR26407.

llvm-svn: 259375
2016-02-01 17:37:56 +00:00
Jun Bum Lim ca832660ae [ValueTracking] Improve isKnownNonZero for PHI of non-zero constants
It is clear that a PHI is a non-zero if all incoming values are non-zero constants.

llvm-svn: 259370
2016-02-01 17:03:07 +00:00
Sanjay Patel b695c5557c [InstCombine] simplify masked load intrinsics with all ones or zeros masks
A masked load with a zero mask means there's no load.
A masked load with an allOnes mask means it's a normal vector load.

Differential Revision: http://reviews.llvm.org/D16691

llvm-svn: 259369
2016-02-01 17:00:10 +00:00
Asaf Badouh 5a3a0231f4 [X86][AVX512VBMI] add encoding and intrinsics for Multishift
Differential Revision: http://reviews.llvm.org/D16399

llvm-svn: 259363
2016-02-01 15:48:21 +00:00
Vasileios Kalintiris a052037034 [mips] Split large test file into 3 smaller ones.
Remove the old select.ll file and use select-int.ll, select-flt.ll,
select-dbl.ll for testing selects on integers, floats & doubles respectivelly.

llvm-svn: 259361
2016-02-01 15:19:35 +00:00
Daniel Sanders f8bb23e509 [mips] Range check uimm16 and fix several bugs this revealed.
Summary:
The bugs were:
* teq and similar take 4-bit unsigned immediates on microMIPS.
* teqi and similar have side-effects like teq do.
* shll_s.w and shra_r.w take 5-bit unsigned immediates.
* The various DSP ext* instructions take a 5-bit immediate.
* repl.qh takes an 8-bit unsigned immediate.
* repl.ph takes a 10-bit unsigned immediate.
* rddsp/wrdsp take a 10-bit unsigned immediate.
* teqi and similar take signed 16-bit immediates (10-bit for microMIPS).
* Out-of-range immediate macros for or/xor take a simm32/simm64 depending
  on architecture. I'll fix the simm64 case properly when I reach simm32.

lui is a bit more lenient than GAS and accepts signed immediates in addition
to unsigned. This is because MipsMCExpr can produce signed values when
constant folding and it currently lacks a way of knowing it should fold to
an unsigned value.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15446

llvm-svn: 259360
2016-02-01 15:13:31 +00:00
Matthew Simpson c578d67407 Reapply commit r258404 with fix.
The previous patch caused PR26364. The fix is to ensure that we don't enter a
cycle when iterating over use-def chains.

llvm-svn: 259357
2016-02-01 13:38:29 +00:00
Igor Breger 56b039ea17 AVX512: fix mask handling for gather/scatter/prefetch intrinsics.
Differential Revision: http://reviews.llvm.org/D16755

llvm-svn: 259346
2016-02-01 09:57:15 +00:00
Simon Pilgrim 1358d86659 [X86][SSE] Find source of the inserted element of INSERTPS
Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle.

Differential Revision: http://reviews.llvm.org/D16652

llvm-svn: 259343
2016-02-01 08:59:30 +00:00
Igor Breger 6cc9115cec AVX512 : Fix SETCCE lowering for KNL 32 bit.
Differential Revision: http://reviews.llvm.org/D16752

llvm-svn: 259342
2016-02-01 07:56:09 +00:00
Frederic Riss 0314e1e1ef [dsymutil] Support scattered relocs.
Although it seems like clang will never emit scattered relocations in
the debug information (at least I couldn't find a way), we have too
support them for the benefit of other compilers.
As clang doesn't generate them, the included testcase was produced
from hacked up assembly.

llvm-svn: 259339
2016-02-01 03:44:22 +00:00
David Majnemer 784d4a455b Revert r258580 and r258581.
Those commits created an artificial edge from a cleanup to a synthesized
catchswitch in order to get the MSVC personality routine to execute
cleanups which don't cleanupret and are not wrapped by a catchswitch.

This worked well enough but is not a complete solution in situations
where there the cleanup infinite loops.

However, the real deal breaker behind this approach comes about from a
degenerate case where the cleanup is post-dominated by unreachable *and*
throws an exception.  This ends poorly because the catchswitch will
inadvertently catch the exception.

Because of this we should go back to our previous behavior of not
executing certain cleanups (identical behavior with the Itanium ABI
implementation in clang, GCC and ICC).

N.B. I think this could be salvaged by making the catchpad rethrow the
exception and properly transforming throwing calls in the cleanup into
invokes.

llvm-svn: 259338
2016-02-01 03:29:38 +00:00
Frederic Riss 96bfaf5fb2 [dsymutil] Fix FileCheck command.
Damn case-insensitive filesystem...

llvm-svn: 259319
2016-01-31 04:39:16 +00:00
Frederic Riss 6c8521ad32 [dsymutil] Fix handling of common symbols.
llvm-dsymutil was misinterpreting the value of common symbols as their
address when it actually contains their size. This didn't impact
llvm-dsymutil's ability to link the debug information for common symbols
because these are always found by name and not by address. Things could
however go wrong when the size of a common object matched the object
file address of another symbol. Depending on the link order of the symbols
the common object might incorrectly evict this other object from the
address to symbol mapping, and then link the evicted symbol with a wrong
binary address.

Use the new ability to have symbols without an object file address to fix
this.

llvm-svn: 259318
2016-01-31 04:29:34 +00:00
Derek Schuff c97ba939d1 [WebAssembly] Fix uses of FrameIndex as store values
Previously the code assumed all uses of FI on loads and stores were as
addresses. This checks whether the use is the address or a value and
handles the latter case as it does for non-memory instructions.

llvm-svn: 259306
2016-01-30 21:43:08 +00:00
JF Bastien fbc89d21dd WebAssembly: don't optimize frameindex store
The previous code was incorrect (can't getReg a frameindex). We could instead optimize it to reduce tree height, but I'm not sure that's worthwhile yet because we then try to eliminate the frameindex.

This patch also fixes frame index elimination for operations which may load or store: it used to assume the base was operand 2 and immediate offset operand 1. That's not true for stores, where they're 4 and 3.

llvm-svn: 259305
2016-01-30 14:11:26 +00:00
Gerolf Hoflehner 87ddb65fa6 [BasicAA] Fix for missing must alias (D16343)
llvm-svn: 259299
2016-01-30 05:52:53 +00:00
Matt Arsenault e013246462 AMDGPU: Fix emitting invalid workitem intrinsics for HSA
The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.

Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.

Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.

Start emitting errors for the intrinsics not handled with HSA.

llvm-svn: 259297
2016-01-30 05:19:45 +00:00
Matt Arsenault d0799df707 AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr
Only the dispatch.ptr intrinsic is supposed to be used now to get
the workgroup size, and the read.local.size intrinsics do not
work correctly.

llvm-svn: 259296
2016-01-30 05:10:59 +00:00
Matt Arsenault 56c079f393 InstCombine: fabs(x) * fabs(x) -> x * x
llvm-svn: 259295
2016-01-30 05:02:00 +00:00
Dan Gohman ed0f113885 [WebAssembly] Refine block placement to insert blocks between trees.
Refine the test for whether an instruction is in an expression tree so that
it detects when one tree ends and another begins, so we can place a block
at that point, rather than continuing to find the first instruction not in
a tree at all.

llvm-svn: 259294
2016-01-30 05:01:06 +00:00
Matt Arsenault 43976df0da AMDGPU: Add new amdgcn workitem intrinsics
These use the correct prefix and follow the HSA naming convention
rather than the config register option names.

llvm-svn: 259293
2016-01-30 04:25:19 +00:00
Justin Lebar ead59f4765 [CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.
Summary: Previously we'd just silently skip these.

Reviewers: tra, jholewinski

Subscribers: llvm-commits, jhen, echristo,

Differential Revision: http://reviews.llvm.org/D16739

llvm-svn: 259279
2016-01-30 01:07:38 +00:00
David Majnemer 8b68a6cabd [CodeView] Properly handle empty line tables
Don't crash when there are no appropriate line table entries for a given
function.

llvm-svn: 259277
2016-01-30 00:36:09 +00:00
Vedant Kumar 00dab22853 [Profiling] Add a -sparse mode to llvm-profdata merge
Add an option to llvm-profdata merge for writing out sparse indexed
profiles. These profiles omit InstrProfRecords for functions which are
never executed.

Differential Revision: http://reviews.llvm.org/D16727

llvm-svn: 259258
2016-01-29 22:54:45 +00:00
Fiona Glaser b417d464e6 Add LoopSimplifyCFG pass
Loop transformations can sometimes fail because the loop, while in
valid rotated LCSSA form, is not in a canonical CFG form. This is
an extremely simple pass that just merges obviously redundant
blocks, which can be used to fix some known failure cases. In the
future, it may be enhanced with more cases (and have code shared with
SimplifyCFG).

This allows us to run LoopSimplifyCFG -> LoopRotate -> LoopUnroll,
so that SimplifyCFG cleans up the loop before Rotate tries to run.

Not currently used in the pass manager, since this pass doesn't do
anything unless you can hook it up in an LPM with other loop passes.
It'll be added once Chandler cleans up things to allow this.

Tested in a custom pipeline out of tree to confirm it works in
practice (in addition to the included trivial test).

llvm-svn: 259256
2016-01-29 22:35:36 +00:00
Sanjay Patel 66fff73c76 [InstCombine] avoid an insertelement transformation that induces the opposite extractelement fold (PR26354)
We would infinite loop because we created a shufflevector that was wider than
needed and then failed to combine that with the insertelement. When subsequently
visiting the extractelement from that shuffle, we see that it's unnecessary,
delete it, and trigger another visit to the insertelement.

llvm-svn: 259236
2016-01-29 20:21:02 +00:00
David Majnemer 6fcbd7e909 [CodeView] Implement .cv_inline_linetable
This support is _very_ rudimentary, just enough to get some basic data
into the CodeView debug section.

Left to do is:
- Use the combined opcodes to save space.
- Do something about code offsets.

llvm-svn: 259230
2016-01-29 19:24:12 +00:00
Tim Northover c4093c3ced ARM: don't mangle DAG constant if it has more than one use
The basic optimisation was to convert (mul $LHS, $complex_constant) into
roughly "(shl (mul $LHS, $simple_constant), $simple_amt)" when it was expected
to be cheaper. The original logic checks that the mul only has one use (since
we're mangling $complex_constant), but when used in even more complex
addressing modes there may be an outer addition that can pick up the wrong
value too.

I *think* the ARM addressing-mode problem is actually unreachable at the
moment, but that depends on complex assessments of the profitability of
pre-increment addressing modes so I've put a real check in there instead of an
assertion.

llvm-svn: 259228
2016-01-29 19:18:46 +00:00
Derek Schuff 6ea637af35 [WebAssembly] Support frame pointer
Add support for frame pointer use in prolog/epilog.
Supports dynamic allocas but not yet over-aligned locals.
Target-independend CG generates SP updates, but we still need to write
back the SP value to memory when necessary.

llvm-svn: 259220
2016-01-29 18:37:49 +00:00
Ahmed Bougacha 0df98ff913 [X86] Add missing "CHECK" colon in r259065 test.
llvm-svn: 259219
2016-01-29 18:25:33 +00:00
Reid Kleckner f3b9ba4941 [codeview] Begin to add support for inlined call sites
Summary:
There are three parts to inlined call frames:
1. The inlinee line subsection
2. The inline site symbol record
3. The function ids referenced by both

This change starts by emitting function ids (3) for all subprograms and
emitting the base inline site symbol record (2). The actual line numbers
in (2) use an encoded format that will come next, along with the inlinee
line subsection.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16333

llvm-svn: 259217
2016-01-29 18:16:43 +00:00
Reid Kleckner 828883b86c [CodeView] Fix dumping the is_stmt bit from the line table
Bug pointed out by George Rimar.

llvm-svn: 259205
2016-01-29 16:39:04 +00:00
Sanjoy Das 69b4a41fed [RS4GC] Remove unnecessary redirections from tests; NFC
llvm-svn: 259204
2016-01-29 16:32:30 +00:00
Sanjoy Das f3a4ee7542 [RS4GC] Add some missing tests and CHECK: lines
I missed porting these in rL259129.

llvm-svn: 259203
2016-01-29 16:32:25 +00:00
Zoran Jovanovic d474ef3a3b [mips] Absolute value macro expansion
Author: obucina
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D16323

llvm-svn: 259202
2016-01-29 16:18:34 +00:00
Alexandros Lamprineas 8c26e7c647 [ARM] Emit trap instruction using .inst directive
The trap instruction is emitted as a data-in-text rather
than an instruction. This patch uses the .inst directive
for emitting trap.

Differential Revision: http://reviews.llvm.org/D16684

llvm-svn: 259182
2016-01-29 10:23:32 +00:00
Matt Arsenault 295875efda AMDGPU: Remove 24-bit intrinsics
The known bit matching code seems to work reasonably well,
so these shouldn't really be needed.

llvm-svn: 259180
2016-01-29 10:05:16 +00:00
Eric Christopher 5a2429e239 Since LI/LIS sign extend the constant passed into the instruction we should
check that the sign extended constant fits into 16-bits if we want a
zero extended value, otherwise go ahead and put it together piecemeal.

Fixes PR26356.

llvm-svn: 259177
2016-01-29 07:20:01 +00:00
Akira Hatanaka 4f472a8867 [llvm-bcanalyzer] Dump bitcode wrapper header
This patch enables llvm-bcanalyzer to print the bitcode wrapper header
if the file has one, which is needed to test the changes made in
r258627 (bitcode-wrapper-header-armv7m.ll is the test case for r258627).

Differential Revision: http://reviews.llvm.org/D16642

llvm-svn: 259162
2016-01-29 05:55:09 +00:00
David Majnemer f2bb710da5 [WinEH] Don't perform state stores in cleanups
Our cleanups do not support true lexical nesting of funclets which
obviates the need to perform state stores.

This fixes PR26361.

llvm-svn: 259161
2016-01-29 05:33:15 +00:00
David Majnemer b2416bd2a7 Revert "Reapply commit r258404 with fix"
This reverts commit r258929, it caused PR26364.

llvm-svn: 259148
2016-01-29 02:43:22 +00:00
Ahmed Bougacha 53010a0d5b [AArch64] Fix i64 nontemporal high-half extraction.
Since we only have pair - not single - nontemporal store instructions,
we have to extract the high part into a separate register to be able
to use them.

When the initial nontemporal codegen support was added, I wrote the
extract using the nonsensical UBFX [0,32[.
Use the correct LSR form instead.

llvm-svn: 259134
2016-01-29 01:08:41 +00:00
Reid Kleckner 2214ed8937 Reland "[CodeView] Use assembler directives for line tables"
This reverts commit r259126 and relands r259117.

This time with updated library dependencies.

llvm-svn: 259130
2016-01-29 00:49:42 +00:00
Sanjoy Das 0407108020 [RS4GC] Clamp UseDeoptBundles to true and update tests
The full diff for the test directory may be hard to read because of the
filename clash; so here's all that happened as far as the tests are
concerned:

```
cd test/Transforms/RewriteStatepointsForGC
git rm *ll
git mv deopt-bundles/* ./
rmdir deopt-bundles
find . -name '*.ll' | xargs gsed -i 's/-rs4gc-use-deopt-bundles //g'
```

llvm-svn: 259129
2016-01-29 00:28:57 +00:00
Reid Kleckner 00d9639c24 Revert "[CodeView] Use assembler directives for line tables"
This reverts commit r259117.

The LineInfo constructor is defined in the codeview library and we have
to link against it now. Doing that isn't trivial, so reverting for now.

llvm-svn: 259126
2016-01-29 00:13:28 +00:00
Sanjoy Das 877a101597 [RS4GC] Port three tests to the deopt bundles directory
two-invokes-one-landingpad.ll was only moved (and not "ported"), but
having everything in the `deopt-bundles` directory will make later
changes more obvious.

llvm-svn: 259125
2016-01-29 00:13:26 +00:00
Easwaran Raman 30a93c1848 Lower inlining threshold when the caller has minsize attribute.
When the caller has optsize attribute, we reduce the inlinining threshold
to OptSizeThreshold (=75) if it is not already lower than that. We don't do
the same for minsize and I suspect it was not intentional. This also addresses
a FIXME regarding checking optsize attribute explicitly instead of using the
right wrapper.

Differential Revision: http://reviews.llvm.org/D16493

llvm-svn: 259120
2016-01-28 23:44:41 +00:00
Reid Kleckner c62e379d22 [CodeView] Use assembler directives for line tables
Adds a new family of .cv_* directives to LLVM's variant of GAS syntax:

- .cv_file: Similar to DWARF .file directives

- .cv_loc: Similar to the DWARF .loc directive, but starts with a
  function id. CodeView line tables are emitted by function instead of
  by compilation unit, so we needed an extra field to communicate this.
  Rather than overloading the .loc direction further, we decided it was
  better to have our own directive.

- .cv_stringtable: Emits the codeview string table at the current
  position. Currently this just contains the filenames as
  null-terminated strings.

- .cv_filechecksums: Emits the file checksum table for all files used
  with .cv_file so far. There is currently no support for emitting
  actual checksums, just filenames.

This moves the line table emission code down into the assembler.  This
is in preparation for implementing the inlined call site line table
format. The inline line table format encoding algorithm requires knowing
the absolute code offsets, so it must run after the assembler has laid
out the code.

David Majnemer collaborated on this patch.

llvm-svn: 259117
2016-01-28 23:31:52 +00:00
Lang Hames 2d8a2aa60a [RuntimeDyld][MachO] Fix handling of empty eh-frame sections.
This patch switches from an unguarded to a guarded loop for eh-frame record
fixups. In the unguarded version we would always make at least one call to
processFDE, which would then crash trying to fix up a frame that didn't exist.

Fixes <rdar://problem/24301582>

llvm-svn: 259103
2016-01-28 22:35:48 +00:00
Simon Pilgrim 04241cef2c [X86][AVX] Added more thorough 256-bit vector consecutive load tests.
llvm-svn: 259100
2016-01-28 22:29:51 +00:00
Sanjoy Das 6f9d6b6ef8 [RS4GC] Change opt %s to opt < %s; NFC
This is as per http://llvm.org/docs/TestingGuide.html#fragile-tests.  I
didn't touch the tests outside deopt-bundles/ since they'll be gone
soon.

llvm-svn: 259097
2016-01-28 21:51:21 +00:00
Sanjoy Das f7302c8baf [PlaceSafepoints] Clamp NoStatepoints to true
This change permanently clamps -spp-no-statepoints to true (the code
deletion will come later).  Tests that specifically tested
PlaceSafepoint's ability to wrap calls in gc.statepoint have been moved
to RS4GC's test suite.

llvm-svn: 259096
2016-01-28 21:51:14 +00:00
Matt Arsenault 5b39b34ca5 AMDGPU: Match fmed3 patterns with legacy fmin/fmax
llvm-svn: 259090
2016-01-28 20:53:48 +00:00
Matt Arsenault f639c32739 AMDGPU: Match some med3 patterns
llvm-svn: 259089
2016-01-28 20:53:42 +00:00
Matt Arsenault 7293f9895e AMDGPU: Set DX10Clamp bit
llvm-svn: 259088
2016-01-28 20:53:35 +00:00
Sanjay Patel 8123f9195c add masked intrinsic tests to show missed opportunities
llvm-svn: 259083
2016-01-28 19:54:20 +00:00
Sergei Larin 427f570ce1 [SplitModule] In split module utility we should never separate alias with its aliasee.
Summary: When splitting module with preserving locals, we currently do not handle case of global alias being separated with its aliasee.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16585

llvm-svn: 259075
2016-01-28 18:59:28 +00:00
David Majnemer bff6b581e2 Address buildbot fallout from r259065
llvm-svn: 259074
2016-01-28 18:59:04 +00:00
David Majnemer 4543ff09a2 [X86] Don't transform X << 1 to X + X during type legalization
While legalizing a 64-bit shift left by 1, the following occurs:

We split the shift operand in half: a high half and a low half.
We then create an ADDC with the low half and a ADDE with the high half +
the carry bit from the ADDC.

This is problematic if X is any_ext'd because the high half computation
is now undef + undef + carry bit and there is no way to ensure that the
two undef values had the same bitwise representation.  This results in
the lowest bit in the high half turning into garbage.

Instead, do not try to turn shifts into arithmetic during type
legalization.

This fixes PR26350.

llvm-svn: 259065
2016-01-28 18:20:05 +00:00
Sanjoy Das 2321a4cd71 [PlaceSafepoints] Clean up tests; NFC
Use `opt < %s` instead of `opt %s` as specified in
http://llvm.org/docs/TestingGuide.html#fragile-tests.

llvm-svn: 259062
2016-01-28 18:01:03 +00:00
Tom Stellard 3d2c852958 AMDGPU: waitcnt operand fixes
Summary:
Allow lgkmcnt up to 0xF (hardware allows that).
Fix mask for ExpCnt in AMDGPUInstPrinter.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16314

Patch by: Nikolay Haustov

llvm-svn: 259059
2016-01-28 17:13:44 +00:00
Sanjoy Das 52e67e7611 [PlaceSafepoints] Minor test cleanup; NFC
There is no need to place quotes around some_call and
personality_function.

llvm-svn: 259055
2016-01-28 16:11:27 +00:00
Sanjoy Das 7a2e2bed67 [LICM] Keep metadata on control equivalent hoists
Summary:
If the instruction we're hoisting out of a loop into its preheader is
guaranteed to have executed in the loop, then the metadata associated
with the instruction (e.g. !range or !dereferenceable) is valid in the
preheader.  This is because once we're in the preheader, we know we're
eventually going to reach the location the metadata was valid at.

This change makes LICM smarter around this, and helps it recognize cases
like these:

```
  do {
    int a = *ptr; !range !0
    ...
  } while (i++ < N);
```

to

```
  int a = *ptr; !range !0
  do {
    ...
  } while (i++ < N);
```

Earlier we'd drop the `!range` metadata after hoisting the load from
`ptr`.

Reviewers: igor-laevsky

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16669

llvm-svn: 259053
2016-01-28 15:51:58 +00:00
Oliver Stannard 02fa1c80c4 Revert r259035, it introduces a cyclic library dependency
llvm-svn: 259045
2016-01-28 13:19:47 +00:00
Igor Breger fca0a34398 AVX512: Fix truncate v32i8 to v32i1 lowering implementation.
Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions.

Differential Revision: http://reviews.llvm.org/D16531

llvm-svn: 259044
2016-01-28 13:19:25 +00:00
Zoran Jovanovic 838eabcd46 [mips][microMIPS] Disable FastISel for microMIPS
Author: milena.vujosevic.janicic
Reviewers: dsanders

FastIsel is not supported for microMIPS, thus it needs to be disabled. 
Test micromips-zero-mat-uses.ll is deleted since the tested sequence of instructions is not generated for microMIPS without FastISel.
Differential Revision: http://reviews.llvm.org/D15892

llvm-svn: 259039
2016-01-28 11:08:03 +00:00
Oliver Stannard b4b092ea1b Add backend dignostic printer for unsupported features
Re-commit of r258951 after fixing layering violation.

The related LLVM patch adds a backend diagnostic type for reporting
unsupported features, this adds a printer for them to clang.

In the case where debug location information is not available, I've
changed the printer to report the location as the first line of the
function, rather than the closing brace, as the latter does not give the
user any information. This also affects optimisation remarks.

Differential Revision: http://reviews.llvm.org/D16590

llvm-svn: 259035
2016-01-28 10:07:27 +00:00
Asaf Badouh 42852d99e7 [X86][AVX512] small fix in ptestm intrinsics
move ptestm{q|d} intrinsics from patterns form (in td file) to the intrinsics table

Differential Revision: http://reviews.llvm.org/D16633

llvm-svn: 259029
2016-01-28 08:33:22 +00:00
Junmo Park b3327b7007 [DAGCombiner] Don't add volatile or indexed stores to ChainedStores
Summary:
findBetterNeighborChains does not handle volatile or indexed stores.
However, it did not check when adding stores to ChainedStores.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D16463

llvm-svn: 259024
2016-01-28 06:23:33 +00:00
NAKAMURA Takumi 628a7a0aef Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported features"
It broke layering violation in LLVMIR.

clang r258950 "Add backend dignostic printer for unsupported features"
llvm  r258951 "Refactor backend diagnostics for unsupported features"

llvm-svn: 259016
2016-01-28 04:41:32 +00:00
Dan Gohman fbfe5ec4a4 [WebAssembly] Don't stackify a register def past a get_local use in the same tree.
llvm-svn: 259013
2016-01-28 03:59:09 +00:00
Dan Gohman adf28177eb [WebAssembly] Enhanced register stackification
This patch revamps the RegStackifier pass with a new tree traversal mechanism,
enabling three major new features:

 - Stackification of values with multiple uses, using the result value of set_local
 - More aggressive stackification of instructions with side effects
 - Reordering operands in commutative instructions to enable more stackification.

llvm-svn: 259009
2016-01-28 01:22:44 +00:00
Evgeniy Stepanov e257f0f671 Tweak unnamed label syntax in textual IR for easier matching in tests.
Change the unnamed label comments like
  ; <label>:8  ; preds = %1
to
  ; <label>:8:  ; preds = %1

This way lit tests can match [[LABEL]]: in both asserts and no-asserts builds.

llvm-svn: 258993
2016-01-27 21:53:08 +00:00
Derek Schuff 4dd6778660 [WebAssembly] Implement byval arguments
Summary:
Just does the simple allocation of a stack object and passes
a pointer to the callee.

Differential Revision: http://reviews.llvm.org/D16610

llvm-svn: 258989
2016-01-27 21:17:39 +00:00
Davide Italiano 905b8627bb [llvm-nm] Remove redundant check for file validity.
We already perform it at the beginning of the function so we can't
arrive here with an invalid object. Also, add a test so that bugs
won't sneak in the future.

llvm-svn: 258982
2016-01-27 20:27:44 +00:00
Tim Northover 042a6c1fe1 ARMv7k: base ABI decision on v7k Arch rather than watchos OS.
Various bits we want to use the new ABI actually compile with "-arch armv7k
-miphoneos-version-min=9.0". Not ideal, but also not ridiculous given how
slices work.

llvm-svn: 258975
2016-01-27 19:32:29 +00:00
Sanjay Patel 5264cc772c [SimplifyCFG] limit recursion depth when speculating instructions (PR26308)
This is a fix for:
https://llvm.org/bugs/show_bug.cgi?id=26308

With the switch to using the TTI cost model in:
http://reviews.llvm.org/rL228826
...it became possible to hit a zero-cost cycle of instructions (gep -> phi -> gep...), 
so we need a cap for the recursion in DominatesMergePoint().

A recursion depth parameter was already added for a different reason in:
http://reviews.llvm.org/rL255660
...so we can just set a limit for it.

I pulled "10" out of the air and made it an independent parameter that we can play with.
It might be higher than it needs to be given the currently low default value of 
PHINodeFoldingThreshold (2). That's the starting cost value that we enter the recursion
with, and most instructions have cost set to TCC_Basic (1), so I don't think we're going
to speculate more than 2 instructions with the current parameters.

As noted in the review and the TODO comment, we can do better than just limiting recursion
depth.

Differential Revision: http://reviews.llvm.org/D16637

llvm-svn: 258971
2016-01-27 19:22:45 +00:00
John McCall 3fe604f89f Add support for objc_unsafeClaimAutoreleasedReturnValue to the
ObjC ARC Optimizer.

The main implication of this is:

1. Ensuring that we treat it conservatively in terms of optimization.
2. We put the ASM marker on it so that the runtime can recognize
objc_unsafeClaimAutoreleasedReturnValue from releaseRV.

<rdar://problem/21567064>

Patch by Michael Gottesman!

llvm-svn: 258970
2016-01-27 19:05:08 +00:00
Oliver Stannard 1e67a9f196 Refactor backend diagnostics for unsupported features
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

The implementation of DiagnosticInfoUnsupported::print must be in
lib/Codegen rather than in the existing file in lib/IR/ to avoid
introducing a dependency from IR to CodeGen.

Differential Revision: http://reviews.llvm.org/D16590

llvm-svn: 258951
2016-01-27 17:30:33 +00:00
Simon Pilgrim a73a0cfaf4 [X86][SSE] Test insertps instrinsic calls with masks that can't combine to something simpler
For these basic tests of the intrinsic, make sure the mask can't simplify to movss, blend-with-zero or something else

llvm-svn: 258941
2016-01-27 16:51:57 +00:00
Igor Laevsky ff291b5833 [DebugInfo] Support zero-length CIE in the _eh_frame parser
MCJIT emits zero-length CIE at the end of the _eh_frame section. This change
ensures that parser inside DebugInfo will not crash and correctly record such cases.
We are now recording DW_EH_PE_omit as a default value for FDE and LSDA encodings.
Also Offset != EndAugmentationOffset assertion check will only happen if augmentation 
string had 'z' letter in it.

Differential Revision: http://reviews.llvm.org/D16588

llvm-svn: 258931
2016-01-27 14:05:35 +00:00
Matthew Simpson b95861d35e Reapply commit r258404 with fix
This patch is the second attempt to reapply commit r258404. There was bug in
the initial patch and subsequent fix (mentioned below).

The initial patch caused an assertion because we were computing smaller type
sizes for instructions that cannot be demoted. The fix first determines the
instructions that will be demoted, and then applies the smaller type size to
only those instructions.

This should fix PR26239 and PR26307.

llvm-svn: 258929
2016-01-27 13:43:27 +00:00
Benjamin Kramer d477e9e378 Revert "Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed."
and "Add a missing test case for r258847."

This reverts commit r258847, r258848. Causes miscompilations and backend
errors.

llvm-svn: 258927
2016-01-27 12:44:12 +00:00
Sjoerd Meijer 0140f45964 Add missing build attribute regression tests for Cortex-A8
Differential Revision: http://reviews.llvm.org/D16576

llvm-svn: 258923
2016-01-27 11:34:51 +00:00
Marek Olsak e86f252209 AMDGPU/SI: Stoney has only 16 LDS banks
Summary:
This is a candidate for stable, along with all patches that add the "stoney"
processor.

Reviewers: tstellarAMD

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16485

llvm-svn: 258922
2016-01-27 11:19:45 +00:00
Igor Breger b1bd47ca1a AVX512: Fix vpmovzxbw predicate for AVX1/2 instructions.
Differential Revision: http://reviews.llvm.org/D16595

llvm-svn: 258915
2016-01-27 08:57:46 +00:00
Igor Breger d6c187b038 AVX512: Add store mask patterns.
Differential Revision: http://reviews.llvm.org/D16596

llvm-svn: 258914
2016-01-27 08:43:25 +00:00
Chen Li 5cde8389cf [IndVarSimplify] Rewrite loop exit values with their initial values from loop preheader
Summary:
This is a revised version of D13974, and the following quoted summary are from D13974

"This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with
its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch."

D13974 was committed but failed one lnt test. The bug was that we only checked the condition from loop exit's incoming block was a loop invariant. But there could be another condition from loop header to that incoming block not being a loop invariant. This would produce miscompiled code.

This patch fixes the issue by checking if the incoming block is loop header, and if not, don't perform the rewrite. The could be further improved by recursively checking all conditions leading to loop exit block, but I'd like to check in this simple version first and improve it with future patches.     

Reviewers: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16570

llvm-svn: 258912
2016-01-27 07:40:41 +00:00
David Majnemer fccf5c6e01 Revert "Revert "[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)""
This reverts commit r258903 which reverted r255660.  r258903 was an
accidental commit and should not have been committed.

llvm-svn: 258905
2016-01-27 02:59:41 +00:00
David Majnemer c761afd1d1 [SimplifyCFG] Don't mistake icmp of and for a tree of comparisons
SimplifyCFG tries to turn complex branch conditions into a switch.
Some of it's logic attempts to reason about bitwise arithmetic produced
by InstCombine.  InstCombine can turn things like (X == 2) || (X == 3)
into (X & 1) == 2 and so SimplifyCFG tries to detect when this occurs so
that it can produce a switch instruction.

However, the legality checking was not sufficient to determine whether
or not this had occured.  Correctly check this case by requiring that
the right-hand side of the comparison be a power of two.

This fixes PR26323.

llvm-svn: 258904
2016-01-27 02:43:28 +00:00
David Majnemer 47de2140f7 Revert "[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)"
This reverts commit r255660.

llvm-svn: 258903
2016-01-27 02:43:22 +00:00
Matt Arsenault b22828f2fb AMDGPU: Fix default device handling
When no device name is specified, default to kaveri
for HSA since SI is not supported and it woud fail.

Default to "tahiti" instead of "SI" since these are
effectively the same, and tahiti is an actual device.

Move default device handling to the TargetMachine
rather than the AMDGPUSubtarget. The module ISA version
is computed from the device name provided with the target
machine, so the attributes printed by the AsmPrinter were
inconsistent with those computed in the subtarget.

Also remove DevName field from subtarget since it's redundant
with getCPU() in the superclass.

llvm-svn: 258901
2016-01-27 02:17:49 +00:00
Dan Gohman ece881d518 [WebAssembly] Add a test for the mem-intrinsic code in WebAssemblyPeephole.cpp
llvm-svn: 258895
2016-01-27 01:37:52 +00:00
Kevin Enderby 87c85b7e23 Fix identify_magic() to check that a file that starts with MH_MAGIC is
at least as big as the mach header to be identified as a Mach-O file and
make sure smaller files are not identified as a Mach-O files but as
unknown files. Also fix identify_magic() so it looks at all 4 bytes of
the filetype field when determining the type of the Mach-O file.
Then fix the macho-invalid-header test case to check that it is an
unknown file and make sure it does not get the error for
object_error::parse_failed.  And also update the unit tests.

llvm-svn: 258883
2016-01-26 23:43:37 +00:00
Reid Kleckner 1c93b4cd7b [llvm-tblgen] Stop emitting the intrinsic name matching code
The AMDGPU backend was the last user of the old StringMatcher
recognition code. Move it over to the new lookupLLVMIntrinsicName
funciton, which is now improved to handle all of the interesting edge
cases exposed by AMDGPU intrinsic names.

llvm-svn: 258875
2016-01-26 23:01:21 +00:00
Derek Schuff 90d9e8d370 [WebAssembly] Omit no-op adds for non-mem uses of FrameIndex
Differential Revision: http://reviews.llvm.org/D16554

llvm-svn: 258872
2016-01-26 22:47:43 +00:00
Simon Pilgrim 12e71db856 [X86][SSE] Added 8i8 to 8i64 sext/zext tests
llvm-svn: 258868
2016-01-26 22:19:22 +00:00
Simon Pilgrim 00adc1e105 [X86] Add support for zeroed shuffle elements to getShuffleScalarElt
Enable handling of SM_SentinelZero shuffle elements to getShuffleScalarElt. Improves VZEXT_LOAD matches in EltsFromConsecutiveLoads.

llvm-svn: 258865
2016-01-26 21:39:25 +00:00
Chris Bieneman e49730d4ba Remove autoconf support
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html

"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi

Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark

Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16471

llvm-svn: 258861
2016-01-26 21:29:08 +00:00
JF Bastien 1a6c7608b1 WebAssembly: don't optimize memcpy/memmove/memcpy to frame index
r258781 optimized memcpy/memmove/memcpy so the intrinsic call can return its first argument, but missed the frame index case. Teach it to ignore that case so C code doesn't assert out in these cases.

llvm-svn: 258851
2016-01-26 20:22:42 +00:00
Cong Hou 26d04ef9d9 Add a missing test case for r258847.
llvm-svn: 258848
2016-01-26 20:09:38 +00:00
Cong Hou 551a57f797 Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:

jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...

AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:

jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...

However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.

Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.

In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.


Differential Revision: http://reviews.llvm.org/D11393

llvm-svn: 258847
2016-01-26 20:08:01 +00:00
Hemant Kulkarni ab4a46fa1c [llvm-readobj] Add -elf-section-groups option
Adds a way to inspect SHT_GROUP sections in ELF objects.
Displays signature, member sections of these sections.

Differential revision: http://reviews.llvm.org/D16555

llvm-svn: 258845
2016-01-26 19:46:39 +00:00
Aditya Nandakumar 3d0c46d489 Reassociate: Reprocess RedoInsts after each inst
Previously the RedoInsts was processed at the end of the block.
However it was possible that it left behind some instructions that
were not canonicalized.
This should guarantee that any previous instruction in the basic
block is canonicalized before we process a new instruction.

llvm-svn: 258830
2016-01-26 18:42:36 +00:00
Sanjay Patel 10e82d1ddb [x86, AVX] tighten checks
llvm-svn: 258828
2016-01-26 18:22:50 +00:00
Kevin Enderby 40fdbf87d2 Update the comments for the macho-invalid-zero-ncmds test and fix
llvm-objdump when printing the Mach Header to print the unknown
cputype and cpusubtype fields as decimal instead of not printing
them at all.  And change the test to check for that.

llvm-svn: 258826
2016-01-26 18:20:49 +00:00
Sanjay Patel 980b280f50 [LibCallSimplifier] fold memset(malloc(x), 0, x) --> calloc(1, x)
This is a step towards solving PR25892:
https://llvm.org/bugs/show_bug.cgi?id=25892

It won't handle the reported case. As noted by the 'TODO' comments in the patch, 
we need to relax the hasOneUse() constraint and also match patterns that include
memset_chk() and the llvm.memset() intrinsic in addition to memset().

Differential Revision: http://reviews.llvm.org/D16337

llvm-svn: 258816
2016-01-26 16:17:24 +00:00
Matthew Simpson 61d5a18469 Revert "Reapply commit r258404 with fix"
This commit exposes a crash in computeKnownBits on the Chromium buildbots.
Reverting to investigate.

Reference: https://llvm.org/bugs/show_bug.cgi?id=26307
llvm-svn: 258812
2016-01-26 15:45:49 +00:00
Igor Laevsky 03a670c0ec Re-submit r256008 "Improve DWARFDebugFrame::parse to also handle __eh_frame."
Originally this change was causing failures on windows buildbots.
But those problems were fixed in r258806.

llvm-svn: 258811
2016-01-26 15:09:42 +00:00
Simon Pilgrim 46696ef93c [X86][SSE] Add zero element and general 64-bit VZEXT_LOAD support to EltsFromConsecutiveLoads
This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load).

It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads.

After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion.

Differential Revision: http://reviews.llvm.org/D16217

llvm-svn: 258798
2016-01-26 09:30:08 +00:00
Matt Arsenault c5f6152911 AMDGPU: Make v32i8/v64i8 illegal types
Old intrinsics were forcing these, but they have now all
been removed. This fixes large i8 vector operations generally
being broken.

llvm-svn: 258788
2016-01-26 04:43:48 +00:00
Matt Arsenault 018179fc46 AMDGPU: Remove old sample intrinsics
I did my best to try to update all the uses in tests that
just happened to use the old ones to the newer intrinsics.

I'm not sure I got all of the immediate operand conversions
correct, since the value seems to have been ignored by the
old pattern but I don't think it really matters.

llvm-svn: 258787
2016-01-26 04:38:08 +00:00
Matt Arsenault 051d6f9fde AMDGPU: Add new amdgcn intrinsics for cube instructions
More cleanup to try to get all intrinsics using the correct
amdgcn prefix that are as close to the instruction as possible.

llvm-svn: 258786
2016-01-26 04:29:56 +00:00
Matt Arsenault 9a10cea7fb AMDGPU: Implement read_register and write_register intrinsics
Some of the special intrinsics now that now correspond to a instruction
also have special setting of some registers, e.g. llvm.SI.sendmsg sets
m0 as well as use s_sendmsg. Using these explicit register intrinsics
may be a better option.

Reading the exec mask and others may be useful for debugging. For this
I'm not sure this is entirely correct because we would want this to
be convergent, although it's possible this is already treated
sufficently conservatively.

llvm-svn: 258785
2016-01-26 04:29:24 +00:00
Matt Arsenault 0c3e2338fe AMDGPU: Restore AMDGPU prefixed rsq intrinsic for now
Also move into backend intrinsics to discourage use of the old name.

llvm-svn: 258783
2016-01-26 04:14:16 +00:00
Dan Gohman bdf08d5da6 [WebAssembly] Optimize memcpy/memmove/memcpy calls.
These calls return their first argument, but because LLVM uses an intrinsic
with a void return type, they can't use the returned attribute. Generalize
the store results pass to optimize these calls too.

llvm-svn: 258781
2016-01-26 04:01:11 +00:00
Dan Gohman bb3722430f [WebAssembly] Implement unaligned loads and stores.
Differential Revision: http://reviews.llvm.org/D16534

llvm-svn: 258779
2016-01-26 03:39:31 +00:00
Haicheng Wu f1c00a22be [LIR] Add support for structs and hand unrolled loops
This is a recommit of r258620 which causes PR26293.

The original message:

Now LIR can turn following codes into memset:

typedef struct foo {
  int a;
  int b;
} foo_t;

void bar(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; ++i) {
    f[i].a = 0;
    f[i].b = 0;
  }
}

void test(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; i += 2) {
    f[i] = 0;
    f[i+1] = 0;
  }
}

llvm-svn: 258777
2016-01-26 02:27:47 +00:00
Dan Gohman 75452734e4 Followup to 258750; update more tests to use .p2align .
llvm-svn: 258755
2016-01-26 00:35:07 +00:00
Dan Gohman 14d8436c66 Followup to 258750; update all MC tests to use .p2align .
llvm-svn: 258754
2016-01-26 00:27:59 +00:00
Dan Gohman 968b090552 Followup to 258750; update this test to use .p2align .
llvm-svn: 258752
2016-01-26 00:17:24 +00:00
Dan Gohman 61d15ae4f5 [MC] Use .p2align instead of .align
For historic reasons, the behavior of .align differs between targets.
Fortunately, there are alternatives, .p2align and .balign, which make the
interpretation of the parameter explicit, and which behave consistently across
targets.

This patch teaches MC to use .p2align instead of .align, so that people reading
code for multiple architectures don't have to remember which way each platform
does its .align directive.

Differential Revision: http://reviews.llvm.org/D16549

llvm-svn: 258750
2016-01-26 00:03:25 +00:00
Evgeniy Stepanov fbc3da577c [cfi] Cross-DSO CFI diagnostic mode (LLVM part).
* __cfi_check gets a 3rd argument: ubsan handler data
* Instead of trapping on failure, call __cfi_check_fail which must be
  present in the module (generated in the frontend).

llvm-svn: 258746
2016-01-25 23:35:03 +00:00
Matthias Braun 4e67e5c91a X86ISelLowering: Fix cmov(cmov) special lowering bug
There's a special case in EmitLoweredSelect() that produces an improved
lowering for cmov(cmov) patterns. However this special lowering is
currently broken if the inner cmov has multiple users so this patch
stops using it in this case.

If you wonder why this wasn't fixed by continuing to use the special
lowering and inserting a 2nd PHI for the inner cmov: I believe this
would incur additional copies/register pressure so the special lowering
does not improve upon the normal one anymore in this case.

This fixes http://llvm.org/PR26256 (= rdar://24329747)

llvm-svn: 258729
2016-01-25 22:08:25 +00:00
Teresa Johnson 71d12d2a4e [ThinLTO] Find all needed metadata when linking metadata as postpass
For metadata postpass linking, after importing all functions, we need
to recursively walk through any nodes reached via imported functions to
locate needed subprogram metadata. Some might only be reached indirectly
via the variable list for an inlined function.

llvm-svn: 258728
2016-01-25 22:04:56 +00:00
Simon Pilgrim d1d118097d [X86][AVX] Add commutation support for VPERM2X128 instructions
Its main use is to allow memory folding of the 1st operand

Differential Revision: http://reviews.llvm.org/D16521

llvm-svn: 258726
2016-01-25 21:51:34 +00:00
Teresa Johnson f07db00a65 [ThinLTO] Handle DISubprogram reached indirectly from DIImportedEntity
Extend fix for PR26037 to identify DISubprogram reached from a
DIImportedEntity via a DILexicalBlock.

llvm-svn: 258722
2016-01-25 21:29:55 +00:00
Lawrence Hu d3d51061fb Enable loopreroll to rerool loop with pointer induction variable.
Example:

while (buf !=end ) {
   S += buf[0];
   S += buf[1];
   buf +=2;
};

Differential Revision: http://reviews.llvm.org/D13151

llvm-svn: 258709
2016-01-25 19:43:45 +00:00
Lawrence Hu b917cd9fa6 Undo commit 258700 due to missing commit message
llvm-svn: 258708
2016-01-25 19:36:30 +00:00
Matthew Simpson cfe5e2c846 Reapply commit r25804 with fix
We were hitting an assertion because we were computing smaller type sizes for
instructions that cannot be demoted. The fix first determines the instructions
that will be demoted, and then applies the smaller type size to only those
instructions.

This should fix PR26239.

llvm-svn: 258705
2016-01-25 19:24:29 +00:00
Quentin Colombet a392810bea Speculatively revert r258620 as it is the likely culprid of PR26293.
llvm-svn: 258703
2016-01-25 19:12:49 +00:00
Lawrence Hu 84b6195e41 Differential Revision: http://reviews.llvm.org/D13151
llvm-svn: 258700
2016-01-25 18:53:39 +00:00
Dan Gohman 899cb5ab7b [WebAssembly] Fix unbalanced register stack code in the case of late DCE.
Instructions can be DCE'd after the RegStackify pass. If the instruction which
would be the pop for what would be a push is removed, don't use a push.

llvm-svn: 258694
2016-01-25 16:48:44 +00:00
Dan Gohman 619db96d5e [WebAssembly] Add tests for negative offsets with global variable addresses.
llvm-svn: 258693
2016-01-25 15:19:39 +00:00
Dan Gohman 5016c0f99d [SelectionDAG] Use the correct return type for memcpy, memmove, and memset.
When generating calls to memcpy, memmove, and memset, use void* as the return
type rather than void, to match the standard signatures for these functions.

This has no practical effect for most targets, since the return values of
these calls aren't being used anyway, and most calling conventions tolerate
this kind of mismatch. However, this change will help support future
optimizations to utilize the return value to avoid holding the argument
value live across a call.

llvm-svn: 258691
2016-01-25 15:05:56 +00:00
James Molloy 121de0bcfa [DemandedBits] Fix computation of demanded bits for ICmps
The computation of ICmp demanded bits is independent of the individual operand being evaluated. We simply return a mask consisting of the minimum leading zeroes of both operands.

We were incorrectly passing "I" to ComputeKnownBits - this should be "UserI->getOperand(0)". In cases where we were evaluating the 1th operand, we were taking the minimum leading zeroes of it and itself.

This should fix PR26266.

llvm-svn: 258690
2016-01-25 14:49:36 +00:00
Michael Zuckerman 1bd7f993fc [AVX512] Adding PTESTNMB/D/W/Q instruction
Differential Revision: http://reviews.llvm.org/D16520

llvm-svn: 258688
2016-01-25 14:43:23 +00:00
Michael Zuckerman 19670d479a [AVX512] Adding PTESTMB/W/D/Q instruction
Differential Revision: http://reviews.llvm.org/D16519

llvm-svn: 258686
2016-01-25 13:27:32 +00:00
Bradley Smith d27a6a7072 [ARM] Add DSP build attribute and extension targeting
This patch was originally committed as r257885, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.

llvm-svn: 258683
2016-01-25 11:26:11 +00:00
Bradley Smith f277c8a5ea [ARM] Add new system registers to ARMv8-M Baseline/Mainline
This patch was originally committed as r257884, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.

llvm-svn: 258682
2016-01-25 11:25:36 +00:00
Bradley Smith fed3e4ac00 [ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/Mainline
This patch was originally committed as r257883, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.

llvm-svn: 258681
2016-01-25 11:24:47 +00:00
Asaf Badouh 655822ab7e [X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned 52bit integer
VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
 VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators

Differential Revision: http://reviews.llvm.org/D16407

llvm-svn: 258680
2016-01-25 11:14:24 +00:00
Oliver Stannard 65b85382f6 [ARM] Add ARMv8.2-A FP16 scalar instructions
This was originally committed as r255762, but reverted as it broke windows
bots. Re-commitiing the exact same patch, as the underlying cause was fixed by
r258677.

ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.

These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.

Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.

New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.

Differential Revision: http://reviews.llvm.org/D15038

llvm-svn: 258678
2016-01-25 10:26:26 +00:00
Igor Breger 6d421419db AVX1 : Enable vector masked_load/store to AVX1.
Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q).

Differential Revision: http://reviews.llvm.org/D16528

llvm-svn: 258675
2016-01-25 10:17:11 +00:00
Michael Zuckerman 72b7223ae6 [AVX512] [CMPPS ][ CMPPD ] Adding full Comparison Predicate names
X86AsmParser.cpp is missing full comparison predicate names for CMPPD and CMPPS Instructions.
X86AsmParser.cpp defines only the short names of the Comparison predicate that you can find in the following pdf:
https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf
Page 5-61 table 5-3

Differential Revision: http://reviews.llvm.org/D16518

llvm-svn: 258671
2016-01-25 08:43:26 +00:00
Elena Demikhovsky 29cde35b43 Added Skylake client to X86 targets and features
Changes in X86.td:

I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X ..
I added Skylake client processor and defined it's features
FeatureADX was missing on KNL
Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others

Differential Revision: http://reviews.llvm.org/D16357

llvm-svn: 258659
2016-01-24 10:41:28 +00:00