Commit Graph

62877 Commits

Author SHA1 Message Date
Reid Kleckner 592a193285 Revert [SLP] Look-ahead operand reordering heuristic.
This reverts r364084 (git commit 5698921be2)

It caused crashes while compiling a file in Chrome. Reduction
forthcoming.

llvm-svn: 364111
2019-06-21 23:10:25 +00:00
Shoaib Meenai 6442317219 [llvm-lipo] Implement -thin
Creates thin output file of specified arch_type from the fat input file.

Patch by Anusha Basana <anushabasana@fb.com>

Differential Revision: https://reviews.llvm.org/D63341

llvm-svn: 364107
2019-06-21 21:59:01 +00:00
Julian Lettner 19c4d660f4 [ASan] Use dynamic shadow on 32-bit iOS and simulators
The VM layout on iOS is not stable between releases. On 64-bit iOS and
its derivatives we use a dynamic shadow offset that enables ASan to
search for a valid location for the shadow heap on process launch rather
than hardcode it.

This commit extends that approach for 32-bit iOS plus derivatives and
their simulators.

rdar://50645192
rdar://51200372
rdar://51767702

Reviewed By: delcypher

Differential Revision: https://reviews.llvm.org/D63586

llvm-svn: 364105
2019-06-21 21:01:39 +00:00
Craig Topper f5a5785632 [X86] Add test cases for incorrect shrinking of volatile vector loads from 128-bits to 32 or 64 bits. NFC
This is caused by isel patterns that look for vzmovl+load and
treat it the same as vzload.

llvm-svn: 364101
2019-06-21 20:16:26 +00:00
Matt Arsenault 22e3dc60a0 AMDGPU: Fix not using s33 for scratch wave offset in kernels
Fixes missing piece from r363990.

llvm-svn: 364099
2019-06-21 20:04:02 +00:00
Craig Topper 4649a051bf [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0)
128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg.

This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns.

Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining.

I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload.

Differential Revision: https://reviews.llvm.org/D63512

llvm-svn: 364095
2019-06-21 19:10:21 +00:00
Craig Topper 4569cdbcf5 [X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types.
We don't have any Custom handling during type legalization. Only
operation legalization.

Fixes PR42355

llvm-svn: 364093
2019-06-21 18:50:00 +00:00
Craig Topper 91ea99295c [X86] Add avx512bw command lines to avx512-select.ll
Prep for fixing PR42355 and ensuring we have coverage of
ISD::SELECT for v64i8/v32i16 on KNL and SKX configs.

llvm-svn: 364092
2019-06-21 18:49:42 +00:00
Simon Pilgrim 5dba4ed208 [X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffle
Subvector shuffling often ends up as insert/extract subvector.

llvm-svn: 364090
2019-06-21 18:35:04 +00:00
Amara Emerson 6e71b34fe6 [AArch64][GlobalISel] Implement selection support for the new G_JUMP_TABLE and G_BRJT ops.
With this we can now fully code generate jump tables, which is important for code size.

Differential Revision: https://reviews.llvm.org/D63223

llvm-svn: 364086
2019-06-21 18:10:41 +00:00
Amara Emerson fe4625fb24 [GlobalISel][IRTranslator] Change switch table translation to generate jump tables and range checks.
This change makes use of the newly refactored SwitchLoweringUtils code from
SelectionDAG to in order to generate jump tables and range checks where appropriate.

Much of this code is ported from SDAG with some modifications. We generate
G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means
that targets which previously relied on the naive one MBB per case stmt
translation will now start falling back until they add support for the new opcodes.

For range checks, we don't generate any previously unused operations. This
just recognizes contiguous ranges of case values and generates a single block per
range. Single case value blocks are just a special case of ranges so we get that
support almost for free.

There are still some optimizations missing that I haven't ported over, and
bit-tests are also unimplemented. This patch series is already complex enough.

Actual arm64 support for selection of jump tables is coming in a later patch.

Differential Revision: https://reviews.llvm.org/D63169

llvm-svn: 364085
2019-06-21 18:10:38 +00:00
Simon Pilgrim 5698921be2 [SLP] Look-ahead operand reordering heuristic.
This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D60897

llvm-svn: 364084
2019-06-21 17:57:01 +00:00
David Bolvansky 2441a4074c [NFC] Update shl-sub tests
llvm-svn: 364083
2019-06-21 17:51:18 +00:00
Sanjay Patel f483617256 [InstCombine] add tests for ctpop folds; NFC
llvm-svn: 364082
2019-06-21 17:44:09 +00:00
Craig Topper 6af1be9664 [X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl.
We already use vmovq for v2i64/v2f64 vzmovl. But we were using a
blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or
movsd+xorpd under optsize.

I think the blend with 0 or movss/d is only needed for
vXi32 where we don't have an instruction that can move 32
bits from one xmm to another while zeroing upper bits.

movq is no worse than blendpd on any known CPUs.

llvm-svn: 364079
2019-06-21 17:24:21 +00:00
Amara Emerson 8f25a021dd [AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal.
We sometimes get poor code size because constants of types < 32b are legalized
as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the
localizer can no longer sink them (although it's possible to extend it to do so).

On AArch64 however s8 and s16 constants can be selected in the same way as s32
constants, with a mov pseudo into a W register. If we make s8 and s16 constants
legal then we can avoid unnecessary truncates, they can be CSE'd, and the
localizer can sink them as normal.

There is a caveat: if the user of a smaller constant has to widen the sources,
we end up with an anyext of the smaller typed G_CONSTANT. This can cause
regressions because of the additional extend and missed pattern matching. To
remedy this, there's a new artifact combiner to generate the wider G_CONSTANT
if it's legal for the target.

Differential Revision: https://reviews.llvm.org/D63587

llvm-svn: 364075
2019-06-21 16:43:50 +00:00
Stanislav Mekhanoshin bdf7f81b89 [AMDGPU] hazard recognizer for fp atomic to s_denorm_mode
This requires 3 wait states unless there is a wait or VALU in
between.

Differential Revision: https://reviews.llvm.org/D63619

llvm-svn: 364074
2019-06-21 16:30:14 +00:00
David Bolvansky dbcdad51ff [InstCombine] (1 << (C - x)) -> ((1 << C) >> x) if C is bitwidth - 1
Summary:
```
%a = sub i32 31, %x
%r = shl i32 1, %a
  =>
%d = shl i32 1, 31
%r = lshr i32 %d, %x

Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/btZm

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63652

llvm-svn: 364073
2019-06-21 16:25:32 +00:00
David Bolvansky 045b0f60b6 [NFC] Added more tests for D63652
llvm-svn: 364069
2019-06-21 16:14:13 +00:00
David Bolvansky 4b28478389 [InstCombine] cttz(abs(x)) -> cttz(x)
Summary: Signedness does not change number of trailing zeros.

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D63546

llvm-svn: 364064
2019-06-21 15:26:22 +00:00
Sanjay Patel ddb9093684 [GVNSink] prevent crashing on mismatched instructions (PR42346)
Patch based on suggestion by James Molloy (@jmolloy) in:
https://bugs.llvm.org/show_bug.cgi?id=42346

llvm-svn: 364062
2019-06-21 15:17:24 +00:00
David Bolvansky b0ba049f58 [NFC] Added tests for (1 << (C - x)) -> ((1 << C) >> x)
llvm-svn: 364060
2019-06-21 15:00:31 +00:00
George Rimar fa1c7d9bdf [llvm-objcopy] - Get rid of dynrel.elf precompiled binary from inputs.
We do not have to spread using the precompiled binaries in the tests,
when we can use YAML. This patch removes the dynrel.elf binary and adds
a few comments to the test cases.

Differential revision: https://reviews.llvm.org/D63641

llvm-svn: 364052
2019-06-21 14:15:15 +00:00
Jay Foad d9d3c91b48 [Scalarizer] Propagate IR flags
Summary:
The motivation for this was to propagate fast-math flags like nnan and
ninf on vector floating point operations to the corresponding scalar
operations to take advantage of follow-on optimizations. But I think
the same argument applies to all of our IR flags: if they apply to the
vector operation then they also apply to all the individual scalar
operations, and they might enable follow-on optimizations.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63593

llvm-svn: 364051
2019-06-21 14:10:18 +00:00
George Rimar 0a32c07cd7 [llvm-readobj] - Inline a few yaml inputs into test cases.
There are some test that are splitted into main part + input yaml for no visible reason.
This patch inines the yaml part for the 3 test cases I found.

Differential revision: https://reviews.llvm.org/D63644

llvm-svn: 364049
2019-06-21 14:07:35 +00:00
Andrea Di Biagio dd0dc19b1c Set an explicit x86 triple for test bottleneck-analysis.s added by my r364045. NFC
This should unbreak the ppc64 buildbots.

llvm-svn: 364048
2019-06-21 14:05:58 +00:00
Sam Elliott 96c8bc7956 [RISCV] Add RISCV-specific TargetTransformInfo
Summary:
LLVM Allows Targets to provide information that guides optimisations
made to LLVM IR. This is done with callbacks on a TargetTransformInfo object.

This patch adds a TargetTransformInfo class for RISC-V. This will allow us to
implement RISC-V specific callbacks as they become necessary.

This commit also adds the getIntImmCost callbacks, and tests them with a simple
constant hoisting test. Our immediate costs are on the conservative side, for
the moment, but we prevent hoisting in most circumstances anyway.

Previous review was on D63007

Reviewers: asb, luismarques

Reviewed By: asb

Subscribers: ributzka, MaskRay, llvm-commits, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, apazos, simoncook, johnrusso, rbar, hiraditya, mgorny

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63433

llvm-svn: 364046
2019-06-21 13:36:09 +00:00
Andrea Di Biagio aa9b6468bd [MCA][Bottleneck Analysis] Teach how to compute a critical sequence of instructions based on the simulation.
This patch teaches the bottleneck analysis how to identify and print the most
expensive sequence of instructions according to the simulation. Fixes PR37494.

The goal is to help users identify the sequence of instruction which is most
critical for performance.

A dependency graph is internally used by the bottleneck analysis to describe
data dependencies and processor resource interferences between instructions.

There is one node in the graph for every instruction in the input assembly
sequence. The number of nodes in the graph is independent from the number of
iterations simulated by the tool. It means that a single node of the graph
represents all the possible instances of a same instruction contributed by the
simulated iterations.

Edges are dynamically "discovered" by the bottleneck analysis by observing
instruction state transitions and "backend pressure increase" events generated
by the Execute stage. Information from the events is used to identify critical
dependencies, and materialize edges in the graph. A dependency edge is uniquely
identified by a pair of node identifiers plus an instance of struct
DependencyEdge::Dependency (which provides more details about the actual
dependency kind).

The bottleneck analysis internally ranks dependency edges based on their impact
on the runtime (see field DependencyEdge::Dependency::Cost). To this end, each
edge of the graph has an associated cost. By default, the cost of an edge is a
function of its latency (in cycles). In practice, the cost of an edge is also a
function of the number of cycles where the dependency has been seen as
'contributing to backend pressure increases'. The idea is that the higher the
cost of an edge, the higher is the impact of the dependency on performance. To
put it in another way, the cost of an edge is a measure of criticality for
performance.

Note how a same edge may be found in multiple iteration of the simulated loop.
The logic that adds new edges to the graph checks if an equivalent dependency
already exists (duplicate edges are not allowed). If an equivalent dependency
edge is found, field DependencyEdge::Frequency of that edge is incremented by
one, and the new cost is cumulatively added to the existing edge cost.

At the end of simulation, costs are propagated to nodes through the edges of the
graph. The goal is to identify a critical sequence from a node of the root-set
(composed by node of the graph with no predecessors) to a 'sink node' with no
successors.  Note that the graph is intentionally kept acyclic to minimize the
complexity of the critical sequence computation algorithm (complexity is
currently linear in the number of nodes in the graph).

The critical path is finally computed as a sequence of dependency edges. For
edges describing processor resource interferences, the view also prints a
so-called "interference probability" value (by dividing field
DependencyEdge::Frequency by the total number of iterations).

Examples of critical sequence computations can be found in tests added/modified
by this patch.

On output streams that support colored output, instructions from the critical
sequence are rendered with a different color.

Strictly speaking the analysis conducted by the bottleneck analysis view is not
a critical path analysis. The cost of an edge doesn't only depend on the
dependency latency. More importantly, the cost of a same edge may be computed
differently by different iterations.

The number of dependencies is discovered dynamically based on the events
generated by the simulator. However, their number is not fixed. This is
especially true for edges that model processor resource interferences; an
interference may not occur in every iteration. For that reason, it makes sense
to also print out a "probability of interference".

By construction, the accuracy of this analysis (as always) is strongly dependent
on the simulation (and therefore the quality of the information available in the
scheduling model).

That being said, the critical sequence effectively identifies a performance
criticality. Instructions from that sequence are expected to have a very big
impact on performance. So, users can take advantage of this information to focus
their attention on specific interactions between instructions.
In my experience, it works quite well in practice, and produces useful
output (in a reasonable amount time).

Differential Revision: https://reviews.llvm.org/D63543

llvm-svn: 364045
2019-06-21 13:32:54 +00:00
Simon Tatham 0c7af66450 [ARM] Add MVE 64-bit GPR <-> vector move instructions.
These instructions let you load half a vector register at once from
two general-purpose registers, or vice versa.

The assembly syntax for these instructions mentions the vector
register name twice. For the move _into_ a vector register, the MC
operand list also has to mention the register name twice (once as the
output, and once as an input to represent where the unchanged half of
the output register comes from). So we can conveniently assign one of
the two asm operands to be the output $Qd, and the other $QdSrc, which
avoids confusing the auto-generated AsmMatcher too much. For the move
_from_ a vector register, there's no way to get round the fact that
both instances of that register name have to be inputs, so we need a
custom AsmMatchConverter to avoid generating two separate output MC
operands. (And even that wouldn't have worked if it hadn't been for
D60695.)

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62679

llvm-svn: 364041
2019-06-21 13:17:23 +00:00
Simon Tatham bafb105e96 [ARM] Add MVE vector instructions that take a scalar input.
This adds the `MVE_qDest_rSrc` superclass and all its instances, plus
a few other instructions that also take a scalar input register or two.

I've also belatedly added custom diagnostic messages to the operand
classes for odd- and even-numbered GPRs, which required matching
changes in two of the existing MVE assembly test files.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62678

llvm-svn: 364040
2019-06-21 13:17:08 +00:00
Paul Robinson 26cc5bcb1a Fix a crash with assembler source and -g.
llvm-mc or clang with -g normally produces debug info describing the
assembler source itself; however, if that source already contains some
.file/.loc directives, we should instead emit the debug info described
by those directives.  For certain assembler sources seen in the wild
(particularly in the Chrome build) this was causing a crash due to
incorrect assumptions about legal sequences of assembler source text.

Fixes PR38994.

Differential Revision: https://reviews.llvm.org/D63573

llvm-svn: 364039
2019-06-21 13:10:19 +00:00
Simon Pilgrim 36a999ffb8 [X86] X86ISD::ANDNP is a (non-commutative) binop
The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better.

llvm-svn: 364038
2019-06-21 12:42:39 +00:00
Simon Tatham a6b6a15701 [ARM] Add a batch of similarly encoded MVE instructions.
Summary:
This adds the `MVE_qDest_qSrc` superclass and all instructions that
inherit from it. It's not the complete class of _everything_ with a
q-register as both destination and source; it's a subset of them that
all have similar encodings (but it would have been hopelessly unwieldy
to call it anything like MVE_111x11100).

This category includes add/sub with carry; long multiplies; halving
multiplies; multiply and accumulate, and some more complex
instructions.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62677

llvm-svn: 364037
2019-06-21 12:13:59 +00:00
James Henderson 9485b265e8 [binutils] Add response file option to help and docs
Many LLVM-based tools already support response files (i.e. files
containing a list of options, specified with '@'). This change simply
updates the documentation and help text for some of these tools to
include it. I haven't attempted to fix all tools, just a selection that
I am interested in.

I've taken the opportunity to add some tests for --help behaviour, where
they were missing. We could expand these tests, but I don't think that's
within scope of this patch.

This fixes https://bugs.llvm.org/show_bug.cgi?id=42233 and
https://bugs.llvm.org/show_bug.cgi?id=42236.

Reviewed by: grimar, MaskRay, jkorous

Differential Revision: https://reviews.llvm.org/D63597

llvm-svn: 364036
2019-06-21 11:49:20 +00:00
James Henderson beb2493fb7 [llvm-dwarfdump] Remove unnecessary explicit -h behaviour
--help and -h are automatically supported by the command-line parser,
unless overridden by the tool. The behaviour of the PrintHelpMessage
being used for -h prior to this patch is subtly different to that
provided by --help automatically (it omits certain elements of help text
and options, such as --help-list), so overriding the default is not
desirable, without good reason. This patch removes the explicit
specification of -h and its behaviour, so that the default behaviour is
used.

Reviewed by: hintonda

Differential Revision: https://reviews.llvm.org/D63565

llvm-svn: 364029
2019-06-21 11:22:20 +00:00
Simon Tatham 7d76f8acf0 [ARM] Add MVE vector compare instructions.
Summary:
These take a pair of vector register to compare, and a comparison type
(written in the form of an Arm condition suffix); they output a vector
of booleans in the VPR register, where predication can conveniently
use them.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62676

llvm-svn: 364027
2019-06-21 11:14:51 +00:00
Simon Pilgrim 771c33e375 [X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) pattern
llvm-svn: 364022
2019-06-21 10:44:15 +00:00
Simon Tatham c9b2cd4674 [ARM] Add a batch of MVE floating-point instructions.
Summary:
This includes floating-point basic arithmetic (add/sub/multiply),
complex add/multiply, unary negation and absolute value, rounding to
integer value, and conversion to/from integer formats.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62675

llvm-svn: 364013
2019-06-21 09:35:07 +00:00
Yevgeny Rouban d5e1ce3f44 [LICM & MSSA] Fixed test to run only with assertions enabled as it uses -debug-only
llvm-svn: 364005
2019-06-21 04:49:40 +00:00
Amara Emerson bc0d08e0ee [GlobalISel][Localizer] Allow localization of G_INTTOPTR and chains of instructions.
G_INTTOPTR can prevent the localizer from moving G_CONSTANTs, but since it's
essentially a side effect free cast instruction we can remat both instructions.
This patch changes the localizer to enable localization of the chains by
iterating over the entry block instructions in reverse order. That way, uses will
localized first, and then the defs are free to be localized as well.

This also changes the previous SmallPtrSet of localized instructions to use a
SetVector instead. We're dealing with pointers and need deterministic iteration
order.

Overall, this change improves ARM64 -O0 CTMark code size by around 0.7% geomean.

Differential Revision: https://reviews.llvm.org/D63630

llvm-svn: 364001
2019-06-21 00:36:19 +00:00
Cameron McInally 1c0bd6dd2c [Reassociate] Remove bogus assert reported in PR42349.
Also, add a FIXME for the unsafe transform on a unary FNeg. A unary FNeg can only be transformed to a FMul by -1.0 when the nnan flag is present. The unary FNeg project is a WIP, so the unsafe transformation is acceptable until that work is complete.

The bogus assert with introduced in D63445.

llvm-svn: 363998
2019-06-20 23:03:55 +00:00
Sanjay Patel b342f026a4 [InstSimplify] simplify power-of-2 (single bit set) sequences
As discussed in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

Improving the canonicalization for these patterns:
rL363956
...means we should adjust/enhance the related simplification.

https://rise4fun.com/Alive/w1cp

  Name: isPow2 or zero
  %x = and i32 %xx, 2048
  %a = add i32 %x, -1
  %r = and i32 %a, %x
  =>
  %r = i32 0

llvm-svn: 363997
2019-06-20 22:55:28 +00:00
Eli Friedman 45270054bc [ARM GlobalISel] Tests for s64 G_ADD and G_SUB.
Forgot to commit these in r363989 (https://reviews.llvm.org/D63585)

llvm-svn: 363991
2019-06-20 22:00:07 +00:00
Matt Arsenault d88db6d7fc AMDGPU: Always use s33 for global scratch wave offset
Every called function could possibly need this to calculate the
absolute address of stack objectst, and this avoids inserting a copy
around every call site in the kernel. It's also somewhat cleaner to
keep this in a callee saved SGPR.

llvm-svn: 363990
2019-06-20 21:58:24 +00:00
Rainer Orth 6fde832b82 [profile] Solaris ld supports __start___llvm_prof_data etc. labels
Currently, many profiling tests on Solaris FAIL like

  Command Output (stderr):
  --
  Undefined                       first referenced
   symbol                             in file
  __llvm_profile_register_names_function /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o
  __llvm_profile_register_function    /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o

Solaris 11.4 ld supports the non-standard GNU ld extension of adding
__start_SECNAME and __stop_SECNAME labels to sections whose names are valid
as C identifiers.  Given that we already use Solaris 11.4-only features
like ld -z gnu-version-script-compat and fully working .preinit_array
support in compiler-rt, we don't need to worry about older versions of
Solaris ld.

The patch documents that support (although the comment in
lib/Transforms/Instrumentation/InstrProfiling.cpp
(needsRuntimeRegistrationOfSectionRange) is quite cryptic what it's
actually about), and adapts the affected testcase not to expect the
alternativeq __llvm_profile_register_functions and __llvm_profile_init.
It fixes all affected tests.

Tested on amd64-pc-solaris2.11.

Differential Revision: https://reviews.llvm.org/D41111

llvm-svn: 363984
2019-06-20 21:27:06 +00:00
Matt Arsenault 740322f1eb AMDGPU: Add intrinsics for DS GWS semaphore instructions
llvm-svn: 363983
2019-06-20 21:11:42 +00:00
Alina Sbirlea d0b11698cd [LICM & MSSA] Limit unsafe sinking and hoisting.
Summary:
The getClobberingMemoryAccess API checks for clobbering accesses in a loop by walking the backedge. This may check if a memory access is being
clobbered by the loop in a previous iteration, depending how smart AA got over the course of the updates in MemorySSA (it does not occur when built from scratch).
If no clobbering access is found inside the loop, it will optimize to an access outside the loop. This however does not mean that access is safe to sink.
Given:
```
for i
  load a[i]
  store a[i]
```
The access corresponding to the load can be optimized to outside the loop, and the load can be hoisted. But it is incorrect to sink it.
In order to sink the load, we'd need to check no Def clobbers the Use in the same iteration. With this patch we currently restrict sinking to either
Defs not existing in the loop, or Defs preceding the load in the same block. An easy extension is to ensure the load (Use) post-dominates all Defs.

Caught by PR42294.

This issue also shed light on the converse problem: hoisting stores in this same scenario would be illegal. With this patch we restrict
hoisting of stores to the case when their corresponding Defs are dominating all Uses in the loop.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63582

llvm-svn: 363982
2019-06-20 21:09:09 +00:00
Sanjay Patel 3207566dd6 [InstSimplify] add tests for known-not-a-power-of-2; NFC
I added a canonicalization to create this general pattern in:
rL363956

But as noted in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314#c11

...we have a (potentially expensive) simplification for the version
of the code that we just canonicalized away from, so we should
add/adjust that code to match.

llvm-svn: 363981
2019-06-20 21:04:14 +00:00
Matt Arsenault 8ad1decf45 AMDGPU: Insert mem_viol check loop around GWS pre-GFX9
It is necessary to emit this loop around GWS operations in case the
wave is preempted pre-GFX9.

llvm-svn: 363979
2019-06-20 20:54:32 +00:00
Cameron McInally 9589db7a98 [NFC][SLP] Pre-commit unary FNeg test to X86/propagate_ir_flags.ll
llvm-svn: 363978
2019-06-20 20:53:51 +00:00
Leonard Chan 108a946319 Update LLVM test to not check for the EliminateAvailableExternallyPass
for lto-pre-link O2 pipeline runs.

llvm-svn: 363977
2019-06-20 20:51:58 +00:00
Leonard Chan 97dc622ab3 [clang][NewPM] Do not eliminate available_externally durng `-O2 -flto` runs
This fixes CodeGen/available-externally-suppress.c when the new pass manager is
turned on by default. available_externally was not emitted during -O2 -flto
runs when it should still be retained for link time inlining purposes. This can
be fixed by checking that we aren't LTOPrelinking when adding the
EliminateAvailableExternallyPass.

Differential Revision: https://reviews.llvm.org/D63580

llvm-svn: 363971
2019-06-20 19:44:51 +00:00
David Bolvansky 642ed40e57 [NFC] Add more tests for D46262
llvm-svn: 363970
2019-06-20 19:39:15 +00:00
David Bolvansky e0c1c3baf9 [NFC] Updated tests for D63546
llvm-svn: 363967
2019-06-20 19:30:56 +00:00
Craig Topper 9e1665f2d6 [X86] Add BLSI to isUseDefConvertible.
Summary:
BLSI sets the C flag is the input is not zero. So if its followed
by a TEST of the input where only the Z flag is consumed, we can
replace it with the opposite check of the C flag.

We should be able to do the same for BLSMSK and BLSR, but the
naive test case for those is being optimized to a subo by
CodeGenPrepare.

Reviewers: spatel, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63589

llvm-svn: 363957
2019-06-20 17:52:53 +00:00
Sanjay Patel 63311bfb83 [InstCombine] canonicalize check for power-of-2
The form that compares against 0 is better because:
1. It removes a use of the input value.
2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS).

This is a root cause for PR42314, but probably doesn't completely answer the codegen request:
https://bugs.llvm.org/show_bug.cgi?id=42314

Alive proof:
https://rise4fun.com/Alive/9kG

  Name: is power-of-2
  %neg = sub i32 0, %x
  %a = and i32 %neg, %x
  %r = icmp eq i32 %a, %x
  =>
  %dec = add i32 %x, -1
  %a2 = and i32 %dec, %x
  %r = icmp eq i32 %a2, 0

  Name: is not power-of-2
  %neg = sub i32 0, %x
  %a = and i32 %neg, %x
  %r = icmp ne i32 %a, %x
  =>
  %dec = add i32 %x, -1
  %a2 = and i32 %dec, %x
  %r = icmp ne i32 %a2, 0

llvm-svn: 363956
2019-06-20 17:41:15 +00:00
Philip Reames 8c80d08052 [Tests] Add a tricky LFTR case for documentation purposes
Thought of this case while working on something else.  We appear to get it right in all of the variations I tried, but that's by accident.  So, add a test which would catch the potential bug.

llvm-svn: 363953
2019-06-20 17:16:53 +00:00
Amy Huang 7fac5c8d94 Store a pointer to the return value in a static alloca and let the debugger use that
as the variable address for NRVO variables.

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D63361

llvm-svn: 363952
2019-06-20 17:15:21 +00:00
David Bolvansky 01511192b2 [InstCombine] cttz(-x) -> cttz(x)
Summary: Signedness does not change number of trailing zeros.

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63534

llvm-svn: 363951
2019-06-20 17:04:14 +00:00
Matt Arsenault 5dbe4a9926 AMDGPU: Eliminate test usage of legacy FP elim attributes
llvm-svn: 363950
2019-06-20 17:03:27 +00:00
Matt Arsenault 5dc457cbe4 AMDGPU: Fix ignoring DisableFramePointerElim in leaf functions
The attribute can specify elimination for leaf or non-leaf, so it
should always be considered. I copied this bug from AArch64, which
probably should also be fixed.

llvm-svn: 363949
2019-06-20 17:03:23 +00:00
Evandro Menezes aa10f05044 [CodeGen] Fix formatting and comments (NFC)
llvm-svn: 363947
2019-06-20 16:34:00 +00:00
Stanislav Mekhanoshin e917b3b4b8 [AMDGPU] gfx10 tests. NFC.
llvm-svn: 363946
2019-06-20 16:29:40 +00:00
Sanjay Patel d729ed8d44 [InstCombine] add commuted variants for power-of-2 checks; NFC
llvm-svn: 363945
2019-06-20 16:27:23 +00:00
Matt Arsenault b7f87c0ecf AMDGPU: Treat undef as an inline immediate
This should only matter in vectors with an undef component, since a
full undef vector would have been folded out.

llvm-svn: 363941
2019-06-20 16:01:09 +00:00
Matt Arsenault fcce531752 AMDGPU: Make test functions hidden
Reduces amount of code in the function from eliminating the GOT load.

llvm-svn: 363940
2019-06-20 15:38:30 +00:00
Sanjay Patel 345473c791 [InstCombine] add tests for checking power-of-2; NFC
llvm-svn: 363938
2019-06-20 15:25:18 +00:00
Cameron McInally 4452c3b490 [NFC][SLP] Pre-commit unary FNeg test to X86/phi3.ll
llvm-svn: 363937
2019-06-20 15:17:17 +00:00
Simon Tatham 232db11020 [ARM] Add a batch of MVE integer instructions.
This includes integer arithmetic of various kinds (add/sub/multiply,
saturating and not), and the immediate forms of VMOV and VMVN that
load an immediate into all lanes of a vector.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62674

llvm-svn: 363936
2019-06-20 15:16:56 +00:00
Stanislav Mekhanoshin 0846c125f9 [AMDGPU] gfx1010 core wave32 changes
Differential Revision: https://reviews.llvm.org/D63204

llvm-svn: 363934
2019-06-20 15:08:34 +00:00
Simon Pilgrim 1d8093249f [DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), C)) non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

llvm-svn: 363929
2019-06-20 14:42:27 +00:00
Simon Pilgrim 72186a2494 [SLP][X86] Add lookahead reordering tests from D60897
llvm-svn: 363925
2019-06-20 12:52:58 +00:00
Fangrui Song 7064a437f8 [llvm-nm] Generalize ELF symbol types 'N' and 'n'
Reviewed By: grimar, jhenderson

Differential Revision: https://reviews.llvm.org/D63588

llvm-svn: 363918
2019-06-20 10:15:11 +00:00
Petar Avramovic 153bd24eda [MIPS GlobalISel] Select integer to floating point conversions
Select G_SITOFP and G_UITOFP for MIPS32.

Differential Revision: https://reviews.llvm.org/D63542

llvm-svn: 363912
2019-06-20 09:05:02 +00:00
Petar Avramovic 4b4dae1c76 [MIPS GlobalISel] Select floating point to integer conversions
Select G_FPTOSI and G_FPTOUI for MIPS32.

Differential Revision: https://reviews.llvm.org/D63541

llvm-svn: 363911
2019-06-20 08:52:53 +00:00
Craig Topper 3ba20e943e [X86] Add test cases showing missed opportunities to use the C flag from the BLSI instruction to avoid a TEST instruction
llvm-svn: 363909
2019-06-20 06:45:01 +00:00
Matt Arsenault c67c484f36 AMDGPU: Don't clobber VCC in MUBUF addr64 emulation
Introducing VCC defs during SIFixSGPRCopies is generally
problematic. Avoid it by starting with the VOP3 form with the general
condition register. This is the easiest to fix instance, but doesn't
solve any specific problems I'm looking at.

llvm-svn: 363904
2019-06-20 00:51:28 +00:00
Eli Friedman d88e28d13e [llvm-objdump] Switch between ARM/Thumb based on mapping symbols.
The ARMDisassembler changes allow changing between ARM and Thumb mode
based on the MCSubtargetInfo, rather than the Target, which simplifies
the other changes a bit.

I'm not really happy with adding more target-specific logic to
tools/llvm-objdump/, but there isn't any easy way around it: the logic
in question specifically applies to disassembling an object file, and
that code simply isn't located in lib/Target, at least at the moment.

Differential Revision: https://reviews.llvm.org/D60927

llvm-svn: 363903
2019-06-20 00:29:40 +00:00
Thomas Preud'homme a2ef1ba32f [FileCheck] Stop qualifying expressions as numeric
Summary:
Stop referring to "numeric expression", using simply the term
"expression" instead. Likewise for numeric operation since operations
are only used in numeric expressions.

Reviewers: jhenderson, jdenny, probinson, arichardson

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63500

llvm-svn: 363901
2019-06-19 23:47:24 +00:00
Thomas Preud'homme baae41ff76 FileCheck: Return parse error w/ Error & Expected
Summary:
Make use of Error and Expected to bubble up diagnostics and force
checking of errors in the callers.

Reviewers: jhenderson, jdenny, probinson, arichardson

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63125

llvm-svn: 363900
2019-06-19 23:47:10 +00:00
Matt Arsenault e24b34e9c9 AMDGPU: Undo sub x, c canonicalization for v2i16
Should avoid regression from D62341

llvm-svn: 363899
2019-06-19 23:37:43 +00:00
Matt Arsenault 532be255a5 AMDGPU: Add baseline test for vector sub x, c canonicalization
This will catch regressions from D62341, and show improvements from a
future patch to fix them.

llvm-svn: 363888
2019-06-19 22:37:08 +00:00
Simon Atanasyan f61c43c636 [mips] Mark the `lwupc` instruction as MIPS64 R6 only
The "The MIPS64 Instruction Set Reference Manual" [1] states that
the `lwupc` is MIPS64 Release 6 only. It should not be supported
for 32-bit CPUs.

[1] https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00087-2B-MIPS64BIS-AFP-6.06.pdf

llvm-svn: 363886
2019-06-19 22:08:06 +00:00
Philip Reames eda1ba65ca LFTR for multiple exit loops
Teach IndVarSimply's LinearFunctionTestReplace transform to handle multiple exit loops. LFTR does two key things 1) it rewrites (all) exit tests in terms of a common IV potentially eliminating one in the process and 2) it moves any offset/indexing/f(i) style logic out of the loop.

This turns out to actually be pretty easy to implement. SCEV already has all the information we need to know what the backedge taken count is for each individual exit. (We use that when computing the BE taken count for the loop as a whole.) We basically just need to iterate through the exiting blocks and apply the existing logic with the exit specific BE taken count. (The previously landed NFC makes this super obvious.)

I chose to go ahead and apply this to all loop exits instead of only latch exits as originally proposed. After reviewing other passes, the only case I could find where LFTR form was harmful was LoopPredication. I've fixed the latch case, and guards aren't LFTRed anyways. We'll have some more work to do on the way towards widenable_conditions, but that's easily deferred.

I do want to note that I added one bit after the review.  When running tests, I saw a new failure (no idea why didn't see previously) which pointed out LFTR can rewrite a constant condition back to a loop varying one.  This was theoretically possible with a single exit, but the zero case covered it in practice.  With multiple exits, we saw this happening in practice for the eliminate-comparison.ll test case because we'd compute a ExitCount for one of the exits which was guaranteed to never actually be reached.  Since LFTR ran after simplifyAndExtend, we'd immediately turn around and undo the simplication work we'd just done.  The solution seemed obvious, so I didn't bother with another round of review.

Differential Revision: https://reviews.llvm.org/D62625

llvm-svn: 363883
2019-06-19 21:58:25 +00:00
Philip Reames 80eb1ce7a0 [Tests] Autogen a test so that future changes are understandable
llvm-svn: 363882
2019-06-19 21:39:07 +00:00
Alina Sbirlea 109d2ea153 [MemorySSA] Cleanup trivial phis.
Summary:
This is unfortunately needed for correctness, if we are to extend the tolerance of the update API to the way simple loop unswitch is doing cloning.

In simple loop unswitch (as opposed to loop unswitch), not all blocks are cloned. This can create unreachable cloned blocks (no predecessor), which are later cleaned up.

In MemorySSA, the  APIs for supporting these kind of updates (clone + update exit blocks), make certain assumption on the integrity of the CFG. When cloning, if something was not cloned, it's values in MemorySSA default to LiveOnEntry. When updating exit blocks, it is safe to assume that we can first insert phis in the blocks merging two clones, then add additional phis in the IDF of the blocks that received phis. This no longer holds true if one of the clones being merged comes from an unreachable block. We'd conservatively need to add all phis before filling in their incoming definitions. In practice this restriction can be relaxed if we clean up trivial phis after the first round of insertion.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63354

llvm-svn: 363880
2019-06-19 21:33:09 +00:00
Matt Arsenault 4d000d2488 AMDGPU: Fix folding immediate into readfirstlane through reg_sequence
The def instruction for the vreg may not match, because it may be
folding through a reg_sequence. The assert was overly conservative and
not necessary. It's not actually important if DefMI really defined the
register, because the fold that will be done cares about the def of
the value that will be folded.

For some reason copies aren't making it through the reg_sequence,
although they should.

llvm-svn: 363876
2019-06-19 20:44:15 +00:00
Peter Collingbourne 2742eeb78e hwasan: Shrink outlined checks by 1 instruction.
Turns out that we can save an instruction by folding the right shift into
the compare.

Differential Revision: https://reviews.llvm.org/D63568

llvm-svn: 363874
2019-06-19 20:40:03 +00:00
Matt Arsenault 4d55d024be Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"
This reapplies r363678, using the correct chain for the CopyToReg for
v0. glueCopyToM0 counterintuitively changes the operands of the
original node.

llvm-svn: 363870
2019-06-19 19:55:27 +00:00
Yuanfang Chen 40a156b791 [llvm-readobj] Match GNU output for DT_RPATH and DT_RUNPATH when dumping dynamic symbol table.
Reviewers: jhenderson, grimar, MaskRay, rupprecht, espindola

Subscribers: emaste, nemanjai, arichardson, kbarton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63347

llvm-svn: 363868
2019-06-19 19:31:07 +00:00
Yuanfang Chen fee7365b07 [llvm-objdump] Remove unnecessary indentation when dumping ELF data.
Reviewers: MaskRay, jhenderson, rupprecht

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63393

llvm-svn: 363858
2019-06-19 18:44:29 +00:00
Volkan Keles 61d7e35b22 Fix GlobalISel MachineVerifier tests. NFC.
These test were failing when building llvm with
`-DLLVM_DEFAULT_TARGET_TRIPLE=''`. Add `-march` to the
run line to fix the issue.

llvm-svn: 363854
2019-06-19 18:15:45 +00:00
Sanjay Patel b5640b6fe8 [x86] avoid vector load narrowing with extracted store uses (PR42305)
This is an exception to the rule that we should prefer xmm ops to ymm ops.
As shown in PR42305:
https://bugs.llvm.org/show_bug.cgi?id=42305
...the store folding opportunity with vextractf128 may result in better
perf by reducing the instruction count.

Differential Revision: https://reviews.llvm.org/D63517

llvm-svn: 363853
2019-06-19 18:13:47 +00:00
Sanjay Patel 33ef687d94 [x86] add test for unaligned 32-byte load/store splitting; NFC
llvm-svn: 363852
2019-06-19 18:06:59 +00:00
Simon Pilgrim 6016fb726c [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.

Matches what we already do for ZERO_EXTEND.

llvm-svn: 363850
2019-06-19 18:00:24 +00:00
Huihui Zhang 670778c762 [InstCombine] Fold icmp eq/ne (and %x, signbit), 0 -> %x s>=/s< 0 earlier
Summary:
To generate simplified IR, make sure fold
```
  (X & signbit) ==/!= 0) -> X s>=/s< 0;
```
is scheduled before fold
```
  ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0.
```

https://rise4fun.com/Alive/fbdh

Reviewers: lebedev.ri, efriedma, spatel, craig.topper

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63026

llvm-svn: 363845
2019-06-19 17:31:39 +00:00
Sanjay Patel 3e03bf6921 [InstSimplify] add a phi test with 1 incoming value; NFC
D63489 proposes to change this behavior, but there's no
direct -instsimplify test to verify that the transform exists.

llvm-svn: 363842
2019-06-19 17:23:29 +00:00
Simon Pilgrim 34279db355 [X86][SSE] Combine shuffles to ANY_EXTEND/ANY_EXTEND_VECTOR_INREG.
We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension.

llvm-svn: 363841
2019-06-19 17:21:15 +00:00
Evandro Menezes a7ed3a627b [AArch64] Improve jump tables testing (NFC)
Improve testing of the minimum and maximum sizes of jump tables.

llvm-svn: 363839
2019-06-19 16:59:34 +00:00
Simon Tatham 2f5188fd58 [ARM] Add MVE vector bit-operations (register inputs).
This includes all the obvious bitwise operations (AND, OR, BIC, ORN,
MVN) in register-to-register forms, and the immediate forms of
AND/OR/BIC/ORN; byte-order reverse instructions; and the VMOVs that
access a single lane of a vector.

Some of those VMOVs (specifically, the ones that access a 32-bit lane)
share an encoding with existing instructions that were disassembled as
accessing half of a d-register (e.g. `vmov.32 r0, d1[0]`), but in
8.1-M they're now written as accessing a quarter of a q-register (e.g.
`vmov.32 r0, q0[2]`). The older syntax is still accepted by the
assembler.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62673

llvm-svn: 363838
2019-06-19 16:43:53 +00:00
Evandro Menezes 54252b8243 [AArch64] Improve jump tables testing (NFC)
Improve testing of the minimum and maximum sizes of jump tables.

llvm-svn: 363837
2019-06-19 16:35:30 +00:00
James Henderson e20326ed33 [test][llvm-dwarfdump] Remove pointless CHECK-NOT lines
The original line was there from when this test was added, but it is
checking for a switch that doesn't exist, so really has no purpose, at
least any more.

llvm-svn: 363833
2019-06-19 16:31:59 +00:00
Hubert Tong 1f6ddfb6a3 [NFC][llvm-objcopy] Fix overly restrictive od output check
The check against the output of `od` in the affected tests expect a
specific input offset format. They also expect a specific offset value,
not consistent with the EXAMPLE section for `od` in POSIX.1-2017
Chapter 4, while using the `-j` option. In particular, the example shows
that the input offset begins at 0 following the bytes skipped.

This patch adjusts the matching of the input offset to be more generic.
In order to avoid false matches, it restricts the number of bytes to be
formatted.

llvm-svn: 363829
2019-06-19 16:04:24 +00:00
Hubert Tong e9983eed5a [NFC][LSR] Avoid undefined grep in pr2570.ll
greater-than-sign is not a BRE special character.

POSIX.1-2017 XBD Section 9.3.2 indicates that the interpretation of `\>`
is undefined. This patch replaces the pattern.

llvm-svn: 363828
2019-06-19 16:02:54 +00:00
Cameron McInally 7aa898e61e [DFSan] Add UnaryOperator visitor to DataFlowSanitizer
Differential Revision: https://reviews.llvm.org/D62815

llvm-svn: 363814
2019-06-19 15:11:41 +00:00
Cameron McInally a027cf4764 [Reassociate] Handle unary FNeg in the Reassociate pass
Differential Revision: https://reviews.llvm.org/D63445

llvm-svn: 363813
2019-06-19 14:59:14 +00:00
Bjorn Pettersson 16ff5fea87 [ConstantFolding] Add constant folding for smul.fix and smul.fix.sat
Summary:
This patch teaches ConstantFolding to constant fold
both scalar and vector variants of llvm.smul.fix and
llvm.smul.fix.sat.

As described in the LangRef rounding is unspecified for
these instrinsics. If the result cannot be represented
exactly the default behavior in ConstantFolding is to
round down towards negative infinity. If a target has a
preferred rounding that is different some kind of target
hook would be needed (same strategy as used by the
SelectionDAG legalizer).

Reviewers: nikic, leonardchan, RKSimon

Reviewed By: leonardchan

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63385

llvm-svn: 363811
2019-06-19 14:28:03 +00:00
Ulrich Weigand 3641b10f3d [SystemZ] Support vector load/store alignment hints
Vector load/store instructions support an optional alignment field
that the compiler can use to provide known alignment info to the
hardware.  If the field is used (and the information is correct),
the hardware may be able (on some models) to perform faster memory
accesses than otherwise.

This patch adds support for alignment hints in the assembler and
disassembler, and fills in known alignment during codegen.

llvm-svn: 363806
2019-06-19 14:20:00 +00:00
Simon Pilgrim c3994f77cb [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.

Matches what we already do for SIGN_EXTEND.

llvm-svn: 363802
2019-06-19 13:58:02 +00:00
Fangrui Song 102b1efd53 [llvm-dwarfdump] --gdb-index: fix uninitialized TuListOffset
The test only checks the existence of the `Types CU list` line.
Unfortunately I can't make a better test because
{gcc,clang} -fuse-ld={lld,gold} --gdb-index do not give me a non-empty types CU list.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D63537

llvm-svn: 363800
2019-06-19 13:51:29 +00:00
Simon Pilgrim 128ce93c60 Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on
SI/CI.
........
Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/

llvm-svn: 363797
2019-06-19 13:00:54 +00:00
David Bolvansky e3cd19d330 [NFC] Added tests for D63534
llvm-svn: 363796
2019-06-19 12:59:37 +00:00
David Bolvansky 21fd232385 [NFC] Added tests for cttz(abs(x)) -> cttz(x) fold
llvm-svn: 363795
2019-06-19 12:55:39 +00:00
Simon Pilgrim 9eed5d2f78 [DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

llvm-svn: 363793
2019-06-19 12:41:37 +00:00
Simon Pilgrim 8c49366c9b [DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths.

llvm-svn: 363792
2019-06-19 12:25:29 +00:00
Simon Pilgrim 85f70baa23 [X86] Add non-uniform (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) test
llvm-svn: 363791
2019-06-19 11:36:01 +00:00
Simon Pilgrim d954a53633 [DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) comment. NFCI.
We pre-extend, not post.

llvm-svn: 363787
2019-06-19 11:17:48 +00:00
Orlando Cazalet-Hyams 1251cac62a [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

I have set up a separate review D61933 for a fix which is required for this patch.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse

Reviewed By: hfinkel, jmorse

Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

> llvm-svn: 363046

llvm-svn: 363786
2019-06-19 10:50:47 +00:00
Jay Foad 45d19fb470 [ConstantFolding] Fix assertion failure on non-power-of-two vector load.
Summary:
The test case does an (out of bounds) load from a global constant with
type <3 x float>. InstSimplify tried to turn this into an integer load
of the whole alloc size of the vector, which is 128 bits due to
alignment padding, and then bitcast this to <3 x vector> which failed
an assertion due to the type size mismatch.

The fix is to do an integer load of the normal size of the vector, with
no alignment padding.

Reviewers: tpr, arsenm, majnemer, dstuttard

Reviewed By: arsenm

Subscribers: hfinkel, wdng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63375

llvm-svn: 363784
2019-06-19 10:28:48 +00:00
Lewis Revill 18737e81eb [RISCV] Allow parsing immediates that use tilde & exclaim
This patch allows immediates (and CSR alias immediates) which start with
a tilde token or an exclaim (!) token to be parsed as intended.

Differential Revision: https://reviews.llvm.org/D57320

llvm-svn: 363783
2019-06-19 10:27:24 +00:00
Lewis Revill 218aa0edb1 [RISCV] Fix failure to parse parenthesized immediates
Since the parser attempts to parse an operand as a register with
parentheses before parsing it as an immediate, immediates in
parentheses should not be parsed by parseRegister. However in the case
where the immediate does not start with an identifier, the LParen is not
unlexed and so the RParen causes an unexpected token error.

This patch adds the missing UnLex, and modifies the existing UnLex to
not use a buffered token, as it should always be unlexing an LParen.

Differential Revision: https://reviews.llvm.org/D57319

llvm-svn: 363782
2019-06-19 10:11:13 +00:00
Clement Courbet f7a6fb9f2c Fix r363773: Update Barcelona MCA tests.
llvm-svn: 363781
2019-06-19 10:00:36 +00:00
George Rimar b6e20937b3 [yaml2obj/obj2yaml] - Make RawContentSection::Info Optional<>
This allows to customize this field for "implicit" sections properly.

Differential revision: https://reviews.llvm.org/D63487

llvm-svn: 363777
2019-06-19 08:57:38 +00:00
Roman Lebedev 9f9691c032 [NFC][X86][MCA] Barcelona: add load/store/load-store-throughput tests
llvm-svn: 363775
2019-06-19 08:53:34 +00:00
Roman Lebedev 4358016b03 [NFC][X86][MCA] BdVer2: add load-store-throughput test
llvm-svn: 363774
2019-06-19 08:53:28 +00:00
Clement Courbet 4ef7c2868a [X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr
Summary:
llvm.x86.sse.stmxcsr only writes to memory.
llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE.

Reviewers: craig.topper, RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62896

llvm-svn: 363773
2019-06-19 08:44:31 +00:00
Lewis Revill 39263ac5d1 [RISCV] Add lowering of global TLS addresses
This patch adds lowering for global TLS addresses for the TLS models of
InitialExec, GlobalDynamic, LocalExec and LocalDynamic.

LocalExec support required using a 4-operand add instruction, which uses
the fourth operand to express a relocation on the symbol. The necessary
fixup is emitted when the instruction is emitted.

Differential Revision: https://reviews.llvm.org/D55305

llvm-svn: 363771
2019-06-19 08:40:59 +00:00
Alex Bradbury ec4e0809df [RISCV] Fix test after r363757
r363757 renamed ExpandISelPseudo to FinalizeISel, so the RUN line in
select-optimize-multiple.mir needed updating to refer to finalize-isel.

llvm-svn: 363762
2019-06-19 03:18:48 +00:00
Matt Arsenault 9cac4e6d14 Rename ExpandISelPseudo->FinalizeISel, delay register reservation
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.

Patch by Matthias Braun

llvm-svn: 363757
2019-06-19 00:25:39 +00:00
Thomas Lively 1885747498 [WebAssembly] Optimize ISel for SIMD Boolean reductions
Summary:
Converting the result *.{all,any}_true to a bool at the source level
generates LLVM IR that compares the result to 0. This check is
redundant since these instructions already return either 0 or 1 and
therefore conform to the BooleanContents setting for WebAssembly. This
CL adds patterns to detect and remove such redundant operations on the
result of Boolean reductions.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63529

llvm-svn: 363756
2019-06-19 00:02:13 +00:00
Evandro Menezes 1933cbe866 [test] Change comment wording (NFC)
llvm-svn: 363751
2019-06-18 23:31:10 +00:00
Michael Trent c2885ded2b Print dylib load kind (weak, reexport, etc) in llvm-objdump -m -dylibs-used
Summary:
Historically llvm-objdump prints the path to a dylib as well as the
dylib's compatibility version and current version number. This change
extends this information by adding the kind of dylib load: weak,
reexport, etc.

rdar://51383512

Reviewers: pete, lhames

Reviewed By: pete

Subscribers: rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62866

llvm-svn: 363746
2019-06-18 22:20:10 +00:00
Michael Liao 4f7f70e262 Recommit [SROA] Enhance SROA to handle `addrspacecast`ed allocas
[SROA] Enhance SROA to handle `addrspacecast`ed allocas

- Fix typo in original change
- Add additional handling to ensure all return pointers are properly
  casted.

Summary:
- After `addrspacecast` is allowed to be eliminated in SROA, the
  adjusting of storage pointer (from `alloca) needs to handle the
  potential different address spaces between the storage pointer (from
  alloca) and the pointer being used.

Reviewers: arsenm

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63501

llvm-svn: 363743
2019-06-18 21:41:13 +00:00
Matt Arsenault e8d8bb5170 InstCombine: Pre-commit test for reassociating nuw
D39417

llvm-svn: 363741
2019-06-18 21:32:51 +00:00
Huihui Zhang d16779a732 [ARM] Comply with rules on ARMv8-A thumb mode partial deprecation of IT.
Summary:
When identifing instructions that can be folded into a MOVCC instruction,
checking for a predicate operand is not enough, also need to check for
thumb2 function, with restrict-IT, is the machine instruction eligible for
ARMv8 IT or not.

Notes in ARMv8-A Architecture Reference Manual, section "Partial deprecation of IT"
  https://usermanual.wiki/Pdf/ARM20Architecture20Reference20ManualARMv8.1667877052.pdf

"ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to
instructions other than a single subsequent 16-bit instruction from a restricted set
are deprecated, as are explicit references to the PC within that single 16-bit
instruction. This permits the non-deprecated forms of IT and subsequent instructions
to be treated as a single 32-bit conditional instruction."

Reviewers: efriedma, lebedev.ri, t.p.northover, jmolloy, aemerson, compnerd, stoklund, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63474

llvm-svn: 363739
2019-06-18 20:55:09 +00:00
Sam Elliott 9f155bc6e5 [RISCV] Prevent re-ordering some adds after shifts
Summary:
DAGCombine will normally turn a `(shl (add x, c1), c2)` into `(add (shl x, c2), c1 << c2)`, where `c1` and `c2` are constants. This can be prevented by a callback in TargetLowering.

On RISC-V, materialising the constant `c1 << c2` can be more expensive than materialising `c1`, because materialising the former may take more instructions, and may use a register, where materialising the latter would not.

This patch implements the hook in RISCVTargetLowering to prevent this transform, in the cases where:
- `c1` fits into the immediate field in an `addi` instruction.
- `c1` takes fewer instructions to materialise than `c1 << c2`.

In future, DAGCombine could do the check to see whether `c1` fits into an add immediate, which might simplify more targets hooks than just RISC-V.

Reviewers: asb, luismarques, efriedma

Reviewed By: asb

Subscribers: xbolva00, lebedev.ri, craig.topper, lewis-revill, Jim, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62857

llvm-svn: 363736
2019-06-18 20:38:08 +00:00
Sanjay Patel 413ed69b4b [x86] add another test for load splitting with extracted stores (PR42305); NFC
llvm-svn: 363732
2019-06-18 20:13:35 +00:00
Adrian Prantl fc5107cde6 Add debug location verification for !llvm.loop attachments.
This patch teaches the Verifier how to detect broken !llvm.loop
attachments as discussed in https://reviews.llvm.org/D60831. This
allows LLVM to warn and strip out the broken debug info before
attempting an LTO compilation with input generated by LLVM predating
https://reviews.llvm.org/rL361149.

rdar://problem/51631158

Differential Revision: https://reviews.llvm.org/D63499

[Re-applies r363725 without changes after fixing a broken testcase.]

llvm-svn: 363731
2019-06-18 20:09:09 +00:00
Adrian Prantl 1db8d4a866 Fix broken debug info in in an !llvm.loop attachment in this testcase.
llvm-svn: 363730
2019-06-18 20:07:53 +00:00
Adrian Prantl acc93d62e0 Revert Add debug location verification for !llvm.loop attachments.
This reverts r363725 (git commit 8ff822d61d)

llvm-svn: 363728
2019-06-18 19:54:17 +00:00
Adrian Prantl 8ff822d61d Add debug location verification for !llvm.loop attachments.
This patch teaches the Verifier how to detect broken !llvm.loop
attachments as discussed in https://reviews.llvm.org/D60831. This
allows LLVM to warn and strip out the broken debug info before
attempting an LTO compilation with input generated by LLVM predating
https://reviews.llvm.org/rL361149.

rdar://problem/51631158

Differential Revision: https://reviews.llvm.org/D63499

llvm-svn: 363725
2019-06-18 19:42:29 +00:00
Jordan Rupprecht 33e85ad956 Revert [SROA] Enhance SROA to handle `addrspacecast`ed allocas
This reverts r363711 (git commit 76a149ef81)

This causes stage2 build failures, e.g.:
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/132/steps/stage%202%20build/logs/stdio
http://lab.llvm.org:8011/builders/ppc64le-lld-multistage-test/builds/87/steps/build-stage2-unified-tree/logs/stdio

llvm-svn: 363718
2019-06-18 18:40:04 +00:00
Michael Liao 76a149ef81 [SROA] Enhance SROA to handle `addrspacecast`ed allocas
Summary:
- After `addrspacecast` is allowed to be eliminated in SROA, the
  adjusting of storage pointer (from `alloca) needs to handle the
  potential different address spaces between the storage pointer (from
  alloca) and the pointer being used.

Reviewers: arsenm

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63501

llvm-svn: 363711
2019-06-18 17:58:49 +00:00
Sanjay Patel 223176f5d7 [x86] add test for load splitting with extracted store (PR42305); NFC
llvm-svn: 363704
2019-06-18 17:16:17 +00:00
Simon Tatham cfc70782d7 [ARM] Add MVE vector shift instructions.
This includes saturating and non-saturating shifts, both with
immediate shift count and with the shift counts given by another
vector register; VSHLC (in which the bits shifted out of each active
vector lane are shifted in to the next active lane); and also VMOVL,
which is enough like an immediate shift that it didn't fit too badly
in this category.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62672

llvm-svn: 363696
2019-06-18 16:19:59 +00:00
Simon Tatham faaf1a5366 [ARM] Add MVE integer vector min/max instructions.
Summary:
These form a small family of their own, to go with the floating-point
VMINNM/VMAXNM instructions added in a previous commit.

They introduce the first of many special cases in the mnemonic
recognition code, because VMIN with the E suffix used by the VPT
predication system needs to avoid being interpreted as the nonexistent
instruction 'VMI' with an ordinary 'NE' condition suffix.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62671

llvm-svn: 363695
2019-06-18 15:51:46 +00:00
Simon Pilgrim 9aa25be149 [TargetLowering] SimplifyDemandedVectorElts - support MUL and ANY_EXTEND_VECTOR_INREG
Also fold ANY_EXTEND_VECTOR_INREG -> BITCAST if we only need the bottom element.

Fixes temporary regression introduced in rL363693.

llvm-svn: 363694
2019-06-18 15:49:35 +00:00
Simon Pilgrim 9c8593934a [X86][AVX] extract_subvector(any_extend(x)) -> any_extend_vector_inreg(x)
Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here.

llvm-svn: 363693
2019-06-18 15:30:50 +00:00
Simon Tatham ed4a602515 [ARM] Rename MVE instructions in Tablegen for consistency.
Summary:
Their names began with a mishmash of `MVE_`, `t2` and no prefix at
all. Now they all start with `MVE_`, which seems like a reasonable
choice on the grounds that (a) NEON is the thing they're most at risk
of being confused with, and (b) MVE implies Thumb-2, so a prefix
indicating MVE is strictly more specific than one indicating Thumb-2.

Reviewers: ostannard, SjoerdMeijer, dmgreen

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63492

llvm-svn: 363690
2019-06-18 15:05:42 +00:00
Lewis Revill 74c8364954 [RISCV] Lower calls through PLT
This patch adds support for generating calls through the procedure
linkage table where required for a given ExternalSymbol or GlobalAddress
callee.

Differential Revision: https://reviews.llvm.org/D55304

llvm-svn: 363686
2019-06-18 14:29:45 +00:00
Fangrui Song 677423997d [llvm-readobj] Allow --hex-dump/--string-dump to dump multiple sections
1) `-x foo` currently dumps one `foo`. This change makes it dump all `foo`.
2) `-x foo -x foo` currently dumps `foo` twice. This change makes it dump `foo` once.
   In addition, if foo has section index 9, `-x foo -x 9` dumps `foo` once.
3) Give a warning instead of an error if `foo` does not exist.

The new behaviors match GNU readelf.

Also, print a new line as a separator between two section dumps.
GNU readelf uses two lines, but one seems good enough.

Reviewed By: grimar, jhenderson

Differential Revision: https://reviews.llvm.org/D63475

llvm-svn: 363683
2019-06-18 14:01:03 +00:00
Matt Arsenault 8d35dcd703 AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on
SI/CI.

llvm-svn: 363678
2019-06-18 13:19:57 +00:00
Simon Pilgrim 83bacd8d72 [SelectionDAG] Legalize vaargs that require vector splitting
This adds vector splitting for vaarg instructions during type legalization

Committed on behalf of @luke (Luke Lau)

Differential Revision: https://reviews.llvm.org/D60762

llvm-svn: 363671
2019-06-18 12:24:02 +00:00
Matt Arsenault bcb5ea0042 AMDGPU: Fold readlane from copy of SGPR or imm
These may be inserted to assert uniformity somewhere.

llvm-svn: 363670
2019-06-18 12:23:46 +00:00
Matt Arsenault 23f03f5059 AMDGPU: Fix iterator crash in AMDGPUPromoteAlloca
The lifetime intrinsic was erased, which was the next iterator.

llvm-svn: 363668
2019-06-18 12:23:44 +00:00
Matt Arsenault d5ce8ec778 AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scale
llvm-svn: 363667
2019-06-18 12:23:42 +00:00
Jonas Paulsson 5c64a8c4c6 [SystemZ] Fix AHIMuxK pseudo expansion.
Do not emit a copy if the source and destination registers are the same.

Review: Ulrich Weigand
llvm-svn: 363665
2019-06-18 12:10:02 +00:00
Graham Hunter 43854e3ccc [SVE][IR] Scalable Vector IR Type with pr42210 fix
Recommit of D32530 with a few small changes:
  - Stopped recursively walking through aggregates in
    the verifier, so that we don't impose too much
    overhead on large modules under LTO (see PR42210).
  - Changed tests to match; the errors are slightly
    different since they only report the array or
    struct that actually contains a scalable vector,
    rather than all aggregates which contain one in
    a nested member.
  - Corrected an older comment

Reviewers: thakis, rengolin, sdesmalen

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D63321

llvm-svn: 363658
2019-06-18 10:11:56 +00:00
Simon Pilgrim 6658bfb171 [X86] Regenerate promote.ll. NFC.
llvm-svn: 363657
2019-06-18 10:10:53 +00:00
Fangrui Song 291e11ea02 [llvm-objdump] Tidy up AMDGCNPrettyPrinter
llvm-svn: 363650
2019-06-18 06:35:18 +00:00
Craig Topper 02a445c245 [X86] Add i128 ctpop and i32/i64/i128 optsize test cases to popcnt.ll
Test cases for PR41151 and D59909.

llvm-svn: 363647
2019-06-18 04:52:49 +00:00
Craig Topper 587427716c [X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions.
The isel patterns for these use a bitcast and load/store, but
DAG combine should have canonicalized those away.

For the purposes of the memory folding table these opcodes can be
replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes.

llvm-svn: 363644
2019-06-18 03:23:15 +00:00
Craig Topper 8582ecd8d9 [X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class.
Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt.

Use the new versions in patterns that previously used a COPY_TO_REGCLASS
to VR128. These patterns expect the upper bits to be zero. The
current set up appears to work, but I'm not sure we should be
enforcing upper bits being zero through a COPY_TO_REGCLASS.

I wanted to flip the arrangement and use a COPY_TO_REGCLASS to
FR32/FR64 for the patterns that need an f32/f64 result, but that
complicated fastisel and globalisel.

I've been doing some experiments with reducing some isel patterns
and ended up in a situation where I had a
(SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our
post-isel peephole was unable to avoid using an instruction for
the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128
instruction removes the COPY_TO_REGCLASS that was breaking this.

llvm-svn: 363643
2019-06-18 03:23:11 +00:00
Alex Brachet 7747700937 [llvm-strip] Error when using stdin twice
Summary: Implements bug [[ https://bugs.llvm.org/show_bug.cgi?id=42204 | 42204 ]]. llvm-strip now warns when the same input file is used more than once, and errors when stdin is used more than once.

Reviewers: jhenderson, rupprecht, espindola, alexshap

Reviewed By: jhenderson, rupprecht

Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63122

llvm-svn: 363638
2019-06-18 00:39:10 +00:00
Matt Arsenault 5a321b899e GlobalISel: Use the original flags when lowering fneg to fsub
This was ignoring the flag on fneg, and using the source instruction's
flags. Also fixes tests missing from r358702.

Note the expansion itself isn't correct without nnan, but that should
be fixed separately.

llvm-svn: 363637
2019-06-17 23:48:43 +00:00
Peter Collingbourne d57f7cc15e hwasan: Use bits [3..11) of the ring buffer entry address as the base stack tag.
This saves roughly 32 bytes of instructions per function with stack objects
and causes us to preserve enough information that we can recover the original
tags of all stack variables.

Now that stack tags are deterministic, we no longer need to pass
-hwasan-generate-tags-with-calls during check-hwasan. This also means that
the new stack tag generation mechanism is exercised by check-hwasan.

Differential Revision: https://reviews.llvm.org/D63360

llvm-svn: 363636
2019-06-17 23:39:51 +00:00
Peter Collingbourne fb9ce100d1 hwasan: Add a tag_offset DWARF attribute to instrumented stack variables.
The goal is to improve hwasan's error reporting for stack use-after-return by
recording enough information to allow the specific variable that was accessed
to be identified based on the pointer's tag. Currently we record the PC and
lower bits of SP for each stack frame we create (which will eventually be
enough to derive the base tag used by the stack frame) but that's not enough
to determine the specific tag for each variable, which is the stack frame's
base tag XOR a value (the "tag offset") that is unique for each variable in
a function.

In IR, the tag offset is most naturally represented as part of a location
expression on the llvm.dbg.declare instruction. However, the presence of the
tag offset in the variable's actual location expression is likely to confuse
debuggers which won't know about tag offsets, and moreover the tag offset
is not required for a debugger to determine the location of the variable on
the stack, so at the DWARF level it is represented as an attribute so that
it will be ignored by debuggers that don't know about it.

Differential Revision: https://reviews.llvm.org/D63119

llvm-svn: 363635
2019-06-17 23:39:41 +00:00
Amara Emerson 146882242f [GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block.
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.

The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.

One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.

Overall, these changes improve CTMark code size on arm64 by 1.2%.

Full code size results:

Program                                         baseline       new       diff
------------------------------------------------------------------------------
 test-suite...-typeset/consumer-typeset.test    1249984      1217216     -2.6%
 test-suite...:: CTMark/ClamAV/clamscan.test    1264928      1232152     -2.6%
 test-suite :: CTMark/SPASS/SPASS.test          1394092      1361316     -2.4%
 test-suite...Mark/mafft/pairlocalalign.test    731320       714928      -2.2%
 test-suite :: CTMark/lencod/lencod.test        1340592      1324200     -1.2%
 test-suite :: CTMark/kimwitu++/kc.test         3853512      3820420     -0.9%
 test-suite :: CTMark/Bullet/bullet.test        3406036      3389652     -0.5%
 test-suite...ark/tramp3d-v4/tramp3d-v4.test    8017000      8016992     -0.0%
 test-suite...TMark/7zip/7zip-benchmark.test    2856588      2856588      0.0%
 test-suite...:: CTMark/sqlite3/sqlite3.test    765704       765704       0.0%
 Geomean difference                                                      -1.2%

Differential Revision: https://reviews.llvm.org/D63303

llvm-svn: 363632
2019-06-17 23:20:29 +00:00
Michael Berg f9bff2a55e Propagate fmf in IRTranslate for fneg
Summary: This case is related to D63405 in that we need to be propagating FMF on negates.

Reviewers: volkan, spatel, arsenm

Reviewed By: arsenm

Subscribers: wdng, javed.absar

Differential Revision: https://reviews.llvm.org/D63458

llvm-svn: 363631
2019-06-17 23:19:40 +00:00
Stanislav Mekhanoshin ca42687d62 [AMDGPU] gfx1010 subvector test. NFC.
llvm-svn: 363623
2019-06-17 21:55:06 +00:00
Volkan Keles 689509edab [test][AArch64] Relax the check line for G_BRJT in legalizer-info-validation.mir
Replace the specific number with a pattern to relax the test.

llvm-svn: 363621
2019-06-17 21:25:25 +00:00
Philip Reames 44475363e8 Teach getSCEVAtScope how to handle loop phis w/invariant operands in loops w/taken backedges
This patch really contains two pieces:
    Teach SCEV how to fold a phi in the header of a loop to the value on the backedge when a) the backedge is known to execute at least once, and b) the value is safe to use globally within the scope dominated by the original phi.
    Teach IndVarSimplify's rewriteLoopExitValues to allow loop invariant expressions which already exist (and thus don't need new computation inserted) even in loops where we can't optimize away other uses.

Differential Revision: https://reviews.llvm.org/D63224

llvm-svn: 363619
2019-06-17 21:06:17 +00:00
Daniel Sanders 184c8ee920 [globalisel] Fix iterator invalidation in the extload combines
Summary:
Change the way we deal with iterator invalidation in the extload combines as it
was still possible to neglect to visit a use. Even worse, it happened in the
in-tree test cases and the checks weren't good enough to detect it.

We now take a cheap copy of the use list before iterating over it. This
prevents iterator invalidation from occurring and has the nice side effect
of making the existing schedule-for-erase/schedule-for-insert mechanism
moot.

Reviewers: aditya_nandakumar

Reviewed By: aditya_nandakumar

Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61813

llvm-svn: 363616
2019-06-17 20:56:31 +00:00
Stanislav Mekhanoshin 3138278287 [AMDGPU] Propagate function attributes thru bitcasts
AMDGPUPropagateAttributes will not work on function bitcatsts,
so move AMDGPUFixFunctionBitcasts before it.

Differential Revision: https://reviews.llvm.org/D63455

llvm-svn: 363614
2019-06-17 20:42:48 +00:00
Philip Reames fe8bd96ebd Fix a bug w/inbounds invalidation in LFTR (recommit)
Recommit r363289 with a bug fix for crash identified in pr42279.  Issue was that a loop exit test does not have to be an icmp, leading to a null dereference crash when new logic was exercised for that case.  Test case previously committed in r363601.

Original commit comment follows:

This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV.

The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program.

As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well.

(Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.)

Differential Revision: https://reviews.llvm.org/D62939

llvm-svn: 363613
2019-06-17 20:32:22 +00:00
Nicolai Haehnle ae4fcb97dd AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer
Summary:
The purpose of the padding is to guard against stale code being
fetched into the instruction cache by the lowest level prefetching.
We're generating relocatable ELF here, and so the padding should
arguably be added by the linker. This is in fact what Mesa does.

This also fixes multi-part shaders for Mesa.

Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82

Reviewers: arsenm, rampitec, t-tye

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63427

llvm-svn: 363602
2019-06-17 19:28:43 +00:00
Philip Reames 58c75565f3 Reduced test case for pr42279 in advance of the relevant re-commit + fix
llvm-svn: 363601
2019-06-17 19:27:45 +00:00
Nicolai Haehnle 8af7198c6c AMDGPU: Explicitly define a triple for some tests
Summary:
This is related to the changes to the groupstaticsize intrinsic in
D61494 which would otherwise make the related tests in these files
fail or much less useful.

Note that for some reason, SOPK generation is less effective in the
amdhsa OS, which is why I chose PAL. I haven't investigated this
deeper.

Change-Id: I6bb99569338f7a433c28b4c9eb1e3e036b00d166

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63392

llvm-svn: 363600
2019-06-17 19:25:57 +00:00
Joseph Tremoulet daa1ae6142 [EarlyCSE] Fix hashing of self-compares
Summary:
Update compare normalization in SimpleValue hashing to break ties (when
the same value is being compared to itself) by switching to the swapped
predicate if it has a lower numerical value.  This brings the hashing in
line with isEqual, which already recognizes the self-compares with
swapped predicates as equal.

Fixes PR 42280.

Reviewers: spatel, efriedma, nikic, fhahn, uabelho

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63349

llvm-svn: 363598
2019-06-17 19:11:28 +00:00
Alina Sbirlea 7a0098aa6e [MemorySSA] Don't use template when the clone is a simplified instruction.
Summary:
LoopRotate doesn't create a faithful clone of an instruction, it may
simplify it beforehand. Hence the clone of an instruction that has a
MemoryDef associated may not be a definition, but a use or not a memory
alternig instruction.
Don't rely on the template when the clone may be simplified.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63355

llvm-svn: 363597
2019-06-17 18:58:40 +00:00
Jessica Paquette 49537bbf74 [GlobalISel][AArch64] Fold G_SUB into G_ICMP when it's safe to do so
Basically porting over the behaviour in AArch64ISelLowering to GISel. See
emitComparison for reference.

When we have something like this:

```
  lhs = G_SUB 0, y
  ...
  G_ICMP lhs, rhs
```

We can fold away the G_SUB and produce a cmn instead, given that we produce
the same value in NZCV.

Add a test showing that the transformation works, and also showing that we
don't perform the transformation when it's unsafe.

Also factor out the CSet emission into emitCSetForICMP.

Differential Revision: https://reviews.llvm.org/D63163

llvm-svn: 363596
2019-06-17 18:40:06 +00:00
Simon Pilgrim 835999e48a [X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026)
If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64).

llvm-svn: 363592
2019-06-17 18:20:04 +00:00
Alina Sbirlea 05f77803f4 [MemorySSA] Add all MemoryPhis before filling their values.
Summary:
Add all MemoryPhis in IDF before filling in their incomign values.
Otherwise, a new Phi can be added that needs to become the incoming
value of another Phi.
Test fails the verification in verifyPrevDefInPhis.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63353

llvm-svn: 363590
2019-06-17 18:16:53 +00:00
Stanislav Mekhanoshin a9191c8492 [AMDGPU] gfx1010 wavefrontsize intrinsic folding
Differential Revision: https://reviews.llvm.org/D63206

llvm-svn: 363588
2019-06-17 17:57:50 +00:00
Matt Arsenault 6d741f29ec AMDGPU: Fold readlane/readfirstlane calls
llvm-svn: 363587
2019-06-17 17:52:35 +00:00
Stanislav Mekhanoshin ad04e7ad42 [AMDGPU] Pass to propagate ABI attributes from kernels to the functions
The pass works in two modes:

Mode 1: Just set attributes starting from kernels. This can work at
the very beginning of opt and llc pipeline, but cannot clone functions
because it must be a function pass.

Mode 2: Actually clone functions for new attributes. This can only work
after all function passes in the opt pipeline because it has to be a
module pass.

Differential Revision: https://reviews.llvm.org/D63208

llvm-svn: 363586
2019-06-17 17:47:28 +00:00
Simon Pilgrim bb9adfdb4e [X86][AVX] Split under-aligned vector nt-stores.
If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it.

llvm-svn: 363582
2019-06-17 17:22:38 +00:00
Warren Ristow 6452bdd29b [LV] Suppress vectorization in some nontemporal cases
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.

This adds two new functions:
  bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
  bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;

to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.

This fixes https://llvm.org/PR40759

Differential Revision: https://reviews.llvm.org/D61764

llvm-svn: 363581
2019-06-17 17:20:08 +00:00
Matt Arsenault 3e140066bc GlobalISel: Ignore callsite attributes when picking intrinsic type
A target intrinsic may be defined as possibly reading memory, but the
call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic assumption
of the intrinsic definition, so the chain should still be used.

I fixed the same bug in SelectionDAG in r287593.

llvm-svn: 363580
2019-06-17 17:01:35 +00:00
Matt Arsenault a7f09f3c9e GlobalISel: Verify intrinsics
I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.

llvm-svn: 363579
2019-06-17 17:01:32 +00:00
Stanislav Mekhanoshin 5d00c3060e [AMDGPU] gfx1010 wave32 metadata
Differential Revision: https://reviews.llvm.org/D63207

llvm-svn: 363577
2019-06-17 16:48:56 +00:00
Tom Stellard 8b1c53b528 AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECT
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60640

llvm-svn: 363576
2019-06-17 16:27:43 +00:00
Francis Visoiu Mistrih 34667519dc [Remarks] Extend -fsave-optimization-record to specify the format
Use -fsave-optimization-record=<format> to specify a different format
than the default, which is YAML.

For now, only YAML is supported.

llvm-svn: 363573
2019-06-17 16:06:00 +00:00
Simon Pilgrim 1c91e63897 [X86][SSE] Add tests for underaligned nt loads
Test both 'unaligned' (which we should just use regular unaligned loads) and 'subvector aligned' (which we should split)

llvm-svn: 363565
2019-06-17 14:38:17 +00:00
Simon Pilgrim 454e6b9010 [X86][SSE] Prevent misaligned non-temporal vector load/store combines
For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway.

First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets.

Differential Revision: https://reviews.llvm.org/D63246

llvm-svn: 363564
2019-06-17 14:26:10 +00:00
Matt Arsenault 1df203d78e InferAddressSpaces: Fix cloning original addrspacecast
If an addrspacecast needed to be inserted again, this was creating a
clone of the original cast for each user. Just use the original, which
also saves losing the value name.

llvm-svn: 363562
2019-06-17 14:13:29 +00:00
Matt Arsenault b10f097833 AMDGPU: Ignore subtarget for InferAddressSpaces
Even if the target doesn't have flat instructions, addrspace(0) is
still flat. It just happens to not work.

llvm-svn: 363561
2019-06-17 14:13:24 +00:00
Matt Arsenault f3b64d80bc AMDGPU: Mark exp/exp.compr as inaccessiblememonly
Should also be marked writeonly, but I think that would require
splitting the version with done set to a separate intrinsic

Test change is only from renumbering the attribute group numbers,
which for some reason the generated check lines consider.

llvm-svn: 363560
2019-06-17 13:52:24 +00:00
Sam Parker 1bd3d00e7e [CodeGen] Check for HardwareLoop Latch ExitBlock
The HardwareLoops pass finds exit blocks with a scevable exit count.
If the target specifies to update the loop counter in a register,
through a phi, we need to ensure that the exit block is a latch so
that we can insert the phi with the correct value for the incoming
edge.

Differential Revision: https://reviews.llvm.org/D63336

llvm-svn: 363556
2019-06-17 13:39:28 +00:00
Simon Pilgrim f1e2827170 [X86][SSE] Avoid unnecessary stack codegen in NT store codegen tests.
llvm-svn: 363552
2019-06-17 12:35:26 +00:00
Bjorn Pettersson 83773b77a5 [LV] Deny irregular types in interleavedAccessCanBeWidened
Summary:
Avoid that loop vectorizer creates loads/stores of vectors
with "irregular" types when interleaving. An example of
an irregular type is x86_fp80 that is 80 bits, but that
may have an allocation size that is 96 bits. So an array
of x86_fp80 is not bitcast compatible with a vector
of the same type.

Not sure if interleavedAccessCanBeWidened is the best
place for this check, but it solves the problem seen
in the added test case. And it is the same kind of check
that already exists in memoryInstructionCanBeWidened.

Reviewers: fhahn, Ayal, craig.topper

Reviewed By: fhahn

Subscribers: hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63386

llvm-svn: 363547
2019-06-17 12:02:24 +00:00
Sander de Smalen 74ac20158a Test forward references in IntrinsicEmitter on Neon LD(2|3|4)
This patch tests the forward-referencing added in D62995 by changing
some existing intrinsics to use forward referencing of overloadable
parameters, rather than backward referencing.

This patch changes the TableGen definition/implementation of
llvm.aarch64.neon.ld2lane and llvm.aarch64.neon.ld2lane intrinsics
(and similar for ld3 and ld4). This change is intended to be
non-functional, since the behaviour of the intrinsics is
expected to be the same.

Reviewers: arsenm, dmgreen, RKSimon, greened, rnk

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D63189

llvm-svn: 363546
2019-06-17 12:01:53 +00:00
Luis Marques 2e46312ffd [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting
Some GEPs were not being split, presumably because that split would just be 
undone by the DAGCombiner. Not performing those splits can prevent important 
optimizations, such as preventing the element indices / member offsets from 
being (partially) folded into load/store instruction immediates. This patch:

- Makes the splits also occur in the cases where the base address and the GEP 
  are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.

Differential Revision: https://reviews.llvm.org/D60294

llvm-svn: 363544
2019-06-17 10:54:12 +00:00
Simon Pilgrim ef78e55205 [SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v in getNode
This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source.

llvm-svn: 363542
2019-06-17 10:14:52 +00:00
Sam Parker 60d6fb2a63 [SCEV] Use NoWrapFlags when expanding a simple mul
Second functional change following on from rL362687. Pass the
NoWrapFlags from the MulExpr to InsertBinop when we're generating a
shl or mul.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 363540
2019-06-17 10:05:18 +00:00
Fangrui Song 46f9cbe28d [llvm-objdump] Use %08 instead of %016 to print leading addresses for 32-bit binaries
Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D63398

llvm-svn: 363539
2019-06-17 09:59:55 +00:00
Fangrui Song ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
Roman Lebedev 25a043e78a [NFC][Codegen] Standalone tests for icmp eq/ne (urem %x, C), 0 -> icmp eq/ne %x, 0 fold (D63390)
llvm-svn: 363537
2019-06-17 09:50:50 +00:00
Sander de Smalen 5d6ee76c16 Describe stack-id as an enum
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.

This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.

Reviewers: arsenm, thegameg, qcolombet

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D60137

llvm-svn: 363533
2019-06-17 09:13:29 +00:00
Hans Wennborg a9e5d2f35d Re-commit r357452 (take 3): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)"
Third time's the charm.

This was reverted in r363220 due to being suspected of an internal benchmark
regression and a test failure, none of which turned out to be caused by this.

llvm-svn: 363529
2019-06-17 07:47:28 +00:00
Yevgeny Rouban ee62c40eae [SimplifyCFG] Fix prof branch_weights MD while removing unreachable switch cases
SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata
if unreachable switch cases are removed. This patch fixes this bug by making use
of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122).
A new test is created.

Differential Revision: https://reviews.llvm.org/D62186

llvm-svn: 363527
2019-06-17 05:55:12 +00:00
Justin Hibbits 1d1cf30b73 PowerPC: Optimize SPE double parameter calling setup
Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types.  This is all handled by the target-independent layer.  However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.

For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:

    evstdd      5, X(1)
    lwz         3, X(1)
    lwz         4, X+4(1)

Likewise, to form a double into r5 from args in r3 and r4:

    stw         3, X(1)
    stw         4, X+4(1)
    evldd       5, X(1)

This optimizes the fence to use SPE instructions.  Now, to pass a double
to a function:

    mr          4, 5
    evmergehi   3, 5, 5

And to form a double into r5 from args in r3 and r4:

    evmergelo   5, 3, 4

This is comparable to the way that gcc generates the double splits.

This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.

Reviewers: nemanjai, hfinkel, joerg

Subscribers: kbarton, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D54583

llvm-svn: 363526
2019-06-17 03:15:23 +00:00
Seiya Nuta 4f15732067 [yaml2obj][MachO] Don't fill dummy data for virtual sections
Summary:
Currently, MachOWriter::writeSectionData writes dummy data (0xdeadbeef) to fill section data areas in the file even if the section is a virtual one. Since virtual sections don't occupy any space in the file, writing dummy data could results the  "OS.tell() - fileStart <= Sec.offset" assertion failure.

This patch fixes the bug by simply not writing any dummy data for virtual sections.

Reviewers: beanz, jhenderson, rupprecht, alexshap

Reviewed By: alexshap

Subscribers: compnerd, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62991

llvm-svn: 363525
2019-06-17 02:07:20 +00:00
Seiya Nuta 13de174b4c [llvm-objcopy] Add elf32-sparc and elf32-sparcel target
Summary:
The "sparc"/"sparcel" architectures appears in ArchMap (used by -B option) but not in OutputFormatMap (used by -I/-O option). Add their targets into OutputFormatMap for consistency.

Note that AFAIK there're no targets for 32-bit little-endian SPARC ("elf32-sparcel") in GNU binutils.

Reviewers: espindola, alexshap, rupprecht, jhenderson, compnerd, jakehehrlich

Reviewed By: jhenderson, compnerd, jakehehrlich

Subscribers: jyknight, emaste, arichardson, fedor.sergeev, jakehehrlich, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63238

llvm-svn: 363524
2019-06-17 02:03:45 +00:00
Roman Lebedev 5a663bd77a [InstSimplify] Fix addo/subo undef folds (PR42209)
Fix folds of addo and subo with an undef operand to be:

`@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`,
 as per LLVM undef rules.
Same for commuted variants.

Based on the original version of the patch by @nikic.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 | PR42209 ]]

Differential Revision: https://reviews.llvm.org/D63065

llvm-svn: 363522
2019-06-16 20:39:45 +00:00
Nicolai Haehnle 41abf2766e AMDGPU: Prepare for explicit absolute relocations in code generation
Summary:
We will use absolute relocations for LDS symbols.

Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61492

llvm-svn: 363517
2019-06-16 17:43:37 +00:00
Nicolai Haehnle 6d71be4e67 AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0
Summary:
Instead of encoding a high-word of 0 using a fake TargetGlobalAddress,
just use a literal target constant. This simplifies some subsequent changes.

The generated assembly is now more explicit about the kind of relocation
that is to be used.

Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61491

llvm-svn: 363516
2019-06-16 17:32:01 +00:00
Nicolai Haehnle 490e83cd43 AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic
Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62486

Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0
llvm-svn: 363514
2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin 5250021672 [AMDGPU] gfx10 conditional registers handling
This is cpp source part of wave32 support, excluding overriden
getRegClass().

Differential Revision: https://reviews.llvm.org/D63351

llvm-svn: 363513
2019-06-16 17:13:09 +00:00
Sanjay Patel c8d88ad1a9 [CodeGenPrepare][x86] shift both sides of a vector select when profitable
This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428

Proper vector shift instructions don't appear until AVX2, so we may generate several
extra instructions within a loop trying to compensate for that. It's difficult to
recover from that shift expansion later than this, so use the existing TLI hook and
splat analysis to enable better codegen.

This extends CGP functionality introduced with:
rL201655

Differential Revision: https://reviews.llvm.org/D63233

llvm-svn: 363511
2019-06-16 15:29:03 +00:00
Sanjay Patel d14389c0a5 [x86] split 256-bit vector selects if operands are vector concats
This is similar logic/motivation to the select splitting in D62969.

In D63233, the pattern changes so that we no longer have an extract_subvector of vselect,
but the operands of the select are still being concatenated.

The closest case is represented in either the first or last test diffs here - we have an
extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions.
I think that's the right trade-off for most AVX1 targets.

In the example based on PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx).

Differential Revision: https://reviews.llvm.org/D63364

llvm-svn: 363508
2019-06-16 14:04:49 +00:00
Simon Pilgrim fcffc2facc [X86] CombineShuffleWithExtract - handle cases with different vector extract sources
Insert the shorter vector source into an undef vector of the longer vector source's type.

llvm-svn: 363507
2019-06-16 08:00:41 +00:00
Simon Pilgrim 90e87af303 [X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles
Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well.

Fixes PR34380

llvm-svn: 363500
2019-06-15 18:30:43 +00:00
Simon Pilgrim 990f3ceb67 [X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3)
This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0)

llvm-svn: 363499
2019-06-15 17:05:24 +00:00
Roman Lebedev 5dd61974f9 [NFC][MCA][X86] Add one more 'clear super register' pattern - movss/movsd load clears high XMM bits
llvm-svn: 363498
2019-06-15 16:12:13 +00:00
Roman Lebedev 680c43b73a [NFC][MCA][X86] Add baseline test coverage for AMD Barcelona (aka K10, fam10h)
Looking into sched model for that CPU ...

llvm-svn: 363497
2019-06-15 16:12:05 +00:00
Kang Zhang 2d51adcb57 [PowerPC] Set the innermost hot loop to align 32 bytes
Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
 the loop will depend on the hotness check and other logic in alignBlocks.

The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
 to 32 bytes not only for the hot loop whose size is 16~32 bytes.

Reviewed By: steven.zhang, jsji

Differential Revision: https://reviews.llvm.org/D61228

llvm-svn: 363495
2019-06-15 15:10:24 +00:00
Nikita Popov 8550fb386a [SCEV] Use unsigned/signed intersection type in SCEV
Based on D59959, this switches SCEV to use unsigned/signed range
intersection based on the sign hint. This will prefer non-wrapping
ranges in the relevant domain. I've left the one intersection in
getRangeForAffineAR() to use the smallest intersection heuristic,
as there doesn't seem to be any obvious preference there.

Differential Revision: https://reviews.llvm.org/D60035

llvm-svn: 363490
2019-06-15 09:15:52 +00:00
Nikita Popov 9145562b48 [SimplifyIndVar] Simplify non-overflowing saturating add/sub
If we can detect that saturating math that depends on an IV cannot
overflow, replace it with simple math. This is similar to the CVP
optimization from D62703, just based on a different underlying
analysis (SCEV vs LVI) that catches different cases.

Differential Revision: https://reviews.llvm.org/D62792

llvm-svn: 363489
2019-06-15 08:48:52 +00:00
Fangrui Song e1aa69f755 [RISCV] Regenerate remat.ll and atomic-rmw.ll after D43256
llvm-svn: 363487
2019-06-15 07:49:14 +00:00
Alex Brachet 899a3072f0 [objcopy] Error when --preserve-dates is specified with standard streams
Summary: llvm-objcopy/strip now error when -p is specified when reading from stdin or writing to stdout

Reviewers: jhenderson, rupprecht, espindola, alexshap

Reviewed By: jhenderson, rupprecht

Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63090

llvm-svn: 363485
2019-06-15 05:32:23 +00:00
Michael Berg ad6bb86b2d adding more fmf propagation for selects plus updated tests
llvm-svn: 363484
2019-06-15 04:53:51 +00:00
Fangrui Song 968b5f84af Revert "adding more fmf propagation for selects plus tests"
This reverts rL363474. -debug-only=isel was added to some tests that
don't specify `REQUIRES: asserts`. This causes failures on
-DLLVM_ENABLE_ASSERTIONS=off builds.

I chose to revert instead of fixing the tests because I'm not sure
whether we should add `REQUIRES: asserts` to more tests.

llvm-svn: 363482
2019-06-15 03:51:08 +00:00
Huihui Zhang dc2fd6a14e [InstCombine] Add tests to show missing fold opportunity for "icmp and shift" (nfc).
Summary:
For icmp pred (and (sh X, Y), C), 0

  When C is signbit, expect to fold (X << Y) & signbit ==/!= 0 into (X << Y) >=/< 0,
  rather than (X & (signbit >> Y)) != 0.

  When C+1 is power of 2, expect to fold (X << Y) & ~C ==/!= 0 into (X << Y) </>= C+1,
  rather than (X & (~C >> Y)) == 0.

For icmp pred (and X, (sh signbit, Y)), 0

  Expect to fold (X & (signbit l>> Y)) ==/!= 0 into (X << Y) >=/< 0
  Expect to fold (X & (signbit << Y)) ==/!= 0 into (X l>> Y) >=/< 0

  Reviewers: lebedev.ri, efriedma, spatel, craig.topper

  Reviewed By: lebedev.ri

  Subscribers: llvm-commits

  Tags: #llvm

  Differential Revision: https://reviews.llvm.org/D63025

llvm-svn: 363479
2019-06-15 00:33:41 +00:00
Matt Arsenault 9487278010 Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect"
This reapplies r363410, avoiding null dereference if there is no
AltRegBank.

llvm-svn: 363478
2019-06-15 00:33:26 +00:00
Mitch Phillips 0d44f129bb Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect"
This patch breaks UBSan build bots. See
https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for
a guide as to how to reproduce the error.

This reverts commit c2864c0de0.
This reverts rL363410.

llvm-svn: 363476
2019-06-14 23:45:34 +00:00
Michael Berg 69394bedc5 adding more fmf propagation for selects plus tests
llvm-svn: 363474
2019-06-14 23:30:52 +00:00
Guozhi Wei d2210af332 [MBP] Move a latch block with conditional exit and multi predecessors to top of loop
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:

    * a latch block
    * it has two successors, one is loop header, another is exit
    * it has more than one predecessors

If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.

Differential Revision: https://reviews.llvm.org/D43256

llvm-svn: 363471
2019-06-14 23:08:59 +00:00
Akira Hatanaka a704a8f28c [ObjC][ARC] Delete ObjC runtime calls on global variables annotated
with 'objc_arc_inert'

Those calls are no-ops, so they can be safely deleted.

rdar://problem/49839633

Differential Revision: https://reviews.llvm.org/D62433

llvm-svn: 363468
2019-06-14 22:06:32 +00:00
Matt Arsenault aa41e92e17 AMDGPU: Avoid most waitcnts before calls
Currently you get extra waits, because waits are inserted for the
register dependencies of the call, and the function prolog waits on
everything.

Currently waits are still inserted on returns. It may make sense to
not do this, and wait in the caller instead.

llvm-svn: 363465
2019-06-14 21:52:26 +00:00
Francis Visoiu Mistrih 5501dda247 [Remarks][NFC] Improve testing and documentation of -foptimization-record-passes
This adds:

* documentation to the user manual
* nicer error message
* test for the error case
* test for the gold plugin

llvm-svn: 363463
2019-06-14 21:38:57 +00:00
Matt Arsenault 282dac717e SROA: Allow eliminating addrspacecasted allocas
There is a circular dependency between SROA and InferAddressSpaces
today that requires running both multiple times in order to be able to
eliminate all simple allocas and addrspacecasts. InferAddressSpaces
can't remove addrspacecasts when written to memory, and SROA helps
move pointers out of memory.

This should avoid inserting new commuting addrspacecasts with GEPs,
since there are unresolved questions about pointer wrapping between
different address spaces.

For now, don't replace volatile operations that don't match the alloca
addrspace, as it would change the address space of the access. It may
be still OK to insert an addrspacecast from the new alloca, but be
more conservative for now.

llvm-svn: 363462
2019-06-14 21:38:31 +00:00
Matt Arsenault e6efb6433f SROA: Add baseline test for addrspacecast changes
llvm-svn: 363460
2019-06-14 21:22:26 +00:00
Matt Arsenault bb0a610599 AMDGPU: Fix capitalized register names in asm constraints
This was a workaround a long time ago, but the canonical lower case
names work now.

llvm-svn: 363459
2019-06-14 21:16:06 +00:00
Matt Arsenault 9e5fa33378 AMDGPU: Fix dropping memref for ds append/consume
The way SelectionDAG treats memory operands is very frustrating, and
by default drops them unless a property is set on the pattern. There
is no pattern for manually selected instructions, so this requires
manually setting them.

llvm-svn: 363455
2019-06-14 21:01:24 +00:00
Matt Arsenault 1509fde891 AMDGPU: Add baseline test for call waitcnt insertion
llvm-svn: 363453
2019-06-14 21:01:23 +00:00
Sanjay Patel 501bb982b9 [x86] add test for 256-bit blendv with AVX targets; NFC
This is a reduction of the pattern seen in D63233.

llvm-svn: 363448
2019-06-14 20:03:42 +00:00
Amara Emerson f79d3bc724 [GlobalISel] Add a G_BRJT opcode.
This is a branch opcode that takes a jump table pointer, jump table index and an
index into the table to do an indirect branch.

We pass both the table pointer and JTI to allow targets like ARM64 to more
easily use the existing jump table compression optimization without having to
walk up the block to find a paired G_JUMP_TABLE.

Differential Revision: https://reviews.llvm.org/D63159

llvm-svn: 363434
2019-06-14 17:55:48 +00:00
Florian Hahn dcdd12b68c Revert Fix a bug w/inbounds invalidation in LFTR
Reverting because it breaks a green dragon build:
    http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208

This reverts r363289 (git commit eb88badff9)

llvm-svn: 363427
2019-06-14 17:23:09 +00:00
Valery Pykhtin ffeb01c113 [AMDGPU] Don't constrain callees with inlinehint from inlining on MaxBB check
Summary: Function bodies marked inline in an opencl source are eliminated but MaxBB check may prevent inlining them leaving undefined references.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63337

llvm-svn: 363418
2019-06-14 16:37:33 +00:00
Kevin P. Neal fece7c6c83 [FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase of ISelLowering to mirror non-strict nodes on x86.
I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Craig Topper, Kevin P. Neal
Approved by:	Craig Topper
Differential Revision:	https://reviews.llvm.org/D63271

llvm-svn: 363417
2019-06-14 16:28:55 +00:00
Sanjay Patel 75312aa805 [x86] move vector shift tests for PR37428; NFC
As suggested in the post-commit thread for rL363392 - it's
wasteful to have so many runs for larger tests. AVX1/AVX2
is what shows the diff and probably what matters most going
forward.

llvm-svn: 363411
2019-06-14 15:23:09 +00:00
Matt Arsenault c2864c0de0 GlobalISel: Avoid producing Illegal copies in RegBankSelect
Avoid producing illegal register bank copies for reg_sequence and
phi. The default implementation assumes it is possible to pick any
operand's bank and use that for the result, introducing a copy for
operands with a different bank. This does not check for illegal
copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR
operand requires the result to be a VGPR.

The changes in getInstrMappingImpl aren't strictly necessary, since
AMDGPU now just bypasses this for reg_sequence/phi. This could be
replaced with an assert in case other targets run into this. It is
currently responsible for producing the error for unsatisfiable
copies, but this will be better served with a verifier check.

For phis, for now assume any undetermined operands must be
VGPRs. Eventually, this needs to be able to defer mapping these
operations. This also does not yet have a way to check for whether the
block is in a divergent region.

llvm-svn: 363410
2019-06-14 15:22:25 +00:00
Sanjay Patel 7ea378b940 [CodeGenPrepare] propagate debuginfo when copying a shuffle
llvm-svn: 363409
2019-06-14 15:05:35 +00:00
Matt Arsenault 492d71cc99 AMDGPU: Fold readlane intrinsics of constants
I'm not 100% sure about this, since I'm worried about IR transforms
that might end up introducing divergence downstream once replaced with
a constant, but I haven't come up with an example yet.

llvm-svn: 363406
2019-06-14 14:51:26 +00:00
Mikhail Maltsev d1cc2e1543 [ARM] Add MVE horizontal accumulation instructions
This is the family of vector instructions that combine all the lanes
in their input vector(s), and output a value in one or two GPRs.

Differential Revision: https://reviews.llvm.org/D62670

llvm-svn: 363403
2019-06-14 14:31:13 +00:00
George Rimar 0aecabae14 Revert "Revert r363377: [yaml2obj] - Allow setting custom section types for implicit sections."
LLD test case will be fixed in a following commit.

Original commit message:

[yaml2obj] - Allow setting custom section types for implicit sections.

We were hardcoding the final section type for sections that
are usually implicit. The patch fixes that.

This also fixes a few issues in existent test cases and removes
one precompiled object.

Differential revision: https://reviews.llvm.org/D63267

llvm-svn: 363401
2019-06-14 14:25:34 +00:00
Rui Ueyama 9f4e21c69a Revert r363377: [yaml2obj] - Allow setting custom section types for implicit sections.
This reverts commit r363377 because lld's ELF/invalid/undefined-local-symbol-in-dso.test
test started failing after this commit.

llvm-svn: 363394
2019-06-14 13:57:25 +00:00
Sanjay Patel e5a78cd90f [x86] add test for original example in PR37428; NFC
The reduced case may avoid complications seen in this larger function.

llvm-svn: 363392
2019-06-14 13:44:01 +00:00
Matt Arsenault 74d67c2086 AMDGPU: Fix printing trailing whitespace after s_endpgm
llvm-svn: 363384
2019-06-14 13:26:29 +00:00
George Rimar 3b523c0a2e [yaml2obj] - Allow setting custom section types for implicit sections.
We were hardcoding the final section type for sections that
are usually implicit. The patch fixes that.

This also fixes a few issues in existent test cases and removes
one precompiled object.

Differential revision: https://reviews.llvm.org/D63267

llvm-svn: 363377
2019-06-14 12:16:59 +00:00
James Henderson f7cfabb45d [llvm-readobj] Don't abort printing of dynamic table if string reference is invalid
If dynamic table is missing, output "dynamic strtab not found'. If the index is
out of range, output "Invalid Offset<..>".

https://bugs.llvm.org/show_bug.cgi?id=40807

Reviewed by: jhenderson, grimar, MaskRay

Differential Revision: https://reviews.llvm.org/D63084

Patch by Yuanfang Chen.

llvm-svn: 363374
2019-06-14 12:02:01 +00:00
George Rimar d6df7ded6e [llvm-readobj] - Do not fail to dump the object which has wrong type of .shstrtab.
Imagine we have object that has .shstrtab with type != SHT_STRTAB.
In this case, we fail to dump the object, though GNU readelf dumps it without
any issues and warnings.

This patch fixes that. It adds a code to ELFDumper.cpp which is based on the implementation of getSectionName from the ELF.h:

https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Object/ELF.h#L608
https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Object/ELF.h#L431
https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Object/ELF.h#L539

The difference is that all non critical errors are ommitted what allows us to
improve the dumping on a tool side. Also, this opens a road for a follow-up that
should allow us to dump the section headers, but drop the section names in case if .shstrtab is completely absent and/or broken.

Differential revision: https://reviews.llvm.org/D63266

llvm-svn: 363371
2019-06-14 11:56:10 +00:00
Sjoerd Meijer 3058a62b90 [ARM] MVE VPT Block Pass
Initial commit of a new pass to create vector predication blocks, called VPT
blocks, that are supported by the Armv8.1-M MVE architecture.

This is a first naive implementation. I.e., for 2 consecutive predicated
instructions I1 and I2, for example, it will generate 2 VPT blocks:

VPST
I1
VPST
I2

A more optimal implementation would obviously put instructions in the same VPT
block when they are predicated on the same condition and when it is allowed to
do this:

VPTT
I1
I2

We will address this optimisation with follow up patches when the groundwork is
in. Creating VPT Blocks is very similar to IT Blocks, which is the reason I
added this to Thumb2ITBlocks.cpp. This allows reuse of the def use analysis
that we need for the more optimal implementation.

VPT blocks cannot be nested in IT blocks, and vice versa, and so these 2 passes
cannot interact with each other. Instructions allowed in VPT blocks must
be MVE instructions that are marked as VPT compatible.

Differential Revision: https://reviews.llvm.org/D63247

llvm-svn: 363370
2019-06-14 11:46:05 +00:00
George Rimar 43f62ff17c [yaml2obj] - Allow setting the custom Address for .strtab
Despite the fact that .strtab is non-allocatable,
there is no reason to disallow setting the custom address
for it.

The patch also adds a test case showing we can set any address
we want for other implicit sections.

Differential revision: https://reviews.llvm.org/D63137

llvm-svn: 363368
2019-06-14 11:13:32 +00:00
George Rimar cfa1a62a4c [yaml2obj] - Allow setting cutom Flags for implicit sections.
With this patch we get ability to set any flags we want
for implicit sections defined in YAML.

Differential revision: https://reviews.llvm.org/D63136

llvm-svn: 363367
2019-06-14 11:01:14 +00:00
Sam Parker 0cf9639a9c [SCEV] Pass NoWrapFlags when expanding an AddExpr
InsertBinop now accepts NoWrapFlags, so pass them through when
expanding a simple add expression.

This is the first re-commit of the functional changes from rL362687,
which was previously reverted.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 363364
2019-06-14 09:19:41 +00:00
Eugene Leviant d46ebd207b [llvm-objcopy][IHEX] Improve test case formatting. NFC
Differential revision: https://reviews.llvm.org/D63258

llvm-svn: 363359
2019-06-14 08:09:10 +00:00
Alex Brachet d54d4f9905 [llvm-objcopy] Changed command line parsing errors
Summary: Tidied up errors during command line parsing to be more consistent with the rest of llvm-objcopy errors.

Reviewers: jhenderson, rupprecht, espindola, alexshap

Reviewed By: jhenderson, rupprecht

Subscribers: emaste, arichardson, MaskRay, llvm-commits, jakehehrlich

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62973

llvm-svn: 363350
2019-06-14 02:04:02 +00:00
David Blaikie 4129e3e0f8 DebugInfo: Include enumerators in pubnames
This is consistent with GCC's behavior (which is the defacto standard
for pubnames). Though I find the presence of enumerators from enum
classes to be a bit confusing, possibly a bug on GCC's end (since they
can't be named unqualified, unlike the other names - and names nested in
classes don't go in pubnames, for instance - presumably because one must
name the class first & that's enough to limit the scope of the search)

llvm-svn: 363349
2019-06-14 01:58:56 +00:00
Tim Shen 4121bdc3d4 [X86] Add target triple for live-debug-values-fragments.mir
llvm-svn: 363348
2019-06-14 01:41:04 +00:00
Douglas Yung 5b188f8dac Add REQUIRES: zlib to test added in r363325 as the profile uses zlib compression.
llvm-svn: 363347
2019-06-14 01:08:50 +00:00
Stanislav Mekhanoshin c43e67bfff [AMDGPU] gfx1011/gfx1012 targets
Differential Revision: https://reviews.llvm.org/D63307

llvm-svn: 363344
2019-06-14 00:33:31 +00:00
Stanislav Mekhanoshin 68a2fef9ae [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
Differential Revision: https://reviews.llvm.org/D63301

llvm-svn: 363339
2019-06-13 23:47:36 +00:00
Seiya Nuta b1027a480a [llvm-objcopy] Fix sparc target endianness
Summary: AFAIK, the "sparc" target is big endian and the target for 32-bit little-endian SPARC is denoted as "sparcel". This patch fixes the endianness of "sparc" target and adds "sparcel" target for 32-bit little-endian SPARC.

Reviewers: espindola, alexshap, rupprecht, jhenderson

Reviewed By: jhenderson

Subscribers: jyknight, emaste, arichardson, fedor.sergeev, jakehehrlich, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63251

llvm-svn: 363336
2019-06-13 23:24:12 +00:00
Amy Huang 49275272e3 Use fully qualified name when printing S_CONSTANT records
Summary:
Before it was using the fully qualified name only for static data members.
Now it does for all variable names to match MSVC.

Reviewers: rnk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63012

llvm-svn: 363335
2019-06-13 22:53:43 +00:00
Amara Emerson fb0a40f064 [GlobalISel][IRTranslator] Add debug loc with line 0 to constants emitted into the entry block.
Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which
lets us use the vreg def assuming it dominates all other users. However, it can
cause jumpy debug behaviour since the DebugLoc attached to these MIs are from
a user instruction that could be in a different block.

Fixes PR40887.

Differential Revision: https://reviews.llvm.org/D63286

llvm-svn: 363331
2019-06-13 22:15:35 +00:00
Jinsong Ji 1c88445840 [MachinePiepliner] Don't check boundary node in checkValidNodeOrder
This was exposed by PowerPC target enablement.

In ScheduleDAG, if we haven't seen any uses in this scheduling region,
we will create a dependence edge to ExitSU to model the live-out latency.
This is required for vreg defs with no in-region use, and prefetches with
no vreg def.

When we build NodeOrder in Scheduler, we ignore these boundary nodes.
However, when we check Succs in checkValidNodeOrder, we did not skip
them, so we still assume all the nodes have been sorted and in order in
Indices array. So when we call lower_bound() for ExitSU, it will return
Indices.end(), causing memory issues in following Node access.

Differential Revision: https://reviews.llvm.org/D63282

llvm-svn: 363329
2019-06-13 21:51:12 +00:00
Vedant Kumar 901d04fc6d [Coverage] Load code coverage data from archives
Support loading code coverage data from regular archives, thin archives,
and from MachO universal binaries which contain archives.

Testing: check-llvm, check-profile (with {A,UB}San enabled)

rdar://51538999

Differential Revision: https://reviews.llvm.org/D63232

llvm-svn: 363325
2019-06-13 20:48:57 +00:00
Shawn Landden 24f4085811 [SimplifyCFG] NFC, update Switch tests as a baseline.
Also add baseline tests to show effect of later patches.

There were a couple of regressions here that were never caught,
but my patch set that this is a preparation to will fix them.

This is the third attempt to land this patch.

Differential Revision: https://reviews.llvm.org/D61150

llvm-svn: 363319
2019-06-13 19:36:38 +00:00
Cameron McInally 79ec1a2957 Revert "[NFC][CodeGen] Add unary fneg tests to fp-fast.ll fp-fold.ll fp-in-intregs.ll fp-stack-compare-cmov.ll fp-stack-compare.ll fsxor-alignment.ll"
This reverts commit 1d85a7518c.

llvm-svn: 363317
2019-06-13 19:25:16 +00:00
Cameron McInally 07514a1b16 Revert "[NFC][CodeGen] Add unary fneg tests to fmul-combines.ll fnabs.ll"
This reverts commit 5c01140581.

llvm-svn: 363316
2019-06-13 19:25:12 +00:00
Cameron McInally 8984dbc27c Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns_wide.ll"
This reverts commit f1b8c6ac4f.

llvm-svn: 363315
2019-06-13 19:25:09 +00:00
Cameron McInally 5d9271802b Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns.ll"
This reverts commit 06de52674d.

llvm-svn: 363314
2019-06-13 19:25:06 +00:00
Cameron McInally d331e71bdb Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma4-fneg-combine.ll"
This reverts commit f288a0685f.

llvm-svn: 363313
2019-06-13 19:25:03 +00:00
Cameron McInally 31da4f80d5 Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma-scalar-combine.ll"
This reverts commit 3d2ee0053a.

llvm-svn: 363312
2019-06-13 19:25:00 +00:00
Cameron McInally d3eaa332e4 Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma-intrinsics-x86.ll"
This reverts commit 169fc2b020.

llvm-svn: 363311
2019-06-13 19:24:57 +00:00
Cameron McInally 2aff82bfa6 Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma4-intrinsics-x86.ll"
This reverts commit 66f286845c.

llvm-svn: 363310
2019-06-13 19:24:54 +00:00
Cameron McInally 0a3fe05047 Revert "[NFC][CodeGen] Add unary FNeg tests to some X86/ and XCore/ tests."
This reverts commit 4f3cf3853e.

llvm-svn: 363309
2019-06-13 19:24:51 +00:00
Cameron McInally a0d06a626f Revert "[NFC][CodeGen] Add unary FNeg tests to X86/fma-intrinsics-canonical.ll"
This reverts commit ee5881a88c.

llvm-svn: 363308
2019-06-13 19:24:47 +00:00
Cameron McInally a37d925d3d Revert "[NFC][CodeGen] Forgot 2 unary FNeg tests in X86/fma-intrinsics-canonical.ll"
This reverts commit 5f39a3096f.

llvm-svn: 363307
2019-06-13 19:24:44 +00:00
Cameron McInally e00198f7a8 Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma-fneg-combine.ll"
This reverts commit 10c0855542.

llvm-svn: 363306
2019-06-13 19:24:41 +00:00
Cameron McInally ea28a063fd Revert "[NFC][CodeGen] Add unary FNeg tests to X86/combine-fcopysign.ll X86/dag-fmf-cse.ll X86/fast-isel-fneg.ll X86/fdiv.ll"
This reverts commit e04c4b6af8.

llvm-svn: 363305
2019-06-13 19:24:38 +00:00
Cameron McInally 4890457196 Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll X86/combine-fabs.ll"
This reverts commit 6fe46ec25d.

llvm-svn: 363304
2019-06-13 19:24:34 +00:00
Cameron McInally 21a29a9e65 Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll"
This reverts commit 2aa5ada267.

llvm-svn: 363303
2019-06-13 19:24:31 +00:00
Cameron McInally 7d4e7efd2e Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll"
This reverts commit 27a5db9de5.

llvm-svn: 363302
2019-06-13 19:24:28 +00:00
Cameron McInally 8608afa964 Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll"
This reverts commit 41e0b9f280.

llvm-svn: 363301
2019-06-13 19:24:24 +00:00
Cameron McInally 675be5db46 Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll"
This reverts commit aeb89f8b33.

llvm-svn: 363300
2019-06-13 19:24:21 +00:00
Stanislav Mekhanoshin 335f9883f0 [AMDGPU] gfx1010: small test change for wave32. NFC
llvm-svn: 363297
2019-06-13 19:05:04 +00:00
Sanjay Patel 5bf7f81aa8 [InstCombine] add test for failed libfunction prototype matching; NFC
llvm-svn: 363291
2019-06-13 18:26:10 +00:00
Philip Reames eb88badff9 Fix a bug w/inbounds invalidation in LFTR
This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV.

The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying.  As such, our potential UB triggering use does not change the semantics of the original program.

As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well.

(Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.)

Differential Revision: https://reviews.llvm.org/D62939

llvm-svn: 363289
2019-06-13 18:23:13 +00:00
Sanjay Patel 4d93fb528e [InstCombine] auto-generate complete test checks; NFC
llvm-svn: 363286
2019-06-13 18:14:49 +00:00
David Bolvansky a9d8388e80 [NFC] Updated testcase for D54411/rL363284
llvm-svn: 363285
2019-06-13 18:13:03 +00:00
David Bolvansky 896ece41e4 [Codegen] Merge tail blocks with no successors after block placement
Summary:
I found the following case having tail blocks with no successors merging opportunities after block placement.

Before block placement:

bb0:
    ...
    bne a0, 0, bb2:

bb1:
    mv a0, 1
    ret 

bb2:
    ...

bb3:
    mv a0, 1
    ret

bb4:
    mv a0, -1
    ret

The conditional branch bne in bb0 is opposite to beq.

After block placement:

bb0:
    ...
    beq a0, 0, bb1

bb2:
    ...

bb4:
    mv a0, -1
    ret

bb1:
    mv a0, 1
    ret

bb3:
    mv a0, 1
    ret

After block placement, that appears new tail merging opportunity, bb1 and bb3 can be merged as one block. So the conditional constraint for merging tail blocks with no successors should be removed. In my experiment for RISC-V, it decreases code size.


Author of original patch: Jim Lin

Reviewers: haicheng, aheejin, craig.topper, rnk, RKSimon, Jim, dmgreen

Reviewed By: Jim, dmgreen

Subscribers: xbolva00, dschuff, javed.absar, sbc100, jgravelle-google, aheejin, kito-cheng, dmgreen, PkmX, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D54411

llvm-svn: 363284
2019-06-13 18:11:32 +00:00
Stanislav Mekhanoshin 2bda177da0 [AMDGPU] ImmArg and SourceOfDivergence for permlane/dpp
Added missing ImmArg and SourceOfDivergence to the crosslane
intrinsics.

Differential Revision: https://reviews.llvm.org/D63216

llvm-svn: 363276
2019-06-13 16:31:51 +00:00
Cameron McInally aeb89f8b33 [NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll
Patch 2 of n.

llvm-svn: 363275
2019-06-13 15:54:20 +00:00
Joseph Tremoulet 3bc6e2a7aa [EarlyCSE] Ensure equal keys have the same hash value
Summary:
The logic in EarlyCSE that looks through 'not' operations in the
predicate recognizes e.g. that `select (not (cmp sgt X, Y)), X, Y` is
equivalent to `select (cmp sgt X, Y), Y, X`.  Without this change,
however, only the latter is recognized as a form of `smin X, Y`, so the
two expressions receive different hash codes.  This leads to missed
optimization opportunities when the quadratic probing for the two hashes
doesn't happen to collide, and assertion failures when probing doesn't
collide on insertion but does collide on a subsequent table grow
operation.

This change inverts the order of some of the pattern matching, checking
first for the optional `not` and then for the min/max/abs patterns, so
that e.g. both expressions above are recognized as a form of `smin X, Y`.

It also adds an assertion to isEqual verifying that it implies equal
hash codes; this fires when there's a collision during insertion, not
just grow, and so will make it easier to notice if these functions fall
out of sync again.  A new flag --earlycse-debug-hash is added which can
be used when changing the hash function; it forces hash collisions so
that any pair of values inserted which compare as equal but hash
differently will be caught by the isEqual assertion.

Reviewers: spatel, nikic

Reviewed By: spatel, nikic

Subscribers: lebedev.ri, arsenm, craig.topper, efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62644

llvm-svn: 363274
2019-06-13 15:24:11 +00:00
Diogo N. Sampaio 0be2d25ecc [FIX] Forces shrink wrapping to consider any memory access as aliasing with the stack
Summary:
Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472

The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed.
Taking an exception can cause the stack to be corrupted.

As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access
the stack.

Reviewers: dmgreen, qcolombet

Reviewed By: qcolombet

Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63152

llvm-svn: 363265
2019-06-13 13:56:19 +00:00
Simon Tatham 286e1d2c2d [ARM] Set up infrastructure for MVE vector instructions.
This commit prepares the way to start adding the main collection of
MVE instructions, which operate on the 128-bit vector registers.

The most obvious thing that's needed, and the simplest, is to add the
MQPR register class, which is like the existing QPR except that it has
fewer registers in it.

The more complicated part: MVE defines a system of vector predication,
in which instructions operating on 128-bit vector registers can be
constrained to operate on only a subset of the lanes, using a system
of prefix instructions similar to the existing Thumb IT, in that you
have one prefix instruction which designates up to 4 following
instructions as subject to predication, and within that sequence, the
predicate can be inverted by means of T/E suffixes ('Then' / 'Else').

To support instructions of this type, we've added two new Tablegen
classes `vpred_n` and `vpred_r` for standard clusters of MC operands
to add to a predicated instruction. Both include a flag indicating how
the instruction is predicated at all (options are T, E and 'not
predicated'), and an input register field for the register controlling
the set of active lanes. They differ from each other in that `vpred_r`
also includes an input operand for the previous value of the output
register, for instructions that leave inactive lanes unchanged.
`vpred_n` lacks that extra operand; it will be used for instructions
that don't preserve inactive lanes in their output register (either
because inactive lanes are zeroed, as the MVE load instructions do, or
because the output register isn't a vector at all).

This commit also adds the family of prefix instructions themselves
(VPT / VPST), and all the machinery needed to work with them in
assembly and disassembly (e.g. generating the 't' and 'e' mnemonic
suffixes on disassembled instructions within a predicated block)

I've added a couple of demo instructions that derive from the new
Tablegen base classes and use those two operand clusters. The bulk of
the vector instructions will come in followup commits small enough to
be manageable. (One exception is that I've added the full version of
`isMnemonicVPTPredicable` in the AsmParser, because it seemed
pointless to carefully split it up.)

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62669

llvm-svn: 363258
2019-06-13 13:11:13 +00:00
Jeremy Morse bf2b2f08b0 [DebugInfo] Honour variable fragments in LiveDebugValues
This patch makes the LiveDebugValues pass consider fragments when propagating
DBG_VALUE insts between blocks, fixing PR41979. Fragment info for a variable
location is added to the open-ranges key, which allows distinct fragments to be
tracked separately. To handle overlapping fragments things become slightly
funkier. To avoid excessive searching for overlaps in the data-flow part of
LiveDebugValues, this patch:
 * Pre-computes pairings of fragments that overlap, for each DILocalVariable
 * During data-flow, whenever something happens that causes an open range to
   be terminated (via erase), any fragments pre-determined to overlap are
   also terminated.

The effect of which is that when encountering a DBG_VALUE fragment that
overlaps others, the overlapped fragments do not get propagated to other
blocks. We still rely on later location-list building to correctly handle
overlapping fragments within blocks.

It's unclear whether a mixture of DBG_VALUEs with and without fragmented
expressions are legitimate. To avoid suprises, this patch interprets a
DBG_VALUE with no fragment as overlapping any DBG_VALUE _with_ a fragment.

Differential Revision: https://reviews.llvm.org/D62904

llvm-svn: 363256
2019-06-13 12:51:57 +00:00
Dmitry Preobrazhensky 1fca3b1972 [AMDGPU][MC] Enabled constant expressions as operands of s_getreg/s_setreg
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61125

llvm-svn: 363255
2019-06-13 12:46:37 +00:00
Simon Pilgrim a284f4fa7c [X86][AVX] Add broadcast(v4f64 hadd) test
llvm-svn: 363252
2019-06-13 11:42:32 +00:00
Simon Pilgrim 0baf136a4d [X86][SSE] Avoid assert for broadcast(horiz-op()) cases for non-f64 cases.
Based on fuzz test from @craig.topper

llvm-svn: 363251
2019-06-13 11:26:21 +00:00
Simon Pilgrim a6b87aa7ee [X86][SSE] Add tests for underaligned nt stores
Test both 'unaligned' (which we should scalarize) and 'subvector aligned' (which we should split)

llvm-svn: 363249
2019-06-13 10:41:56 +00:00
Chris Jackson 7b39513302 [llvm-nm] Additional lit tests for command line options
Differential Revision: https://reviews.llvm.org/D62955

llvm-svn: 363248
2019-06-13 10:39:36 +00:00
Simon Pilgrim e1aea85896 [X86][SSE] Add SSE4A nt store tests on X86 as well as X64
We should be able to use MOVNTSD (f64) instead of MOVNTI (i32) to reduce the number of ops 32-bit targets

Pulled out of D63246

llvm-svn: 363247
2019-06-13 10:30:12 +00:00
Jeremy Morse 181bf0cefb [DebugInfo] Use FrameDestroy to extend stack locations to end-of-function
We aim to ignore changes in variable locations during the prologue and
epilogue of functions, to avoid using space documenting location changes
that aren't visible. However in D61940 / r362951 this got ripped out as
the previous implementation was unsound.

Instead, use the FrameDestroy flag to identify when we're in the epilogue
of a function, and ignore variable location changes accordingly. This fits
in with existing code that examines the FrameSetup flag.

Some variable locations get shuffled in modified tests as they now cover
greater ranges, which is what would be expected. Some additional
single-location variables are generated too. Two tests are un-xfailed,
they were only xfailed due to r362951 deleting functionality they depended
on.

Apparently some out-of-tree backends don't accurately maintain FrameDestroy
flags -- if you're an out-of-tree maintainer and see changes in variable
locations disappear due to a faulty FrameDestroy flag, it's safe to back
this change out. The impact is just slightly more debug info than necessary.

Differential Revision: https://reviews.llvm.org/D62314

llvm-svn: 363245
2019-06-13 10:03:17 +00:00
Eugene Leviant 86b7f865ac [llvm-objcopy] Implement IHEX reader
This is the final part of IHEX format support in llvm-objcopy
Differential revision: https://reviews.llvm.org/D62583

llvm-svn: 363243
2019-06-13 09:56:14 +00:00
Sander de Smalen 51c2fa0e2a Improve reduction intrinsics by overloading result value.
This patch uses the mechanism from D62995 to strengthen the
definitions of the reduction intrinsics by letting the scalar
result/accumulator type be overloaded from the vector element type.

For example:

  ; The LLVM LangRef specifies that the scalar result must equal the
  ; vector element type, but this is not checked/enforced by LLVM.
  declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a)

This patch changes that into:

  declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a)

Which has the type-constraint more explicit and causes LLVM to check
the result type with the vector element type.

Reviewers: RKSimon, arsenm, rnk, greened, aemerson

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D62996

llvm-svn: 363240
2019-06-13 09:37:38 +00:00
Owen Reynolds 8d59f5370d Revert [llvm-ar][test] Add to MRI test coverage
This reverts 363232 due to mru-utf8.test buildbot test failure

Differential Revision: https://reviews.llvm.org/D63197

llvm-svn: 363239
2019-06-13 09:02:33 +00:00
Sam Parker 9d28473a35 [ARM][TTI] Scan for existing loop intrinsics
TTI should report that it's not profitable to generate a hardware loop
if it, or one of its child loops, has already been converted.

Differential Revision: https://reviews.llvm.org/D63212

llvm-svn: 363234
2019-06-13 08:28:46 +00:00
Owen Reynolds 02eac87ba3 [llvm-ar][test] Add to MRI test coverage
This change adds tests to cover existing MRI script functionality.

Differential Revision: https://reviews.llvm.org/D63197

llvm-svn: 363232
2019-06-13 07:45:12 +00:00
Craig Topper b1daec0eae [X86] Correct instruction operands in evex-to-vex-compress.mir to be closer to real instructions.
$noreg was being used way more than it should have. We also had
xmm registers in addressing modes.

Mostly found by hacking the machine verifier to do some stricter
checking that happened to work for this test, but not sure if
generally applicable for other tests or other targets.

llvm-svn: 363231
2019-06-13 07:11:02 +00:00
Shawn Landden 8b142bcc3f [SimplifyCFG] reverting preliminary Switch patches again
This reverts 363226 and 363227, both NFC intended

I swear I fixed the test case that is failing, and ran
the tests, but I will look into it again.

llvm-svn: 363229
2019-06-13 05:26:17 +00:00
Shawn Landden c54b2011bd [SimplifyCFG] NFC, update Switch tests to better examine successive patches
Also add baseline tests to show effect of later patches.

There were a couple of regressions here that were never caught,
but my patch set that this is a preparation to will fix them.

Differential Revision: https://reviews.llvm.org/D61150

llvm-svn: 363226
2019-06-13 04:51:35 +00:00
Craig Topper 387acd64f3 [X86] Add tests for some the special cases in EVEX to VEX to the evex-to-vex-compress.mir test.
llvm-svn: 363224
2019-06-13 04:10:08 +00:00
Shawn Landden c6cba2957d [SimplifyCFG] revert the last commit.
I ran ALL the test suite locally, so I will look into this...

llvm-svn: 363223
2019-06-13 02:47:47 +00:00
Shawn Landden f93b99b2b6 [SimplifyCFG] NFC, update Switch tests to HEAD so I can
see if my changes change anything

Also add baseline tests to show effect of later patches.

Differential Revision: https://reviews.llvm.org/D61150

llvm-svn: 363222
2019-06-13 02:24:24 +00:00
David L. Jones c73fadaa84 Revert r361811: 'Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors ...'
We have observed some failures with internal builds with this revision.

- Performance regressions:
  - llvm's SingleSource/Misc evalloop shows performance regressions (although these may be red herrings).
  - Benchmarks for Abseil's SwissTable.
- Correctness:
  - Failures for particular libicu tests when building the Google AppEngine SDK (for PHP).

hwennborg has already been notified, and is aware of reproducer failures.

llvm-svn: 363220
2019-06-13 02:04:45 +00:00
Dinar Temirbulatov b2f45ba1e8 [SLP] Update propagate_ir_flags.ll test to check that we do retain the common subset, NFC.
llvm-svn: 363218
2019-06-13 00:19:50 +00:00
Philip Reames 0bded8442f [Tests] Highlight impact of multiple exit LFTR (D62625) as requested by reviewer
llvm-svn: 363217
2019-06-12 23:39:49 +00:00
Cameron McInally 41e0b9f280 [NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll
Patch 1 of n.

llvm-svn: 363215
2019-06-12 22:50:44 +00:00
Sanjay Patel a1421e8347 [x86] add tests for vector shifts; NFC
llvm-svn: 363203
2019-06-12 21:30:06 +00:00
Cameron McInally 27a5db9de5 [NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll
Patch 3 of 3 for X86/avx512vl-intrinsics-fast-isel.ll

llvm-svn: 363200
2019-06-12 20:56:59 +00:00
Jordan Rupprecht 565f1e2298 [llvm-readobj] Fix output interleaving issue caused by using multiple streams at the same time.
Summary:
Use llvm::fouts() as the default stream for outputing. No new stream
should be constructed to output at the same time.

https://bugs.llvm.org/show_bug.cgi?id=42140

Reviewers: jhenderson, grimar, MaskRay, phosek, rupprecht

Reviewed By: rupprecht

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63115

Patch by Yuanfang Chen!

llvm-svn: 363198
2019-06-12 20:16:22 +00:00
Cameron McInally 2aa5ada267 [NFC][CodeGen] Add unary FNeg tests to X86/avx512vl-intrinsics-fast-isel.ll
Patch 2 of 3 for X86/avx512vl-intrinsics-fast-isel.ll

llvm-svn: 363194
2019-06-12 19:39:42 +00:00
Philip Reames 00e481b75d [Tests] Autogen RLEV test and add tests for a future enhancement
llvm-svn: 363193
2019-06-12 19:23:10 +00:00
Philip Reames 851adc000c [Tests] Add tests to highlight sibling loop optimization order issue for exit rewriting
The issue addressed in r363180 is more broadly relevant.  For the moment, we don't actually get any of these cases because we a) restrict SCEV formation due to SCEExpander needing to preserve LCSSA, and b) don't iterate between loops.

llvm-svn: 363192
2019-06-12 19:04:51 +00:00
Stanislav Mekhanoshin 000f9cc62a [AMDGPU] more gfx1010 tests. NFC.
llvm-svn: 363190
2019-06-12 18:44:11 +00:00
Jordan Rupprecht 146a154e61 [llvm-ar][test] Relax lit directory assumptions in thin-archive.test
Summary: thin-archive.test assumes the Output/<testname> structure that lit creates. Rewrite the test in a way that still tests the same thing (creating via relative path and adding via absolute path) but doesn't assume this specific lit structure, making it possible to run in a lit emulator.

Reviewers: gbreynoo

Reviewed By: gbreynoo

Subscribers: llvm-commits, bkramer

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62930

llvm-svn: 363189
2019-06-12 18:41:27 +00:00
Stanislav Mekhanoshin 245b5ba344 [AMDGPU] gfx1010 dpp16 and dpp8
Differential Revision: https://reviews.llvm.org/D63203

llvm-svn: 363186
2019-06-12 18:02:41 +00:00
Stanislav Mekhanoshin 5f581c9f08 [AMDGPU] gfx1010 premlane instructions
Differential Revision: https://reviews.llvm.org/D63202

llvm-svn: 363185
2019-06-12 17:52:51 +00:00
Simon Atanasyan efc0d1a298 [Mips] Add s.d instruction alias for Mips1
Add support for s.d instruction for Mips1 which expands into two swc1
instructions.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D63199

llvm-svn: 363184
2019-06-12 17:52:05 +00:00
Simon Pilgrim ef7d4fbe80 [X86][SSE] Avoid unnecessary stack codegen in NT merge-consecutive-stores codegen tests.
llvm-svn: 363181
2019-06-12 17:28:48 +00:00
Philip Reames e51c3d8b82 [SCEV] Teach computeSCEVAtScope benefit from one-input Phi. PR39673
SCEV does not propagate arguments through one-input Phis so as to make it easy for the SCEV expander (and related code) to preserve LCSSA.  It's not entirely clear this restriction is neccessary, but for the moment it exists.   For this reason, we don't analyze single-entry phi inputs.  However it is possible that when an this input leaves the loop through LCSSA Phi, it is a provable constant.  Missing that results in an order of optimization issue in loop exit value rewriting where we miss some oppurtunities based on order in which we visit sibling loops.

This patch teaches computeSCEVAtScope about this case. We can generalize it later, but so far we can only replace LCSSA Phis with their constant loop-exiting values.  We should probably also add similiar logic directly in the SCEV construction path itself.

Patch by: mkazantsev (with revised commit message by me)
Differential Revision: https://reviews.llvm.org/D58113

llvm-svn: 363180
2019-06-12 17:21:47 +00:00
Simon Pilgrim 5b0e0dd709 [X86][AVX] Fold concat(vpermilps(x,c),vpermilps(y,c)) -> vpermilps(concat(x,y),c)
Handles PSHUFD/PSHUFLW/PSHUFHW (AVX2) + VPERMILPS (AVX1).

An extra AVX1 PSHUFD->VPERMILPS combine will be added in a future commit.

llvm-svn: 363178
2019-06-12 16:38:20 +00:00
Sanjay Patel 64006896ac [InstCombine] add tests for fmin/fmax libcalls; NFC
llvm-svn: 363175
2019-06-12 15:29:40 +00:00
Sam Parker 3d42959dd8 Revert rL363156.
The patch was to fix buildbots, but rL363157 should now be fixing it
in a cleaner way.

llvm-svn: 363174
2019-06-12 15:28:00 +00:00
David Bolvansky 48365ec3e1 [NFC[ Updated tests for D54411
llvm-svn: 363173
2019-06-12 15:01:36 +00:00
Matt Arsenault f29366b1f5 StackProtector: Use PointerMayBeCaptured
This was using its own, outdated list of possible captures. This was
at minimum not catching cmpxchg and addrspacecast captures.

One change is now any volatile access is treated as capturing. The
test coverage for this pass is quite inadequate, but this required
removing volatile in the lifetime capture test.

Also fixes some infrastructure issues to allow running just the IR
pass.

Fixes bug 42238.

llvm-svn: 363169
2019-06-12 14:23:33 +00:00