Commit Graph

20841 Commits

Author SHA1 Message Date
Alfred Huang 5b27072f57 [AMDGPU] Do not insert an instruction into worklist twice in movetovalu
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.

Differential Revision: https://reviews.llvm.org/D34726

llvm-svn: 308039
2017-07-14 17:56:55 +00:00
Krzysztof Parzyszek 9c084fc55d [Hexagon] Add intrinsics for data cache operations
This is the LLVM part, adding definitions for
  void @llvm.hexagon.Y2.dccleana(i8*)
  void @llvm.hexagon.Y2.dccleaninva(i8*)
  void @llvm.hexagon.Y2.dcinva(i8*)
  void @llvm.hexagon.Y2.dczeroa(i8*)
  void @llvm.hexagon.Y4.l2fetch(i8*, i32)
  void @llvm.hexagon.Y5.l2fetch(i8*, i64)
The clang part will follow.

llvm-svn: 308032
2017-07-14 15:58:48 +00:00
Nirav Dave a8f63af9d1 Improve Aliasing of operations to static alloca
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.

Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 308025
2017-07-14 13:56:21 +00:00
Zoran Jovanovic 0e03935182 Reverting commit 308011.
llvm-svn: 308017
2017-07-14 10:52:22 +00:00
Zoran Jovanovic d374c5993b [mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
The following instructions are examined and transformed, if possible:
ADDIU instruction is transformed into 16-bit instruction ADDIUSP
ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP
Function InRange is changed to avoid left shifting of negative values, since 
that caused some sanitizer tests to fail (so the previous patch 
Differential Revision: https://reviews.llvm.org/D34511

llvm-svn: 308011
2017-07-14 10:13:11 +00:00
Diana Picus 87a7067983 [ARM] GlobalISel: Support G_BRCOND
Insert a TSTri to set the flags and a Bcc to branch based on their
values. This is a bit inefficient in the (common) cases where the
condition for the branch comes from a compare right before the branch,
since we set the flags both as part of the compare lowering and as part
of the branch lowering. We're going to live with that until we settle on
a principled way to handle this kind of situation, which occurs with
other patterns as well (combines might be the way forward here).

llvm-svn: 308009
2017-07-14 09:46:06 +00:00
Sam Parker 2893448576 [ARM] Allow rematerialization of ARM Thumb literal pool loads
Constants are crucial for code size in the ARM Thumb-1 instruction
set. The 16 bit instruction size often does not offer enough space
for immediate arguments. This means that additional instructions are
frequently used to load constants into registers. Since constants are
hoisted, this can lead to significant register spillage if they are
used multiple times in a single function. This can be avoided by
rematerialization, i.e. recomputing a constant instead of reloading
it from the stack. This patch fixes the rematerialization of literal
pool loads in the ARM Thumb instruction set.

Patch by Philip Ginsbach

Differential Revision: https://reviews.llvm.org/D33936

llvm-svn: 308004
2017-07-14 08:23:56 +00:00
Matt Arsenault 23e4df6a59 AMDGPU: Detect kernarg segment pointer
This is necessary to pass the kernarg segment pointer
to callee functions. Also don't unconditionally enable
for kernels.

llvm-svn: 307978
2017-07-14 00:11:13 +00:00
Stanislav Mekhanoshin dc2890a887 [AMDGPU] fcaninicalize optimization for GFX9+
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.

Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.

Differential Revision: https://reviews.llvm.org/D35335

llvm-svn: 307976
2017-07-13 23:59:15 +00:00
Matt Arsenault 6b93046f29 AMDGPU: Annotate call graph with used features
Previously this wouldn't detect used features indirectly
used in callee functions.

llvm-svn: 307967
2017-07-13 21:43:42 +00:00
Andrew Zhogin af3d5fe83b [X86][tests] Added rotate_vec.ll CodeGen test. NFC precommit for bug 33691 fix.
llvm-svn: 307937
2017-07-13 18:57:40 +00:00
Nemanja Ivanovic 3c7e276d24 [PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16
As outlined in the PR, we didn't ensure that displacements for DQ-Form
instructions are multiples of 16. Since the instruction encoding encodes
a quad-word displacement, a sub-16 byte displacement is meaningless and
ends up being encoded incorrectly.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33671.

Differential Revision: https://reviews.llvm.org/D35007

llvm-svn: 307934
2017-07-13 18:17:10 +00:00
Martin Storsjo 68266faa31 [AArch64] Implement support for windows style vararg functions
Pass parameters properly in calls to such functions (pass all
floats in integer registers), and handle va_start properly (allocate
stack immediately below the arguments on the stack, to save the
register arguments into a single continuous array).

Differential Revision: https://reviews.llvm.org/D35006

llvm-svn: 307928
2017-07-13 17:03:12 +00:00
Matthew Simpson 06e6a6bdff [AArch64] Add preliminary support for ARMv8.1 SUB/AND atomics
This patch is a follow-up to r305893 and adds preliminary support for the
fetch_sub and fetch_and operations.

llvm-svn: 307913
2017-07-13 15:01:23 +00:00
Simon Dardis 250256f9c9 Reland "[mips] Fix multiprecision arithmetic."
For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC,
get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs.

For MIPS, only the DSP ASE has a carry flag, so in the general case it is not
useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes.

Also improve the generation code in such cases for targets with
TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the
comparison node rather than using it in selects. Similarly for ISD::SUBE /
ISD::SUBC.

Address optimization breakage by moving the generation of MIPS specific integer
multiply-accumulate nodes to before legalization.

This revolves PR32713 and PR33424.

Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D33494

The previous version of this patch was too aggressive in producing fused
integer multiple-addition instructions.

llvm-svn: 307906
2017-07-13 11:28:05 +00:00
Diana Picus c452175642 [ARM] GlobalISel: Support G_BR
This boils down to not crashing in reg bank select due to the lack of
register operands on this instruction, and adding some tests. The
instruction selection is already covered by the TableGen'erated code.

llvm-svn: 307904
2017-07-13 11:09:34 +00:00
Simon Pilgrim bb85cb16e3 [DAGCombiner] Fix issue with rotate combines asserting if the constant value types differ from the result type.
llvm-svn: 307900
2017-07-13 10:41:49 +00:00
Dylan McKay 9fb04071a2 [AVR] Fix indirect calls to function pointers
Patch by Carl Peto.

llvm-svn: 307888
2017-07-13 08:09:36 +00:00
Geoff Berry 6748abe24d [MIR] Add support for printing and parsing target MMO flags
Summary: Add target hooks for printing and parsing target MMO flags.
Targets may override getSerializableMachineMemOperandTargetFlags() to
return a mapping from string to flag value for target MMO values that
should be serialized/parsed in MIR output.

Add implementation of this hook for AArch64 SuppressPair MMO flag.

Reviewers: bogner, hfinkel, qcolombet, MatzeB

Subscribers: mcrosier, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D34962

llvm-svn: 307877
2017-07-13 02:28:54 +00:00
Matt Arsenault ce34ac588e AMDGPU: Fix converting unanalyzable global loads to SMRD
Not all memory dependence queries succeed, so this needs to
be conservative if it fails.

llvm-svn: 307861
2017-07-12 23:06:18 +00:00
Sanjay Patel ac29895173 [x86] add select-of-constant tests; NFC
We're using cmov in these cases, but we could reduce to simpler ops.

llvm-svn: 307859
2017-07-12 22:42:39 +00:00
Daniel Neilson 965613ef1b Add element atomic memset intrinsic
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memset intrinsic. This intrinsic is essentially memset with the implementation requirement that all stores used for the assignment are done with unordered-atomic stores of a given element size.

Reviewers: eli.friedman, reames, mkazantsev, skatkov

Reviewed By: reames

Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D34885

llvm-svn: 307854
2017-07-12 21:57:23 +00:00
Stanislav Mekhanoshin 5680b0ca9f [AMDGPU] fcanonicalize elimination optimization
We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.

Differential Revision: https://reviews.llvm.org/D35218

llvm-svn: 307848
2017-07-12 21:20:28 +00:00
Sanjay Patel 4450e73b5e [x86] improve SBB optimizations for SETB/SETA with subtract
This is another step towards removing a combine that turns sext
into select of constants and preparing the backend for an IR
future where select is the canonical form.

Earlier commits in this area:
https://reviews.llvm.org/rL306040
https://reviews.llvm.org/rL306072
https://reviews.llvm.org/rL307404 (https://reviews.llvm.org/D34652)
https://reviews.llvm.org/rL307471

llvm-svn: 307821
2017-07-12 17:56:46 +00:00
Sanjay Patel 6d6c06879c [x86] add tests for improving sbb transforms; NFC
We're subtracting X from X the hard way...

llvm-svn: 307819
2017-07-12 17:44:50 +00:00
Justin Bogner 4fc696635d GlobalISel: Handle selection of G_IMPLICIT_DEF in AArch64
A generic variant of IMPLICIT_DEF was added in r306875, but this
survives to selection and hits a `Cannot Select`. Add handling that
converts the note to a regular IMPLICIT_DEF.

llvm-svn: 307817
2017-07-12 17:32:32 +00:00
Evandro Menezes 14ba3d7730 [CodeGen] Add dependency printer
Add SDep printer to make debugging sessions more productive.

Differential revision: https://reviews.llvm.org/D35144

llvm-svn: 307799
2017-07-12 15:30:59 +00:00
Davide Italiano a63981aaa9 [X86/FastIsel] Fall-back to SelectionDAG when lowering soft-floats.
FastIsel can't handle them, so we would end up crashing during
register class selection.
Fixes PR26522.

Differential Revision:  https://reviews.llvm.org/D35272

llvm-svn: 307797
2017-07-12 15:26:06 +00:00
Daniel Neilson 57226ef33c Add element atomic memmove intrinsic
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memmove intrinsic. This intrinsic is essentially memmove with the implementation requirement that all loads/stores used for the copy are done with unordered-atomic loads/stores of a given element size.

Reviewers: eli.friedman, reames, mkazantsev, skatkov

Reviewed By: reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34884

llvm-svn: 307796
2017-07-12 15:25:26 +00:00
Simon Pilgrim 8dfbc772d7 [X86][SSE] Fix file check prefix warning breaking buildbots
llvm-svn: 307790
2017-07-12 13:41:13 +00:00
Kamil Rytarowski cce21c1dfe Make shell redirection construct portable
Summary:
NetBSD shell sh(1) does not support ">& /dev/null" construct.
This is bashism. The portable and POSIX solution is to use:
"> /dev/null 2>&1".

This change fixes 22 Unexpected Failures on NetBSD/amd64
for the "check-llvm" target.

Sponsored by <The NetBSD Foundation>

Reviewers: joerg, dim, rnk

Reviewed By: joerg, rnk

Subscribers: rnk, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D35277

llvm-svn: 307789
2017-07-12 13:24:46 +00:00
John Brawn 97cc283117 [ARM] Adjust ifcvt heuristic for the diamond ifcvt case
When we have a diamond ifcvt the fallthough block will have a branch at the end
of it that disappears when predicated, so discount it from the predication cost.

Differential Revision: https://reviews.llvm.org/D34952

llvm-svn: 307788
2017-07-12 13:23:10 +00:00
Simon Pilgrim ebbb969d21 [X86][SSE] Add 512-bit (iX bitcast(vXi1)) test cases
Improves test coverage for pre-AVX512 targets as well

llvm-svn: 307783
2017-07-12 12:44:10 +00:00
Diana Picus 21014df5e0 [ARM] GlobalISel: Select s64 G_FCMP
Very similar to how we select s32 G_FCMP, the only thing that is
different is the exact opcodes that we use.

llvm-svn: 307763
2017-07-12 09:01:54 +00:00
Michael Zuckerman fce5c67920 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
Adding base test for AVX512 

llvm-svn: 307761
2017-07-12 08:01:44 +00:00
Matthias Braun 053b084263 Specify complete target triple in test
This should fix the problems on the greendragon build.

llvm-svn: 307747
2017-07-12 01:16:50 +00:00
Konstantin Zhuravlyov bb80d3e1d3 Enhance synchscope representation
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
  global and local memory. These scopes restrict how synchronization is
  achieved, which can result in improved performance.

  This change extends existing notion of synchronization scopes in LLVM to
  support arbitrary scopes expressed as target-specific strings, in addition to
  the already defined scopes (single thread, system).

  The LLVM IR and MIR syntax for expressing synchronization scopes has changed
  to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
  replaces *singlethread* keyword), or a target-specific name. As before, if
  the scope is not specified, it defaults to CrossThread/System scope.

  Implementation details:
    - Mapping from synchronization scope name/string to synchronization scope id
      is stored in LLVM context;
    - CrossThread/System and SingleThread scopes are pre-defined to efficiently
      check for known scopes without comparing strings;
    - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
      the bitcode.

Differential Revision: https://reviews.llvm.org/D21723

llvm-svn: 307722
2017-07-11 22:23:00 +00:00
Sanjay Patel 7c026cb1af [x86] auto-generate full checks; NFC
llvm-svn: 307718
2017-07-11 22:04:36 +00:00
Michael Zuckerman 1fe5628aa0 reverting 307677.
llvm-svn: 307698
2017-07-11 19:46:11 +00:00
Tony Jiang 892f8c42dc [PPC] Fix one test case regression for patch https://reviews.llvm.org/D34337.
llvm-svn: 307691
2017-07-11 19:07:10 +00:00
Michael Zuckerman 4b6d01a008 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
Base test for avx512
adding new base test to trunk befor commit change on the test

llvm-svn: 307677
2017-07-11 17:17:49 +00:00
Krzysztof Parzyszek f67cd8259d [Hexagon] Do not rely on callee-saved info in hasFP
llvm-svn: 307675
2017-07-11 17:11:54 +00:00
Tony Jiang d5acad053b [PPC] Fix two bugs in frame lowering.
1. The available program storage region of the red zone to compilers is 288
 bytes rather than 244 bytes.
2. The formula for negative number alignment calculation should be
y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1).

Differential Revision: https://reviews.llvm.org/D34337

llvm-svn: 307672
2017-07-11 16:42:20 +00:00
Krzysztof Parzyszek c86e2ef3f5 [Hexagon] Add support for nontemporal loads and stores on HVX
Patch by Michael Wu.

Differential Revision: https://reviews.llvm.org/D35104

llvm-svn: 307671
2017-07-11 16:39:33 +00:00
Diana Picus 1e33c9c166 [ARM] GlobalISel: Tighten G_FCMP selection test. NFC
Use CHECK-NEXT for the comparison sequence, to make sure we don't get
any unexpected instructions in the middle of our flag manipulation
efforts.

llvm-svn: 307656
2017-07-11 12:34:33 +00:00
Guy Blank 509d1b2a5a [X86][AVX512] regenerate avx512-insert-extract.ll
llvm-svn: 307654
2017-07-11 11:51:49 +00:00
Diana Picus 069da27f49 [ARM] GlobalISel: Add reg mapping for s64 G_FCMP
Map the result into GPR and the operands into FPR.

llvm-svn: 307653
2017-07-11 11:47:45 +00:00
Diana Picus 84baba20db [ARM] GlobalISel: Tighten legalizer tests. NFC
Make sure that all the legalizer tests where the original instruction
needs to be removed check for the removal. We do this by adding
CHECK-NOT lines before and after the replacement sequence. This won't
catch pathological cases where the instruction remains somewhere in the
middle of the instruction sequence that's supposed to replace it, but
hopefully that won't occur in practice (since ideally we'd be setting
the insert point for the new instruction sequence either before or after
the original instruction and not fiddle with it while building the
sequence).

llvm-svn: 307647
2017-07-11 10:52:08 +00:00
Diana Picus 443135c6eb [ARM] GlobalISel: Fix oversight in G_FCMP legalization
We used to forget to erase the original instruction when replacing a
G_FCMP true/false. Fix this bug and make sure the tests check for it.

llvm-svn: 307639
2017-07-11 09:43:51 +00:00
Daniel Sanders fe12c0fa56 [globalisel][tablegen] Correct matching of intrinsic ID's.
TreePatternNode considers them to be plain integers but MachineInstr considers
them to be a distinct kind of operand.

The tweak to AArch64InstrInfo.td to produce a simple test case is a NFC for
everything except GlobalISelEmitter (confirmed by diffing the tablegenerated
files). GlobalISelEmitter is currently unable to infer the type of operands in
the Dst pattern from the operands in the Src pattern.

llvm-svn: 307634
2017-07-11 08:57:29 +00:00