Commit Graph

20841 Commits

Author SHA1 Message Date
Sanjay Patel 27cccc96c2 [x86] auto-generate complete checks for tests; NFC
These all used 'CHECK-NOT' which isn't necessary if we have complete checks.

llvm-svn: 306981
2017-07-02 14:50:35 +00:00
Simon Pilgrim 8971b2904e [X86][SSE] Attempt to combine 64-bit and 32-bit shuffles to unary shuffles before bit shifts
We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive

llvm-svn: 306978
2017-07-02 14:16:25 +00:00
Simon Pilgrim 4cb5613c38 [X86][SSE] Attempt to combine 64-bit and 16-bit shuffles to unary shuffles before bit shifts
We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive

The 32-bit shuffles are a bit tricky and will be dealt with in a later patch

llvm-svn: 306977
2017-07-02 13:19:10 +00:00
Simon Pilgrim 638af5f1c4 [X86][SSE] Add test showing missed opportunity to combine to pshuflw
We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive

llvm-svn: 306976
2017-07-02 12:56:10 +00:00
Gadi Haber dc25c2b08b [X86] Rerun "update_llc_test_checks" tool on CodeGen tests. NFC.
This is NFC after rerunning the "update_llc_test_checks.py" tool on the CodeGen X86 tests in order to submit a patch.
Minor differences due to added "End of Function" lines.

Reviewers: zvi
Differential Revision: https://reviews.llvm.org/D34933

llvm-svn: 306973
2017-07-02 12:01:33 +00:00
Igor Breger 717bd36c83 [GlobalISel][X86] Support G_GLOBAL_VALUE operation.
Summary: Support G_GLOBAL_VALUE operation. For now most of the PIC configurations not implemented yet.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34738

Conflicts:
	test/CodeGen/X86/GlobalISel/regbankselect-X86_64.mir

llvm-svn: 306972
2017-07-02 08:58:29 +00:00
Igor Breger b186a69aa5 [GlobalISel][X86] Support vector type G_UNMERGE_VALUES selection.
Summary:
Support vector type G_UNMERGE_VALUES selection.
For now G_UNMERGE_VALUES marked as legal for any type, so nothing to do in legalizer.

Reviewers: t.p.northover, qcolombet, zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D33665

llvm-svn: 306971
2017-07-02 08:15:49 +00:00
Hiroshi Inoue bb703e8960 fix trivial typos; NFC
suport -> support

llvm-svn: 306968
2017-07-02 03:24:54 +00:00
Simon Pilgrim 3bad6f3167 [X86][RDSEED] Split off i64 intrinsic tests and test i16/i32 on 32-bit target as well.
llvm-svn: 306961
2017-07-01 16:42:16 +00:00
Simon Pilgrim 2d320161e5 [X86][RDRAND] Split off i64 intrinsic tests and test i16/i32 on 32-bit target as well.
llvm-svn: 306960
2017-07-01 16:41:12 +00:00
Simon Pilgrim 2b679e1812 [X86] Removed reference to update_test_checks.py
llvm-svn: 306959
2017-07-01 16:34:29 +00:00
Simon Pilgrim ad7f0844ea [X86][AVX] Remove duplicate autogeneration note
llvm-svn: 306958
2017-07-01 16:32:02 +00:00
Eric Christopher 3df231a1f7 Remove the default ARMSubtarget from the ARM TargetMachine.
This enables us to ensure better LTO and code generation in the face of module linking.
Remove a report_fatal_error from the TargetMachine and replace it with an assert in ARMSubtarget - and remove the test that depended on the error. The assertion will still fire in the case that we were reporting before, but error reporting needs to be in front end tools if possible for options parsing.

llvm-svn: 306939
2017-07-01 03:41:53 +00:00
Teresa Johnson 32d95742b8 Recommit "r306541 - Add zero-length check to memcpy/memset load store loop expansion""
With fix for use-after-free errors. We can't add the new branch and
remove the old one until we are done with the Builder constructed for
the block.

llvm-svn: 306937
2017-07-01 03:24:10 +00:00
Eric Christopher 015dc2094e Rewrite ARM execute only support to avoid the use of a command line flag and unqualified ARMSubtarget lookup.
Paired with a clang commit to use the new behavior.

llvm-svn: 306927
2017-07-01 02:55:22 +00:00
Brian Gesiak 4ef3daafef [ORE] Add diagnostics hotness threshold
Summary:
Add an option to prevent diagnostics that do not meet a minimum hotness
threshold from being output. When generating optimization remarks for
large codebases with a ton of cold code paths, this option can be used
to limit the optimization remark output at a reasonable size. Discussion of
this change can be read here:
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html

Reviewers: anemet, davidxl, hfinkel

Reviewed By: anemet

Subscribers: qcolombet, javed.absar, fhahn, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D34867

llvm-svn: 306912
2017-06-30 23:14:53 +00:00
Krzysztof Parzyszek 9eb75c4520 [Hexagon] Implement frame pointer elimination with -fomit-frame-pointer
It applies to leaf functions that are otherwise not required to have
a frame pointer.

llvm-svn: 306888
2017-06-30 21:21:40 +00:00
Richard Smith d0c0c13447 Fix ODR violations due to abuse of LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
This is a short-term fix for PR33650 aimed to get the modules build bots green again.

Remove all the places where we use the LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
macros to try to locally specialize a global template for a global type. That's
not how C++ works.

Instead, we now centrally define how to format vectors of fundamental types and
of string (std::string and StringRef). We use flow formatting for the former
cases, since that's the obvious right thing to do; in the latter case, it's
less clear what the right choice is, but flow formatting is really bad for some
cases (due to very long strings), so we pick block formatting. (Many of the
cases that were using flow formatting for strings are improved by this change.)

Other than the flow -> block formatting change for some vectors of strings,
this should result in no functionality change.

Differential Revision: https://reviews.llvm.org/D34907

Corresponding updates to clang, clang-tools-extra, and lld to follow.

llvm-svn: 306878
2017-06-30 20:56:57 +00:00
Tim Northover ff5e7e1295 GlobalISel: add G_IMPLICIT_DEF instruction.
It looks like there are two target-independent but not GISel instructions that
need legalization, IMPLICIT_DEF and PHI. These are already anomalies since
their operands have important LLTs attached, so to make things more uniform it
seems like a good idea to add generic variants. Starting with G_IMPLICIT_DEF.

llvm-svn: 306875
2017-06-30 20:27:36 +00:00
Sumanth Gundapaneni 8c5d59557d [Hexagon] Emit jump tables in text section based on a flag
This patch adds a new LLVM flag -hexagon-emit-jt-text which is defaulted to 
"false". The value "true" emits the switch generated jump tables in text section.
Differential Revision: https://reviews.llvm.org/D34820

llvm-svn: 306872
2017-06-30 20:21:48 +00:00
Sumanth Gundapaneni 19b74203b1 Revert "[Hexagon] Guard the generation of lookup table"
This reverts commit ae521f4192c3ed0202c047fec993cb59133dd1a0.
Wrong commit message

llvm-svn: 306871
2017-06-30 20:20:00 +00:00
Sumanth Gundapaneni cf73758dc8 [Hexagon] Guard the generation of lookup table
The llvm flag "-hexagon-emit-lookup-tables" guards the generation
of lookup table from a switch statement.

Differential Revision: https://reviews.llvm.org/D34819

llvm-svn: 306869
2017-06-30 20:10:28 +00:00
Tim Northover 2b5f03aa12 ARM: fix big-endian 64-bit cmpxchg.
On big-endian machines the high and low parts of the value accessed by ldrexd
and strexd are swapped around. To account for this we swap inputs and outputs
in ISelLowering.

Patch by Bharathi Seshadri.

llvm-svn: 306865
2017-06-30 19:51:02 +00:00
Sanjay Patel 1be7ea4ad5 [PowerPC] auto-generate check lines; NFC
The existing check lines were more flexible, but these are
small enough tests that there shouldn't be much question
about register allocation. I've been hand-modifying this 
file as I change the CGP memcmp expansion, but that's
more error-prone and time-consuming than just running the 
update script.

llvm-svn: 306861
2017-06-30 19:20:54 +00:00
Nirav Dave a35938d827 Revert "[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset"
This reverts commit r306819 which appears be exposing underlying
issues in a stage1 ppc64be build

llvm-svn: 306820
2017-06-30 12:56:02 +00:00
Nirav Dave c5a48c1ee8 [DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset
As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.

Tests of note:

  * test/CodeGen/X86/build-vector* - Improved.
  * test/CodeGen/BPF/undef.ll - Improved store alignment allows an
    additional store merge

  * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
    case we already do not handle well. Here, the DAG is improved, but
    scheduling causes a code size degradation.

Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D34472

llvm-svn: 306819
2017-06-30 12:23:41 +00:00
Simon Pilgrim e5e9232260 [X86] Updated 32-bit memcmp tests to run with/without SSE2
llvm-svn: 306816
2017-06-30 11:23:59 +00:00
Daniel Jasper 3b704ceba1 Revert "r306541 - Add zero-length check to memcpy/memset load store loop expansion"
Segfaults in non-optimized builds. I'll get a stack trace and a
reproducer to Teresa.

llvm-svn: 306793
2017-06-30 06:37:33 +00:00
Heejin Ahn ac62b05d05 [WebAssembly] Add support for exception handling instructions
Summary:
This adds backend support for throw, rethrow, try, and try_end instructions.
This needs the corresponding clang builtin support:
https://reviews.llvm.org/D34783
This follows the Wasm exception handling proposal in
https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md

Reviewers: sunfish, dschuff

Reviewed By: dschuff

Subscribers: jfb, sbc100, jgravelle-google

Differential Revision: https://reviews.llvm.org/D34826

llvm-svn: 306774
2017-06-30 00:43:15 +00:00
Eric Christopher ee837a59f7 Unified logic for computing target ABI in backend and front end by moving this common code to Support/TargetParser.
Modeled Triple::GNU after front end code (aapcs abi) and  updated tests that expect apcs abi.

Based heavily on a patch by Ana Pazos!

llvm-svn: 306768
2017-06-30 00:03:54 +00:00
Aditya Nandakumar 20f6207013 [GISel]: New Opcode G_FLOG/G_FLOG2
https://reviews.llvm.org/D34837

llvm-svn: 306766
2017-06-29 23:43:44 +00:00
Taewook Oh 0e35ea3b7c Remove redundant copy in recurrences
Summary:
If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code,

```
BB#1: ; Loop Header
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6: ; Loop Latch
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0
  %vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
```

Existing two-address generation pass generates following code:

```
BB#1:
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6:
    Predecessors according to CFG: BB#5 BB#4
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1
  %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0
  %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
  JMP_1 <BB#7>
```

This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1:

```
.LBB0_6:
  addl  %esi, %edi
  addl  %ebx, %edi
  cmpl  $10, %edi
  movl  %edi, %esi
  jl  .LBB0_1
```

This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes

```
BB#1:
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6: derived from LLVM BB %bb7
    Predecessors according to CFG: BB#5 BB#4
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0
  %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1
  %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
  JMP_1 <BB#7>
```

and the final assembly does not have redundant copy:

```
.LBB0_6:
  addl  %edi, %eax
  addl  %ebx, %eax
  cmpl  $10, %eax
  jl  .LBB0_1
```

Reviewers: qcolombet, MatzeB, wmi

Reviewed By: wmi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31821

llvm-svn: 306758
2017-06-29 23:11:24 +00:00
Simon Dardis dede76f428 Revert "[mips] Fix multiprecision arithmetic."
This reverts commit r305389. This broke chromium builds, so reverting
while I investigate further.

llvm-svn: 306741
2017-06-29 20:59:47 +00:00
Krzysztof Parzyszek 0089419417 [Hexagon] Keep all phi nodes when building DFG in addr-mode-opt
The dead phis are needed for finding correct would-be reaching defs
in register propagation.

llvm-svn: 306690
2017-06-29 15:55:59 +00:00
Yonghong Song 5fbe01b12d bpf: remove unnecessary truncate operation
For networking-type bpf program, it often needs to access
packet data. A context data structure is provided to the bpf
programs with two fields:
        u32 data;
        u32 data_end;
User can access these two fields with ctx->data and ctx->data_end.
During program verification process, the kernel verifier modifies
the bpf program with loading of actual pointer value from kernel
data structure.
    r = ctx->data      ===> r = actual data start ptr
    r = ctx->data_end  ===> r = actual data end ptr

A typical program accessing ctx->data like
    char *data_ptr = (char *)(long)ctx->data
will result in a 32-bit load followed by a zero extension.
Such an operation is combined into a single LDW in DAG combiner
as bpf LDW does zero extension automatically.

In cases like the below (which can be a result of global value numbering
and partial redundancy elimination before insn selection):
B1:
   u32 a = load-32-bit &ctx->data
   u64 pa = zext a
   ...
B2:
   u32 b = load-32-bit &ctx->data
   u64 pb = zext b
   ...
B3:
   u32 m = PHI(a, b)
   u64 pm = zext m

In B3, "pm = zext m" cannot be removed, which although is legal
from compiler perspective, will generate incorrect code after
kernel verification.

This patch recognizes this pattern and traces through PHI node
to see whether the operand of "zext m" is defined with LDWs or not.
If it is, the "zext m" itself can be removed.

The patch also recognizes the pattern where the load and use of
the load value not in the same basic block, where truncate operation
may be removed as well.

The patch handles 1-byte, 2-byte and 4-byte truncation.

Two test cases are added to verify the transformation happens properly
for the above code pattern.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 306685
2017-06-29 15:18:54 +00:00
Nikolai Bozhenov 1925594ea0 [NFC] Use stdin for some tests instead of positional argument.
Summary: Otherwise unexpected matches with the path to the tests might happen.

Reviewers: rengolin, spatel, efriedma, RKSimon

Reviewed By: spatel

Subscribers: n.bozhenov, javed.absar, llvm-commits

Patch by Andrei Elovikov <andrei.elovikov@intel.com>

Differential Revision: https://reviews.llvm.org/D32994

llvm-svn: 306684
2017-06-29 14:51:54 +00:00
Hiroshi Inoue 6989caa931 [PowerPC] fix potential verification error on __tls_get_addr
This patch fixes a verification error with -verify-machineinstrs while expanding __tls_get_addr by not creating ADJCALLSTACKUP and ADJCALLSTACKDOWN if there is another ADJCALLSTACKUP in this basic block since nesting ADJCALLSTACKUP/ADJCALLSTACKDOWN is not allowed.

Here, ADJCALLSTACKUP and ADJCALLSTACKDOWN are created as a fence for instruction scheduling to avoid _tls_get_addr is scheduled before mflr in the prologue (https://bugs.llvm.org//show_bug.cgi?id=25839). So if another ADJCALLSTACKUP exists before _tls_get_addr, we do not need to create a new ADJCALLSTACKUP.

Differential Revision: https://reviews.llvm.org/D34347

llvm-svn: 306678
2017-06-29 14:13:38 +00:00
Daniel Jasper 559aa75382 Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue"
I am 99% sure that this breaks the PPC ASAN build bot:
http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio

If it doesn't go back to green, we can recommit (and fix the original
commit message at the same time :) ).

llvm-svn: 306676
2017-06-29 13:58:24 +00:00
Igor Breger 0cddd34876 [GlobalISel][X86] Support vector type G_MERGE_VALUES selection.
Summary:
Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D33958

llvm-svn: 306665
2017-06-29 12:08:28 +00:00
Simon Pilgrim 9a68e69c68 [X86][SSE] Dropped -mcpu from palignr tests
Use triple and attribute only for consistency 

Add AVX tests as well

llvm-svn: 306664
2017-06-29 11:13:39 +00:00
Simon Pilgrim e2eacbfc23 [X86][SSE] Regenerate shuffle test with update_llc_test_checks.py
llvm-svn: 306663
2017-06-29 11:11:37 +00:00
Simon Pilgrim 0afe97f480 [X86][SSE] Dropped -mcpu from vector shift tests
Use triple and attribute only for consistency 

llvm-svn: 306662
2017-06-29 11:09:53 +00:00
Simon Pilgrim 91539ce2d3 [X86][SSE] Dropped -mcpu from zero insertion tests
Use triple and attribute only for consistency 

llvm-svn: 306661
2017-06-29 11:08:11 +00:00
Michael Zuckerman 4bcb9c3349 [LLVM][X86][Goldmont] Adding new target-cpu: Goldmont
[LLVM SIDE]
Connecting the GoldMont processor to his feature.

Reviewers: 
1. igorb
2. zvi
3. delena
4. RKSimon
5. craig.topper        

Differential Revision: https://reviews.llvm.org/D34504

llvm-svn: 306658
2017-06-29 10:00:33 +00:00
Florian Hahn 08fdd040b5 [ARM] Add tGPRwithpc register class and use it for TBB/THH
Summary:
TBB and THH allow using a Thumb GPR or the PC as destination operand.
A few machine verifier failures where due to those instructions not
expecting PC as destination operand.

Add -verify-machineinstrs to test/CodeGen/ARM/jump-table-tbh.ll to add
test coverage even if expensive checks are disabled.



Reviewers: MatzeB, t.p.northover, jmolloy

Reviewed By: MatzeB

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34610

llvm-svn: 306654
2017-06-29 08:45:31 +00:00
Zvi Rackover da3943d600 [X86] Adding shuffle tests demonstrating missed vcompress opportunities. NFC
llvm-svn: 306646
2017-06-29 06:22:01 +00:00
Matt Arsenault 7c525903ef AMDGPU: Remove SITypeRewriter
This was an old workaround for using v16i8 in some old intrinsics
for resource descriptors.

llvm-svn: 306603
2017-06-28 21:38:50 +00:00
Stanislav Mekhanoshin a45584bebe Fold fneg and fabs like multiplications
Given no NaNs and no signed zeroes it folds:

(fmul X, (select (fcmp X > 0.0), -1.0, 1.0)) -> (fneg (fabs X))
(fmul X, (select (fcmp X > 0.0), 1.0, -1.0)) -> (fabs X)

Differential Revision: https://reviews.llvm.org/D34579

llvm-svn: 306592
2017-06-28 20:25:50 +00:00
Chih-Hung Hsieh 514dafdae3 Another test commit.
llvm-svn: 306567
2017-06-28 17:12:51 +00:00
Krzysztof Parzyszek 3008594cd4 Missed a check for UndefVI in r306466
llvm-svn: 306553
2017-06-28 15:46:16 +00:00
Alexandros Lamprineas c0432d86aa [AArch64] AArch64CondBrTuningPass generates wrong branch instructions
Some conditional branch instructions generated by this pass are checking
the wrong condition code. The instructions TBZ and TBNZ are transformed
into B.GE and B.LT instead of B.PL and B.MI respectively. They should
only be checking the Negative bit.

Differential Revision: https://reviews.llvm.org/D34743

llvm-svn: 306550
2017-06-28 15:09:11 +00:00
John Brawn 75d76e5e95 [ARM] Improve if-conversion for M-class CPUs without branch predictors
The current heuristic in isProfitableToIfCvt assumes we have a branch predictor,
and so gives the wrong answer in some cases when we don't. This patch adds a
subtarget feature to indicate that a subtarget has no branch predictor, and
changes the heuristic in isProfitableToiIfCvt when it's present. This gives a
slight overall improvement in a set of embedded benchmarks on Cortex-M4 and
Cortex-M33.

Differential Revision: https://reviews.llvm.org/D34398

llvm-svn: 306547
2017-06-28 14:11:15 +00:00
Simon Pilgrim 48b30c3d55 [X86] Added BSWAP tests for illegal i64/i128/i256 'wide' scalar integers
llvm-svn: 306546
2017-06-28 14:07:50 +00:00
Simon Pilgrim 4f5fcb03ad [X86][SSE] Dropped -mcpu from vector bswap tests
Use triple and attribute only for consistency 

llvm-svn: 306545
2017-06-28 13:59:15 +00:00
Michael Zuckerman d0e663a697 [X86][LLVM][test]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess test.
Exapnding the test to include AVX target. 
Adding base tast (to trunk) for Store strid=4 vf=32. 

llvm-svn: 306543
2017-06-28 13:42:45 +00:00
Teresa Johnson 538b8d25f0 Add zero-length check to memcpy/memset load store loop expansion
Summary:
I was testing using this expansion logic in other cases besides
NVPTX, and found some runtime failures due to the lack of a check
for a zero length memcpy/memset before the loop. There is already
such a check in the memmove expansion code though.

Reviewers: hfinkel

Subscribers: jholewinski, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D34707

llvm-svn: 306541
2017-06-28 13:07:37 +00:00
Igor Breger 86cf07a32e [GlobalISel][X86] Test G_CONSTANT i32 0 TableGen'erated selection.NFC.
llvm-svn: 306537
2017-06-28 12:43:21 +00:00
Igor Breger d5b59cf914 [GlobalISel][X86] Support bitwise operations : G_AND, G_OR, G_XOR
Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code.

Reviewers: zvi, guyblank, aymanmus, m_zuckerman

Reviewed By: aymanmus

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34605

llvm-svn: 306533
2017-06-28 11:39:04 +00:00
Michael Zuckerman f66840020c Reverting commit 306414 on behalf of @gadi.haber
llvm-svn: 306532
2017-06-28 11:23:31 +00:00
Simon Pilgrim b9fa16bc53 [X86][AVX2] Dropped -mcpu from avx2 arithmetic/intrinsics tests
Use triple and attribute only for consistency 

llvm-svn: 306531
2017-06-28 10:54:54 +00:00
Petar Jovanovic 7b3a38ec30 [X86] Correct dwarf unwind information in function epilogue
CFI instructions that set appropriate cfa offset and cfa register are now
inserted in emitEpilogue() in X86FrameLowering.

Majority of the changes in this patch:

1. Ensure that CFI instructions do not affect code generation.
2. Enable maintaining correct information about cfa offset and cfa register
in a function when basic blocks are reordered, merged, split, duplicated.

These changes are target independent and described below.

Changed CFI instructions so that they:

1. are duplicable
2. are not counted as instructions when tail duplicating or tail merging
3. can be compared as equal

Add information to each MachineBasicBlock about cfa offset and cfa register
that are valid at its entry and exit (incoming and outgoing CFI info). Add
support for updating this information when basic blocks are merged, split,
duplicated, created. Add a verification pass (CFIInfoVerifier) that checks
that outgoing cfa offset and register of predecessor blocks match incoming
values of their successors.

Incoming and outgoing CFI information is used by a late pass
(CFIInstrInserter) that corrects CFA calculation rule for a basic block if
needed. That means that additional CFI instructions get inserted at basic
block beginning to correct the rule for calculating CFA. Having CFI
instructions in function epilogue can cause incorrect CFA calculation rule
for some basic blocks. This can happen if, due to basic block reordering,
or the existence of multiple epilogue blocks, some of the blocks have wrong
cfa offset and register values set by the epilogue block above them.

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D18046

llvm-svn: 306529
2017-06-28 10:21:17 +00:00
Kristof Beyls eecb353d0e [ARM] Make -mcpu=generic schedule for an in-order core (Cortex-A8).
The benchmarking summarized in
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113525.html showed
this is beneficial for a wide range of cores.

As is to be expected, quite a few small adaptations are needed to the
regressions tests, as the difference in scheduling results in:
- Quite a few small instruction schedule differences.
- A few changes in register allocation decisions caused by different
 instruction schedules.
- A few changes in IfConversion decisions, due to a difference in
 instruction schedule and/or the estimated cost of a branch mispredict.

llvm-svn: 306514
2017-06-28 07:07:03 +00:00
Stanislav Mekhanoshin d445455643 [AMDGPU] Add pattern for v_alignbit_b32 with immediate
If immediate in shift is less than 32 we can use alignbit too.

Differential Revision: https://reviews.llvm.org/D34729

llvm-svn: 306500
2017-06-28 02:52:39 +00:00
Stanislav Mekhanoshin eb40733bf0 Allow to truncate left shift with non-constant shift amount
That is pretty common for clang to produce code like
(shl %x, (and %amt, 31)). In this situation we can still perform
trunc (shl) into shl (trunc) conversion given the known value
range of shift amount.

Differential Revision: https://reviews.llvm.org/D34723

llvm-svn: 306499
2017-06-28 02:37:11 +00:00
Sanjay Patel 4b23fa0abf [CGP] add specialization for memcmp expansion with only one basic block
llvm-svn: 306485
2017-06-27 23:15:01 +00:00
Aditya Nandakumar cca75d2406 [GISel]: Add G_FEXP, G_FEXP2 opcodes
Also add IRTranslator support.
https://reviews.llvm.org/D34710

llvm-svn: 306475
2017-06-27 22:19:32 +00:00
Sanjay Patel 70b36f193d [CGP] eliminate a sub instruction in memcmp expansion
As noted in D34071, there are some IR optimization opportunities that could be 
handled by normal IR passes if this expansion wasn't happening so late in CGP.

Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, 
so I'm proposing this change:
  %s = sub i32 %x, %y
  %r = icmp ne %s, 0
    =>
  %r = icmp ne %x, %y

Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.

The fact that the PowerPC backend doesn't eliminate the 'subf.' might be 
something for PPC folks to investigate separately.

Differential Revision: https://reviews.llvm.org/D34416

llvm-svn: 306471
2017-06-27 21:46:34 +00:00
Tim Northover 849fcca090 GlobalISel: verify that a COPY is trivial when created.
Without this check, COPY instructions can actually be one of the generic casts
in disguise. That's confusing and bad.

At some point during ISel this restriction has to be relaxed since the fully
selected instructions will usually use COPY for those purposes. Right now I
think it's possible that relaxation occurs during RegBankSelect (hence the
change there). I'm not convinced that's where it belongs long-term though.

llvm-svn: 306470
2017-06-27 21:41:40 +00:00
Krzysztof Parzyszek 0b7688e6c0 Create a PHI value when merging with a known undef live-in
Differential Revision: https://reviews.llvm.org/D34640

llvm-svn: 306466
2017-06-27 21:30:46 +00:00
Krzysztof Parzyszek 25173e4cba [Hexagon] Use proper predicate register state when expanding PS_vselect
llvm-svn: 306458
2017-06-27 19:59:46 +00:00
Stanislav Mekhanoshin e8bf6c9629 [AMDGPU] Add 2 new alignbit patterns
Differential Revision: https://reviews.llvm.org/D34655

llvm-svn: 306449
2017-06-27 19:10:47 +00:00
Stanislav Mekhanoshin c9bd53ab59 [AMDGPU] Simplify setcc (sext from i1 b), -1|0, cc
Depending on the compare code that can be either an argument of
sext or negate of it. This helps to avoid v_cndmask_b64 instruction
for sext. A reversed value can be further simplified and folded into
its parent comparison if possible.

Differential Revision: https://reviews.llvm.org/D34545

llvm-svn: 306446
2017-06-27 18:53:03 +00:00
Krzysztof Parzyszek 5ddd2e5899 [Hexagon] Update kills in hexagon-nvj even more properly than before
Account for the fact that both, the feeder and the compare can be moved
over instructions that kill registers.

llvm-svn: 306443
2017-06-27 18:37:16 +00:00
Matt Arsenault 836d786e86 RenameIndependentSubregs: Fix infinite loop
Apparently this replacement can really be substituting the
same as the original register. Avoid restarting the loop
when there's been no change in the register uses.

llvm-svn: 306441
2017-06-27 18:28:10 +00:00
Stanislav Mekhanoshin 6851ddf942 [AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0
Also factored out function to check if a boolean is an already
deserialized value which does not require v_cndmask_b32 to be
loaded. Added binary logical operators to its check.

Differential Revision: https://reviews.llvm.org/D34500

llvm-svn: 306439
2017-06-27 18:25:26 +00:00
Chih-Hung Hsieh ff680f0386 Another test commit
llvm-svn: 306420
2017-06-27 16:18:41 +00:00
Gadi Haber 13759a7ed6 Updated and extended the information about each instruction in HSW and SNB to include the following data:
•static latency
•number of uOps from which the instructions consists
•all ports used by the instruction

Reviewers: 
 RKSimon 
 zvi  
aymanmus  
m_zuckerman 

Differential Revision: https://reviews.llvm.org/D33897
 

llvm-svn: 306414
2017-06-27 15:05:13 +00:00
Sam Kolton a179d25b99 [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructions
Summary:
1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it.
2. There were several problems with support of VOPC instructions in SDWA peephole pass.

Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl

Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye

Differential Revision: https://reviews.llvm.org/D34626

llvm-svn: 306413
2017-06-27 15:02:23 +00:00
Matthew Simpson 0bd79f416a [AArch64] Update successor probabilities after ccmp-conversion
This patch modifies the conditional compares pass so that it keeps successor
probabilities up-to-date after the conversion. Previously, successor
probabilities were being normalized to a uniform distribution, even though they
may have been heavily biased prior to the conversion (e.g., if one of the edges
was the back edge of a loop). This loss of information affected passes later in
the pipeline.

Differential Revision: https://reviews.llvm.org/D34109

llvm-svn: 306412
2017-06-27 15:00:22 +00:00
Hiroshi Inoue 84aafee4fb [SelectionDAG] set dereferenceable flag in MergeConsecutiveStores to fix assetion failure
When SelectionDAG merges consecutive stores and loads in MergeConsecutiveStores, it does not set dereferenceable flag for a created load instruction. This results in an assertion failure if SelectionDAG commonizes this load instruction with other load instructions, as well as it may miss optimization opportunities.

This patch sat dereferenceable flag for the newly created load instruction if all the load instructions to be merged are dereferenceable.

Differential Revision: https://reviews.llvm.org/D34679

llvm-svn: 306404
2017-06-27 12:43:08 +00:00
Ayman Musa 721d97f7b8 Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371
[X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).

AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.

When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.

Differential Revision: https://reviews.llvm.org/D33188

llvm-svn: 306402
2017-06-27 12:08:37 +00:00
Diana Picus 0e74a134f8 [ARM] GlobalISel: Support G_SELECT for pointers
All we need to do is mark it as legal, otherwise it's just like s32.

llvm-svn: 306390
2017-06-27 10:29:50 +00:00
Simon Pilgrim 71d8b67bea [X86][AVX512] Regenerate avx512 arithmetic tests
llvm-svn: 306389
2017-06-27 10:13:56 +00:00
Daniel Sanders cc36dbf55d [globalisel][tablegen] Add support for EXTRACT_SUBREG.
Summary:
After this patch, we finally have test cases that require multiple
instruction emission.

Depends on D33590

Reviewers: ab, qcolombet, t.p.northover, rovka, kristof.beyls

Subscribers: javed.absar, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D33596

llvm-svn: 306388
2017-06-27 10:11:39 +00:00
Diana Picus 7145d22f81 [ARM] GlobalISel: Support G_SELECT for i32
* Mark as legal for (s32, i1, s32, s32)
* Map everything into GPRs
* Select to two instructions: a CMP of the condition against 0, to set
  the flags, and a MOVCCr to select between the two inputs based on the
  flags that we've just set

llvm-svn: 306382
2017-06-27 09:19:51 +00:00
Hiroshi Inoue 912eec378c [PowerPC] fix incorrect processor name for -mcpu in a test case
to surpress warnings. ppc970 should be 970 (or g5)

llvm-svn: 306380
2017-06-27 08:35:35 +00:00
Nicolai Haehnle 43cc6c4e0f AMDGPU: M0 operands to spill/restore opcodes are dead
Summary:
With scalar stores, M0 is clobbered and therefore marked as implicitly
defined. However, it is also dead.

This fixes an assertion when the Greedy Register Allocator decides to
optimize a spill/restore pair away again (via tryHintsRecoloring).

Reviewers: arsenm

Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33319

llvm-svn: 306375
2017-06-27 08:04:13 +00:00
Igor Breger 925f088bae [GlobalISel][X86] Add fp32/62 legalizer, regbank-select, selection tests for G_FADD, G_FSUB, G_FMUL, G_FDIV. NFC.
llvm-svn: 306370
2017-06-27 07:01:54 +00:00
Hiroshi Inoue 5102028f63 [PowerPC] set optimization level in SelectionDAGISel
PowerPC backend does not pass the current optimization level to SelectionDAGISel and so SelectionDAGISel works with the default optimization level regardless of the current optimization level.
This patch makes the PowerPC backend set the optimization level correctly.

Differential Revision: https://reviews.llvm.org/D34615

llvm-svn: 306367
2017-06-27 04:52:17 +00:00
Matthias Braun e2ae001982 ScheduleDAGInstrs: Fix fixupKills() adding too many kill flags.
Remove invalid shortcut in fixupKills(): A register needs to be marked
live even when we are not adding a kill flag. This is because a
partially live register must not get a kill flags, but it still needs to
be fully marked live when walking backwards.

llvm-svn: 306352
2017-06-27 00:58:48 +00:00
Wolfgang Pieb 9f65858235 DAGCombine: Make sure we only eliminate trunc/extend when the scales of truncation and extension match.
This fixes PR33368.

Reviewer: rksimon

Differential Revision:  https://reviews.llvm.org/D34069

llvm-svn: 306345
2017-06-26 23:05:51 +00:00
Sanjay Patel b859910eb2 [x86] add tests for missing sbb transforms; NFC
llvm-svn: 306337
2017-06-26 22:20:07 +00:00
Matt Arsenault 53fae0772a RenameIndependentSubregs: Fix iterator problem
Fixes bug 33597.

Use of substituteRegister in the tied operand case messes
up the register use iterator, causing some uses to be left
unprocessed.

llvm-svn: 306333
2017-06-26 21:33:36 +00:00
Tim Northover c2d5e6d637 AArch64: legalize G_EXTRACT operations.
This is the dual problem to legalizing G_INSERTs so most of the code and
testing was cribbed from there.

llvm-svn: 306328
2017-06-26 20:34:13 +00:00
Tim Northover 9ac3e42211 AArch64: remove all kill flags when extending register liveness.
When we forward a stored value to a load and eliminate it entirely we need to
make sure the liveness of the register is maintained all the way to its use.
Previously we only cleared liveness on the store doing the forwarding, but
there could be other killing uses in between.

We already do the right thing when the load has to be converted into something
else, it was just this one path that skipped it.

llvm-svn: 306318
2017-06-26 18:49:25 +00:00
Simon Pilgrim d58f051792 [X86][SSE] Check SSE2/SSE3 codegen tests on i686 and x86_64
llvm-svn: 306314
2017-06-26 18:20:46 +00:00
Matt Arsenault f28683cf51 AMDGPU: Setup SP/FP in callee function prolog/epilog
llvm-svn: 306312
2017-06-26 17:53:59 +00:00
Ulrich Weigand af98b748f6 [SystemZ] Fix missing emergency spill slot corner case
We sometimes need emergency spill slots for the register scavenger.
This may be the case when code needs to access a stack slot that
has an offset of 4096 or more relative to the stack pointer.

To make that determination, processFunctionBeforeFrameFinalized
currently simply checks the total stack frame size of the current
function.  But this is not enough, since code may need to access
stack slots in the caller's stack frame as well, in particular
incoming arguments stored on the stack.

This commit fixes the problem by taking argument slots into account.

llvm-svn: 306305
2017-06-26 16:50:32 +00:00
Simon Pilgrim f07663876a [X86][SSE] Add combine tests for PMULDQ/PMULUDQ
Found several missed optimizations while investigating replacing _mm_mul_epi32/_mm_mul_epu32 with generic implementations

llvm-svn: 306302
2017-06-26 16:22:52 +00:00
Ahmed Bougacha 58a197414e [X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc.
The non-AVX-512 behavior was changed in r248266 to match N1778
(C bindings for IEEE-754 (2008)), which defined the four functions
to not raise the inexact exception ("rint" is still defined as raising
it).

Update the AVX-512 lowering of these functions to match that: it should
not be different.

llvm-svn: 306299
2017-06-26 16:00:24 +00:00
Tom Stellard eb8f1e27d9 AMDGPU/GlobalISel: Mark 32-bit G_SHL as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34589

llvm-svn: 306298
2017-06-26 15:56:52 +00:00
Simon Pilgrim 0ad0e5802b [X86] Add test case for PR15981
llvm-svn: 306296
2017-06-26 15:53:11 +00:00
Sanjay Patel 15748d239e [x86] transform vector inc/dec to use -1 constant (PR33483)
Convert vector increment or decrement to sub/add with an all-ones constant:

add X, <1, 1...> --> sub X, <-1, -1...>
sub X, <1, 1...> --> add X, <-1, -1...>

The all-ones vector constant can be materialized using a pcmpeq instruction that is 
commonly recognized as an idiom (has no register dependency), so that's better than 
loading a splat 1 constant.

AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better
way to produce 512 one-bits.

The general advantages of this lowering are:
1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, 
   so in theory, this could be better for perf, but...

2. That seems unlikely to affect any OOO implementation, and I can't measure any real 
   perf difference from this transform on Haswell or Jaguar, but...

3. It doesn't look like it from the diffs, but this is an overall size win because we 
   eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting 
   a scalar load (which might itself be a bug), then we're replacing a scalar constant 
   load + broadcast with a single cheap op, so that should always be smaller/better too.

4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 
   and psub x, -1, so we should use that form for +1 too because we can. If there's some
   reason to favor a constant load on some CPU, let's make the reverse transform for all
   of these cases (either here in the DAG or in a later machine pass).

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=33483

Differential Revision: https://reviews.llvm.org/D34336

llvm-svn: 306289
2017-06-26 14:19:26 +00:00
Krzysztof Parzyszek 918e6d70bd [Hexagon] Handle cases when the aligned stack pointer is missing
llvm-svn: 306288
2017-06-26 14:17:58 +00:00
Jonas Paulsson 8c33647ba1 [SystemZ] Add a check against zero before calling getTestUnderMaskCond()
Csmith discovered that this function can be called with a zero argument,
in which case an assert for this triggered.

This patch also adds a guard before the other call to this function since
it was missing, although the test only covers the case where it was
discovered.

Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll.

Review: Ulrich Weigand
llvm-svn: 306287
2017-06-26 13:38:27 +00:00
Michael Zuckerman ce7e187f84 [X86][LLVM][test]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess test.
Adding base tast (to trunk) for Store strid=4 vf=32. 

llvm-svn: 306286
2017-06-26 13:27:32 +00:00
Serguei Katkov 0e70206c8f This reverts commit r306272.
Revert "[MBP] do not rotate loop if it creates extra branch"

It breaks the sanitizer build bots. Need to fix this.

llvm-svn: 306276
2017-06-26 06:51:45 +00:00
Serguei Katkov b01fff06ed [MBP] do not rotate loop if it creates extra branch
This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.

We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.

Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one

So if C is not a predecessor of H then we introduce extra branch.

This change actually prohibits rotation of the loop if both true
1) Best Exit has next element in chain as successor.
2) Last element in chain is not a predecessor of first element of chain.

Reviewers: iteratee, xur
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34271

llvm-svn: 306272
2017-06-26 05:27:27 +00:00
Matt Arsenault 10fc062b2b AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.

Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.

llvm-svn: 306264
2017-06-26 03:01:31 +00:00
Simon Pilgrim 9956364a1f [X86] Add test case for PR15705
llvm-svn: 306246
2017-06-25 16:12:45 +00:00
Elena Demikhovsky 72f991cded AVX-512: Fixed a crash during legalization of <3 x i8> type
The compiler fails with assertion during legalization of SETCC for <3 x i8> operands.
The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>.

Differential Revision: https://reviews.llvm.org/D34503

llvm-svn: 306242
2017-06-25 13:36:20 +00:00
Igor Breger f5035d6ee5 [GlobalISel][X86] Support vector type G_EXTRACT selection.
Summary:
Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33957

llvm-svn: 306240
2017-06-25 11:42:17 +00:00
Hiroshi Inoue a85d24b73d fix trivial typos in comment, NFC
llvm-svn: 306211
2017-06-24 16:00:26 +00:00
Hiroshi Inoue 95f24dca98 [SelectionDAG] set dereferenceable flag when expanding memcpy/memmove
When SelectionDAG expands memcpy (or memmove) call into a sequence of load and store instructions, it disregards dereferenceable flag even the source pointer is known to be dereferenceable.
This results in an assertion failure if SelectionDAG commonizes a load instruction generated for memcpy with another load instruction for the source pointer.
This patch makes SelectionDAG to set the dereferenceable flag for the load instructions properly to avoid the assertion failure.

Differential Revision: https://reviews.llvm.org/D34467

llvm-svn: 306209
2017-06-24 15:17:38 +00:00
Rafael Espindola b05f4a7b25 Add missing %s to RUN line.
llvm-svn: 306199
2017-06-24 04:41:39 +00:00
Rafael Espindola 2c166857b3 Test the object file creation too.
This should *really* be a llvm-mc test, but the parser is broken.
See PR33579 for the parser bug.

llvm-svn: 306198
2017-06-24 04:31:45 +00:00
Nirav Dave 18c10c53d0 Update constants in complex-return test to prevent reduction to smaller constants
llvm-svn: 306192
2017-06-24 01:29:24 +00:00
Vadzim Dambrouski 9e0d3878fb [MSP430] Fix data layout string.
Summary:
Without this patch some types have incorrect size and/or alignment
according to the MSP430 EABI.

Reviewers: asl, awygle

Reviewed By: asl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34561

llvm-svn: 306159
2017-06-23 21:11:45 +00:00
Nirav Dave cedfeb364f Add bitcast store-merge test.
llvm-svn: 306158
2017-06-23 20:52:14 +00:00
Krzysztof Parzyszek 717021772b Revert "[Hexagon] Handle decreasing of stack alignment in frame lowering"
This breaks passing of aligned function arguments.

llvm-svn: 306145
2017-06-23 19:47:04 +00:00
Chad Rosier 6db9ff64a8 [AArch64] Prefer Bcc to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free".
This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a
conditional branch (Bcc), when the NZCV flags can be set for "free". This is
preferred on targets that have more flexibility when scheduling Bcc
instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are
equal). This can reduce register pressure and is also the default behavior for
GCC.

A few examples:

 add w8, w0, w1  -> cmn w0, w1             ; CMN is an alias of ADDS.
 cbz w8, .LBB_2  -> b.eq .LBB0_2           ; single def/use of w8 removed.

 add w8, w0, w1  -> adds w8, w0, w1        ; w8 has multiple uses.
 cbz w8, .LBB1_2 -> b.eq .LBB1_2

 sub w8, w0, w1       -> subs w8, w0, w1   ; w8 has multiple uses.
 tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2

In looking at all current sub-target machine descriptions, this transformation
appears to be either positive or neutral.

Differential Revision: https://reviews.llvm.org/D34220.

llvm-svn: 306144
2017-06-23 19:20:12 +00:00
Sanjay Patel 3de6bad65f [x86] fix value types for SBB transform (PR33560)
I'm not sure yet why this wouldn't fail in the simple case,
but clearly I used the wrong value type with:
https://reviews.llvm.org/rL306040

...and the bug manifests with:
https://bugs.llvm.org/show_bug.cgi?id=33560

llvm-svn: 306139
2017-06-23 18:42:15 +00:00
Simon Pilgrim 19cee0d56c [X86][AVX] Regenerate i256 bitcasted store test
Check on slow/fast unaligned memory targets

llvm-svn: 306138
2017-06-23 18:34:56 +00:00
Simon Pilgrim dfa436079f Regenerate extract-store.ll tests
llvm-svn: 306131
2017-06-23 17:19:44 +00:00
Krzysztof Parzyszek bb2fcd1921 [Hexagon] Handle decreasing of stack alignment in frame lowering
llvm-svn: 306124
2017-06-23 16:53:59 +00:00
Tim Northover 4b4eec7009 GlobalISel: remove G_SEQUENCE instruction.
It was trying to do too many things. The basic lumping together of values for
legalization purposes is now handled by G_MERGE_VALUES. More complex things
involving gaps and odd sizes are handled by G_INSERT sequences.

llvm-svn: 306120
2017-06-23 16:15:55 +00:00
Tim Northover b57bf2ac79 GlobalISel: convert buildSequence to use non-deprecated instructions.
G_SEQUENCE is going away soon so as a first step the MachineIRBuilder needs to
be taught how to emulate it with alternatives. We use G_MERGE_VALUES where
possible, and a sequence of G_INSERTs if not.

llvm-svn: 306119
2017-06-23 16:15:37 +00:00
Ulrich Weigand eaf0051ba3 [SystemZ] Remove unnecessary serialization before volatile loads
This reverts the use of TargetLowering::prepareVolatileOrAtomicLoad
introduced by r196905.  Nothing in the semantics of the "volatile"
keyword or the definition of the z/Architecture actually requires
that volatile loads are preceded by a serialization operation, and
no other compiler on the platform actually implements this.

Since we've now seen a use case where this additional serialization
causes noticable performance degradation, this patch removes it.

The patch still leaves in the serialization before atomic loads,
which is now implemented directly in lowerATOMIC_LOAD.  (This also
seems overkill, but that can be addressed separately.)

llvm-svn: 306117
2017-06-23 15:56:14 +00:00
Sanjay Patel 021f32fd0f [x86] auto-generate complete checks; NFC
llvm-svn: 306114
2017-06-23 15:29:49 +00:00
Sanjay Patel 02469b63c2 [x86] auto-generate complete checks; NFC
llvm-svn: 306113
2017-06-23 15:22:27 +00:00
Tom Stellard af552dc352 AMDGPU/GlobalISel: Mark 32-bit G_AND as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34349

llvm-svn: 306112
2017-06-23 15:17:17 +00:00
Sanjay Patel 563e5afa0e [x86] remove overridden target settings in test; NFC
r306109 was supposed to make this change, but I committed the wrong version.

llvm-svn: 306110
2017-06-23 15:06:30 +00:00
Sanjay Patel 8e06df4303 [x86] rename test file and auto-generate complete checks; NFC
The command-line params override the target setting in the file itself, so delete that.
Also, remove the cpu and arch because those don't matter and neither does the OS specification in the triple.

llvm-svn: 306109
2017-06-23 14:58:21 +00:00
Simon Pilgrim 859b48d2d3 [X86][AVX] Extended vector average tests
Added AVX1 tests and merged AVX1/AVX2/AVX512 checks where possible

llvm-svn: 306107
2017-06-23 14:38:00 +00:00
Jonas Paulsson 82f15a7168 [SystemZ] Fix trap issue and enable expensive checks.
The isBarrier/isTerminator flags have been removed from the SystemZ trap
instructions, so that tests do not fail with EXPENSIVE_CHECKS. This was just
an issue at -O0 and did not affect code output on benchmarks.

(Like Eli pointed out: "targets are split over whether they consider their
"trap" a terminator; x86, AArch64, and NVPTX don't, but ARM, MIPS, PPC, and
SystemZ do. We should probably try to be consistent here.". This is still the
case, although SystemZ has switched sides).

SystemZ now returns true in isMachineVerifierClean() :-)

These Generic tests have been modified so that they can be run with or without
EXPENSIVE_CHECKS: CodeGen/Generic/llc-start-stop.ll and
CodeGen/Generic/print-machineinstrs.ll

Review: Ulrich Weigand, Simon Pilgrim, Eli Friedman
https://bugs.llvm.org/show_bug.cgi?id=33047
https://reviews.llvm.org/D34143

llvm-svn: 306106
2017-06-23 14:30:46 +00:00
Simon Pilgrim dbd20ffee1 [X86][SSE] Dropped -mcpu from vector average tests
Use triple and attribute only for consistency 

llvm-svn: 306104
2017-06-23 14:16:50 +00:00
Simon Pilgrim dbf8f5ace7 [X86][SSE] Dropped -mcpu from scalar math tests
Use triple and attribute only for consistency 

llvm-svn: 306097
2017-06-23 13:07:20 +00:00
Simon Pilgrim 5d3d716815 [X86][SSE] Dropped -mcpu from insertps tests
Use triple and attribute only for consistency 

llvm-svn: 306092
2017-06-23 11:00:49 +00:00
Stefan Maksimovic b794c0a5ca [mips][msa] Splat.d endianness check
Before this change, it was always the first element of a vector that got splatted since the lower 6 bits of vshf.d $wd were always zero for little endian.
Additionally, masking has been performed for vshf via which splat.d is created.

Vshf has a property where if its first operand's elements have either bit 6 or 7 set, destination element is set to zero.
Initially masked with 63 to avoid this property, which would result in generation of and.v + vshf.d in all cases.
Masking with one results in generating a single splati.d instruction when possible.

Differential Revision: https://reviews.llvm.org/D32216

llvm-svn: 306090
2017-06-23 09:09:31 +00:00
Sanjay Patel 359ae44fb4 [x86] add/sub (X==0) --> sbb(cmp X, 1)
This is very similar to the transform in:
https://reviews.llvm.org/rL306040
...but in this case, we use cmp X, 1 to set the carry bit as needed.

Again, we can show that all of these are logically equivalent (although
InstCombine currently canonicalizes to a form not seen here), and if
we believe IACA, then this is the smallest/fastest code. Eg, with SNB:

| Num Of |              Ports pressure in cycles               |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |    |
---------------------------------------------------------------------
|   1    | 1.0       |     |           |           |     |     |    | cmp edi, 0x1
|   2    |           | 1.0 |           |           |     | 1.0 | CP | sbb eax, eax


The larger motivation is to clean up all select-of-constants combining/lowering 
because we're missing some common cases.

llvm-svn: 306072
2017-06-22 23:47:15 +00:00
Farhana Aleen 4b652a5335 Supported lowerInterleavedStore() in X86InterleavedAccess.
Reviewers: RKSimon, DavidKreitzer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32658

llvm-svn: 306068
2017-06-22 22:59:04 +00:00
Sanjay Patel ff051957fc [x86] add more tests for select --> sbb transform; NFC
These are siblings of the tests added with r306032.

llvm-svn: 306064
2017-06-22 22:17:05 +00:00
Jacob Gravelle a31ec61c46 [WebAssembly] WebAssemblyFastISel getelementptr variable index support
Summary:
Previously -fast-isel getelementptr would constant-fold non-constant i8
load/stores.

Reviewers: sunfish

Subscribers: jfb, dschuff, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D34044

llvm-svn: 306060
2017-06-22 21:26:08 +00:00
Krzysztof Parzyszek 9b7c1d2dcf [Hexagon] Properly update kill flags in HexagonNewValueJump
The feeder instruction will be moved to right before the compare, so
the updating code should not be looking for kills past the compare.

llvm-svn: 306059
2017-06-22 21:11:44 +00:00
Krzysztof Parzyszek 1a0da8d5a3 [Hexagon] Use LivePhysRegs to fix up kills in HexagonGenMux
Remove the previous, manual shuffling of the kill flags. 

llvm-svn: 306054
2017-06-22 20:43:02 +00:00
Craig Topper 792fc92be2 [AVX-512] Remove and autoupgrade the masked integer compare intrinsics
Summary:
These intrinsics aren't used by clang and haven't been for a while.

There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today.

Reviewers: spatel, RKSimon, zvi, igorb

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34389

llvm-svn: 306047
2017-06-22 20:11:01 +00:00
Sanjay Patel 41a34e4111 [x86] add/sub (X==0) --> sbb(neg X)
Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480),
lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb'
codegen in 1 out of the 4 tests. This is a step towards smoothing that out.

First, show that all of these IR forms are equivalent:
http://rise4fun.com/Alive/mx

Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge
(later Intel and AMD chips are similar based on Agner's tables):

This is the "obvious" x86 codegen (what gcc appears to produce currently):

| Num Of |              Ports pressure in cycles               |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |    |
---------------------------------------------------------------------
|   1*   |           |     |           |           |     |     |    | xor eax, eax
|   1    | 1.0       |     |           |           |     |     | CP | test edi, edi
|   1    |           |     |           |           |     | 1.0 | CP | setnz al
|   1    |           | 1.0 |           |           |     |     | CP | neg eax


This is the adc version:
|   1*   |           |     |           |           |     |     |    | xor eax, eax
|   1    | 1.0       |     |           |           |     |     | CP | cmp edi, 0x1
|   2    |           | 1.0 |           |           |     | 1.0 | CP | adc eax, 0xffffffff


And this is sbb:
|   1    | 1.0       |     |           |           |     |     |    | neg edi
|   2    |           | 1.0 |           |           |     | 1.0 | CP | sbb eax, eax

If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be
clearly better than the alternatives going forward.

llvm-svn: 306040
2017-06-22 18:11:19 +00:00
Sanjay Patel 96e4e0967e [x86] add tests for select --> sbb transform; NFC
llvm-svn: 306032
2017-06-22 17:01:14 +00:00
David Stuttard 70e8bc1bf3 [AMDGPU] Add intrinsics for tbuffer load and store
Intrinsic already existed for llvm.SI.tbuffer.store

Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.*

Added CodeGen tests for the 2 new variants added.
Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr

Differential Revision: https://reviews.llvm.org/D30687

llvm-svn: 306031
2017-06-22 16:29:22 +00:00
Krzysztof Parzyszek 9bdb460f64 [Hexagon] Fix typo in a testcase
llvm-svn: 306030
2017-06-22 16:25:46 +00:00
Krzysztof Parzyszek f63ad39e7d [Hexagon] Handle a global operand to A2_addi when creating duplexes
llvm-svn: 306012
2017-06-22 15:53:31 +00:00
whitequark cebe8241ca [X86] Add support for "probe-stack" attribute
This commit adds prologue code emission for stack probe function
calls.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D34387

llvm-svn: 306010
2017-06-22 15:42:53 +00:00
Krzysztof Parzyszek 69ffba4595 [Hexagon] Recognize potential offset overflow for store-imm to stack
Reserve an extra scavenging stack slot if the offset field in store-
-immediate instructions may overflow.

llvm-svn: 306004
2017-06-22 14:11:23 +00:00
Sam Kolton ca5a30ed74 [AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit encoding
Summary:
Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA.
There are no real instructions for them, but there are pseudo instructions.

Reviewers: arsenm, vpykhtin, cfang

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34403

llvm-svn: 305999
2017-06-22 12:42:14 +00:00
Kristof Beyls 9665249fd8 Don't conditionalize Neon instructions, even in IT blocks.
This has been deprecated since ARMARM v7-AR, release C.b, published back
in 2012.

This also removes test/CodeGen/Thumb2/ifcvt-neon.ll that originally was
introduced to check that conditionalization of Neon instructions did
happen when generating Thumb2. However, the test had evolved and was no
longer testing that. Rather than trying to adapt that test, this commit
introduces test/CodeGen/Thumb2/ifcvt-neon-deprecated.mir, since we can
now use the MIR framework to write nicer/more maintainable tests.

llvm-svn: 305998
2017-06-22 12:11:38 +00:00
Igor Breger 1c29be7e4f [GlobalISel][X86] Support vector type G_INSERT legalization/selection.
Summary:
Support vector type G_INSERT legalization/selection.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33956

llvm-svn: 305989
2017-06-22 09:43:35 +00:00
Florian Hahn b489e56ae2 [ARM] Add macro fusion for AES instructions.
Summary:
This patch adds a macro fusion using CodeGen/MacroFusion.cpp to pair AES
instructions back to back and adds FeatureFuseAES to enable the feature.

Reviewers: evandro, javed.absar, rengolin, t.p.northover

Reviewed By: javed.absar

Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34142

llvm-svn: 305988
2017-06-22 09:39:36 +00:00
Elena Demikhovsky 2dac0b4d58 AVX-512: Lowering Masked Gather intrinsic - fixed a bug
Masked gather for vector length 2 is lowered incorrectly for element type i32.
The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD.
The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal.

In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well.

Differential revision: https://reviews.llvm.org/D34343

llvm-svn: 305987
2017-06-22 06:47:41 +00:00
Sam Kolton 3c4933fcc6 [AMDGPU] SDWA: add support for GFX9 in peephole pass
Summary:
Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers.
Added several subtarget features for GFX9 SDWA.
This diff also contains changes from D34026.
Depends D34026

Reviewers: vpykhtin, rampitec, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34241

llvm-svn: 305986
2017-06-22 06:26:41 +00:00
Davide Italiano 7a6c5c12ad Revert "[Target] Implement the ".rdata" MIPS assembly directive."
This reverts commit r305949 and r305950 as they didn't have the
correct commit message.

llvm-svn: 305973
2017-06-22 00:11:41 +00:00
Stanislav Mekhanoshin 3ed38c601a [AMDGPU] Add FP_CLASS to the add/setcc combine
This is one of the nodes which also compile as v_cmp_*.

Differential Revision: https://reviews.llvm.org/D34485

llvm-svn: 305970
2017-06-21 23:46:22 +00:00
Stanislav Mekhanoshin a8b26936d0 [AMDGPU] Combine add and adde, sub and sube
If one of the arguments of adde/sube is zero we can fold another
add/sub into it.

Differential Revision: https://reviews.llvm.org/D34374

llvm-svn: 305964
2017-06-21 22:30:01 +00:00
Stanislav Mekhanoshin e3eb42cef6 [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setcc
This simplification allows to avoid generating v_cndmask_b32
to serialize condition code between compare and use.

Differential Revision: https://reviews.llvm.org/D34300

llvm-svn: 305962
2017-06-21 22:05:06 +00:00
Nirav Dave 6919b9e9f0 Add Aarch64 ldst-opt test.
llvm-svn: 305951
2017-06-21 20:50:07 +00:00
Davide Italiano cae62546ac [Target/Mips] Add test associated with r305949.
llvm-svn: 305950
2017-06-21 20:42:34 +00:00
Davide Italiano 9b8e3d308f [Solaris] emit .init_array instead of .ctors on Solaris (Sparc/x86)
Patch by Fedor Sergeev.

Differential Revision:  https://reviews.llvm.org/D33868

llvm-svn: 305948
2017-06-21 20:36:32 +00:00
Krzysztof Parzyszek fd048cc0ec [Hexagon] Handle more types of immediate operands in expand-condsets
llvm-svn: 305943
2017-06-21 19:21:30 +00:00
Lei Huang 84dbbfdeb9 [PowerPC] define target hook isReallyTriviallyReMaterializable()
Define target hook isReallyTriviallyReMaterializable() to explicitly specify
PowerPC instructions that are trivially rematerializable.  This will allow
the MachineLICM pass to accurately identify PPC instructions that should always
be hoisted.

Differential Revision: https://reviews.llvm.org/D34255

llvm-svn: 305932
2017-06-21 17:17:56 +00:00
Christof Douma 1ee68828b2 [AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics.
Added test file for ARMv8.1 LSE Atomics that I forgot to include in
commit r305893.

Patch by Ananth Jasty.

Differential Revision: https://reviews.llvm.org/D33586

Change-Id: Ic1ad8ed87c1b584c4c791b459a686c866a3c3087
llvm-svn: 305918
2017-06-21 15:18:39 +00:00
Simon Pilgrim 550cb7e82c [X86][SSE] Dropped -mcpu from 256-bit vector shuffle tests
Use triple and attribute only for consistency 

llvm-svn: 305916
2017-06-21 14:51:23 +00:00
Simon Pilgrim 9d0c2b7bad [X86][SSE] Dropped -mcpu from 128-bit vector shuffle tests
Use triple and attribute only for consistency 

llvm-svn: 305913
2017-06-21 14:23:02 +00:00
Simon Pilgrim 5309b7d5c9 [X86][SSE] Regenerate merge store tests
llvm-svn: 305910
2017-06-21 13:46:42 +00:00
Simon Pilgrim e74e08fe61 [X86][SSE] Dropped -mcpu from vector blend shuffle tests and regenerate
Use triple and attribute only for consistency 

llvm-svn: 305909
2017-06-21 13:45:33 +00:00
Simon Pilgrim 98aab7c6fc [X86][SSE] Dropped -mcpu from vector shuffle tests
Use triple and attribute only for consistency 

llvm-svn: 305908
2017-06-21 13:26:52 +00:00
Simon Pilgrim 6d5d6b542b [X86][SSE] Dropped -mcpu from vector zero extend tests
Use triple and attribute only for consistency 

llvm-svn: 305907
2017-06-21 13:17:14 +00:00
Simon Pilgrim c388ec32e0 [X86][SSE] Dropped -mcpu from variable shuffle tests
Use triple and attribute only for consistency 

llvm-svn: 305906
2017-06-21 13:15:41 +00:00
Simon Pilgrim 73814a2594 [X86][AVX] Add AVX1 shuffle truncation tests
llvm-svn: 305905
2017-06-21 12:58:56 +00:00
Simon Pilgrim db6c3fa872 [X86][SSE] Add SSE2/SSE42 shuffle truncation tests
llvm-svn: 305904
2017-06-21 12:58:19 +00:00
Zvi Rackover 845ca8fba9 [X86] Rerun the update_llc_test_checks tool on test. NFC.
llvm-svn: 305897
2017-06-21 11:21:43 +00:00
Strahinja Petrovic d280ea4f76 [MIPS] Fix for selecting of DINS/INS instruction
This patch adds one more condition in selection DINS/INS
instruction, which fixes MultiSource/Applications/JM/ldecod/
for mips32r2 (and mips64r2 n32 abi).

Differential Revision: https://reviews.llvm.org/D33725

llvm-svn: 305888
2017-06-21 09:25:51 +00:00
Florian Hahn 80e485179e [AArch64] Preserve register flags when promoting a load from store.
Summary:
This patch updates promoteLoadFromStore to use the store MachineOperand as the
source operand of the of the new instruction instead of creating a new
register MachineOperand. This way, the existing register flags are
preserved. 

This fixes PR33468 (https://bugs.llvm.org/show_bug.cgi?id=33468). 


Reviewers: MatzeB, t.p.northover, junbuml

Reviewed By: MatzeB

Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34402

llvm-svn: 305885
2017-06-21 08:47:23 +00:00
Guy Blank 52d73fce85 [DAGCombiner] Add another combine from build vector to shuffle
Add support for combining a build vector to a shuffle.
When the build vector is of extracted elements from 2 vectors (vec1, vec2) where vec2 is 2 times smaller than vec1.

llvm-svn: 305883
2017-06-21 07:38:41 +00:00
Dean Michael Berris 28ecff5cf1 [XRay] Reduce synthetic references emitted by XRay
Summary:
When we're building with XRay instrumentation, we use a trick that
preserves references from the function to a function sled index. This
index table lives in a separate section, and without this trick the
linker is free to garbage-collect this section and all the segments it
refers to. Until we're able to tell the linkers to preserve these
sections, we use this reference trick to keep around both the index and
the entries in the instrumentation map.

Before this change we emitted both a synthetic reference to the label in
the instrumentation map, and to the entry in the function map index.
This change removes the first synthetic reference and only emits one
synthetic reference to the index -- the index entry has the references
to the labels in the instrumentation map, so the linker will still
preserve those if the function itself is preserved.

This reduces the amount of synthetic references we emit from 16 bytes to
just 8 bytes in x86_64, and similarly to other platforms.

Reviewers: dblaikie

Subscribers: javed.absar, kpw, pelikan, llvm-commits

Differential Revision: https://reviews.llvm.org/D34340

llvm-svn: 305880
2017-06-21 06:39:42 +00:00
Serguei Katkov 0b0dc57dd8 [ImplicitNullChecks] Uphold an invariant in areMemoryOpsAliased
Right now areMemoryOpsAliased has an assertion justified as:

MMO1 should have a value due it comes from operation we'd like to use
as implicit null check.
assert(MMO1->getValue() && "MMO1 should have a Value!");
However, it is possible for that invariant to not be upheld in the
following situation (conceptually):

Null check %RAX
NotNullSucc:

%RAX = LEA %RSP, 16            // I0
%RDX = MOV64rm %RAX            // I1
With the current code, we will have an early exit from
ImplicitNullChecks::isSuitableMemoryOp on I0 with SR_Unsuitable.
However, I1 will look plausible (since it loads from %RAX) and
will go ahead and call areMemoryOpsAliased(I1, I0). This will cause
us to fail the assert mentioned above since I1 does not load from an
IR level value and thus is allowed to have a non-Value base address.

The fix is to bail out earlier whenever we see an unsuitable
instruction overwrite PointerReg. This would guarantee that when we
call areMemoryOpsAliased, we're guaranteed to be looking at an
instruction that loads from or stores to an IR level value.

Original Patch Author: sanjoy
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34385

llvm-svn: 305879
2017-06-21 06:38:23 +00:00
Stanislav Mekhanoshin a9d846c6ef [AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32
If there is an immediate operand we shall not shrink V_SUBB_U32
and V_ADDC_U32, it does not fit e32 encoding.

Differential Revison: https://reviews.llvm.org/D34291

llvm-svn: 305840
2017-06-20 20:33:44 +00:00
Aditya Nandakumar c6a419123a [GISel]: Add G_FMA opcode for fused multiply adds
https://reviews.llvm.org/D34372

Reviewed by dsanders

llvm-svn: 305824
2017-06-20 19:25:23 +00:00
Matt Arsenault ff3f912e74 AMDGPU: Do operand folding in program order
Before it was possible to partially fold use instructions
before the defs. After the xor is folded into a copy, the same
mov can end up in the fold list twice, so on the second attempt
it will fail expecting to see a register to fold.

llvm-svn: 305821
2017-06-20 18:56:32 +00:00
Matthias Braun 7a482e2302 RegisterScavenging: Followup to r305625
This does some improvements/cleanup to the recently introduced
scavengeRegisterBackwards() functionality:

- Rewrite findSurvivorBackwards algorithm to use the existing
  LiveRegUnit::accumulateBackward() code. This also avoids the Available
  and Candidates bitset and just need 1 LiveRegUnit instance
  (= 1 bitset).
- Pick registers in allocation order instead of register number order.

llvm-svn: 305817
2017-06-20 18:43:14 +00:00
Matt Arsenault 76858f5a1d AMDGPU: Preserve undef when folding register operands
If the source was a copy of an undef register, this would
produce a read of an undefined register which is a verifier
error.

llvm-svn: 305816
2017-06-20 18:41:31 +00:00
Stanislav Mekhanoshin 465a1ff193 [AMDGPU] Eliminate SGPR to VGPR copy when possible
SGPRs are generally cheaper, so try to use them over VGPRs.

Differential Revision: https://reviews.llvm.org/D34130

llvm-svn: 305815
2017-06-20 18:32:42 +00:00
Matt Arsenault 7f67b35901 AMDGPU: Fix crash with undef vreg input operand
llvm-svn: 305814
2017-06-20 18:28:02 +00:00
Sanjay Patel 0656629b87 [x86] enable CGP memcmp() expansion for 2/4/8 byte sizes
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)

llvm-svn: 305802
2017-06-20 15:58:30 +00:00
Simon Pilgrim 4822b5b649 [X86][SSE] Relax 0/-1 vector element insertion to work for any vector with >=16bit elements
Shuffle lowering/combining now does a good job for 256/512-bit vectors - we don't need to prevent this

llvm-svn: 305801
2017-06-20 15:19:02 +00:00
Tim Northover 208ddc5bdc DAG: correctly legalize UMULO.
We were incorrectly sign extending into the high word (as you would for
SMULO) when legalizing UMULO in terms of a wider full multiplication.

Patch by James Duley.

llvm-svn: 305800
2017-06-20 15:01:38 +00:00
Daniel Sanders a6e2cebf98 [globalisel][tablegen] Add support for COPY_TO_REGCLASS.
Summary:
As part of this
* Emitted instructions now have named MachineInstr variables associated
  with them. This isn't particularly important yet but it's a small step
  towards multiple-insn emission.
* constrainSelectedInstRegOperands() is no longer hardcoded. It's now added
  as the ConstrainOperandsToDefinitionAction() action. COPY_TO_REGCLASS uses
  an alternate constraint mechanism ConstrainOperandToRegClassAction() which
  supports arbitrary constraints such as that defined by COPY_TO_REGCLASS.

Reviewers: ab, qcolombet, t.p.northover, rovka, kristof.beyls, aditya_nandakumar

Reviewed By: ab

Subscribers: javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D33590

llvm-svn: 305791
2017-06-20 12:36:34 +00:00
Simon Pilgrim b4a77fe83a Fixed test name. NFCI.
llvm-svn: 305787
2017-06-20 10:24:06 +00:00
Igor Breger 1dcd5e8dc8 [GlobalISel][X86] Get correct RegClass for given RegBank.
Summary:
In some cases RegClass depends on target feature. Hight (16-31) vector registers exist only if AVX512f available.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: t.p.northover, guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33952

Conflicts:
	test/CodeGen/X86/GlobalISel/select-memop-scalar.mir

llvm-svn: 305784
2017-06-20 09:15:10 +00:00
Igor Breger 14535f0fc2 [GlobalISel] combine not symmetric merge/unmerge nodes.
Summary:
In some cases legalization ends up with not symmetric merge/unmerge nodes.
Transform it to merge/unmerge nodes.

Reviewers: t.p.northover, qcolombet, zvi

Reviewed By: t.p.northover

Subscribers: rovka, kristof.beyls, guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D33626

llvm-svn: 305783
2017-06-20 08:54:17 +00:00
Igor Breger 22ab175658 [GlobalISel][X86] add legalizer mir tests. NFC
llvm-svn: 305781
2017-06-20 08:30:48 +00:00
Alexandros Lamprineas 2b2b420563 [ARM] Support constant pools in data when generating execute-only code.
Resubmission of r305387, which was reverted at r305390. The Address
Sanitizer caught a stack-use-after-scope of a Twine variable. This
is now fixed by passing the Twine directly as a function parameter.

The ARM backend asserts against constant pool lowering when it generates
execute-only code in order to prevent the generation of constant pools in
the text section. It appears that target independent optimizations might
generate DAG nodes that represent constant pools. By lowering such nodes
as global addresses we don't violate the semantics of execute-only code
and also it is guaranteed that execute-only behaves correct with the
position-independent addressing modes that support execute-only code.

Differential Revision: https://reviews.llvm.org/D33773

llvm-svn: 305776
2017-06-20 07:20:52 +00:00