Commit Graph

32702 Commits

Author SHA1 Message Date
Zheng Chen 04377a81ae [Powerpc] set instruction count as lsr first priority of lsr.
On Powerpc, set instruction count as lsr first priority of lsr by default.
Add an option ppc-lsr-no-insns-cost to return back to default lsr cost model.

Reviewed By: steven.zhang, jsji

Differential Revision: https://reviews.llvm.org/D72683
2020-02-16 21:04:55 -05:00
Simon Pilgrim b85df2e185 [X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to PALIGNR 2020-02-16 16:13:26 +00:00
Simon Pilgrim c9c1c2b335 [X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to bit shifts 2020-02-16 16:13:25 +00:00
Sanjay Patel e48b536be6 [x86] form broadcast of scalar memop even with >1 use
The unseen logic diff occurs because MayFoldLoad() is defined like this:

static bool MayFoldLoad(SDValue Op) {
  return Op.hasOneUse() && ISD::isNormalLoad(Op.getNode());
}

The test diffs here all seem ok to me on screen/paper, but it's hard to know
if that will lead to universally better perf for all targets. For example,
if a target implements broadcast from mem as multiple uops, we would have to
weigh the potential reduction of instructions and register pressure vs.
possible increase in number of uops. I don't know if we can make a truly
informed decision on this at compile-time.

The motivating case that I'm looking at in PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024
...resembles the diff in extract-concat.ll, but we're not going to change the
larger example there without at least 1 other fix.

Differential Revision: https://reviews.llvm.org/D74088
2020-02-16 10:32:56 -05:00
Simon Pilgrim 5d22b6a87f [X86] Add test cases showing failure to simplify target shuffles to bit shifts 2020-02-15 23:34:31 +00:00
Simon Pilgrim c1186d50f9 [X86][AVX512] Split AVX512F and AVX512BW shuffle combining tests
Split off shuffle combine tests that use AVX512F intrinsics, so we can test it with/without AVX512BW support.
2020-02-15 22:48:52 +00:00
Fangrui Song 46788a21f9 [X86][AsmPrinter] PrintSymbolOperand: prefer to lower ELF MO_GlobalAddress to .Lfoo$local 2020-02-15 13:45:29 -08:00
Simon Pilgrim 34a054ce71 [X86] combineX86ShuffleChain - add support for combining to X86ISD::ROTLI
Refactors matchShuffleAsBitRotate to allow use by both lowerShuffleAsBitRotate and matchUnaryPermuteShuffle.
2020-02-15 20:04:54 +00:00
Simon Pilgrim 4abbaceea0 [X86] Add test showing failure to combine shuffle to bit rotation 2020-02-15 19:23:00 +00:00
Craig Topper 3f7649799b [X86] Move combineIncDecVector logic from Select to PreprocessISelDAG.
This allows it to work properly with masked inc/dec for avx512. Those
would have a vselect as the root node so didn't get a chance to call
combineIncDecVector.

This also simplifies the logic because we don't have to manage
the topological ordering.
2020-02-15 09:59:12 -08:00
David Green da147ef0a5 [AArch64] Fixup kill flags on BSL generation
This hopefully fixes up the expensive checks bot.
2020-02-15 11:44:23 +00:00
Fangrui Song 6b14814e10 [AsmPrinter] Omit unique ID for .stack_sizes
Follow-up for D74006.
2020-02-14 21:25:06 -08:00
Fangrui Song 895cad1a13 [AsmPrinter][XRay] Omit unique ID for xray_instr_map and xray_fn_idx
Follow-up for D74006.
2020-02-14 21:10:46 -08:00
Diogo Sampaio 8bc790f9e6 [AArch64][FPenv] Update chain of int to fp conversion
Summary:
When using strict fp, it is required to update the
chain when performing integer type promotion of a
operand to a integer to floating point conversion.

Reviewers: craig.topper, john.brawn

Reviewed By: craig.topper

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74597
2020-02-15 05:07:34 +00:00
Fangrui Song f554e27224 [AsmPrinter] Omit unique ID for __patchable_function_entries sections
Follow-up for D74006.

When the integrated assembler is used, we use SHF_LINK_ORDER.  The
linked-to symbol is part of ELFSectionKey, thus we can omit the unique
ID.
2020-02-14 20:54:54 -08:00
Fangrui Song 0fbe221543 [MC][ELF] Make linked-to symbol name part of ELFSectionKey
https://bugs.llvm.org/show_bug.cgi?id=44775

This rule has been implemented by GNU as https://sourceware.org/ml/binutils/2020-02/msg00028.html (binutils >= 2.35)

It allows us to simplify

```
.section .foo,"o",foo,unique,0
.section .foo,"o",bar,unique,1  # different section
```

to

```
.section .foo,"o",foo
.section .foo,"o",bar  # different section
```

We consider the two `.foo` different even if the linked-to symbols foo and bar
are defined in the same section.  This is a deliberate choice so that we don't
need to know the section where foo and bar are defined beforehand.

Differential Revision: https://reviews.llvm.org/D74006
2020-02-14 20:03:04 -08:00
Matt Arsenault 8d8d46b57a AMDGPU/GlobalISel: Fix missing impdef of scc on boolean bit ops 2020-02-14 22:35:30 -05:00
Shiva Chen 1cae2f9d19 [RISCV] Correct the CallPreservedMask for the function call in an interrupt handler
CallPreservedMask is used to describe the register liveness after a
function call. The function call in an interrupt handler should use the same
CallPreservedMask as normal functions. So that only callee save registers
can live through the function call.
2020-02-15 09:14:04 +08:00
Matt Arsenault 65dbdc329f AMDGPU: Don't preserve analyses with div64 IR expansion
The dominator tree needs to be updated, but that isn't handled now.
2020-02-14 20:06:02 -05:00
Matt Arsenault dc3e499dd4 AMDGPU/GlobalISel: Fix G_EXTRACT of 96-bit results
This would assert on an unhandled size in getRegSplitParts.
2020-02-14 15:57:40 -08:00
Matt Arsenault 630b47e518 AMDGPU: Use generated checks for memcpy expansion 2020-02-14 15:57:40 -08:00
Matt Arsenault 60fea2713d AMDGPU/GlobalISel: Improve 16-bit bswap
Match the new DAG behavior and use v_perm_b32 when available. Also
does better on SI/CI by expanding 16-bit swaps. Also fix
non-power-of-2 cases.
2020-02-14 15:57:39 -08:00
Matt Arsenault 9ec668606b AMDGPU: Add option to disable CGP division expansion
The division expansions in AMDGPUCodeGenPrepare can't be relied on for
correctness, since they punt to later optimization and possibly
legalization in some cases. We still need a way to be able to write
tests for the legalizer versions of the expansion. This is mostly for
GlobalISel, since the expected optimzations is expecting aren't
implemented.

The interaction with the flag to expand 64-bit division in the IR is
pretty confusing, but these flags have different purposes.
2020-02-14 11:37:07 -08:00
Sanjay Patel 63ed0eceaf [x86] remove stray test assertions; NFC
I updated the prefix and forgot to manually remove the old names
as part of rG6071fc57a45.f
2020-02-14 14:28:50 -05:00
Sanjay Patel 6071fc57a4 [x86] regenerate complete test checks for sqrt{est}; NFC
The existing checks were trying to test both CPU-specific
codegen and generic codegen with explicit attributes for
the various sqrt estimate possibilities, but that was hard
to decipher and update (D69989).

Instead generate the complete results for various CPUs,
and that makes it clear which models have slow/fast sqrt
attributes along with all of the other potential diffs
(FMA, AVX2, scheduling).

Also, explicitly add the function attributes corresponding
to whether DAZ/FTZ denorm settings are expected.
2020-02-14 14:21:28 -05:00
Matt Arsenault 34d9a16e54 AMDGPU: Add option to expand 64-bit integer division in IR
I didn't realize we were already expanding 24/32-bit division here
already. Use the available IntegerDivision utilities. This uses loops,
so produces significantly smaller code than the inline DAG expansion.

This now requires width reductions of 64-bit divisions before
introducing the expanded loops.

This helps work around missing legalization in GlobalISel for
division, which are the only remaining core instructions that didn't
work at all.

I think this is plausibly a better implementation than exists in the
DAG, although turning it on by default misses out on the constant
value optimizations and also needs benchmarking.
2020-02-14 11:16:08 -08:00
Craig Topper 391cc4dd41 [X86] Use ZERO_EXTEND instead of SIGN_EXTEND in the fast isel handling of convert_from_fp16. 2020-02-14 10:57:12 -08:00
Craig Topper fc0c72b2df [X86] Add AVX512 support to the fast isel code for Intrinsic::convert_from_fp16/convert_to_fp16. 2020-02-14 10:57:11 -08:00
Matt Arsenault bfbfa18591 GlobalISel: Lower s64->s16 G_FPTRUNC
This is more or less directly ported from the AMDGPU custom lowering
for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES
instead of creating shift/trunc to extract the two halves, and zexting
an inverted compare instead of select_cc).

This also does not include the fast math expansion the DAG which
converts to f32 and then to f16. I think that belongs in a
pre-legalize combine instead.
2020-02-14 10:46:58 -08:00
Volkan Keles 187686a22f [GlobalISel] LegalizationArtifactCombiner: Fix a bug in tryCombineMerges
Like COPY instructions explained in D70616, we don't check the constraints
when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check
if registers can be replaced, or a COPY instruction needs to be built.

https://reviews.llvm.org/D70564
2020-02-14 10:45:58 -08:00
Brian Cain bf3b86bc2f [Hexagon] v67+ HVX register pairs should support either direction
Assembler now permits pairs like 'v0:1', which are encoded
differently from the odd-first pairs like 'v1:0'.

The compiler will require more work to leverage these new register
pairs.
2020-02-14 12:43:43 -06:00
Matt Arsenault 8c2c0b3637 AMDGPU: Improve i16/v2i16 bswap 2020-02-14 09:53:22 -08:00
Matt Arsenault e0fd2d6d62 AMDGPU: Add baseline tests for 16-bit bswap 2020-02-14 09:34:13 -08:00
Matt Arsenault a257bde420 AMDGPU/GlobalISel: Handle G_BSWAP 2020-02-14 09:09:44 -08:00
Pavel Iliin b6a9fe2099 [AArch64] Add BIT/BIF support.
This patch added generation of SIMD bitwise insert BIT/BIF instructions.
In the absence of GCC-like functionality for optimal constraints satisfaction
during register allocation the bitwise insert and select patterns are matched
by pseudo bitwise select BSP instruction with not tied def.
It is expanded later after register allocation with def tied
to BSL/BIT/BIF depending on operands registers.
This allows to get rid of redundant moves.

Reviewers: t.p.northover, samparker, dmgreen

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D74147
2020-02-14 14:19:39 +00:00
Simon Pilgrim 2492075add [X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets
Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.

REAPPLIED: Original commit rG11c16e71598d was reverted at rGde1d90299b16 as it wasn't accounting for later lowering. This version emits ROTLI or the OR(VSHLI/VSRLI) directly to avoid the issue.
2020-02-14 11:55:18 +00:00
Kazushi (Jam) Marukawa 60431bd728 [VE] Support for PIC (global data and calls)
Summary: Support for PIC with tests for global variables and function calls.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D74536
2020-02-14 09:50:02 +01:00
Liu, Chen3 ec89335c47 [X86] Fix the bug that _mm_mask_cvtsepi64_epi32 generates result without
zero the upper 64bit.

Differential Revision : https://reviews.llvm.org/D74552
2020-02-14 09:26:06 +08:00
Thomas Lively 918e90559b [WebAssembly] Make stack pointer args inhibit tail calls
Summary:
Also make return calls terminator instructions so epilogues are
inserted before them rather than after them. Together, these changes
make WebAssembly's tail call optimization more stack-safe.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73943
2020-02-13 16:43:53 -08:00
Pavel Iliin b23ec43973 [AArch64][NFC] Update test checks.
This NFC commit updates several llc tests checks by automatically generated ones.
2020-02-14 00:13:15 +00:00
Craig Topper c2e8a421ac [X86] Don't widen 128/256-bit strict compares with vXi1 result to 512-bits on KNL.
If we widen the compare we might trigger a spurious exception from
the garbage data.

We have two choices here. Explicitly force the upper bits to zero.
Or use a legacy VEX vcmpps/pd instruction and convert the XMM/YMM
result to mask register.

I've chosen to go with the second option. I'm not sure which is
really best. In some cases we could get rid of the zeroing since
the producing instruction probably already zeroed it. But we lose
the ability to fold a load. So which is best is dependent on
surrounding code.

Differential Revision: https://reviews.llvm.org/D74522
2020-02-13 13:26:40 -08:00
Thomas Lively e252293d06 [WebAssembly] Add cbrt function signatures
Summary:
Fixes a crash in the backend where optimizations produce calls to the
cbrt runtime functions. Fixes PR 44227.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74259
2020-02-13 13:18:42 -08:00
Matt Arsenault 5adbf7d57f AMDGPU/GlobalISel: Make G_TRUNC legal
This is required to be legal. I'm not sure how we were getting away
without defining any rules for it.
2020-02-13 15:25:52 -05:00
Frederic Bastien 019ab61e25 [NVPTX, LSV] Move the LSV optimization pass to later when the graph is cleaner
This allow it to recognize more loads as being consecutive when the load's address are complex at the start.

Differential Revision: https://reviews.llvm.org/D74444
2020-02-13 12:15:38 -08:00
Matt Arsenault cfa60ff2c7 AMDGPU/GlobalISel: Add missing tests for cmpxchg selection 2020-02-13 10:26:55 -08:00
Yuanfang Chen 4ad7685258 Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""
This reverts commit 80a34ae311 with fixes.

Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
2020-02-13 10:16:06 -08:00
Yuanfang Chen 17122ec10a Revert "Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""""
This reverts commit bb51d24330.
2020-02-13 10:08:05 -08:00
Yuanfang Chen bb51d24330 Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""
This reverts commit 80a34ae311 with fixes.

On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
2020-02-13 10:02:53 -08:00
Matt Arsenault bfe3779459 AMDGPU: Use v_perm_b32 to implement bswap
Also greatly improve i64 lowering. LegalizeIntegerTypes does the
correct narrowing if i64 isn't legal. Just workaround this for
SelectionDAG by making i64 legal and splitting in the patterns.
2020-02-13 09:45:31 -08:00
John Brawn 0ec5797296 [ARM] Fix infinite loop when lowering STRICT_FP_EXTEND
If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND
and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the
current implementation will cause in infinite loop for STRICT_FP_EXTEND due to
emitting a merge_values of the original node which after replacement becomes a
merge_values of itself.

Fix this by not doing anything for f32 to f64 extend when we have FP64, though
for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that
doesn't happen automatically for opcodes with custom lowering.

Differential Revision: https://reviews.llvm.org/D74559
2020-02-13 16:12:50 +00:00