Commit Graph

35824 Commits

Author SHA1 Message Date
Craig Topper f21f835ee8 [X86] Improve demanded bits for X86ISD::BEXTR.
If the control is constant we can figure out exactly which bits
of the input are demanded.

Differential Revision: https://reviews.llvm.org/D88072
2020-09-23 10:51:02 -07:00
Andrew Wei c2deacd929 [AArch64] Fix ldst optimization of non-immediate store offset
When matching store instruction for ldst opt, we should make sure store instr is in 'reg+imm' form as load instr,
otherwise, it will have assertion in isLdOffsetInRangeOfSt since it will use getImm() directly.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87905
2020-09-23 23:00:13 +08:00
Sebastian Neubauer a343b9b032 Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57.

According to michel.daenzer,
> This completely broke the Mesa radeonsi driver on Navi 14. Xorg +
> xterm come up with major corruption & psychedelic colours.
2020-09-23 17:16:39 +02:00
Cameron McInally db40a74344 [SVE] Lower fixed length ISD::VECREDUCE_ADD to Scalable
Differential Revision: https://reviews.llvm.org/D87796
2020-09-23 09:08:07 -05:00
Sam Parker 00c34f72fb [NFC][ARM] Pre-commit tail predication test 2020-09-23 14:29:26 +01:00
Matt Arsenault c463fd136e GlobalISel: Fix truncating shift amount in trunc (shl) combine
The shift amount type does not necessarily match the result type. This
was inserting a trunc from s32 to s32, which asserted. Just preserve
the original shift amount type which can be legalized later.
2020-09-23 09:07:50 -04:00
Matt Arsenault af0207f2ba AMDGPU: Check global FP atomics match default FP mode
We would always select global FP atomics from atomicrmw fadd, although
they have a hardcoded FP mode.
2020-09-23 09:07:50 -04:00
Kerry McLaughlin d0149ba9b4 [SVE][CodeGen] Lower legal integer -> floating point conversions
This patch adds new ISD nodes, SCVTZ_MERGE_PASSTHRU &
UCVTZ_MERGE_PASSTHRU, which are used to lower both legal
scalable vector [S|U]INT_TO_FP operations and the following intrinsics:
 - llvm.aarch64.sve.scvtf
 - llvm.aarch64.sve.ucvtf

Reviewed By: sdesmalen, efriedma

Differential Revision: https://reviews.llvm.org/D87913
2020-09-23 11:53:53 +01:00
Sebastian Neubauer ca907bfb57 [AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the
caller or the callee can insert a waitcnt to ensure that all reads are
finished.
Calls need some time to be executed, so if the callee inserts the
waitcnt, filling the instruction buffer and waiting for memory will be
interleaved, hiding some latency. This comes at the cost of having a
waitcnt inside functions that may not be needed as no memory operations
are outstanding.

For function calls, this is already implemented. The same principal
applies to returns: If the caller inserts a waitcnt after the call, the
callee does not have to wait and the return and memory operation can be
run in parallel.

This commit implements waiting in the caller after returning from a
function call.

Differential Revision: https://reviews.llvm.org/D87674
2020-09-23 12:17:59 +02:00
Piotr Sobczak 8d7fd73c3a [AMDGPU] Fix merging m0 inits
Fix incorrect merges of m0 inits in loops.

It was assumed that if a clobbering instruction appears in
the same block as an init and the clobbering instruction
does not dominate the init then it does not interfere with
init.

This does not work in the presence of loops, where in this
scenario, the clobbering instruction does interfere with
the init in another iteration.

To fix this, do not check for block equality and defer the
decision to the predecessor check.

Differential Revision: https://reviews.llvm.org/D87882
2020-09-23 09:13:43 +02:00
Albion Fung d7eb917a7c [PowerPC] Implementation of 128-bit Binary Vector Mod and Sign Extend builtins
This patch implements 128-bit Binary Vector Mod and Sign Extend builtins for PowerPC10.

Differential: https://reviews.llvm.org/D87394#inline-815858
2020-09-23 01:18:14 -05:00
Philip Reames e1a3271ebb [AArch64] Teach analyzeBranch to remove branch equivelent to fallthrough
The motivation here is that MachineBlockPlacement relies on analyzeBranch to remove branches to fallthrough blocks when the branch is not fully analyzeable. With the introduction of the FAULTING_OP psuedo for implicit null checking (see D87861), this case becomes important. Note that it's hard to otherwise exercise this path as BranchFolding handle's any fully analyzeable branch sequence without using this interface.

p.s. For anyone who saw my comment in the original review, what I thought was an issue in BranchFolding originally turned out to simply be a bug in my patch. (Now fixed.)

Differential Revision: https://reviews.llvm.org/D88035
2020-09-22 14:38:27 -07:00
Congzhe Cao 4edb3d3646 [AArch64] Avoid pairing loads with same result reg
When pairing ldr instructions to an ldp instruction, we cannot pair two ldr
destination registers where one is a sub or super register of the other.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D86906
2020-09-22 16:25:08 -04:00
Amy Kwan 079757b551 [PowerPC] Implement Vector String Isolate Builtins in Clang/LLVM
This patch implements the vector string isolate (predicate and non-predicate
versions) builtins. The predicate builtins are custom selected within PPCISelDAGToDAG.

Differential Revision: https://reviews.llvm.org/D87671
2020-09-22 11:31:44 -05:00
Amy Kwan b3147058de [PowerPC] Implement the 128-bit Vector Divide Extended Builtins in Clang/LLVM
This patch implements the 128-bit vector divide extended builtins in Clang/LLVM.
These builtins map to the vdivesq and vdiveuq instructions respectively.

Differential Revision: https://reviews.llvm.org/D87729
2020-09-22 11:31:44 -05:00
Michael Liao 534f6e1718 [PeepholeOptimizer] Enhance the redundant COPY elimination.
- Eliminate redundant COPYs from the same register & subregister pair.

Differential Revision: https://reviews.llvm.org/D87939
2020-09-22 10:11:37 -04:00
Stefan Pintilie 7e78d89052 [PowerPC] Fix for compiler side issue in PCRelative Local Exec
Stop combining loads and stores with PPCISD::ADD_TLS before we can merge the
node with with TLS_LOCAL_EXEC_MAT_ADDR. The issue is that
TLS_LOCAL_EXEC_MAT_ADDR cannot be selected by itself and requires the previous
ADD_TLS node that goes with it. However, we sometimes try to combine ADD_TLS
with loads and stores that come after it. If this happens then the ADD_TLS is
removed and TLS_LOCAL_EXEC_MAT_ADDR cannot be selected.

While this bug fix will address the issue it my not be ideal from a performance
perspective as we may be able to add patterns to combine TLS_LOCAL_EXEC_MAT_ADDR
with ADD_TLS with the load and store that comes after it all in one. However,
this is beyond the scope of this patch.

Reviewed By: NeHuang

Differential Revision: https://reviews.llvm.org/D88030
2020-09-22 08:28:06 -05:00
Meera Nakrani a3d0dce260 [ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.

Differential Revision: https://reviews.llvm.org/D87457
2020-09-22 11:54:10 +00:00
Esme-Yi c7ff6e0fe1 [NFC][PowerPC]Add tests for multiply-by-constant. 2020-09-22 10:25:02 +00:00
Jay Foad 892ef2e3c0 [AMDGPU] More codegen patterns for v2i16/v2f16 build_vector
It's simpler to do this at codegen time than to do ad-hoc constant
folding of machine instructions in SIFoldOperands.

Differential Revision: https://reviews.llvm.org/D88028
2020-09-22 10:41:38 +01:00
Sam Parker b4fa884a73 [ARM] Improve VPT predicate tracking
The VPTBlock has been modified to track the 'global' state of the
VPR, as well as the state for each block. Each object now just holds
a list of instructions that makeup the block, while static structures
hold the predicate information. This enables global access for
querying how both a VPT block and individual instructions are
predicated. These changes now allow us, again, to handle more
complicated cases where multiple instructions build a predicate
and/or where the same predicate in used in multiple blocks.

It doesn't, however, get us back to before the tracking was 'fixed'
as some extra logic will be required to properly handle VPT
instructions. Currently a VPT could be effectively predicated because
of it's inputs, but the existing logic will not detect that and so
will refuse to perform the transformation. This can be seen in
remat-vctp.ll test where we still don't perform the transform.

Differential Revision: https://reviews.llvm.org/D87681
2020-09-22 10:40:27 +01:00
Muhammad Omair Javaid 73a6a164b8 Revert "Reapply Revert "RegAllocFast: Rewrite and improve""
This reverts commit 55f9f87da2.

Breaks following buildbots:
http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4306
http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9154
2020-09-22 14:40:06 +05:00
Sam Parker a0c1dcc318 [ARM] Remove MVEDomain from VLDR/STR of P0
Remove the domain from the instructions and create a shouldInspect
helper for LowOverheadLoops which queries it or a vpr operand.

Differential Revision: https://reviews.llvm.org/D87900
2020-09-22 09:05:50 +01:00
Amara Emerson e3f5046e44 [AArch64][GlobalISel] Merge selection of vector-vector G_ASHR/G_LSHR and support more cases.
The vector-immediate cases are handled elsewhere in an earlier commit.
2020-09-21 16:04:52 -07:00
Amara Emerson a513fdec90 [AArch64][GlobalISel] Add a post-legalize combine for lowering vector-immediate G_ASHR/G_LSHR.
In order to select the immediate forms using the imported patterns, we need to
lower them into new G_VASHR/G_VLSHR target generic ops. Add a combine to do this
matching build_vector of constant operands.

With this, we get selection for free.
2020-09-21 16:04:52 -07:00
Amara Emerson 825203daae [AArch64][GlobalISel] Make <4 x s16> G_ASHR and G_LSHR legal.
Selection support for these is coming up.
2020-09-21 15:32:48 -07:00
Martin Storsjö 36c64af9d7 [CodeGen] [WinException] Only produce handler data at the end of the function if needed
If we are going to write handler data (that is written as variable
length data following after the unwind info in .xdata), we need to
emit the handler data immediately, but for cases where no such
info is going to be written, skip emitting it right away. (Unwind
info for all remaining functions that hasn't gotten it emitted
directly is emitted at the end.)

This does slightly change the ordering of sections (triggering a
bunch of updates to DebugInfo/COFF tests), but the change should be
benign.

This also matches GCC's assembly output, which doesn't output
.seh_handlerdata unless it actually is needed.

For ARM64, the unwind info can be packed into the runtime function
entry itself (leaving no data in the .xdata section at all), but
that can only be done if there's no follow-on data in the .xdata
section. If emission of the unwind info is triggered via
EmitWinEHHandlerData (or the .seh_handlerdata directive), which
implicitly switches to the .xdata section, there's a chance of the
caller wanting to pass further data there, so the packed format
can't be used in that case.

Differential Revision: https://reviews.llvm.org/D87448
2020-09-21 23:42:59 +03:00
Matt Arsenault 55f9f87da2 Reapply Revert "RegAllocFast: Rewrite and improve"
This reverts commit dbd53a1f0c.

Needed lldb test updates
2020-09-21 15:45:27 -04:00
Momchil Velikov 742250bf62 [ARM][CMSE] Issue an error if passing arguments through memory across
security boundary

It was never supported and that part was accidentally omitted when
upstreaming D76518.

Differential Revision: https://reviews.llvm.org/D86478

Change-Id: If6ba9506eb0431c87a1d42a38aa60e47ce263039
2020-09-21 17:26:10 +01:00
Denis Antrushin ee86688b81 [Statepoints][ISEL] gc.relocate uniquification should be based on SDValue, not IR Value.
When exporting statepoint results to virtual registers we try to avoid
generating exports for duplicated inputs. But we erroneously use
IR Value* to check if inputs are duplicated. Instead, we should use
SDValue, because even different IR values can get lowered to the same
SDValue.
I'm adding a (degenerate) test case which emphasizes importance of this
feature for invoke statepoints.
If we fail to export only unique values we will end up with something
like that:

  %0 = STATEPOINT
  %1 = COPY %0

landing_pad:
  <use of %1>

And when exceptional path is taken, %1 is left uninitialized (COPY is never
execute).

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87695
2020-09-21 19:44:46 +07:00
Paul Walker f3fa954b5b [SVE] Change definition of reduction ISD nodes to have an SVE vector result type.
The current nodes, AArch64::SMAXV_PRED for example, are defined to
return a NEON vector result.  This is incorrect because they modify
the complete SVE register and are thus changed to represent such.

This patch also adds nodes for UADDV_PRED and SADDV_PRED, which
unifies the handling of all SVE reductions.

NOTE: Floating-point reductions are already implemented correctly,
so this patch is essentially making everything consistent with those.

Differential Revision: https://reviews.llvm.org/D87843
2020-09-21 13:16:28 +01:00
Paul Walker 6457455248 [SVE] Use NEON for extract_vector_elt when the index is in range.
Patch also adds missing patterns for unpacked vector types and
extracts of element zero.

Differential Revision: https://reviews.llvm.org/D87842
2020-09-21 13:12:28 +01:00
David Green f4c5cadbcb [ARM] Select f32 constants with vmov.f16
This adds lowering for f32 values using the vmov.f16, which zeroes the
top bits whilst setting the lower bits to a pattern. This range of
values does not often come up, except where a f16 constant value has
been converted to a f32.

Differential Revision: https://reviews.llvm.org/D87790
2020-09-21 11:10:47 +01:00
Sam Parker 13c73632c7 [NFC][ARM] More tail predication tests.
Add mir tests for use/def of P0.
2020-09-21 10:58:05 +01:00
Lucas Prates 53d238a961 [CodeGen] Fixing inconsistent ABI mangling of vlaues in SelectionDAGBuilder
SelectionDAGBuilder was inconsistently mangling values based on ABI
Calling Conventions when getting them through copyFromRegs in
SelectionDAGBuilder, causing duplicate value type convertions for
function arguments. The checking for the mangling requirement was based
on the value's originating instruction and was performed outside of, and
inspite of, the regular Calling Convention Lowering.

The issue could be observed in a scenario such as:

```
%arg1 = load half, half* %const, align 2
%arg2 = call fastcc half @someFunc()
call fastcc void @otherFunc(half %arg1, half %arg2)
; Here, %arg2 was incorrectly mangled twice, as the CallConv data from
; the call to @someFunc() was taken into consideration for the check
; when getting the value for processing the call to @otherFunc(...),
; after the proper convertion had taken place when lowering the return
; value of the first call.
```

This patch fixes the issue by disregarding the Calling Convention
information for such copyFromRegs, making sure the ABI mangling is
properly contanined in the Calling Convention Lowering.

This fixes Bugzilla #47454.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87844
2020-09-21 10:05:34 +01:00
Qiu Chaofan 1d782c2987 [PowerPC] Pass nofpexcept flag to custom lowered constrained ops
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.

This is also seen on X86 target.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87390
2020-09-21 10:44:25 +08:00
Fangrui Song d06485685d [XRay] Change mips to use version 2 sled (PC-relative address)
Follow-up to D78590. All targets use PC-relative addresses now.

Reviewed By: atanasyan, dberris

Differential Revision: https://reviews.llvm.org/D87977
2020-09-20 17:59:57 -07:00
Craig Topper a74b1faba2 [X86] Make reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore work for avx512 after type legalization.
The scalar elements of the vXi1 build_vector will have been type legalized to i8 by padding with 0s. So we can't check for all ones. Instead we should just look at bit 0 of the constant.

Differential Revision: https://reviews.llvm.org/D87863
2020-09-20 13:54:20 -07:00
Craig Topper c89b3af0e3 [X86] Pre-commit test cases for D87863. NFC 2020-09-20 13:53:05 -07:00
Craig Topper 4e8c028158 [X86] Stop reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore from creating scalar i64 load/stores in 32-bit mode
If we emit a scalar i64 load/store it will get type legalized to two i32 load/stores.

Differential Revision: https://reviews.llvm.org/D87862
2020-09-20 13:46:59 -07:00
Craig Topper 9b1c98c0fb [X86] Add 32-bit command lines to masked_store.ll and masked_load.ll 2020-09-20 13:46:59 -07:00
David Green 29bd8ea110 [ARM] Constant fold VMOVrh
This adds simple constant folding for VMOVrh, to constant fold fp16
constants to integer values. It can help especially with soft calling
conventions, but some of the results are not optimal as we end up
loading using a vldr. This will be improved in a follow up patch.

Differential Revision: https://reviews.llvm.org/D87789
2020-09-20 21:32:51 +01:00
Simon Pilgrim 0bfeede669 [X86][SSE] Fold EXTEND_VECTOR_INREG(EXTRACT_SUBVECTOR(EXTEND(X),0)) -> EXTEND_VECTOR_INREG(X) 2020-09-20 18:39:12 +01:00
Simon Pilgrim bb0078e591 [X86][SSE] Fold SIGN_EXTEND(SIGN_EXTEND_VECTOR_INREG(X)) -> SIGN_EXTEND_VECTOR_INREG(X)
It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp
2020-09-20 18:39:12 +01:00
Simon Pilgrim 15c8306056 [X86][SSE] Fold EXTEND_VECTOR_INREG(EXTEND_VECTOR_INREG(X)) -> EXTEND_VECTOR_INREG(X)
It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp
2020-09-20 16:33:02 +01:00
Simon Pilgrim a0c8793ce6 [X86][SSE] Enable ZERO_EXTEND_VECTOR_INREG shuffle combining on SSE41 targets.
Allows ZERO_EXTEND_VECTOR_INREG to be shuffle combined on all targets where it is legal.
2020-09-20 16:05:10 +01:00
Amara Emerson 5a50f8b39f [AArch64][GlobalISel] Add legalization and selection support for <4 x s16> G_SHL. 2020-09-18 23:32:01 -07:00
Craig Topper 58ecbbcdcd [X86] Fix copy paste mistake in @ccnp flag.
We were treating @ccp and @ccnp the same.
2020-09-18 21:28:01 -07:00
Craig Topper 5e6baf78e5 [X86] Invert the compares in inline-asm-flag-output.ll so that the setcc instruction condition matches the test name. NFC
Also add nounwind to the tests to remove cfi directives.
2020-09-18 21:23:53 -07:00
Eric Christopher dbd53a1f0c Temporarily Revert "RegAllocFast: Rewrite and improve"
as it's breaking a few tests in the lldb test suite.

Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio

This reverts commit c8757ff3aa.
2020-09-18 18:11:21 -07:00