Commit Graph

51809 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin 93f15c922f [AMDGPU] gfx1010 loop alignment
Differential Revision: https://reviews.llvm.org/D61529

llvm-svn: 359935
2019-05-03 21:17:29 +00:00
Mandeep Singh Grang 5dc8aeb26d [COFF, ARM64] Fix ABI implementation of struct returns
Summary:
Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values

Related clang patch: D60349

Reviewers: rnk, efriedma, TomTan, ssijaric

Reviewed By: rnk, efriedma

Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60348

llvm-svn: 359934
2019-05-03 21:12:36 +00:00
Brian Cain 3428c9daef [hexagon] change AsmParser assertion to error
For immediates that can't be evaluated in assembler-mapped instructions, we
should return 'invalid operand' instead of assert.

llvm-svn: 359905
2019-05-03 16:50:38 +00:00
Craig Topper a8f3840c62 [X86] Allow assembly parser to accept x/y/z suffixes on non-memory vfpclassps/pd and on memory forms in intel syntax
The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned.

But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas.

The printing code will still only use the suffix for the memory form where it is needed.

llvm-svn: 359903
2019-05-03 16:15:15 +00:00
Simon Pilgrim b323d5ec7c [X86] LowerToHorizontalOp - Tidyup calls to getHopForBuildVector. NFCI.
Merge the if() tests for the various HADD/SUB + Subtarget tests

llvm-svn: 359901
2019-05-03 15:56:06 +00:00
Matt Arsenault 657ef48a88 AMDGPU: Select VOP3 form of sub
The VOP3 form should always be the preferred selection form to be
shrunk later.

The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 359899
2019-05-03 15:37:07 +00:00
Matt Arsenault cfd0ca38b0 AMDGPU: Support shrinking add with FI in SIFoldOperands
Avoids test regression in a future patch

llvm-svn: 359898
2019-05-03 15:21:53 +00:00
Matt Arsenault 344d68d3c9 AMDGPU: Remove redundant patterns for shifts
llvm-svn: 359895
2019-05-03 15:08:36 +00:00
Matt Arsenault ada33314a2 AMDGPU: Remove redundant patterns for sub
There were 2 patterns for sub, one selecting to sub and one to
subrev. Only one of these will succeed, so remove the reversed one.

llvm-svn: 359894
2019-05-03 15:08:35 +00:00
Matt Arsenault 0446fbe45e AMDGPU: Replace shrunk instruction with dummy implicit_def
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.

Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.

llvm-svn: 359891
2019-05-03 14:40:10 +00:00
Simon Pilgrim bfdd0f75a8 [X86] Remove repeated variables. NFCI.
llvm-svn: 359889
2019-05-03 14:37:00 +00:00
Simon Pilgrim aa49be4926 Avoid cppcheck operator precedence warnings. NFCI.
Prefer ((X & Y) ? A : B) to (X & Y ? A : B)

llvm-svn: 359884
2019-05-03 13:50:38 +00:00
Matt Arsenault 2c8936fd26 AMDGPU: Fix incorrect commute with sub when folding immediates
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.

llvm-svn: 359883
2019-05-03 13:42:56 +00:00
Simon Pilgrim a359ef192b [X86] LowerMULH - remove unused Lo/Hi vector indices. NFCI.
Leftover from before we had the extract128BitVector helpers.

llvm-svn: 359871
2019-05-03 10:32:07 +00:00
Simon Pilgrim 88f9117168 Reduce variable scope to just the if() block its actually used in. NFCI.
llvm-svn: 359869
2019-05-03 10:13:41 +00:00
Craig Topper d724360695 [X86] Add more one checks to masked compare patterns that were missed in r358358.
This covers the patterns we use for widening 128/256 comparisons to 512-bit when
AVX512VL isn't supported.

llvm-svn: 359863
2019-05-03 07:14:05 +00:00
Eli Friedman 7238353848 [AArch64][MC] Reject "add x0, x1, w2, lsl #1" etc.
Looks like just a minor oversight in the parsing code.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41504.

Differential Revision: https://reviews.llvm.org/D60840

llvm-svn: 359855
2019-05-03 00:59:52 +00:00
Craig Topper bf29238e1a [X86] Remove LEA16r references from X86FixupLEAs. NFCI
As far as I know, we never emit LEA16r

llvm-svn: 359840
2019-05-02 22:46:23 +00:00
Craig Topper e1e38d4248 [X86] Correct the register class for specific mask register constraints in getRegForInlineAsmConstraint when the VT is a scalar type
The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes.

Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying.

This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg.

Fixes PR41678

Differential Revision: https://reviews.llvm.org/D61453

llvm-svn: 359837
2019-05-02 22:26:40 +00:00
Evandro Menezes 111df108e6 [AArch64] Update for Exynos
Fix the forwarding of multiplication results for Exynos M4.

llvm-svn: 359834
2019-05-02 22:01:39 +00:00
Craig Topper 47d8865a38 [X86] Remove string literal from an if. NFC
This if used to be an assert that got refactored into an if, but left the string literal behind.

Fixes PR41718

llvm-svn: 359833
2019-05-02 21:57:18 +00:00
Sanjay Patel 284472be6d [SelectionDAG] remove constant folding limitations based on FP exceptions
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.

There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.

Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.

Differential Revision: https://reviews.llvm.org/D61331

llvm-svn: 359791
2019-05-02 14:47:59 +00:00
Simon Pilgrim df8daf0ef4 [X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+fold
Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway.

Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result:
https://godbolt.org/z/0R-U-K

Differential Revision: https://reviews.llvm.org/D61426

llvm-svn: 359786
2019-05-02 14:00:55 +00:00
Simon Pilgrim 9fa56f7829 [X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI.
Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921

llvm-svn: 359782
2019-05-02 12:18:24 +00:00
Diana Picus 1136ea2d44 [ARM GlobalISel] Fixup r359768
Get rid of local variable used only in assertion.

llvm-svn: 359772
2019-05-02 10:08:29 +00:00
Diana Picus 06a61ccc42 [ARM GlobalISel] Select extensions to < 32 bits
Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in
the exact same way as 32 bits. This overwrites the higher bits, but that
should be ok since all legal users of types smaller than 32 bits ignore
those bits anyway.

llvm-svn: 359768
2019-05-02 09:28:00 +00:00
Diana Picus 53bcf6f2e7 [ARM GlobalISel] Legalize extensions to < 32 bits
Make it legal to extend from e.g. s1 to s8 or s16.

llvm-svn: 359766
2019-05-02 09:21:46 +00:00
Kang Zhang 1a0d6d6899 [NFC][PowerPC] Return early if the element type is not byte-sized in combineBVOfConsecutiveLoads
Summary:
Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61076

llvm-svn: 359764
2019-05-02 08:15:13 +00:00
Stanislav Mekhanoshin 64399da8b8 [AMDGPU] gfx1010 lost VOP2 forms of some add/sub
Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32.

Differential Revision:

llvm-svn: 359757
2019-05-02 04:26:35 +00:00
Stanislav Mekhanoshin 5cf8167735 [AMDGPU] gfx1010 allows VOP3 to have a literal
Differential Revision: https://reviews.llvm.org/D61413

llvm-svn: 359756
2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin f2baae0abb [AMDGPU] gfx1010 constant bus limit
Constant bus limit has increased to 2 with GFX10.

Differential Revision: https://reviews.llvm.org/D61404

llvm-svn: 359754
2019-05-02 03:47:23 +00:00
Craig Topper b929a0062e [X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variant
The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template.

Patch by Pengfei Wang

Differential Revision: https://reviews.llvm.org/D61295

llvm-svn: 359753
2019-05-02 03:25:50 +00:00
Jessica Paquette a3843fe6f4 [GlobalISel][AArch64] Use fmov for G_FCONSTANT when possible
This adds support for using fmov rather than a standard mov to materialize
G_FCONSTANT when it's safe to do so.

Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the
selection is correct.

llvm-svn: 359734
2019-05-01 22:39:43 +00:00
Simon Pilgrim 9f04d97cd7 [X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractions
We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector).

Differential Revision: https://reviews.llvm.org/D61263

llvm-svn: 359707
2019-05-01 17:13:35 +00:00
Stanislav Mekhanoshin 3b7925f035 [AMDGPU] gfx1010 GCNRegBankReassign pass
Reassign registers to reduce register bank conflicts.

Differential Revision: https://reviews.llvm.org/D61344

llvm-svn: 359704
2019-05-01 16:49:31 +00:00
Stanislav Mekhanoshin c29d491596 [AMDGPU] gfx1010 GCNNSAReassign pass
Convert NSA into non-NSA images.

Differential Revision: https://reviews.llvm.org/D61341

llvm-svn: 359700
2019-05-01 16:40:49 +00:00
Stanislav Mekhanoshin 692560dc98 [AMDGPU] gfx1010 MIMG implementation
Differential Revision: https://reviews.llvm.org/D61339

llvm-svn: 359698
2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin a224f68a10 [AMDGPU] gfx1010 DS implementation
Differential Revision: https://reviews.llvm.org/D61332

llvm-svn: 359696
2019-05-01 16:11:11 +00:00
Simon Pilgrim f5bdff7747 Fix 80 column violation. NFCI.
llvm-svn: 359694
2019-05-01 16:01:49 +00:00
Simon Pilgrim 6711b9699a [X86][SSE] Add demanded elts support X86ISD::PMULDQ\PMULUDQ
Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode

llvm-svn: 359686
2019-05-01 14:50:50 +00:00
Simon Pilgrim 3d6899e369 [X86][SSE] Add SSE vector shift support to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359680
2019-05-01 13:51:09 +00:00
Simon Pilgrim ba372c6e62 [X86][SSE] Split 512-bit -> 128-bit vector directly in SimplifyDemandedVectorEltsForTargetNode
llvm-svn: 359678
2019-05-01 12:48:42 +00:00
Simon Pilgrim 951a6b4579 [X86][SSE] Add 512-bit vector support to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359677
2019-05-01 12:37:41 +00:00
Simon Pilgrim 37c2419cc7 [X86][SSE] Add X86ISD::PACKSS\PACKUS to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359673
2019-05-01 11:29:36 +00:00
Simon Pilgrim 3353cee06c [X86][SSE] Add X86ISD::UNPCKL\UNPCK to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359670
2019-05-01 11:08:03 +00:00
Simon Pilgrim f7b978a71b [X86][SSE] Move extract_subvector(pshufb) fold to SimplifyDemandedVectorEltsForTargetNode
This lets us hit more cases than combineExtractSubvector and allows us reuse more code.

llvm-svn: 359669
2019-05-01 10:58:38 +00:00
Simon Pilgrim a7d107a3e0 [X86] SimplifyDemandedVectorEltsForTargetNode - pull out vector halving code. NFCI.
Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes.

llvm-svn: 359667
2019-05-01 10:38:10 +00:00
Simon Pilgrim 99eefe94b5 [X86][SSE] Extract i1 elements from vXi1 bool vectors
This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK.

Differential Revision: https://reviews.llvm.org/D61189

llvm-svn: 359666
2019-05-01 10:02:22 +00:00
Craig Topper dd66acef96 [X86FixupLEAs] Hoist the calls to isLEA out of the 3 separate functions and put it in the basic block instruction loop. NFC
Now need to check it 3 different times. Just do it once at the top of the loop.

llvm-svn: 359658
2019-05-01 06:53:03 +00:00
David L. Jones fccb505f0f Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract element"
This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313.

llvm-svn: 359648
2019-05-01 05:01:03 +00:00