Commit Graph

191 Commits

Author SHA1 Message Date
Michael Kuperstein e6d59fdca5 [X86] Add costs for non-AVX512 single-source permutation integer shuffles
Differential Revision: https://reviews.llvm.org/D29416

llvm-svn: 293932
2017-02-02 20:27:13 +00:00
Jonas Paulsson 8e2f948ef0 [TargetTransformInfo] Refactor and improve getScalarizationOverhead()
Refactoring to remove duplications of this method.

New method getOperandsScalarizationOverhead() that looks at the present unique
operands and add extract costs for them. Old behaviour was to just add extract
costs for one operand of the type always, which still happens in
getArithmeticInstrCost() if no operands are provided by the caller.

This is a good start of improving on this, but there are more places
that can be improved by using getOperandsScalarizationOverhead().

Review: Hal Finkel
https://reviews.llvm.org/D29017

llvm-svn: 293155
2017-01-26 07:03:25 +00:00
Mohammed Agabaria 20caee95e1 [X86] enable memory interleaving for X86\SLM arch.
Differential Revision: https://reviews.llvm.org/D28547

llvm-svn: 293040
2017-01-25 09:14:48 +00:00
Simon Pilgrim 3e5b525699 Remove trailing whitespace. NFCI.
llvm-svn: 292613
2017-01-20 15:15:59 +00:00
Simon Pilgrim 0da4d2bc03 [CostModel][X86] Removed unused cost. NFCI.
SHL v8i32 is already handled in the SSE41 cost table

llvm-svn: 292612
2017-01-20 15:14:38 +00:00
Simon Pilgrim 6ed996cdf0 [CostModel][X86] Fix AVX512BW vector shift costs for vXi16 types
We already have patterns in place to support 128/256-bit shifts without AVX512VL

llvm-svn: 292077
2017-01-15 20:44:00 +00:00
Simon Pilgrim d419b73a42 [CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 has landed
llvm-svn: 292023
2017-01-14 19:24:23 +00:00
Simon Pilgrim 5a81fefad3 [X86][AVX512BW] Vectorize v64i8 vector shifts
Differential Revision: https://reviews.llvm.org/D28447

llvm-svn: 291665
2017-01-11 10:36:51 +00:00
Mohammed Agabaria 2c96c43388 [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.

special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. 
In case if the real operands bitwidth <= 16.

Differential Revision: https://reviews.llvm.org/D28104 

llvm-svn: 291657
2017-01-11 08:23:37 +00:00
Simon Pilgrim 9c58950eeb [CostModel][X86] Fixed vXi8 uniform shift costs.
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).

Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.

Added missing AVX2/AVX512BW costs as well.

llvm-svn: 291391
2017-01-08 14:14:36 +00:00
Simon Pilgrim 1fa5487c05 [CostModel][X86] Moved legal uniform shift costs earlier.
XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts.

llvm-svn: 291390
2017-01-08 13:12:03 +00:00
Simon Pilgrim 9681c407b4 [CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs
SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq.

llvm-svn: 291372
2017-01-07 22:27:43 +00:00
Simon Pilgrim a470296367 [CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.
llvm-svn: 291366
2017-01-07 22:08:09 +00:00
Simon Pilgrim 82e3e05fe2 [CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above
We were matching against general vector shift costs before the uniform splat costs

llvm-svn: 291365
2017-01-07 21:47:10 +00:00
Simon Pilgrim e70644dab7 [CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion.
llvm-svn: 291364
2017-01-07 21:33:00 +00:00
Simon Pilgrim 725997154d [CostModel][X86] Merge separate AVX1 cost LUTs. NFCI.
llvm-svn: 291355
2017-01-07 18:19:25 +00:00
Simon Pilgrim a4109d6433 [CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.
llvm-svn: 291354
2017-01-07 17:54:10 +00:00
Simon Pilgrim df7de7a87e [CostModel][X86] Added missing AVX2 arithmetic costs.
Allows us to correctly fall through to the lower AVX1 costs if look up failed.

llvm-svn: 291353
2017-01-07 17:27:39 +00:00
Simon Pilgrim 100eae1ee0 [CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI.
llvm-svn: 291352
2017-01-07 17:03:51 +00:00
Simon Pilgrim a1b8e2c725 [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470)
llvm-svn: 291347
2017-01-07 15:37:50 +00:00
Simon Pilgrim d8333372bc [CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.
Set the costs on the lowest target that supports the type.

llvm-svn: 291229
2017-01-06 11:12:53 +00:00
Simon Pilgrim aa186c632d [CostModel][X86] Tidyup arithmetic costs code. NFCI.
Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention.

llvm-svn: 291187
2017-01-05 22:48:02 +00:00
Simon Pilgrim 4c050c2190 [CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI.
llvm-svn: 291165
2017-01-05 19:42:43 +00:00
Simon Pilgrim 6f72eba606 Remove trailing whitespace. NFCI.
llvm-svn: 291163
2017-01-05 19:24:25 +00:00
Simon Pilgrim 5b06e4d319 [CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI.
llvm-svn: 291162
2017-01-05 19:19:39 +00:00
Simon Pilgrim a8bf97569a [CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI.
Removes need for yet another LUT.

llvm-svn: 291158
2017-01-05 19:01:50 +00:00
Simon Pilgrim 430d34fc14 [CostModel][X86] Strip unused 256-bit vector shift costs. NFCI.
Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead.

llvm-svn: 291152
2017-01-05 18:36:48 +00:00
Simon Pilgrim b01e844241 [CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL
Matches other MUL/ADD/SUB 256-bit case on AVX1

llvm-svn: 291149
2017-01-05 18:20:25 +00:00
Simon Pilgrim f74700aa8c [CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI.
llvm-svn: 291146
2017-01-05 17:56:19 +00:00
Simon Pilgrim bca02f9e20 [CostModel][X86] Add support for broadcast shuffle costs
Currently only for broadcasts with input and output of the same width.

Differential Revision: https://reviews.llvm.org/D27811

llvm-svn: 291122
2017-01-05 15:56:08 +00:00
Simon Pilgrim a62395a4bd [CostModel][X86] Pulled out common type legalization code
llvm-svn: 291109
2017-01-05 14:33:32 +00:00
Mohammed Agabaria 23599ba794 Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling.
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.

Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.

Differential Revision: https://reviews.llvm.org/D27518

llvm-svn: 291106
2017-01-05 14:03:41 +00:00
Mohammed Agabaria 189e2d29ba [Test Commit] fixing some format issue in X86TTI to match clang-format output.
llvm-svn: 291095
2017-01-05 09:51:02 +00:00
Simon Pilgrim bb895f3e9c [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs
Actual codegen is much better than the extract+insert patterns that was assumed.

llvm-svn: 290962
2017-01-04 14:01:33 +00:00
Simon Pilgrim 939b8cd708 [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.
As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode.

llvm-svn: 290956
2017-01-04 12:08:41 +00:00
Elena Demikhovsky d96200d60a Fixed shuffle-reverse cost on AVX-512.
(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately).

llvm-svn: 290812
2017-01-02 11:44:10 +00:00
Elena Demikhovsky 21706cbd24 AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.

In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).

* Shiffle-broadcast cost will be changed in Simon's upcoming patch.

Differential Revision: https://reviews.llvm.org/D28118

llvm-svn: 290810
2017-01-02 10:37:52 +00:00
Simon Pilgrim 081abbb164 [X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

llvm-svn: 290267
2016-12-21 20:00:10 +00:00
Simon Pilgrim 2f7f0e7a48 [CostModel][X86] Updated reverse shuffle costs
llvm-svn: 289819
2016-12-15 14:24:07 +00:00
Simon Pilgrim 841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Simon Pilgrim 4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Simon Pilgrim 03cd8f887c [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
llvm-svn: 287760
2016-11-23 13:42:09 +00:00
Simon Pilgrim 779da8e5ea [CostModel][X86] Added mul costs for vXi8 vectors
More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result

llvm-svn: 286838
2016-11-14 15:54:24 +00:00
Simon Pilgrim 27fed8e5d6 [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

llvm-svn: 286832
2016-11-14 14:45:16 +00:00
Simon Pilgrim d02c55204b [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233
2016-11-08 14:10:28 +00:00
Alexey Bataev d07c731d86 Improved cost model for FDIV and FSQRT, by Andrew Tischenko
There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.

Differential Revision: https://reviews.llvm.org/D25722

llvm-svn: 285564
2016-10-31 12:10:53 +00:00
Simon Pilgrim d23219b9ee [X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets
llvm-svn: 285329
2016-10-27 18:32:06 +00:00
Simon Pilgrim 820e1326d7 [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).

Updated cost table accordingly.

Differential Revision: https://reviews.llvm.org/D26011

llvm-svn: 285304
2016-10-27 15:27:00 +00:00
Simon Pilgrim 6ac1e98b09 [X86][SSE] Add SSE41/AVX1 costs for vector shifts.
We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results.

llvm-svn: 284939
2016-10-23 16:49:04 +00:00
Michael Kuperstein b2443ed62b [X86] Enable interleaved memory access by default
This lets the loop vectorizer generate interleaved memory accesses on x86.

Differential Revision: https://reviews.llvm.org/D25350

llvm-svn: 284779
2016-10-20 21:04:31 +00:00