llvm-project

Commit Graph

Author	SHA1	Message	Date
Dorit Nuzman	38bbf81ade	recommit 344472 after fixing build failure on ARM and PPC. llvm-svn: 344475	2018-10-14 08:50:06 +00:00
Dorit Nuzman	5118c68cde	revert 344472 due to failures. llvm-svn: 344473	2018-10-14 07:21:20 +00:00
Dorit Nuzman	8174368955	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472	2018-10-14 07:06:16 +00:00
Matthias Braun	d6131c9633	X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_Free DIV/REM by constants should always be expanded into mul/shift/etc. patterns. Unfortunately the ConstantHoisting pass runs too early at a point where the pattern isn't expanded yet. However after ConstantHoisting hoisted some immediate the result may not expand anymore. Also the hoisting typically doesn't make sense because it operates on immediates that will change completely during the expansion. Report DIV/REM as TCC_Free so ConstantHoisting will not touch them. Differential Revision: https://reviews.llvm.org/D53174 llvm-svn: 344315	2018-10-11 23:14:35 +00:00
Craig Topper	a72012c206	[X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F. Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51267 llvm-svn: 340708	2018-08-26 18:47:44 +00:00
Craig Topper	dd0ef801f8	Recommit r338204 "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'." This checks in a more direct way without triggering a UBSAN error. llvm-svn: 338273	2018-07-30 17:29:57 +00:00
Dean Michael Berris	927b3da6c9	Revert "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'." This reverts commit r338204. llvm-svn: 338236	2018-07-30 09:45:09 +00:00
Craig Topper	5daa032546	[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'. X86 normally requires immediates to be a signed 32-bit value which would exclude i64 0x80000000. But for add/sub we can negate the constant and use the opposite instruction. llvm-svn: 338204	2018-07-28 18:21:46 +00:00
Craig Topper	ba208b07b6	[X86] Use alignTo and divideCeil to make some code more readable. NFC llvm-svn: 338203	2018-07-28 18:21:45 +00:00
Simon Pilgrim	dc113dc7ed	[CostModel][X86] Add SREM/UREM general and constant costs (PR38056) We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM. This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975). Differential Revision: https://reviews.llvm.org/D48980 llvm-svn: 336486	2018-07-07 16:53:30 +00:00
Simon Pilgrim	8c3765dc6b	[CostModel][X86] Add UDIV/UREM by pow2 costs Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc. llvm-svn: 336371	2018-07-05 16:56:28 +00:00
Simon Pilgrim	2a9cde026c	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216	2018-06-21 11:37:13 +00:00
Simon Pilgrim	e39fa6cbbb	[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513	2018-06-12 16:12:29 +00:00
Simon Pilgrim	4162d77744	[TTI] Add uniform/non-uniform constant Pow2 detection to TargetTransformInfo::getInstructionThroughput This enables us to detect more fast path sdiv cases under cost analysis. This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs. Found while working on D46276 Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases. Differential Revision: https://reviews.llvm.org/D46637 llvm-svn: 332969	2018-05-22 10:40:09 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Simon Pilgrim	2faf606fb6	[CostModel][X86] Remove hard coded SDIV/UDIV vector costs Algorithmically compute the 'x20' SDIV/UDIV vector costs - this is necessary for PR36550 when DIV costs will be driven from the scheduler models. llvm-svn: 330870	2018-04-25 20:59:16 +00:00
Simon Pilgrim	58e03a09db	[CostModel][X86] Recursive call for cost of imul for packed v16i16 constant shift left. Don't just assume cost = 1. llvm-svn: 330834	2018-04-25 15:22:03 +00:00
Simon Pilgrim	80ce1dde44	[CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets llvm-svn: 329498	2018-04-07 13:24:33 +00:00
Craig Topper	a985919d3e	[X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44644 llvm-svn: 328451	2018-03-25 15:58:12 +00:00
Simon Pilgrim	9929f90740	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280) Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133	2018-02-26 22:10:17 +00:00
Simon Pilgrim	cb9a02f60e	[X86][SSE] Increase PMULLD costs to better match hardware Until Skylake, most hardware could only issue a PMULLD op every other cycle llvm-svn: 324823	2018-02-10 19:27:10 +00:00
Sanjay Patel	d7c702b451	[LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289	2018-02-05 23:43:05 +00:00
Simon Pilgrim	eb07016156	Spelling mistake in comment. NFCI. llvm-svn: 323752	2018-01-30 12:18:51 +00:00
Craig Topper	0d797a34d8	[X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface. This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for. I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10. This has been split from D41895 with the remainder in a follow up commit. llvm-svn: 323015	2018-01-20 00:26:08 +00:00
Alexey Bataev	771ec9f399	[COST]Fix PR35865: Fix cost model evaluation for shuffle on X86. Summary: If the vector type is transformed to non-vector single type, the compile may crash trying to get vector information about non-vector type. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41862 llvm-svn: 322106	2018-01-09 19:08:22 +00:00
Craig Topper	8b0f185c31	[X86] Simplify the TTI code for getInterleavedMemoryOpCost around for AVX512BW. NFCI Previously the lambda for AVX512 passed out a flag that indicated whether AVX512BW was required and that was checked against the AVX512BW subtarget flag outside. This patch changes the interface to pass the AVX512BW subtarget bit in and return its value if we detect 16 or 8 bit types. llvm-svn: 319919	2017-12-06 18:40:46 +00:00
Sanjay Patel	0de1a4bc2d	[PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result This should fix PR31455: https://bugs.llvm.org/show_bug.cgi?id=31455 Differential Revision: https://reviews.llvm.org/D28314 llvm-svn: 319094	2017-11-27 21:15:43 +00:00
Craig Topper	ea37e201ec	[X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is disabled. Allow gather on SKX/CNL/ICL when AVX512 is disabled by using AVX2 instructions. Summary: This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior. Test command lines have been added for these two cases. Reviewers: magabari, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40282 llvm-svn: 318983	2017-11-25 18:09:37 +00:00
Craig Topper	d5b5bbe22f	[X86] Spell penryn correctly in some comments. NFC llvm-svn: 318855	2017-11-22 18:23:40 +00:00
Mohammed Agabaria	115f68ea3e	[LV][X86] Support of AVX2 Gathers code generation and update the LV with this This patch depends on: https://reviews.llvm.org/D35348 Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen) Update LoopVectorize to generate gathers for AVX2 processors. Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb Reviewed By: delena, RKSimon Differential Revision: https://reviews.llvm.org/D35772 llvm-svn: 318641	2017-11-20 08:18:12 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Mohammed Agabaria	6e6d5326a1	[TTI][X86] update costs of interleaved load\store of i64\double This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2. Reviewers: delena, RKSimon, craig.topper, dorit Reviewed By: dorit Differential Revision: https://reviews.llvm.org/D40008 llvm-svn: 318385	2017-11-16 09:38:32 +00:00
Craig Topper	46a5d58b8c	[X86] Update TTI to report that v1iX/v1fX types aren't legal for masked gather/scatter/load/store. The type legalizer will try to scalarize these operations if it sees them, but there is no handling for scalarizing them. This leads to a fatal error. With this change they will now be scalarized by the mem intrinsic scalarizing pass before SelectionDAG. llvm-svn: 318380	2017-11-16 06:02:05 +00:00
Alexey Bataev	e25a6fd390	[SLP] Fix PR35047: Fix default cost model for cast op in X86. Summary: The cost calculation for default case on X86 target does not always follow correct wayt because of missing 4-th argument in `BaseT::getCastInstrCost()` call. Added this missing parameter. Reviewers: hfinkel, mkuper, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39687 llvm-svn: 317576	2017-11-07 14:23:44 +00:00
Mohammed Agabaria	6691758364	[LV][X86] update the cost of interleaving mem. access of floats Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471	2017-11-06 10:56:20 +00:00
Mohammed Agabaria	acd69dbc7c	[REVERT][LV][X86] update the cost of interleaving mem. access of floats reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433	2017-11-05 09:36:54 +00:00
Mohammed Agabaria	f74c767de6	[LV][X86] update the cost of interleaving mem. access of floats This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317432	2017-11-05 09:06:23 +00:00
Clement Courbet	b2c3eb8cf1	[CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2). - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905	2017-10-30 14:19:33 +00:00
Michael Zuckerman	49293264cc	[AVX512][AVX2]Cost calculation for interleave load/store patterns {v8i8,v16i8,v32i8,v64i8} This patch adds accurate instructions cost. The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride. Reviewers: 1. delena 2. Farhana 3. zvi 4. dorit 5. Ayal Differential Revision: https://reviews.llvm.org/D38762 Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7 llvm-svn: 316072	2017-10-18 11:41:55 +00:00
Clement Courbet	2807c0a442	[CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> TargetTransformInfo::enableMemCmpExpansion. Summary: Right now there are two functions with the same name, one does the work and the other one returns true if expansion is needed. Rename TargetTransformInfo::expandMemCmp to make it more consistent with other members of TargetTransformInfo. Remove the unused Instruction* parameter. Differential Revision: https://reviews.llvm.org/D38165 llvm-svn: 314096	2017-09-25 06:35:16 +00:00
Sanjay Patel	6fd4391ddd	[DivRempairs] add a pass to optimize div/rem pairs (PR31028) This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented as an independent pass, so there's no stretching of scope and feature creep for an existing pass. I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost this same functionality as an addition to CGP in the motivating example of PR31028: https://bugs.llvm.org/show_bug.cgi?id=31028 The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and undo the hoisting that is done here. Decomposing remainder may allow removing some code from the backend (PPC and possibly others). Differential Revision: https://reviews.llvm.org/D37121 llvm-svn: 312862	2017-09-09 13:38:18 +00:00
Alexey Bataev	6dd29fccb8	[SLP] Support for horizontal min/max reduction. SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions. Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions. Patch fixes PR26956. Differential revision: https://reviews.llvm.org/D27846 llvm-svn: 312791	2017-09-08 13:49:36 +00:00
Zvi Rackover	25799d93f0	X86: Improve AVX512 fptoui lowering Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704	2017-09-07 07:40:34 +00:00
Tobias Grosser	d7eb619299	Model cache size and associativity in TargetTransformInfo Summary: We add the precise cache sizes and associativity for the following Intel architectures: - Penry - Nehalem - Westmere - Sandy Bridge - Ivy Bridge - Haswell - Broadwell - Skylake - Kabylake Polly uses since several months a performance model for BLAS computations that derives optimal cache and register tile sizes from cache and latency information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016). While bootstrapping this model, these target values have been kept in Polly. However, as our implementation is now rather mature, it seems time to teach LLVM itself about cache sizes. Interestingly, L1 and L2 cache sizes are pretty constant across micro-architectures, hence a set of architecture specific default values seems like a good start. They can be expanded to more target specific values, in case certain newer architectures require different values. For now a set of Intel architectures are provided. Just as a little teaser, for a simple gemm kernel this model allows us to improve performance from 1.2s to 0.27s. For gemm kernels with less optimal memory layouts even larger speedups can be reported. Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb Reviewed By: fhahn, asb Subscribers: lsaba, asb, pollydev, llvm-commits Differential Revision: https://reviews.llvm.org/D37051 llvm-svn: 311647	2017-08-24 09:46:25 +00:00
Elena Demikhovsky	f58f838495	Changed basic cost of store operation on X86 Store operation takes 2 UOps on X86 processors. The exact cost calculation affects several optimization passes including loop unroling. This change compensates performance degradation caused by https://reviews.llvm.org/D34458 and shows improvements on some benchmarks. Differential Revision: https://reviews.llvm.org/D35888 llvm-svn: 311285	2017-08-20 12:34:29 +00:00
Simon Pilgrim	c63f93a197	[CostModel][X86][XOP] Improve costs for XOP shuffles VPPERM/VPERMIL2PD/VPERMIL2PS all provide more effective 2-input shuffles than regular AVX instructions llvm-svn: 311005	2017-08-16 13:50:20 +00:00
Simon Pilgrim	b59c2d9d73	[CostModel][X86] Add SSE2 two-src shuffle costs llvm-svn: 310654	2017-08-10 19:32:35 +00:00
Simon Pilgrim	7354531b82	[CostModel][X86] Add avx1 two-src shuffle costs llvm-svn: 310650	2017-08-10 19:02:51 +00:00
Simon Pilgrim	ac2e50a4ca	[CostModel][X86] Add avx2 two-src shuffle costs llvm-svn: 310645	2017-08-10 18:29:34 +00:00
Simon Pilgrim	702e5fa391	[CostModel][X86] Improve single src shuffle costs Add missing SK_PermuteSingleSrc costs for AVX2 targets and earlier, also added some of the simpler SK_PermuteTwoSrc costs to support splitting of SK_PermuteSingleSrc shuffles llvm-svn: 310632	2017-08-10 17:27:20 +00:00
Evgeny Stupachenko	c675290680	Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720). The root cause of reverting was fixed - PR33514. Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310289	2017-08-07 19:56:34 +00:00
Simon Pilgrim	7b89ab5887	Strip trailing whitespace. NFCI. llvm-svn: 309584	2017-07-31 17:09:27 +00:00
Alexey Bataev	3e9b3eb91d	[Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC. llvm-svn: 309563	2017-07-31 14:19:32 +00:00
Mohammed Agabaria	eb09a810e6	[X86][CM] update add\sub costs of vectors of 64 in X86\SLM arch this patch updates the cost of addq\subq (add\subtract of vectors of 64bits) based on the performance numbers of SLM arch. Differential Revision: https://reviews.llvm.org/D33983 llvm-svn: 306974	2017-07-02 12:16:15 +00:00
Dorit Nuzman	e0e0f1ddb0	[AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2 The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238	2017-06-25 08:26:25 +00:00
Sanjay Patel	0656629b87	[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes There are a couple of potential improvements as seen in the IR and asm: 1. We're unnecessarily extending to a larger type to compare values. 2. The codegen for (select cond, 1, -1) could avoid a cmov. (or we could change the order of the compares, so we have a select with 0 operand) llvm-svn: 305802	2017-06-20 15:58:30 +00:00
Hans Wennborg	ca69fc1cb7	Revert r304824 "Fix PR23384 (part 3 of 3)" This seems to be interacting badly with ASan somehow, causing false reports of heap-buffer overflows: PR33514. > Summary: > The patch makes instruction count the highest priority for > LSR solution for X86 (previously registers had highest priority). > > Reviewers: qcolombet > > Differential Revision: http://reviews.llvm.org/D30562 > > From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 305720	2017-06-19 17:57:15 +00:00
Evgeny Stupachenko	3b88291581	Fix PR23384 (part 3 of 3) Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304824	2017-06-06 20:04:16 +00:00
Anna Thomas	b2a212c070	[Atomics][LoopIdiom] Recognize unordered atomic memcpy Summary: Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations. Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html Patch by Daniel Neilson! Reviewers: reames, anna, skatkov Reviewed By: reames, anna Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33243 llvm-svn: 304806	2017-06-06 16:45:25 +00:00
Simon Pilgrim	6bba6068be	[X86][AVX512] Add 512-bit vector ctpop costs + tests llvm-svn: 303342	2017-05-18 10:42:34 +00:00
Simon Pilgrim	23ef26728a	[X86][AVX512] Add 512-bit vector ctlz costs + tests llvm-svn: 303300	2017-05-17 21:02:18 +00:00
Simon Pilgrim	d0365967c4	[X86][AVX512] Add 512-bit vector cttz costs + tests llvm-svn: 303293	2017-05-17 20:22:54 +00:00
Simon Pilgrim	a9a92a1a6a	[X86][AVX512] Add 512-bit vector bitreverse costs + tests llvm-svn: 303283	2017-05-17 19:20:20 +00:00
Simon Pilgrim	d0ef9d8e93	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts llvm-svn: 303023	2017-05-14 20:52:11 +00:00
Simon Pilgrim	f96b4ab92d	[X86][AVX2] Fix costs for v4i64 ashr by splat llvm-svn: 303022	2017-05-14 20:25:42 +00:00
Simon Pilgrim	de4467b182	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splat llvm-svn: 303021	2017-05-14 20:02:34 +00:00
Simon Pilgrim	d3f0d03cc5	[X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul sequences llvm-svn: 303017	2017-05-14 18:52:15 +00:00
Simon Pilgrim	5bef9c627e	[X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + mask. Tweak cost model to match what lowering actually does. llvm-svn: 303013	2017-05-14 17:59:46 +00:00
Simon Pilgrim	aa8dffb69b	[X86][SSE] Account for cost of extract/insert of v32i8 vector shifts llvm-svn: 303012	2017-05-14 17:36:07 +00:00
Simon Pilgrim	4599eaa09a	[X86][XOP] Account for cost of extract/insert of 256-bit vector shifts llvm-svn: 303010	2017-05-14 13:38:53 +00:00
Simon Pilgrim	2d1c6d6e8d	[X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics. Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. llvm-svn: 302378	2017-05-07 20:58:55 +00:00
Jonas Paulsson	fccc7d66c3	[SystemZ] TargetTransformInfo cost functions implemented. getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052	2017-04-12 11:49:08 +00:00
Keno Fischer	1ec5dd85a2	[X86 TTI] Implement LSV hook Summary: LSV wants to know the maximum size that can be loaded to a vector register. On X86, this always matches the maximum register width. Implement this accordingly and add a test to make sure that LSV can vectorize up to the maximum permissible width on X86. Reviewers: delena, arsenm Reviewed By: arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D31504 llvm-svn: 299589	2017-04-05 20:51:38 +00:00
Simon Pilgrim	06c70adcf0	[X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars Prep work for PR31810 llvm-svn: 297876	2017-03-15 19:34:55 +00:00
Simon Pilgrim	a0b0b74b9a	Align cost model columns. NFCI. llvm-svn: 297824	2017-03-15 11:57:42 +00:00
Jonas Paulsson	a48ea231c0	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705	2017-03-14 06:35:36 +00:00
Michael Kuperstein	e6d59fdca5	[X86] Add costs for non-AVX512 single-source permutation integer shuffles Differential Revision: https://reviews.llvm.org/D29416 llvm-svn: 293932	2017-02-02 20:27:13 +00:00
Jonas Paulsson	8e2f948ef0	[TargetTransformInfo] Refactor and improve getScalarizationOverhead() Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155	2017-01-26 07:03:25 +00:00
Mohammed Agabaria	20caee95e1	[X86] enable memory interleaving for X86\SLM arch. Differential Revision: https://reviews.llvm.org/D28547 llvm-svn: 293040	2017-01-25 09:14:48 +00:00
Simon Pilgrim	3e5b525699	Remove trailing whitespace. NFCI. llvm-svn: 292613	2017-01-20 15:15:59 +00:00
Simon Pilgrim	0da4d2bc03	[CostModel][X86] Removed unused cost. NFCI. SHL v8i32 is already handled in the SSE41 cost table llvm-svn: 292612	2017-01-20 15:14:38 +00:00
Simon Pilgrim	6ed996cdf0	[CostModel][X86] Fix AVX512BW vector shift costs for vXi16 types We already have patterns in place to support 128/256-bit shifts without AVX512VL llvm-svn: 292077	2017-01-15 20:44:00 +00:00
Simon Pilgrim	d419b73a42	[CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 has landed llvm-svn: 292023	2017-01-14 19:24:23 +00:00
Simon Pilgrim	5a81fefad3	[X86][AVX512BW] Vectorize v64i8 vector shifts Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665	2017-01-11 10:36:51 +00:00
Mohammed Agabaria	2c96c43388	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Simon Pilgrim	9c58950eeb	[CostModel][X86] Fixed vXi8 uniform shift costs. The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation). Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled. Added missing AVX2/AVX512BW costs as well. llvm-svn: 291391	2017-01-08 14:14:36 +00:00
Simon Pilgrim	1fa5487c05	[CostModel][X86] Moved legal uniform shift costs earlier. XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts. llvm-svn: 291390	2017-01-08 13:12:03 +00:00
Simon Pilgrim	9681c407b4	[CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq. llvm-svn: 291372	2017-01-07 22:27:43 +00:00
Simon Pilgrim	a470296367	[CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs. llvm-svn: 291366	2017-01-07 22:08:09 +00:00
Simon Pilgrim	82e3e05fe2	[CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above We were matching against general vector shift costs before the uniform splat costs llvm-svn: 291365	2017-01-07 21:47:10 +00:00
Simon Pilgrim	e70644dab7	[CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion. llvm-svn: 291364	2017-01-07 21:33:00 +00:00
Simon Pilgrim	725997154d	[CostModel][X86] Merge separate AVX1 cost LUTs. NFCI. llvm-svn: 291355	2017-01-07 18:19:25 +00:00
Simon Pilgrim	a4109d6433	[CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets. llvm-svn: 291354	2017-01-07 17:54:10 +00:00
Simon Pilgrim	df7de7a87e	[CostModel][X86] Added missing AVX2 arithmetic costs. Allows us to correctly fall through to the lower AVX1 costs if look up failed. llvm-svn: 291353	2017-01-07 17:27:39 +00:00
Simon Pilgrim	100eae1ee0	[CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI. llvm-svn: 291352	2017-01-07 17:03:51 +00:00
Simon Pilgrim	a1b8e2c725	[X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470) llvm-svn: 291347	2017-01-07 15:37:50 +00:00
Simon Pilgrim	d8333372bc	[CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs. Set the costs on the lowest target that supports the type. llvm-svn: 291229	2017-01-06 11:12:53 +00:00
Simon Pilgrim	aa186c632d	[CostModel][X86] Tidyup arithmetic costs code. NFCI. Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention. llvm-svn: 291187	2017-01-05 22:48:02 +00:00
Simon Pilgrim	4c050c2190	[CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI. llvm-svn: 291165	2017-01-05 19:42:43 +00:00
Simon Pilgrim	6f72eba606	Remove trailing whitespace. NFCI. llvm-svn: 291163	2017-01-05 19:24:25 +00:00
Simon Pilgrim	5b06e4d319	[CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI. llvm-svn: 291162	2017-01-05 19:19:39 +00:00
Simon Pilgrim	a8bf97569a	[CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI. Removes need for yet another LUT. llvm-svn: 291158	2017-01-05 19:01:50 +00:00
Simon Pilgrim	430d34fc14	[CostModel][X86] Strip unused 256-bit vector shift costs. NFCI. Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead. llvm-svn: 291152	2017-01-05 18:36:48 +00:00
Simon Pilgrim	b01e844241	[CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL Matches other MUL/ADD/SUB 256-bit case on AVX1 llvm-svn: 291149	2017-01-05 18:20:25 +00:00
Simon Pilgrim	f74700aa8c	[CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI. llvm-svn: 291146	2017-01-05 17:56:19 +00:00
Simon Pilgrim	bca02f9e20	[CostModel][X86] Add support for broadcast shuffle costs Currently only for broadcasts with input and output of the same width. Differential Revision: https://reviews.llvm.org/D27811 llvm-svn: 291122	2017-01-05 15:56:08 +00:00
Simon Pilgrim	a62395a4bd	[CostModel][X86] Pulled out common type legalization code llvm-svn: 291109	2017-01-05 14:33:32 +00:00
Mohammed Agabaria	23599ba794	Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106	2017-01-05 14:03:41 +00:00
Mohammed Agabaria	189e2d29ba	[Test Commit] fixing some format issue in X86TTI to match clang-format output. llvm-svn: 291095	2017-01-05 09:51:02 +00:00
Simon Pilgrim	bb895f3e9c	[CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs Actual codegen is much better than the extract+insert patterns that was assumed. llvm-svn: 290962	2017-01-04 14:01:33 +00:00
Simon Pilgrim	939b8cd708	[X86] Merged Reverse/Alternate shuffle cost tables. NFCI. As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode. llvm-svn: 290956	2017-01-04 12:08:41 +00:00
Elena Demikhovsky	d96200d60a	Fixed shuffle-reverse cost on AVX-512. (This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately). llvm-svn: 290812	2017-01-02 11:44:10 +00:00
Elena Demikhovsky	21706cbd24	AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns. X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810	2017-01-02 10:37:52 +00:00
Simon Pilgrim	081abbb164	[X86][SSE] Improve lowering of vXi64 multiplies As mentioned on PR30845, we were performing our vXi64 multiplication as: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32); when we could avoid one of the upper shifts with: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi + AhiBlo, 32); This matches the lowering on gcc/icc. Differential Revision: https://reviews.llvm.org/D27756 llvm-svn: 290267	2016-12-21 20:00:10 +00:00
Simon Pilgrim	2f7f0e7a48	[CostModel][X86] Updated reverse shuffle costs llvm-svn: 289819	2016-12-15 14:24:07 +00:00
Simon Pilgrim	841d7ca463	[X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882	2016-11-24 14:46:55 +00:00
Simon Pilgrim	4e9b9cbee9	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762	2016-11-23 14:01:18 +00:00
Simon Pilgrim	03cd8f887c	[CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs llvm-svn: 287760	2016-11-23 13:42:09 +00:00
Simon Pilgrim	779da8e5ea	[CostModel][X86] Added mul costs for vXi8 vectors More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838	2016-11-14 15:54:24 +00:00
Simon Pilgrim	27fed8e5d6	[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832	2016-11-14 14:45:16 +00:00
Simon Pilgrim	d02c55204b	[VectorLegalizer] Expansion of CTLZ using CTPOP when possible This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233	2016-11-08 14:10:28 +00:00
Alexey Bataev	d07c731d86	Improved cost model for FDIV and FSQRT, by Andrew Tischenko There is a bug describing poor cost model for floating point operations: Bug 29083 - [X86][SSE] Improve costs for floating point operations. This patch is the second one in series of patches dealing with cost model. Differential Revision: https://reviews.llvm.org/D25722 llvm-svn: 285564	2016-10-31 12:10:53 +00:00
Simon Pilgrim	d23219b9ee	[X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets llvm-svn: 285329	2016-10-27 18:32:06 +00:00
Simon Pilgrim	820e1326d7	[X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64 With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304	2016-10-27 15:27:00 +00:00
Simon Pilgrim	6ac1e98b09	[X86][SSE] Add SSE41/AVX1 costs for vector shifts. We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results. llvm-svn: 284939	2016-10-23 16:49:04 +00:00
Michael Kuperstein	b2443ed62b	[X86] Enable interleaved memory access by default This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779	2016-10-20 21:04:31 +00:00
Simon Pilgrim	365be4f95c	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors We weren't checking for uniform const costs before the general cost, resulting in very high estimates. llvm-svn: 284755	2016-10-20 18:00:35 +00:00
Simon Pilgrim	025e26dd32	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults. We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases. llvm-svn: 284744	2016-10-20 16:39:11 +00:00
Simon Pilgrim	4ddc92b6cd	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459	2016-10-18 07:42:15 +00:00
Alexey Bataev	b271a58e37	NFC: The Cost Model specialization, by Andrey Tischenko The current Cost Model implementation is very inaccurate and has to be updated, improved, re-implemented to be able to take into account the concrete CPU models and the concrete targets where this Cost Model is being used. For example, the Latency Cost Model should be differ from Code Size Cost Model, etc. This patch is the first step to launch the developing and implementation of a new Cost Model generation. Differential Revision: https://reviews.llvm.org/D25186 llvm-svn: 284012	2016-10-12 13:24:13 +00:00
Justin Bogner	b03fd12cef	Replace "fallthrough" comments with LLVM_FALLTHROUGH This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902	2016-08-17 05:10:15 +00:00
Charles Davis	e9c32c7ed3	Revert "[X86] Support the "ms-hotpatch" attribute." This reverts commit r278048. Something changed between the last time I built this--it takes awhile on my ridiculously slow and ancient computer--and now that broke this. llvm-svn: 278053	2016-08-08 21:20:15 +00:00
Charles Davis	0822aa118e	[X86] Support the "ms-hotpatch" attribute. Summary: Based on two patches by Michael Mueller. This is a target attribute that causes a function marked with it to be emitted as "hotpatchable". This particular mechanism was originally devised by Microsoft for patching their binaries (which they are constantly updating to stay ahead of crackers, script kiddies, and other ne'er-do-wells on the Internet), but is now commonly abused by Windows programs to hook API functions. This mechanism is target-specific. For x86, a two-byte no-op instruction is emitted at the function's entry point; the entry point must be immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding. This padding is where the patch code is written. The two byte no-op is then overwritten with a short jump into this code. The no-op is usually a `movl %edi, %edi` instruction; this is used as a magic value indicating that this is a hotpatchable function. Reviewers: majnemer, sanjoy, rnk Subscribers: dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D19908 llvm-svn: 278048	2016-08-08 21:01:39 +00:00
Michael Kuperstein	3ceac2bbd5	[LV, X86] Be more optimistic about vectorizing shifts. Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782	2016-08-04 22:48:03 +00:00
Simon Pilgrim	5d5ca9c0cb	[X86][SSE] Add initial costs for vector CTTZ/CTLZ llvm-svn: 277716	2016-08-04 10:51:41 +00:00
Igor Breger	f44b79d08e	[AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately dataWidth check. Differential Revision: http://reviews.llvm.org/D23055 llvm-svn: 277435	2016-08-02 09:15:28 +00:00
Simon Pilgrim	1b4f511aaa	[X86][SSE] Add cost model values for CTPOP of vectors This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104	2016-07-20 10:41:28 +00:00
Simon Pilgrim	285d9e4d60	Strip trailing whitespace llvm-svn: 275726	2016-07-17 19:02:27 +00:00
Michael Kuperstein	f0c59330e9	[X86] Make some cast costs more precise Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106	2016-07-11 21:39:44 +00:00
Sanjay Patel	04b3496d9b	[x86] fix cost of SINT_TO_FP for i32 --> float (PR21356, PR28434) This is "cvtdq2ps" which does not appear to be particularly slow on any CPU according to Agner's tables. Choosing "5" as a cost here as suggested in: https://llvm.org/bugs/show_bug.cgi?id=21356 ...but it seems very conservative given that the instruction is fully pipelined, and I think these costs are supposed to model throughput. Note that related costs are also most likely too high, but this fixes PR21356 and partly fixes PR28434. llvm-svn: 274658	2016-07-06 19:15:54 +00:00
Michael Kuperstein	1b62e0e91f	[X86] Sort cast cost tables. NFC. Cast cost tables are now sorted, for each cast type, lexicographically on [source base type, source vector width, dest base type, base vector width]. llvm-svn: 274653	2016-07-06 18:26:48 +00:00
Simon Pilgrim	356e823b51	[X86][SSE] Add cost model for BSWAP of vectors The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization. Differential Revision: http://reviews.llvm.org/D21521 llvm-svn: 273217	2016-06-20 23:08:21 +00:00
Simon Pilgrim	3fc09f7be6	[CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets To account for the fast PSHUFB implementation now available llvm-svn: 272484	2016-06-11 19:23:02 +00:00
Michael Kuperstein	9a0542a792	[X86] Add costs for SSE zext/sext to v4i64 to TTI The costs are somewhat hand-wavy, but should be much closer to the truth than what we get from BasicTTI. Differential Revision: http://reviews.llvm.org/D21156 llvm-svn: 272406	2016-06-10 17:01:05 +00:00
Sanjay Patel	aedc347b29	[x86] avoid code explosion from LoopVectorizer for gather loop (PR27826) By making pointer extraction from a vector more expensive in the cost model, we avoid the vectorization of a loop that is very likely to be memory-bound: https://llvm.org/bugs/show_bug.cgi?id=27826 There are still bugs related to this, so we may need a more general solution to avoid vectorizing obviously memory-bound loops when we don't have HW gather support. Differential Revision: http://reviews.llvm.org/D20601 llvm-svn: 270729	2016-05-25 17:27:54 +00:00
Simon Pilgrim	14000b3cea	[CostModel][X86][XOP] Added XOP costmodel for BITREVERSE Now that we have a nice fast VPPERM solution. Added framework for future intrinsic costs as well. llvm-svn: 270537	2016-05-24 08:17:50 +00:00
Simon Pilgrim	eec3a95f95	[X86][SSE] Improve cost model for i64 vector comparisons on pre-SSE42 targets As discussed on PR24888, until SSE42 we don't have access to PCMPGTQ for v2i64 comparisons, but the cost models don't reflect this, resulting in over-optimistic vectorizaton. This patch adds SSE2 'base level' costs that match what a typical target is capable of and only reduces the v2i64 costs at SSE42. Technically SSE41 provides a PCMPEQQ v2i64 equality test, but as getCmpSelInstrCost doesn't give us a way to discriminate between comparison test types we can't easily make use of this, otherwise we could split the cost of integer equality and greater-than tests to give better costings of each. Differential Revision: http://reviews.llvm.org/D20057 llvm-svn: 268972	2016-05-09 21:14:38 +00:00
Ashutosh Nema	468558a061	[X86]: Changing cost for “TRUNCATE v16i32 to v16i8” in SSE4.1 mode. Summary: rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS operations during DAG combine. This generates better code for truncate, so cost of truncate needs to be changed but looks like it got changed only in SSE2 table Whereas this change is also applicable for SSE4.1, so the cost of truncate needs to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from SSE4.1, so it will fall back to SSE2. Reviewers: Simon Pilgrim llvm-svn: 267123	2016-04-22 08:34:05 +00:00
Mehdi Amini	867e91468b	Do not use getGlobalContext()... ever. This code was creating a new type in the global context, regardless of which context the user is sitting in, what can possibly go wrong? From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266275	2016-04-14 04:36:40 +00:00
Sanjay Patel	4c7d094451	fix typo; NFC llvm-svn: 265442	2016-04-05 19:27:39 +00:00
Sanjay Patel	9f6c4d50b4	[x86] fix cost model inaccuracy for vector memory ops The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge. Differential Revision: http://reviews.llvm.org/D18000 llvm-svn: 263069	2016-03-09 22:23:33 +00:00
Igor Breger	4d94d4d5f7	AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element types on SKX Differential Revision: http://reviews.llvm.org/D17913 llvm-svn: 262803	2016-03-06 12:38:58 +00:00
Igor Breger	6d421419db	AVX1 : Enable vector masked_load/store to AVX1. Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675	2016-01-25 10:17:11 +00:00
Elena Demikhovsky	5494698828	Implemented cost model for masked gather and scatter operations The cost is calculated for all X86 targets. When gather/scatter instruction is not supported we calculate the cost of scalar sequence. Differential revision: http://reviews.llvm.org/D15677 llvm-svn: 256519	2015-12-28 20:10:59 +00:00
Cong Hou	8df93ce455	[X86][SSE] Transform truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. This patch transforms truncation between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in lowering phase because after type legalization, the original truncation will be turned into a BUILD_VECTOR with each element that is extracted from a vector and then truncated, and from them it is difficult to do this optimization. This greatly improves the performance of truncations on some specific types. Cost table is updated accordingly. Differential revision: http://reviews.llvm.org/D14588 llvm-svn: 256194	2015-12-21 20:42:43 +00:00
Craig Topper	074e845260	[X86] Prevent constant hoisting for a couple compare immediates that the selection DAG knows how to optimize into a shift. This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code. Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases. llvm-svn: 256126	2015-12-20 18:41:54 +00:00
Cong Hou	59898d8c68	[X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1. Previously in the conversion cost table there are no entries for integer-integer conversions on SSE2. This will result in imprecise costs for certain vectorized operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers are counted from the result of running llc on the new test case in this patch. Differential revision: http://reviews.llvm.org/D15132 llvm-svn: 255315	2015-12-11 00:31:39 +00:00
Elena Demikhovsky	a1a40cce9f	AVX-512: Updated cost of FP/SINT/UINT conversion operations I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode. Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets). Differential Revision: http://reviews.llvm.org/D15074 llvm-svn: 254494	2015-12-02 08:59:47 +00:00
Elena Demikhovsky	1ca72e1846	Pointers in Masked Load, Store, Gather, Scatter intrinsics The masked intrinsics support all integer and floating point data types. I added the pointer type to this list. Added tests for CodeGen and for Loop Vectorizer. Updated the Language Reference. Differential Revision: http://reviews.llvm.org/D14150 llvm-svn: 253544	2015-11-19 07:17:16 +00:00
Cong Hou	da4e8aeec6	[X86] A small fix in X86/X86TargetTransformInfo.cpp: check a value type is simple before calling getSimpleVT(). llvm-svn: 251538	2015-10-28 18:15:46 +00:00
Craig Topper	4b27576001	Remove templates from CostTableLookup functions. All instantiations had the same type. This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now. llvm-svn: 251490	2015-10-28 04:02:12 +00:00
Craig Topper	ee0c859788	Convert cost table lookup functions to return a pointer to the entry or nullptr instead of the index. This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer. While there make use of ArrayRef and std::find_if. llvm-svn: 251382	2015-10-27 04:14:24 +00:00
Elena Demikhovsky	092858588a	Scalarizer for masked.gather and masked.scatter intrinsics. When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations. If the mask is not constant, the scalarizer will build a chain of conditional basic blocks. I added isLegalMaskedGather() isLegalMaskedScatter() APIs. Differential Revision: http://reviews.llvm.org/D13722 llvm-svn: 251237	2015-10-25 15:37:55 +00:00
Craig Topper	eda02a905e	Remove two unnecessary conversions from MVT to EVT. NFC llvm-svn: 251219	2015-10-25 03:15:29 +00:00
Elena Demikhovsky	7ad0d563a5	Partially reverted changes from r250686 Clang runtime failure was reported. Assertion failed: (isExtended() && "Type is not extended!"), function getTypeForEVT I'll need to add a proper handling for PointerType in masked load/store intrinsics. llvm-svn: 250995	2015-10-22 06:20:29 +00:00
Elena Demikhovsky	20662e39f1	Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore(). Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case. Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces. Differential Revision: http://reviews.llvm.org/D13850 llvm-svn: 250686	2015-10-19 07:43:38 +00:00
Simon Pilgrim	a18ae9bd70	[CostModel] Fixed AVX integer shift costs Targets with AVX but without AVX2 were incorrectly reporting costs of 256-bit integer shifts. llvm-svn: 250611	2015-10-17 13:23:38 +00:00
Hans Wennborg	083ca9bb32	Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups. Patch by Eugene Zelenko! Differential Revision: http://reviews.llvm.org/D13321 llvm-svn: 249482	2015-10-06 23:24:35 +00:00
Craig Topper	79dd1bf094	[X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted. Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister. llvm-svn: 249370	2015-10-06 02:50:24 +00:00
Simon Pilgrim	3d11c994f7	[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878	2015-09-30 08:17:50 +00:00
Chandler Carruth	93205eb966	[TTI] Make the cost APIs in TargetTransformInfo consistently use 'int' rather than 'unsigned' for their costs. For something like costs in particular there is a natural "negative" value, that of savings or saved cost. As a consequence, there is a lot of code that subtracts or creates negative values based on cost, all of which is prone to awkwardness or bugs when dealing with an unsigned type. Similarly, we never want these values to wrap, as that would cause Very Bad code generation (likely percieved as an infinite loop as we try to emit over 2^32 instructions or some such insanity). All around 'int' seems a much better fit for these basic metrics. I've added asserts to ensure that at least the TTI interface never returns negative numbers here. If we ever have a use case for negative numbers, we can remove this, but this way a bug where someone used '-1' to produce a 'very large' cost will be caught by the assert. This passes all tests, and is also UBSan clean. No functional change intended. Differential Revision: http://reviews.llvm.org/D11741 llvm-svn: 244080	2015-08-05 18:08:10 +00:00
Eric Christopher	d566fb12a1	Rename hasCompatibleFunctionAttributes->areInlineCompatible based on suggestions. Currently the function is only used for inline purposes and this is more descriptive for the use. llvm-svn: 243578	2015-07-29 22:09:48 +00:00
Simon Pilgrim	86478c6909	[X86][SSE] Vectorize i64 ASHR operations This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed. Differential Revision: http://reviews.llvm.org/D11439 llvm-svn: 243569	2015-07-29 20:31:45 +00:00
Simon Pilgrim	e2c244f3b4	[X86][SSE] Reordered cast vectorization costs. NFCI. Reordered the data tables at the top and placed the lookups after. The first stage in the yak shaving necessary to get more accurate costs for a variety of targets given the recent improvements to SINT_TO_FP/UINT_TO_FP/SIGN_EXTEND vector lowering. llvm-svn: 242643	2015-07-19 15:36:12 +00:00
Simon Pilgrim	59764dccfb	[X86][SSE] Updated SHL/LSHR i64 vectorization costs. This was missed in D8416. llvm-svn: 242621	2015-07-18 20:06:30 +00:00
NAKAMURA Takumi	0b305db8df	Prune trailing whitespaces and CRs. llvm-svn: 242117	2015-07-14 04:03:49 +00:00
Simon Pilgrim	64cc4ad0a2	[X86][SSE] Vectorized v4i32 non-uniform shifts. While the v4i32 shl operation is already vectorized using a cvttps2dq/pmulld pattern, the lshr/ashr opeations are still scalarized. This patch adds vectorization support for non-uniform v4i32 shift operations - it splats constant shift amounts to allow them to use the immediate sse shift instructions, or extracts/zero-extends non-constant shift amounts. The individual results are then blended together. Differential Revision: http://reviews.llvm.org/D11063 llvm-svn: 241989	2015-07-12 11:15:19 +00:00
Mehdi Amini	44ede33a69	Make TargetLowering::getPointerTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775	2015-07-09 02:09:04 +00:00
Simon Pilgrim	8fbf1c1f4a	[X86][SSE] Vectorized i64 uniform constant SRA shifts This patch adds vectorization support for uniform constant i64 arithmetic shift right operators. Differential Revision: http://reviews.llvm.org/D9645 llvm-svn: 241514	2015-07-06 22:35:19 +00:00
Eric Christopher	e100226879	Implement TargetTransformInfo::hasCompatibleFunctionAttributes for X86. This checks subtarget feature compatibility for inlining by verifying that the callee is a strict subset of the caller's features. This includes the cpu as part of the subtarget we can get via the incoming functions as the backend takes CPUs as feature sets. This allows us to inline things like: int foo() { return baz(); } int __attribute__((target("sse4.2"))) bar() { return foo(); } so that generic code can be inlined into specialized functions. llvm-svn: 241221	2015-07-02 01:11:50 +00:00
Simon Pilgrim	5965680d53	[X86][SSE] Vectorized i8 and i16 shift operators This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements: 1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node. 2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper). 3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial. The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however. Tested on SSE2, SSE41 and AVX machines. Differential Revision: http://reviews.llvm.org/D9474 llvm-svn: 239509	2015-06-11 07:46:37 +00:00
Simon Pilgrim	0be4fa761f	[X86][AVX2] Vectorized i16 shift operators Part of D9474, this patch extends AVX2 v16i16 types to 2 x 8i32 vectors and uses i32 shift variable shifts before packing back to i16. Adds AVX2 tests for v8i16 and v16i16 llvm-svn: 238149	2015-05-25 17:49:13 +00:00
Wei Mi	062c74484d	[X86] Disable loop unrolling in loop vectorization pass when VF is 1. The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture, by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce the cost of overflow check, memory boundary check and extra prologue/epilogue code when regular unroller will unroll the loop another time. Disable it when VF==1 remove the unnecessary cost on x86. The same can be done for other platforms after verifying interleaving/memory bound checking to be not perf critical on those platforms. Differential Revision: http://reviews.llvm.org/D9515 llvm-svn: 236613	2015-05-06 17:12:25 +00:00
Chandler Carruth	93dcdc47db	[PM] Switch the TargetMachine interface from accepting a pass manager base which it adds a single analysis pass to, to instead return the type erased TargetTransformInfo object constructed for that TargetMachine. This removes all of the pass variants for TTI. There is now a single TTI pass in the Analysis layer. All of the Analysis <-> Target communication is through the TTI's type erased interface itself. While the diff is large here, it is nothing more that code motion to make types available in a header file for use in a different source file within each target. I've tried to keep all the doxygen comments and file boilerplate in line with this move, but let me know if I missed anything. With this in place, the next step to making TTI work with the new pass manager is to introduce a really simple new-style analysis that produces a TTI object via a callback into this routine on the target machine. Once we have that, we'll have the building blocks necessary to accept a function argument as well. llvm-svn: 227685	2015-01-31 11:17:59 +00:00
Chandler Carruth	705b185f90	[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669	2015-01-31 03:43:40 +00:00
Elena Demikhovsky	a3232f764e	Implemented cost model for masked load/store operations. llvm-svn: 227035	2015-01-25 08:44:46 +00:00
Elena Demikhovsky	fb81b93e17	Masked Load/Store - Changed the order of parameters in intrinsics. No functional changes. The documentation is coming. llvm-svn: 224829	2014-12-25 07:49:20 +00:00
Elena Demikhovsky	3fcafa2cdb	Loop Vectorizer minor changes in the code - some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218	2014-12-14 09:43:50 +00:00
Elena Demikhovsky	f1de34b84d	Masked Load / Store Intrinsics - the CodeGen part. I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348	2014-12-04 09:40:44 +00:00
Michael Liao	5bf9578ce4	[X86] Clean up whitespace as well as minor coding style llvm-svn: 223339	2014-12-04 05:20:33 +00:00
Duncan P. N. Exon Smith	9bc81fbe92	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936	2014-11-28 21:29:14 +00:00
Craig Topper	8c5128bf1b	Add missing override keywords. llvm-svn: 222634	2014-11-23 09:40:13 +00:00
Elena Demikhovsky	9e5089a938	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
Elena Demikhovsky	d5e95b57e0	AVX-512: SINT_TO_FP cost model and some bugfixes Checked some corner cases, for example translation of <8 x i1> to <8 x double> llvm-svn: 221883	2014-11-13 11:46:16 +00:00
Quentin Colombet	360460ba64	[X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if AVX2 is available. According to IACA, the new lowering has a throughput of 8 cycles instead of 13 with the previous one. Althought this lowering kicks in some SPECs benchmarks, the performance improvement was within the noise. Correctness testing has been done for the whole range of uint32_t with the following program: uint4 v = (uint4) {0,1,2,3}; uint32_t i; //Check correctness over entire range for uint4 -> float4 conversion for( i = 0; i < 1U << (32-2); i++ ) { float4 t = test(v); float4 c = correct(v); if( 0xf != _mm_movemask_ps( t == c )) { printf( "Error @ %vx: %vf vs. %vf\n", v, c, t); return -1; } v += 4; } Where "correct" is the old lowering and "test" the new one. The patch adds a test case for the two custom lowering instruction. It also modifies the vector cost model, which is why cast.ll and uitofp.ll are modified. 2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7 instructions instead of 4 (3 more constant loads). rdar://problem/18153096> llvm-svn: 221657	2014-11-11 02:23:47 +00:00
Elena Demikhovsky	27012478d2	AVX-512: added cost for some AVX-512 instructions llvm-svn: 217863	2014-09-16 07:57:37 +00:00
Sanjay Patel	b653de1ada	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Karthik Bhat	7f33ff7dea	Allow vectorization of division by uniform power of 2. This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371	2014-08-25 04:56:54 +00:00
Eric Christopher	d913448b38	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. llvm-svn: 214781	2014-08-04 21:25:23 +00:00
Adam Nemet	2820a5b9e9	[X86] AVX512: Enable it in the Loop Vectorizer This lets us experiment with 512-bit vectorization without passing force-vector-width manually. The code generated for a simple integer memset loop is properly vectorized. Disassembly is still broken for it though :(. llvm-svn: 212634	2014-07-09 18:22:33 +00:00

... 2 3 4 5 6 ...

417 Commits