llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	4629b46bba	[SLPVectorizer] Regenerate test. Missed var name llvm-svn: 290970	2017-01-04 16:01:55 +00:00
Simon Pilgrim	1d5b0377af	Regenerate test. llvm-svn: 290969	2017-01-04 15:52:41 +00:00
Michael Kuperstein	cd7ad7130f	[InstCombine] Canonicalize insert splat sequences into an insert + shuffle This adds a combine that canonicalizes a chain of inserts which broadcasts a value into a single insert + a splat shufflevector. This fixes PR31286. Differential Revision: https://reviews.llvm.org/D27992 llvm-svn: 290641	2016-12-28 00:18:08 +00:00
Alexey Bataev	4160264e30	[TEST] Initial commit of tests for minmax horizontal reductions. llvm-svn: 289817	2016-12-15 13:21:29 +00:00
Matthew Simpson	92ce0230b5	[SLP] Fix sign-extends for type-shrinking This patch ensures the correct minimum bit width during type-shrinking. Previously when type-shrinking, we always sign-extended values back to their original width. However, if we are going to sign-extend, and the sign bit is unknown, we have to increase the minimum bit width by one bit so the sign-extend will fill the upper bits correctly. If the sign bit is known to be zero, we can perform a zero-extend instead. This should fix PR31243. Reference: https://llvm.org/bugs/show_bug.cgi?id=31243 Differential Revision: https://reviews.llvm.org/D27466 llvm-svn: 289470	2016-12-12 21:11:04 +00:00
Sanjoy Das	3336f681e3	[Verifier] Add verification for TBAA metadata Summary: This change adds some verification in the IR verifier around struct path TBAA metadata. Other than some basic sanity checks (e.g. we get constant integers where we expect constant integers), this checks: - That by the time an struct access tuple `(base-type, offset)` is "reduced" to a scalar base type, the offset is `0`. For instance, in C++ you can't start from, say `("struct-a", 16)`, and end up with `("int", 4)` -- by the time the base type is `"int"`, the offset better be zero. In particular, a variant of this invariant is needed for `llvm::getMostGenericTBAA` to be correct. - That there are no cycles in a struct path. - That struct type nodes have their offsets listed in an ascending order. - That when generating the struct access path, you eventually reach the access type listed in the tbaa tag node. Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26438 llvm-svn: 289402	2016-12-11 20:07:15 +00:00
Alexey Bataev	4f0d469d45	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 289043	2016-12-08 11:57:51 +00:00
Simon Pilgrim	e633741c3a	[SLPVectorizer][X86] Tests to show missed buildvector sitofp/fptosi vectorizations e.g. buildvector(sitofp(i32), sitofp(i32), sitofp(i32), sitofp(i32)) --> sitofp(buildvector(i32, i32, i32, i32)) llvm-svn: 288807	2016-12-06 13:29:55 +00:00
Renato Golin	5b8e7ecdb3	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's builtins (twice: once in r288412 and once in r288497). We should investigate this offline. llvm-svn: 288508	2016-12-02 16:56:26 +00:00
Alexey Bataev	e8e94a7176	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288497	2016-12-02 12:20:22 +00:00
Simon Pilgrim	c70d3796fb	[SLPVectorizer][X86] Add tests for vectorization of buildvector of scalar fp-ops (PR6246) llvm-svn: 288492	2016-12-02 10:54:46 +00:00
Artem Belevich	704395a25a	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts r288412 which causes severe compile-time regression. llvm-svn: 288431	2016-12-01 22:52:15 +00:00
Alexey Bataev	2c01af5904	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288412	2016-12-01 20:06:53 +00:00
Alexey Bataev	62af7252f1	[SLP] Fixed cost model for horizontal reduction. Currently when cost of scalar operations is evaluated the vector type is used for scalar operations. Patch fixes this issue and fixes evaluation of the vector operations cost. Several test showed that vector cost model is too optimistic. It allowed vectorization of 8 or less add/fadd operations, though scalar code is faster. Actually, only for 16 or more operations vector code provides better performance. Differential Revision: https://reviews.llvm.org/D26277 llvm-svn: 288398	2016-12-01 18:42:42 +00:00
Alexey Bataev	fc617690ab	[SLP] Additional tests with the cost of vector operations. llvm-svn: 288377	2016-12-01 17:26:54 +00:00
Alexey Bataev	e59a8351d0	Revert "[SLP] Additional tests with the cost of vector operations." This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e. llvm-svn: 288371	2016-12-01 16:45:04 +00:00
Alexey Bataev	2ff768475d	[SLP] Additional tests with the cost of vector operations. llvm-svn: 288369	2016-12-01 16:11:48 +00:00
Alexey Bataev	e951e5eb7b	[SLP] Add a new test for tree vectorization starting from insertelement instruction. llvm-svn: 288148	2016-11-29 15:37:52 +00:00
Alexey Bataev	4fa063ebc9	[SLPVectorizer] Improved support of partial tree vectorization. Currently SLP vectorizer tries to vectorize a binary operation and dies immediately after unsuccessful the first unsuccessfull attempt. Patch tries to improve the situation, trying to vectorize all binary operations of all children nodes in the binop tree. Differential Revision: https://reviews.llvm.org/D25517 llvm-svn: 288115	2016-11-29 08:21:14 +00:00
Mohammad Shahid	2f5cb60b07	[SLP] Add new and update existing lit testfor providing more context to incoming patch for vectorization of jumbled load Change-Id: Ifb9091bb0f84c1937c2c8bd2fc345734f250d2f9 llvm-svn: 287992	2016-11-27 03:35:31 +00:00
Simon Pilgrim	841d7ca463	[X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882	2016-11-24 14:46:55 +00:00
Alexey Bataev	2eaacda53e	[SLP] Add more tests for SLP Vectorizer. llvm-svn: 287801	2016-11-23 20:10:32 +00:00
Simon Pilgrim	4e9b9cbee9	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762	2016-11-23 14:01:18 +00:00
Simon Pilgrim	03cd8f887c	[CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs llvm-svn: 287760	2016-11-23 13:42:09 +00:00
Craig Topper	07f1c15995	[AVX-512] Support FCOPYSIGN for v16f32 and v8f64 Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298	2016-11-18 02:25:34 +00:00
Vyacheslav Klochkov	b3dc774a99	Fixed the lost FastMathFlags for CALL operations in SLPVectorizer. Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26575 llvm-svn: 287064	2016-11-16 00:55:50 +00:00
Vyacheslav Klochkov	f1a12fe0f5	Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer. Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26543 llvm-svn: 286626	2016-11-11 19:55:29 +00:00
Simon Pilgrim	d02c55204b	[VectorLegalizer] Expansion of CTLZ using CTPOP when possible This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233	2016-11-08 14:10:28 +00:00
Alexey Bataev	46c0278e7d	[SLP] Fix for PR30626: Compiler crash inside SLP Vectorizer. After successfull horizontal reduction vectorization attempt for PHI node vectorizer tries to update root binary op by combining vectorized tree and the ReductionPHI node. But during vectorization this ReductionPHI can be vectorized itself and replaced by the `undef` value, while the instruction itself is marked for deletion. This 'marked for deletion' PHI node then can be used in new binary operation, causing "Use still stuck around after Def is destroyed" crash upon PHI node deletion. Also the test is fixed to make it perform actual testing. Differential Revision: https://reviews.llvm.org/D25671 llvm-svn: 285286	2016-10-27 12:02:28 +00:00
Simon Pilgrim	4ddc92b6cd	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459	2016-10-18 07:42:15 +00:00
Simon Pilgrim	cfef627b1f	[SLPVectorizer][X86] Add 512-bit sitofp/uitofp tests llvm-svn: 283756	2016-10-10 14:28:06 +00:00
Simon Pilgrim	2c0733c678	[SLPVectorizer][X86] Add avx512 sitofp/uitofp tests llvm-svn: 283751	2016-10-10 14:14:31 +00:00
Simon Pilgrim	6cadb5610e	[SLPVectorizer][X86] Fixed alignments of scalar loads in sitofp/uitofp tests Fixed copy+paste vector alignment to correct for per-element scalar loads Increased to 512-bit data sizes in preparation of avx512 tests llvm-svn: 283748	2016-10-10 14:10:41 +00:00
Alexey Bataev	6ad5da7c81	[SLPVectorizer] Fix for PR25748: reduction vectorization after loop unrolling. The next code is not vectorized by the SLPVectorizer: ``` int test(unsigned int *p) { int sum = 0; for (int i = 0; i < 8; i++) sum += p[i]; return sum; } ``` During optimization this loop is fully unrolled and SLPVectorizer is unable to vectorize it. Patch tries to fix this problem. Differential Revision: https://reviews.llvm.org/D24796 llvm-svn: 283535	2016-10-07 09:39:22 +00:00
Alexey Bataev	7e217c2402	[SLPVectorizer] Add a test with non-vectorizable IR. llvm-svn: 283225	2016-10-04 15:07:23 +00:00
Sanjay Patel	d27a21874b	[x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433) This should fix: https://llvm.org/bugs/show_bug.cgi?id=30433 There are a couple of open questions about the codegen: 1. Should we let scalar ops be scalars and avoid vector constant loads/splats? 2. Should we have a pass to combine constants such as the inverted pair that we have here? Differential Revision: https://reviews.llvm.org/D25165 llvm-svn: 283119	2016-10-03 16:38:27 +00:00
Simon Pilgrim	04e249b128	[SLPVectorizer][X86] Added fptosi/fptoui tests llvm-svn: 283048	2016-10-01 19:35:59 +00:00
Simon Pilgrim	567c4fbdae	[SLPVectorizer][X86] Added fcopysign tests llvm-svn: 283046	2016-10-01 17:00:26 +00:00
Simon Pilgrim	cceeb2a4fa	[SLPVectorizer][X86] Added fabs tests llvm-svn: 283045	2016-10-01 16:54:01 +00:00
Simon Pilgrim	34032cba14	Rename tests llvm-svn: 281863	2016-09-18 20:25:41 +00:00
Matthew Simpson	df2ab917ad	[SLP] Avoid signed integer overflow The test case included with r279125 exposed an existing signed integer overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together with other costs, such as getReductionCost. This patch removes the possibility of assigning a cost of INT_MAX. Since we were previously using INT_MAX as an indicator for "should not vectorize", we now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable" before computing a cost. This patch adds a run-line to the test case used for r279125 that ensures we don't vectorize. Previously, this line would vectorize the test case by chance due to undefined behavior in the cost calculation. Differential Revision: https://reviews.llvm.org/D23723 llvm-svn: 279562	2016-08-23 20:48:50 +00:00
Matthew Simpson	235e479984	Reapply "[SLP] Initialize VectorizedValue when gathering" The test case included in r279125 exposed existing undefined behavior in the SLP vectorizer that it did not introduce. This patch reapplies the original patch, but modifies the test case to avoid hitting the undefined behavior. This allows us to close PR28330 while keeping the UBSan bot happy. The undefined behavior the original test uncovered will be addressed in a follow-on patch. Reference: https://llvm.org/bugs/show_bug.cgi?id=28330 llvm-svn: 279370	2016-08-20 14:49:02 +00:00
Vitaly Buka	cc7db13bf0	Revert "[SLP] Initialize VectorizedValue when gathering" to fix ubsan bot. This reverts commit r279125. https://reviews.llvm.org/D23410 llvm-svn: 279363	2016-08-20 07:09:39 +00:00
Matthew Simpson	11db6b6b8c	[SLP] Initialize VectorizedValue when gathering We abort building vectorizable trees in some cases (e.g., if the maximum recursion depth is reached, if the region size is too large, etc.). If this happens for a reduction, we can be left with a root entry that needs to be gathered. For these cases, we need make sure we actually set VectorizedValue to the resulting vector. This patch ensures we properly set VectorizedValue, and it also ensures the insertelement sequence generated for the gathers is inserted at the correct location. Reference: https://llvm.org/bugs/show_bug.cgi?id=28330 Differential Revison: https://reviews.llvm.org/D23410 llvm-svn: 279125	2016-08-18 19:50:32 +00:00
Simon Pilgrim	5d5ca9c0cb	[X86][SSE] Add initial costs for vector CTTZ/CTLZ llvm-svn: 277716	2016-08-04 10:51:41 +00:00
Simon Pilgrim	9e201eac32	[SLPVectorizer][X86] Added vXi8/vXi16 sitofp/uitofp tests Dropped useless 2i32-2f32 test llvm-svn: 277281	2016-07-30 21:01:34 +00:00
Simon Pilgrim	f5134a2867	[SLPVectorizer][X86] Added SITOFP/UITOFP vectorization tests llvm-svn: 277275	2016-07-30 18:43:30 +00:00
Michael Kuperstein	38e7298093	[SLPVectorizer] Vectorize reverse-order loads in horizontal reductions When vectorizing a tree rooted at a store bundle, we currently try to sort the stores before building the tree, so that the stores can be vectorized. For other trees, the order of the root bundle - which determines the order of all other bundles - is arbitrary. That is bad, since if a leaf bundle of consecutive loads happens to appear in the wrong order, we will not vectorize it. This is partially mitigated when the root is a binary operator, by trying to build a "reversed" tree when that's considered profitable. This patch extends the workaround we have for binops to trees rooted in a horizontal reduction. This fixes PR28474. Differential Revision: https://reviews.llvm.org/D22554 llvm-svn: 276477	2016-07-22 21:28:48 +00:00
Simon Pilgrim	1b4f511aaa	[X86][SSE] Add cost model values for CTPOP of vectors This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104	2016-07-20 10:41:28 +00:00
Simon Pilgrim	1b2ab113fb	[SLPVectorizer][X86] Added sqrt vectorization tests llvm-svn: 275788	2016-07-18 13:20:54 +00:00
Simon Pilgrim	4ca42e232d	[SLPVectorizer][X86] Added fma vectorization tests llvm-svn: 274889	2016-07-08 17:19:13 +00:00
Elena Demikhovsky	971fbfda1e	Vector GEP test: renamed + some comments Differential revision: http://reviews.llvm.org/D21957 llvm-svn: 274611	2016-07-06 08:11:23 +00:00
Elena Demikhovsky	6f2ec8104a	Fixed crash of SLP Vectorizer on KNL The bug is connected to vector GEPs. https://llvm.org/bugs/show_bug.cgi?id=28313 llvm-svn: 273919	2016-06-27 20:07:00 +00:00
Simon Pilgrim	bc35f9f702	[SLPVectorizer][X86] Added ceil/floor/nearbyint/rint/trunc vectorization tests llvm-svn: 273420	2016-06-22 14:07:46 +00:00
Simon Pilgrim	356e823b51	[X86][SSE] Add cost model for BSWAP of vectors The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization. Differential Revision: http://reviews.llvm.org/D21521 llvm-svn: 273217	2016-06-20 23:08:21 +00:00
Sean Silva	e0a9e66040	[PM] Port SLPVectorizer to the new PM This uses the "runImpl" approach to share code with the old PM. Porting to the new PM meant abandoning the anonymous namespace enclosing most of SLPVectorizer.cpp which is a bit of a bummer (but not a big deal compared to having to pull the pass class into a header which the new PM requires since it calls the constructor directly). llvm-svn: 272766	2016-06-15 08:43:40 +00:00
Simon Pilgrim	3fc09f7be6	[CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets To account for the fast PSHUFB implementation now available llvm-svn: 272484	2016-06-11 19:23:02 +00:00
Michael Zolotukhin	987ab631fa	[SLPVectorizer] Handle GEP with differing constant index types Summary: This fixes PR27617. Bug description: The SLPVectorizer asserts on encountering GEPs with different index types, such as i8 and i64. The patch includes a simple relaxation of the assert to allow constants being of different types, along with a regression test that will provoke the unrelaxed assert. Reviewers: nadav, mzolotukhin Subscribers: JesperAntonsson, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D20685 Patch by Jesper Antonsson! llvm-svn: 272206	2016-06-08 21:55:16 +00:00
Simon Pilgrim	ba319ded5e	[Analysis] Enabled BITREVERSE as a vectorizable intrinsic Allows XOP to vectorize BITREVERSE - other targets will follow as their costmodels improve. llvm-svn: 271803	2016-06-04 20:21:07 +00:00
Guozhi Wei	b994f4cdbc	[SLP] Pass in correct alignment when query memory access cost This patch fixes bug https://llvm.org/bugs/show_bug.cgi?id=27897. When query memory access cost, current SLP always passes in alignment value of 1 (unaligned), so it gets a very high cost of scalar memory access, and wrongly vectorize memory loads in the test case. It can be fixed by simply giving correct alignment. llvm-svn: 271333	2016-05-31 20:41:19 +00:00
Simon Pilgrim	45964c3742	[SLPVectorizer][X86] Regenerated SEXT/ZEXT cast vectorization tests Added 256-bit vector test as well llvm-svn: 268811	2016-05-06 22:22:18 +00:00
Simon Pilgrim	2def0a878a	[SLPVectorizer][X86] Added BSWAP/BITREVERSE vectorization tests llvm-svn: 268803	2016-05-06 21:41:55 +00:00
Simon Pilgrim	a2220ea456	[SLPVectorizer][X86] Added CTPOP/CTLZ/CTTZ vectorization tests llvm-svn: 268800	2016-05-06 21:33:01 +00:00
David Majnemer	13d5526392	[SLPVectorizer] Add operand bundles to vectorized functions SLPVectorizing a call site should result in further propagation of its bundles. llvm-svn: 268004	2016-04-29 07:09:51 +00:00
Arch D. Robison	0e61034018	[SLPVectorizer] Extend SLP Vectorizer to deal with aggregates. The refactoring portion part was done as r267748. http://reviews.llvm.org/D14185 llvm-svn: 267899	2016-04-28 16:11:45 +00:00
Matthew Simpson	e5dfb08fcb	[TTI] Add hook for vector extract with extension This change adds a new hook for estimating the cost of vector extracts followed by zero- and sign-extensions. The motivating example for this change is the SMOV and UMOV instructions on AArch64. These instructions move data from vector to general purpose registers while performing the corresponding extension (sign-extend for SMOV and zero-extend for UMOV) at the same time. For these operations, TargetTransformInfo can assume the extensions are free and only report the cost of the vector extract. The SLP vectorizer has been updated to make use of the new hook. Differential Revision: http://reviews.llvm.org/D18523 llvm-svn: 267725	2016-04-27 15:20:21 +00:00
Adrian Prantl	75819aedf6	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram. Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446	2016-04-15 15:57:41 +00:00
David Majnemer	12fd50410d	[SLPVectorizer] Vectorizing the libm sqrt to llvm's sqrt intrinsic requires nnan To quote the langref "Unlike sqrt in libm, however, llvm.sqrt has undefined behavior for negative numbers other than -0.0 (which allows for better optimization, because there is no need to worry about errno being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sqrt." This means that it's unsafe to replace sqrt with llvm.sqrt unless the call is annotated with nnan. Thanks to Hal Finkel for pointing this out! llvm-svn: 265521	2016-04-06 07:04:53 +00:00
David Majnemer	25d03dbcde	[SLPVectorizer] Vectorize libcalls of sqrt We didn't realize that we could transform the libcall into a vectorized intrinsic. llvm-svn: 265493	2016-04-06 00:14:59 +00:00
David Majnemer	6f1f85f0e1	[SLPVectorizer] Don't insert an extractelement before a catchswitch A catchswitch cannot be preceded by another instruction in the same basic block (other than a PHI node). Instead, insert the extract element right after the materialization of the vectorized value. This isn't optimal but is a reasonable compromise given the constraints of WinEH. This fixes PR27163. llvm-svn: 265157	2016-04-01 17:28:15 +00:00
Adrian Prantl	b8089516a5	testcase gardening: update the emissionKind enum to the new syntax. (NFC) llvm-svn: 265081	2016-04-01 00:16:49 +00:00
Adrian Prantl	b939a25707	Move the DebugEmissionKind enum from DIBuilder into DICompileUnit. This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder into DICompileUnit. DIBuilder is not the right place for this enum to live in — a metadata consumer should not have to include DIBuilder.h. I also added a Verifier check that checks that the emission kind of a DICompileUnit is actually legal. http://reviews.llvm.org/D18612 <rdar://problem/25427165> llvm-svn: 265077	2016-03-31 23:56:58 +00:00
Paul Robinson	51fa0a87c3	Fix tests that used CHECK-NEXT-NOT and CHECK-DAG-NOT. FileCheck actually doesn't support combo suffixes. Differential Revision: http://reviews.llvm.org/D17588 llvm-svn: 262054	2016-02-26 19:40:34 +00:00
Matthew Simpson	92821cb4a8	Reapply commit r259357 with a fix for PR26629 Commit r259357 was reverted because it caused PR26629. We were assuming all roots of a vectorizable tree could be truncated to the same width, which is not the case in general. This commit reapplies the patch along with a fix and a new test case to ensure we don't regress because of this issue again. This should fix PR26629. llvm-svn: 261212	2016-02-18 14:14:40 +00:00
David Majnemer	f48bcb2bd9	Revert "Reapply commit r258404 with fix." This reverts commit r259357, it caused PR26629. llvm-svn: 261137	2016-02-17 19:02:36 +00:00
Matthew Simpson	45dee06177	Add test case missing from r259357 (NFC) llvm-svn: 259385	2016-02-01 19:09:24 +00:00
Matthew Simpson	c578d67407	Reapply commit r258404 with fix. The previous patch caused PR26364. The fix is to ensure that we don't enter a cycle when iterating over use-def chains. llvm-svn: 259357	2016-02-01 13:38:29 +00:00
David Majnemer	b2416bd2a7	Revert "Reapply commit r258404 with fix" This reverts commit r258929, it caused PR26364. llvm-svn: 259148	2016-01-29 02:43:22 +00:00
Matthew Simpson	b95861d35e	Reapply commit r258404 with fix This patch is the second attempt to reapply commit r258404. There was bug in the initial patch and subsequent fix (mentioned below). The initial patch caused an assertion because we were computing smaller type sizes for instructions that cannot be demoted. The fix first determines the instructions that will be demoted, and then applies the smaller type size to only those instructions. This should fix PR26239 and PR26307. llvm-svn: 258929	2016-01-27 13:43:27 +00:00
Matthew Simpson	61d5a18469	Revert "Reapply commit r258404 with fix" This commit exposes a crash in computeKnownBits on the Chromium buildbots. Reverting to investigate. Reference: https://llvm.org/bugs/show_bug.cgi?id=26307 llvm-svn: 258812	2016-01-26 15:45:49 +00:00
Matthew Simpson	cfe5e2c846	Reapply commit r25804 with fix We were hitting an assertion because we were computing smaller type sizes for instructions that cannot be demoted. The fix first determines the instructions that will be demoted, and then applies the smaller type size to only those instructions. This should fix PR26239. llvm-svn: 258705	2016-01-25 19:24:29 +00:00
Matthew Simpson	486bace5cc	Revert "[SLP] Truncate expressions to minimum required bit width" This reverts commit r258404. llvm-svn: 258408	2016-01-21 17:17:20 +00:00
Matthew Simpson	cb17d72170	[SLP] Truncate expressions to minimum required bit width This change attempts to produce vectorized integer expressions in bit widths that are narrower than their scalar counterparts. The need for demotion arises especially on architectures in which the small integer types (e.g., i8 and i16) are not legal for scalar operations but can still be used in vectors. Like similar work done within the loop vectorizer, we rely on InstCombine to perform the actual type-shrinking. We use the DemandedBits analysis and ComputeNumSignBits from ValueTracking to determine the minimum required bit width of an expression. Differential revision: http://reviews.llvm.org/D15815 llvm-svn: 258404	2016-01-21 16:31:55 +00:00
Matthew Simpson	57fe1b10db	Reapply r257800 with fix The fix uniques the bundle of getelementptr indices we are about to vectorize since it's possible for the same index to be used by multiple instructions. The original commit message is below. [SLP] Vectorize the index computations of getelementptr instructions. This patch seeds the SLP vectorizer with getelementptr indices. The primary motivation in doing so is to vectorize gather-like idioms beginning with consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these cases could be vectorized with a top-down phase, seeding the existing bottom-up phase with the index computations avoids the complexity, compile-time, and phase ordering issues associated with a full top-down pass. Only bundles of single-index getelementptrs with non-constant differences are considered for vectorization. llvm-svn: 257918	2016-01-15 18:51:51 +00:00
Matthew Simpson	9258e013a2	Revert "[SLP] Vectorize the index computations of getelementptr instructions." This reverts commit r257800. llvm-svn: 257888	2016-01-15 13:10:46 +00:00
Keno Fischer	81e2e9ef86	Reapply r257105 "[Verifier] Check that debug values have proper size" I originally reapplied this in 257550, but had to revert again due to bot breakage. The only change in this version is to allow either the TypeSize or the TypeAllocSize of the variable to be the one represented in debug info (hopefully in the future we can figure out how to encode the difference). Additionally, several bot failures following r257550, were due to optimizer bugs now fixed in r257787 and r257795. r257550 commit message was: ``` The follow extra changes were made to test cases: Manually making the variable be the actual type instead of a pointer to avoid pointer-size differences in generic code: LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll LLVM :: DebugInfo/Generic/varargs.ll Delete sizing information from debug info for the same reason (but the presence of the pointer was important to the test case): LLVM :: DebugInfo/Generic/restrict.ll LLVM :: DebugInfo/Generic/tu-composite.ll LLVM :: Linker/type-unique-type-array-a.ll LLVM :: Linker/type-unique-simple2.ll Fixing an incorrect DW_OP_deref LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll Fixing a missing DW_OP_deref LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll Additionally, clang should no longer complain during bootstrap should no longer happen after r257534. The original commit message was: `` Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref `` ``` llvm-svn: 257850	2016-01-15 00:46:17 +00:00
Matthew Simpson	791fd160c3	[SLP] Vectorize the index computations of getelementptr instructions. This patch seeds the SLP vectorizer with getelementptr indices. The primary motivation in doing so is to vectorize gather-like idioms beginning with consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these cases could be vectorized with a top-down phase, seeding the existing bottom-up phase with the index computations avoids the complexity, compile-time, and phase ordering issues associated with a full top-down pass. Only bundles of single-index getelementptrs with non-constant differences are considered for vectorization. Differential Revision: http://reviews.llvm.org/D14829 llvm-svn: 257800	2016-01-14 20:46:27 +00:00
Keno Fischer	78e5c9e6e2	Re-Revert r257105 (Verifier debug info changes) While I investigate some new buildbot failures. This was originally reapplied as r257550 and r257558. llvm-svn: 257563	2016-01-13 02:31:14 +00:00
Keno Fischer	25916079ff	Reapply r257105 "[Verifier] Check that debug values have proper size" The follow extra changes were made to test cases: Manually making the variable be the actual type instead of a pointer to avoid pointer-size differences in generic code: LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll LLVM :: DebugInfo/Generic/varargs.ll Delete sizing information from debug info for the same reason (but the presence of the pointer was important to the test case): LLVM :: DebugInfo/Generic/restrict.ll LLVM :: DebugInfo/Generic/tu-composite.ll LLVM :: Linker/type-unique-type-array-a.ll LLVM :: Linker/type-unique-simple2.ll Fixing an incorrect DW_OP_deref LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll Fixing a missing DW_OP_deref LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll Additionally, clang should no longer complain during bootstrap should no longer happen after r257534. The original commit message was: ``` Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref ``` llvm-svn: 257550	2016-01-13 00:31:44 +00:00
Keno Fischer	ea33a25816	Temporarily revert r257105 "[Verifier] Check that debug values have proper size" Looks like there's a case where clang generates debug info that triggers the new verifier check. Reverting while investigating. llvm-svn: 257107	2016-01-07 22:39:11 +00:00
Keno Fischer	b3326be6ad	[Verifier] Check that debug values have proper size Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref Reviewers: aprantl Differential Revision: http://reviews.llvm.org/D14276 llvm-svn: 257105	2016-01-07 22:18:37 +00:00
Charlie Turner	b69b92855d	[NFC] Update horizontal reduction test cases. These testcases no longer need to specify -slp-vectorize-hor, since it was enabled by default in r252733. llvm-svn: 255783	2015-12-16 17:22:24 +00:00
Mehdi Amini	b0e3192a48	Fix SLPVectorizer commutativity reordering The SLPVectorizer had a very crude way of trying to benefit from associativity: it tried to optimize for splat/broadcast or in order to have the same operator on the same side. This is benefitial to the cost model and allows more vectorization to occur. This patch improve the logic and make the detection optimal (locally, we don't look at the full tree but only at the immediate children). Should fix https://llvm.org/bugs/show_bug.cgi?id=25247 Reviewers: mzolotukhin Differential Revision: http://reviews.llvm.org/D13996 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 252337	2015-11-06 20:17:51 +00:00
Peter Collingbourne	d4bff30370	DI: Reverse direction of subprogram -> function edge. Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219	2015-11-05 22:03:56 +00:00
Charlie Turner	ab3215fa11	[SLP] Be more aggressive about reduction width selection. Summary: This change could be way off-piste, I'm looking for any feedback on whether it's an acceptable approach. It never seems to be a problem to gobble up as many reduction values as can be found, and then to attempt to reduce the resulting tree. Some of the workloads I'm looking at have been aggressively unrolled by hand, and by selecting reduction widths that are not constrained by a vector register size, it becomes possible to profitably vectorize. My test case shows such an unrolling which SLP was not vectorizing (on neither ARM nor X86) before this patch, but with it does vectorize. I measure no significant compile time impact of this change when combined with D13949 and D14063. There are also no significant performance regressions on ARM/AArch64 in SPEC or LNT. The more principled approach I thought of was to generate several candidate tree's and use the cost model to pick the cheapest one. That seemed like quite a big design change (the algorithms seem very much one-shot), and would likely be a costly thing for compile time. This seemed to do the job at very little cost, but I'm worried I've misunderstood something! Reviewers: nadav, jmolloy Subscribers: mssimpso, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D14116 llvm-svn: 251428	2015-10-27 17:59:03 +00:00
Charlie Turner	cd6e8cf8c2	[SLP] Try a bit harder to find reduction PHIs Summary: Currently, when the SLP vectorizer considers whether a phi is part of a reduction, it dismisses phi's whose incoming blocks are not the same as the block containing the phi. For the patterns I'm looking at, extending this rule to allow phis whose incoming block is a containing loop latch allows me to vectorize certain workloads. There is no significant compile-time impact, and combined with D13949, no performance improvement measured in ARM/AArch64 in any of SPEC2000, SPEC2006 or LNT. Reviewers: jmolloy, mcrosier, nadav Subscribers: mssimpso, nadav, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D14063 llvm-svn: 251425	2015-10-27 17:54:16 +00:00
Charlie Turner	74c387feb7	[SLP] Treat SelectInsts as reduction values. Summary: Certain workloads, in particular sum-of-absdiff loops, can be vectorized using SLP if it can treat select instructions as reduction values. The test case is a bit awkward. The AArch64 cost model needs some tuning to not be so pessimistic about selects. I've had to tweak the SLP threshold here. Reviewers: jmolloy, mzolotukhin, spatel, nadav Subscribers: nadav, mssimpso, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D13949 llvm-svn: 251424	2015-10-27 17:49:11 +00:00
Michael Zolotukhin	fc783e91e0	[SLP] Don't vectorize loads of non-packed types (like i1, i2). Summary: Given an array of i2 elements, 4 consecutive scalar loads will be lowered to i8-sized loads and thus will access 4 consecutive bytes in memory. If we vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in memory. Hence, we should prohibit vectorization in such cases. PS: Initial patch was proposed by Arnold. Reviewers: aschwaighofer, nadav, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13277 llvm-svn: 248943	2015-09-30 21:05:43 +00:00
Erik Eckstein	91c49810f2	SLPVectorizer: add a test to check if the minimum region size works. This is an addition to rL248917. llvm-svn: 248923	2015-09-30 17:28:19 +00:00
Erik Eckstein	848c1aa452	SLPVectorizer: limit the scheduling region size per basic block. Usually large blocks are not a problem. But if a large block (> 10k instructions) contains many (potential) chains of vector instructions, and those chains are spread over a wide range of instructions, then scheduling becomes a compile time problem. This change introduces a limit for the accumulate scheduling region size of a block. For real-world functions this limit will never be exceeded (it's about 10x larger than the maximum value seen in the test-suite and external test suite). llvm-svn: 248917	2015-09-30 17:00:44 +00:00

1 2 3 4 5 ...

322 Commits