llvm-project

Commit Graph

Author	SHA1	Message	Date
Vasileios Porpodas	55ce296d6f	[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost` Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle. Differential Revision: https://reviews.llvm.org/D124202	2022-04-26 11:11:29 -07:00
Vasileios Porpodas	957ada4164	[AArch64][NFC] Deleted llvm/test/Analysis/CostModel/AArch64/splat-load.ll test This test is no longer necessary as it is a subset of: llvm/test/Analysis/CostModel/AArch64/shuffle-load.ll Differential Revision: https://reviews.llvm.org/D124456	2022-04-26 10:22:11 -07:00
David Green	1159984802	[CostModel] Add fptoi_sat costmodel tests. NFC	2022-04-25 18:44:35 +01:00
Florian Hahn	0a5db8912c	[MemorySSA] Use -simple-loop-unswitch instead of -loop-unswitch in test.	2022-04-25 09:22:52 +01:00
Florian Hahn	cd81ecba2c	[MemorySSA] Generate check lines for test. This is to ensure we produce the same code when switching to SimpleLoopUnswitch.	2022-04-25 09:02:42 +01:00
David Green	091c2f953d	[AArch64] Add some splat of load cost model tests. NFC They do not work yet, but we can hopefully adjust the cost for them to get them to be recognized	2022-04-22 09:38:06 +01:00
Vasileios Porpodas	e83ad23daf	[TTI] Pre-commit cost model tests splat-loads.	2022-04-21 14:45:51 -07:00
Andrew Litteken	3de29ad209	[IRSim] Ignore debug instructions when creating canonical numbering When constructing canonical relationships between two regions, the first instruction of a basic block from the first region is used to find the corresponding basic block from the second region. However, debug instructions are not included in similarity matching, and therefore do not have a canonical numbering. This patch makes sure to ignore the debug instructions when finding the first instruction in a basic block. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D123903	2022-04-19 13:18:28 -05:00
Roman Lebedev	be5c15c7ae	[NFC][Costmodel][LV][X86] Refresh one or two interleaved load/store tests	2022-04-15 17:43:18 +03:00
Congzhe Cao	557b131c88	[DA] Refactor with a better API Refactor from iteratively using BitCastInst::getOperand() to using stripPointerCasts() instead. This is an improvement since now we are able to analyze more cases, please refer to test cases added in this patch. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D123559	2022-04-13 14:51:48 -04:00
Johannes Doerfert	9dc7da3f9c	[GlobalsModRef][FIX] Ensure we honor synchronizing effects of intrinsics This is a long standing problem that resurfaces once in a while [0]. There might actually be two problems because I'm not 100% sure if the issue underlying https://reviews.llvm.org/D115302 would be solved by this or not. Anyway. In 2008 we thought intrinsics do not read/write globals passed to them: `d4133ac315` This is not correct given that intrinsics can synchronize threads and cause effects to effectively become visible. NOTE: I did not yet modify any tests but only tried out the reproducer of https://github.com/llvm/llvm-project/issues/54851. Fixes: https://github.com/llvm/llvm-project/issues/54851 [0] https://discourse.llvm.org/t/bug-gvn-memdep-bug-in-the-presence-of-intrinsics/59402 Differential Revision: https://reviews.llvm.org/D123531	2022-04-12 16:42:50 -05:00
Nikita Popov	2121dc5b15	[llvm-lto] Remove support for legacy pass manager This removes support for the legacy pass manager in llvm-lto and llvm-lto2. In this case I've dropped the use-new-pm option entirely, as I don't think this is considered part of the public interface. This also makes -debug-pass-manager work with llvm-lto, because that was needed to migrate some tests to NewPM. Differential Revision: https://reviews.llvm.org/D123376	2022-04-11 09:40:17 +02:00
LiaoChunyu	505fce5a9e	[RISCV] Add basic code modeling for llvm.experimental.stepvector intrinsic Scalable vectors llvm.experimental.stepvector intrinsic will crash due to an invalid cost when run the code through the loopunroll. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D122782	2022-04-11 10:19:23 +08:00
Florian Hahn	3c14836093	[LAA] Add test with simpler load of pointer select. Add a simpler test for D114487/D108699.	2022-04-10 23:54:41 +02:00
David Green	fa784f6382	[AArch64] Insert subvector costs An insert subvector under aarch64 can often be done as a single lane mov operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so long as the index is a multiple of 4. This teaches the cost model that, using code copied over from the X86 backend. Some of the costs (v16i16_4_0) are still high because they get matched as a SK_Select, not an SK_InsertSubvector. D120879 has some codegen tests for inserting subvectors, which I were added as llvm/test/CodeGen/AArch64/insert-subvector.ll. Differential Revision: https://reviews.llvm.org/D120880	2022-04-07 19:27:41 +01:00
David Green	750bf3582a	[AArch64] Increase cost of v2i64 multiplies The cost of a v2i64 multiply was special cased in D92208 as scalarized into 4extract + 2insert + 2*mul. Scalarizing to/from gpr registers are expensive though, and the cost wasn't high enough to prevent vectorizing in places where it can be detrimental for performance. This increases it so that the costs of copying to/from GPRs is increased to 2 each, with the total cost increasing to 14. So long as umull/smull are handled correctly (as in D123006) this seems to lead to better vectorization factors and better performance. Differential Revision: https://reviews.llvm.org/D123007	2022-04-04 17:42:20 +01:00
David Green	2abaa027d9	[AArch64] Teach the costmodel about widening muls A vector mul(sext, sext) or mul(zext, zext) will be code generated as a single smull or umull instruction. This most notably effects v2i64 multiplies, which are otherwise not legal and need to be expanded. The oneuse check has also been slightly changed, as it is already checked from the use of isWideningInstruction in getCastInstrCost. Differential Revision: https://reviews.llvm.org/D123006	2022-04-04 12:45:04 +01:00
David Green	2e2f38a1ac	[AArch64] Add widening arithmetic cost tests. NFC	2022-04-04 12:19:45 +01:00
Dávid Bolvanský	fb65aaf0be	[NFCI] Fixed missing colon in CHECK directives - part 2	2022-04-03 14:42:59 +02:00
Dávid Bolvanský	f02a0a69af	[NFCI] Fixed missing colon in CHECK directives	2022-04-03 11:52:38 +02:00
Simon Pilgrim	d663166acb	[CostModel][X86] Reduce cost of v2i64 icmp base cost on SSE2 targets Based off the script from D103695, we were exaggerating the cost of the v2i64 comparison expansion using instruction count instead of effective throughput	2022-03-30 09:11:55 +01:00
Johannes Doerfert	a81fff8afd	Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `c5f789050d` and reapplies `7aea3ea8c3` with additional test changes.	2022-03-25 09:36:50 -05:00
Arthur Eubanks	d051c566cd	[test] Remove the last couple uses of -analyze in llvm/test	2022-03-23 11:31:12 -07:00
David Green	c56dd20f69	[AArch64] Add extra insert subvector cost model tests. NFC	2022-03-22 12:20:19 +00:00
Yeting Kuo	ecd7a0132a	[RISCV] Add basic cost model for vector casting To perform the cost model of vector casting, the patch consider most vector casts as their scalar form and consider those vector form of free scalr castings as 1. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121771	2022-03-22 14:17:08 +08:00
Simon Pilgrim	5dde9c1286	[CostModel][X86] Reduce cost of extracting bool vector elements For constant indices, these are now just a MOVMSK+TEST/BT	2022-03-18 19:02:47 +00:00
Florian Hahn	1b7ef6aac8	[BasicAA] Account for wrapping when using abs(VarIndex) >= abs(Scale). The patch adds an extra check to only set MinAbsVarIndex if abs(V * Scale) won't wrap. In the absence of IsNSW, try to use the bitwidths of the original V and Scale to rule out wrapping. Attempt to model https://alive2.llvm.org/ce/z/HE8ZKj The code in the else if below probably needs the same treatment, but I need to come up with a test first. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D121695	2022-03-18 14:41:15 +00:00
Simon Pilgrim	4455c5cdea	[CostModel][X86] Update RUN -passes=* to double quotes to appease update scripts on windows	2022-03-18 11:44:18 +00:00
Craig Topper	bbd2ecf9f0	[RISCV] Add +experimental-zvfh extension to cover half types in vectors. Currently we allow half types in vectors if the scalar Zfh extension is enabled. This behavior is not inline with the vector spec. For f32 and f64 types, the Zve32f, Zve64f, Zve64d, and V explicitly control the availablity of floating point types in vectors. In order to make our compiler compliant, we either need to remove all support for half in vectors or we need an extension to control it. Draft spec here https://github.com/riscv/riscv-v-spec/pull/780 Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D121345	2022-03-17 10:04:02 -07:00
Florian Hahn	e5822ded56	[FunctionAttrs] Infer argmemonly . This patch adds initial argmemonly inference, by checking the underlying objects of locations returned by MemoryLocation. I think this should cover most cases, except function calls to other argmemonly functions. I'm not sure if there's a reason why we don't infer those yet. Additional argmemonly can improve codegen in some cases. It also makes it easier to come up with a C reproducer for `7662d1687b` (already fixed, but I'm trying to see if C/C++ fuzzing could help to uncover similar issues.) Compile-time impact: NewPM-O3: +0.01% NewPM-ReleaseThinLTO: +0.03% NewPM-ReleaseLTO+g: +0.05% https://llvm-compile-time-tracker.com/compare.php?from=067c035012fc061ad6378458774ac2df117283c6&to=fe209d4aab5b593bd62d18c0876732ddcca1614d&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121415	2022-03-16 10:24:33 +00:00
Nikita Popov	57d57b1afd	[AAEval] Make compatible with opaque pointers With opaque pointers, we cannot use the pointer element type to determine the LocationSize for the AA query. Instead, -aa-eval tests are now required to have an explicit load or store for any pointer they want to compute alias results for, and the load/store types are used to determine the location size. This may affect ordering of results, and sorting within one result, as the type is not considered part of the sorted string anymore. To somewhat minimize the churn, printing still uses faux typed pointer notation.	2022-03-16 10:02:11 +01:00
Florian Hahn	a9772a7148	[BasicAA] Add test showing incorrect noalias result with wrapping. @mul_may_overflow_var_nonzero_minabsvarindex_one_index shows BasicAA incorrectly determining noalias for (%gep.917, i8* %gep.idx). If %v == 10581764700698480926, %idx == 917 and the GEPs alias. https://alive2.llvm.org/ce/z/yzDgnn	2022-03-15 12:32:07 +00:00
Nikita Popov	04b717c423	[TLI] Check that malloc argument has type size_t DSE assumes that this is the case when forming a calloc from a malloc + memset pair. For tests, either update the malloc signature or change the data layout.	2022-03-14 17:22:24 +01:00
David Sherwood	e7b89c2fc3	Add BasicTTIImpl cost model for llvm.get.active.lane.mask intrinsic The vectoriser sometimes generates predicated vector loops using the llvm.get.active.lane.mask intrinsic so it's important that we are able to calculate a valid cost for the call instruction. When SVE is enabled we are able to use a single whilelo instruction for some vector types - in such cases I've marked the cost as 1. For all other cases I've set the cost according to how the intrinsic will be expanded. Tests added here: Analysis/CostModel/AArch64/sve-intrinsics.ll Analysis/CostModel/ARM/active_lane_mask.ll Analysis/CostModel/RISCV/active_lane_mask.ll Differential Revision: https://reviews.llvm.org/D121109	2022-03-14 09:35:05 +00:00
Yeting Kuo	ae7c6647f3	[RISCV] Add basic code modeling for fixed length vector reduction. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121447	2022-03-14 11:04:31 +08:00
Florian Hahn	aa590e5823	[AArch64] Improve costs for some conversions to fp16. Currently the cost model under-estimates the cost of certain FP16 conversions. This patch updates getCastInstrCost to return more accurate costs for the cases improved in `c2ed9fd054`. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D113700	2022-03-11 10:27:39 +00:00
Florian Hahn	697f55e368	[AArch64] Move fp16 cast tests. Move FP16 tests to fp16cast function, as suggested in D113700.	2022-03-10 12:22:06 +00:00
Arthur Eubanks	16823adf2a	[test] Modify some tests to remove implicit -basic-aa in legacy PM RUN lines	2022-03-08 14:35:06 -08:00
Arthur Eubanks	b81d5baa0f	[test] Use new PM for -aa-eval tests	2022-03-08 14:15:53 -08:00
Roman Lebedev	2f80ea7f4f	[NFC][LV] Use different braces in debug output The analysis passes output function name encapsulated in `'` braces, but LV uses `"`. Harmonizing this may help in creating an update script for the LV costmodel test checks. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D121105	2022-03-07 19:32:37 +03:00
David Green	43b638241a	[AArch64] Use NPM for cost model tests. NFC As per the other tests, this switches the run lines back to using the NPM via -passes='print<cost-model>' -cost-kind=throughput 2>&1 -disable-output	2022-03-07 08:57:50 +00:00
Arthur Eubanks	f909aed671	Revert "[SCEV] Infer ranges for SCC consisting of cycled Phis" This reverts commit `fc539b0004`. Causes miscompiles, see D110620.	2022-03-04 19:52:44 -08:00
Alex Tsao	89f15fc687	[RISCV] Add cost modelling for masked memory op The patch adds very basic cost model for masked memory op on scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D117884	2022-03-03 20:47:58 +08:00
David Green	47f4cd9c3d	[AArch64] Update costs for some fp16 converts This updates the costs for FP16 converts, as some of them were pretty high. Differential Revision: https://reviews.llvm.org/D120771	2022-03-03 11:17:24 +00:00
David Green	65c0e45a37	[AArch64] Vector shifts cost 1 The costs of vector shifts was 2 as opposed to 1, as the nodes are marked custom. Fix this like the others and mark the nodes as cheap. Differential Revision: https://reviews.llvm.org/D120773	2022-03-03 10:42:57 +00:00
David Green	97e0366d67	[AArch64] Add some fp16 conversion cost tests. NFC	2022-03-02 18:07:14 +00:00
Nikita Popov	98cfcae4e9	Revert "[RISCV] Add cost modelling for masked memory op" This reverts commit `76f243b53b`. The newly added test fails.	2022-03-02 17:32:10 +01:00
Alex Tsao	76f243b53b	[RISCV] Add cost modelling for masked memory op The patch adds very basic cost model for masked memory op on scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D117884	2022-03-02 22:48:41 +08:00
David Green	02de975259	[AArch64] Add some tests for the cost of extending an extract. NFC	2022-03-02 14:47:32 +00:00
David Green	62c2b070d5	[AArch64] Add simple arithmetic cost model test. NFC	2022-03-01 23:31:02 +00:00
David Green	2e7c35ea12	[AArch64] Cleanup and extend cast costs. NFC	2022-02-26 17:59:02 +00:00
David Green	5fe8307b70	[AArch64] Add scalar min/max costs. NFC The vector costs were already added, this adds scalar variants to complete the test coverage.	2022-02-25 17:11:24 +00:00
Max Kazantsev	fc539b0004	[SCEV] Infer ranges for SCC consisting of cycled Phis Our current strategy of computing ranges of SCEVUnknown Phis was to simply compute the union of ranges of all its inputs. In order to avoid infinite recursion, we mark Phis as pending and conservatively return full set for them. As result, even simplest patterns of cycled phis always have a range of full set. This patch makes this logic a bit smarter. We basically do the same, but instead of taking inputs of single Phi we find its strongly connected component (SCC) and compute the union of all inputs that come into this SCC from outside. Processing entire SCC together has one more advantage: we can set range for all of them at once, because the only thing that happens to them is the same value is being passed between those Phis. So, despite we spend more time analyzing a single Phi, overall we may save time by not processing other SCC members, so amortized compile time spent should be approximately the same. Differential Revision: https://reviews.llvm.org/D110620 Reviewed By: reames	2022-02-17 18:03:52 +07:00
Pavel Kosov	37fa99eda0	[SchedModels][CortexA55] Add ASIMD integer instructions Depends on D114642 Original review https://reviews.llvm.org/D112201 OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D117003	2022-02-17 13:41:57 +03:00
Serguei Katkov	194899caef	[MemoryDependency] Relax the re-ordering of atomic store and unordered load/store Atomic store with Release semantic allows re-ordering of unordered load/store before the store. Implement it. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D119844	2022-02-17 10:53:25 +07:00
Roman Lebedev	ae48af582b	[NFC][SCEV] Recognize umin_seq when operand is zext'ed in zero-check zext(umin(x,y)) == umin(zext(x),zext(y)) zext(x) == 0 -> x == 0 While it is not a very likely scenario, we probably should not expect that instcombine already dropped such a redundant zext, but handle directly. Moreover, perhaps there was no ZExtInst, and SCEV somehow managed to pull out said zext out of the SCEV expression.	2022-02-16 22:16:02 +03:00
Roman Lebedev	3c7d48ed90	[NFC][SCEV] Recognize umin_seq when operand is zext'ed in umin but not in zero-check zext(umin(x,y)) == umin(zext(x),zext(y)) zext(x) == 0 -> x == 0 Extra leading zeros do not affect the result of comparison with zero, nor do they matter for the unsigned min/max, so we should not be dissuaded when we find a zero-extensions, but instead we should just skip it.	2022-02-16 22:16:02 +03:00
Roman Lebedev	21c6c43e6f	[NFC][SCEV] Add tests for umin_seq recognition with interfering zext's	2022-02-16 22:16:01 +03:00
Philip Reames	b59f135f16	Precommit tests from D119844, expanded with additional coverage	2022-02-16 07:55:43 -08:00
Serguei Katkov	15f1cffb3a	[MemoryDependency] Relax the re-ordering with volatile store. Volatile store does not provide any special rules for reordering with atomics. Usual must alias anaylsis is enough here. This makes the bahavior similar to how volatile load is handled. Reviewers: reames, nikic Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D119818	2022-02-16 10:58:48 +07:00
Serguei Katkov	2e487da3cb	[MemoryDepndency] Add a test for re-ordering with volatile load/store.	2022-02-16 10:27:11 +07:00
Roman Lebedev	65715ac72a	[SCEV] Generalize umin_seq matching Since we don't greedily flatten `umin_seq(a, umin(b, c))` into `umin_seq(a, b, c)`, just looking at the operands of the outer-level `umin` is not sufficient, and we need to recurse into all same-typed `umin`'s.	2022-02-11 21:58:19 +03:00
Roman Lebedev	c234809ff8	[SCEV] Recognize `x == 0 ? 0 : umin_seq(..., x, ...) -> umin_seq(x, umin_seq(...))`	2022-02-11 21:58:19 +03:00
Roman Lebedev	281421693b	[SCEV] Recognize `x == 0 ? 0 : umin(..., x, ...) -> umin_seq(x, umin(...))` That is the canonical expansion for umin_seq, so we really should roundtrip it.	2022-02-11 21:58:19 +03:00
Roman Lebedev	4d0c0e6cc2	[SCEV] `createNodeForSelectOrPHIInstWithICmpInstCond()`: generalize eq handling The current logic was: https://alive2.llvm.org/ce/z/j8muXk but in reality the offset to the Y in the 'true' hand does not need to exist: https://alive2.llvm.org/ce/z/MNQ7DZ https://alive2.llvm.org/ce/z/S2pMQD To catch that, instead of computing the Y's in both hands and checking their equality, compute Y and C, and check that C is 0 or 1.	2022-02-11 21:58:19 +03:00
Roman Lebedev	bfce0ca203	[NFC][SCEV] Add test more tests for umin_seq recognition	2022-02-11 21:58:18 +03:00
Roman Lebedev	93c93fd08f	[NFC][SCEV] Add some tests for select->umax recognition Apparently we didn't have any tests for that codepath?	2022-02-11 21:58:18 +03:00
Roman Lebedev	8df8b488e3	[NFC][SCEV] Autogenerate checklines in a test to simplify further updates	2022-02-11 01:21:45 +03:00
Roman Lebedev	9766a0cca0	[SCEV] Recognize `cond ? i1 0 : i1 y` as `umin_seq ~cond, x` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/N6XwV-	2022-02-10 17:42:55 +03:00
Roman Lebedev	418604fd90	[SCEV] Recognize `cond ? i1 x : i1 1` as `~umin_seq cond, ~x` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/aqe9GK	2022-02-10 17:42:55 +03:00
Roman Lebedev	49d9acc242	[SCEV] Recognize logical `or` as `not umin_seq (not, not)` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/MUfbTL	2022-02-10 17:42:55 +03:00
Roman Lebedev	16bc24e7be	[SCEV] Recognize logical `and` as `umin_seq` By definition, `umin_seq` has the exact same poison stopping properties the original `select` had: https://alive2.llvm.org/ce/z/59KuZZ	2022-02-10 17:42:55 +03:00
Roman Lebedev	73990ff8a7	[SCEV] Recognize binary `xor` as bit-wise `add` https://alive2.llvm.org/ce/z/ULuZxB We could transparently handle wider bitwidths, by effectively casting iN to <N x i1> and performing the `add` bit/element -wise, the expression will be rather large, so let's not do that for now.	2022-02-10 17:42:55 +03:00
Roman Lebedev	503541fa93	[SCEV] Recognize binary `and` as bit-wise `umin` https://alive2.llvm.org/ce/z/aKAr94 We could transparently handle wider bitwidths, by effectively casting iN to <N x i1> and performing the `umin` bit/element -wise, the expression will be rather large, so let's not do that for now.	2022-02-10 17:42:54 +03:00
Roman Lebedev	e7e0834f07	[SCEV] Recognize binary `or` as bit-wise `umax` https://alive2.llvm.org/ce/z/SMEaoc We could transparently handle wider bitwidths, by effectively casting iN to <N x i1> and performing the `umax` bit/element -wise, the expression will be rather large, so let's not do that for now.	2022-02-10 17:42:54 +03:00
Roman Lebedev	0e6e559bf7	[NFC][SCEV] Add some tests with logical operations and whatnot	2022-02-10 17:42:54 +03:00
Arthur Eubanks	7aadf98d2b	[test] Replace `-analyze -divergence` with `-passes='print<divergence>'`	2022-02-09 16:09:14 -08:00
Arthur Eubanks	3ebab227d9	[test] Remove one more unnecessary -analyze RUN line There is a new PM equivalent RUN line.	2022-02-09 16:05:43 -08:00
Arthur Eubanks	f72b76cde5	[test] Replace/remove some 'opt -analyze' RUN lines	2022-02-09 15:49:53 -08:00
Arthur Eubanks	15ba588d6d	[test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>'	2022-02-09 15:42:16 -08:00
David Green	b55d4c2ad8	Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`" This reverts commit `77a0da926c` as we've received multiple reports of this significantly impacting performance, in ways that don't seem to just be target specific cost models going wrong. I would offer some reproducers, but the test changes here seem to be full of them! Reverting for now and hopefully we can remove the "hack" more carefully as we go.	2022-02-09 20:02:54 +00:00
Craig Topper	09629215c2	[RISCV] Add a really basic cost model for SK_Splice. While testing scalable vectors I found that if we generate a vector splice intrinsic and run the code through the loop unroller, we'll crash due to an invalid cost. This adds a basic cost based on the 2 slide instructions used by the lowering in D119303. We probably need to factor LMUL into this, but that's true for arithmetic instructions too. So I've ignored for the moment. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D119316	2022-02-09 11:43:31 -08:00
Arthur Eubanks	ff31020ee6	[OpaquePtr][LoopAccessAnalysis] Support opaque pointers Previously we relied on the pointee type to determine what type we need to do runtime pointer access checks. With opaque pointers, we can access a pointer with more than one type, so now we keep track of all the types we're accessing a pointer's memory with. Also some other minor getPointerElementType() removals. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D119047	2022-02-09 09:11:27 -08:00
Roman Lebedev	77a0da926c	[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. What it essentially does is prevents scalarized vectorization of masked memory operations: ``` // TODO: Cost model for emulated masked load/store is completely // broken. This hack guides the cost model to use an artificially // high enough value to practically disable vectorization with such // operations, except where previously deployed legality hack allowed // using very low cost values. This is to avoid regressions coming simply // from moving "masked load/store" check from legality to cost model. // Masked Load/Gather emulation was previously never allowed. // Limited number of Masked Store/Scatter emulation was allowed. ``` While i don't really understand about what specifically `is completely broken` was talking about, i believe that at least on X86 with AVX2-or-later, this is no longer true. (or at least, i would like to know what is still broken). So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. But since this was added for X86 specifically, let's just instead completely remove this hack. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114779	2022-02-07 16:08:31 +03:00
Arthur Eubanks	34de63c37f	[test] Remove unnecessary require<> in LoopAccessAnalysis tests These function analyses are always available in loop passes.	2022-02-04 18:03:55 -08:00
Nikita Popov	46f9e45ef0	[Statepoint] Update gc.statepoint calls in tests with elementtype (NFC) This updates tests for the LangRef change in D117890.	2022-02-04 14:15:41 +01:00
Malhar Jajoo	778b455dd6	[LAA] Add Memory dependence remarks. Adds new optimization remarks when vectorization fails. More specifically, new remarks are added for following 4 cases: - Backward dependency - Backward dependency that prevents Store-to-load forwarding - Forward dependency that prevents Store-to-load forwarding - Unknown dependency It is important to note that only one of the sources of failures (to vectorize) is reported by the remarks. This source of failure may not be first in program order. A regression test has been added to test the following cases: a) Loop can be vectorized: No optimization remark is emitted b) Loop can not be vectorized: In this case an optimization remark will be emitted for one source of failure. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D108371	2022-02-02 12:07:51 +00:00
Florian Hahn	17ebd68ae6	[AArch64] Fix costs of float vector compare/selects pairs. The current cost-model overestimates the cost of vector compares & selects for ordered floating point compares. This patch fixes that by extending the existing logic for integer predicates. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118256	2022-01-31 10:18:29 +00:00
William S. Moses	99d2582164	[ScalarEvolution] Handle <= and >= in non infinite loops Extend scalar evolution to handle >= and <= if a loop is known to be finite and the induction variable guards the condition. Specifically, with these assumptions lhs <= rhs is equivalent to lhs < rhs + 1 and lhs >= rhs to lhs > rhs -1. In the case of lhs <= rhs, this is true since the only case these are not equivalent is when rhs == unsigned/signed intmax, which would have resulted in an infinite loop. In the case of lhs >= rhs, this is true since the only case these are not equivalent is when rhs == unsigned/signed intmin, which would again have resulted in an infinite loop. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D118090	2022-01-28 17:41:08 -05:00
William S. Moses	0d04c77856	[ScalarEvolution] Mark a loop as finite if in a willreturn function A limited version of (https://reviews.llvm.org/D118090) that only marks a loop as finite if in a willreturn function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118429	2022-01-28 14:17:05 -05:00
Evgeniy Brevnov	d7424939a6	[BasicAA] Add support for memmove intrinsic Currently, basic AA has special support for llvm.memcpy.* intrinsics. This change extends this support for any memory trancsfer opration and in particular llvm.memmove.* intrinsic. Reviewed By: reames, nikic Differential Revision: https://reviews.llvm.org/D117095	2022-01-28 18:19:36 +07:00
Florian Hahn	cb3df1a299	[AArch64] Add vector compare/select tests with UNE predicate. Precommit some additional tests for D118256.	2022-01-27 14:20:40 +00:00
Florian Hahn	e6ebd2c72d	[AArch64] Add float vector compare/select cost-model tests.	2022-01-26 16:27:29 +00:00
Alban Bridonneau	2feddb37b4	Implement correct cost for SVE bitcasts We have some bitcasts which we know will be simplified, so their cost is zero. Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D118019	2022-01-26 14:25:44 +00:00
Evgeniy Brevnov	0e55d4fab0	[AA] Refine ModRefInfo for llvm.memcpy.* in presence of operand bundles Presence of operand bundles changes semantics in respect to ModRef. In particular, spec says: "From the compilers perspective, deoptimization operand bundles make the call sites theyre attached to at least readonly. They read through all of their pointer typed operands (even if theyre not otherwise escaped) and the entire visible heap. Deoptimization operand bundles do not capture their operands except during deoptimization, in which case control will not be returned to the compiled frame". Fix handling of llvm.memcpy.* according to the spec. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D118033	2022-01-25 10:15:23 +07:00
Stanislav Mekhanoshin	bb1fe36977	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721	2022-01-24 11:51:08 -08:00
Evgeniy Brevnov	b4b6d6374e	[NFC] New test case for BasicAA and memcy/memmove with deopt New test checks results of BasicAA for llvm.memcpy./llvm.memmove. intrinsics in presence of deopt bundle. By specification expected result for unrelated global memory should be Ref. Currently this is not the case and will be fixed in upcoming patches. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D118031	2022-01-24 19:53:29 +07:00
eopXD	3cf15af2da	[RISCV] Remove experimental prefix from rvv-related extensions. Extensions affected: +v, +zve, +zvl Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117860	2022-01-22 20:18:40 -08:00
Shao-Ce SUN	a0a76fee0c	[RISCV] update zfh and zfhmin extention to v1.0 `zfh` and `zfhmin` have been ratified, with version 1.0. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117098	2022-01-15 09:21:24 +08:00
Roman Lebedev	8dcba20674	[SCEV] `getSequentialMinMaxExpr()`: relax 2-op umin_seq w/ constant to umin Currently, `computeExitLimitFromCondFromBinOp()` does that directly.	2022-01-14 17:07:48 +03:00
Roman Lebedev	f34742d7c1	[NFC][SCEV] Add test with umin_seq w/ 1op and constant	2022-01-14 17:07:48 +03:00
Roman Lebedev	c86a982d7d	[SCEV] `getSequentialMinMaxExpr()`: rewrite deduplication to be fully recursive Since we don't merge/expand non-sequential umin exprs into umin_seq exprs, we may have umin_seq(umin(umin_seq())) chain, and the innermost umin_seq can have duplicate operands still.	2022-01-14 15:42:26 +03:00
Roman Lebedev	2c0c619541	[NFC][SCEV] Add test where it's the innermost umin_seq that has a duplicate operand	2022-01-14 01:15:45 +03:00
Roman Lebedev	993792bd1a	[SCEV] Don't consider umin_seq scev expr to be more complex that ptrtoint scev expr Let's consider sequential min/max expression family to be more complex than their non-sequential counterparts, preserving internal ordering within them.	2022-01-13 23:59:47 +03:00
Roman Lebedev	f14b575194	[NFC][SCEV] Add test for umin_seq complexity ordering	2022-01-13 23:59:47 +03:00
David Sherwood	ef1ca4d3e9	[AArch64] Fix incorrect use of MVT::getVectorNumElements in AArch64TTIImpl::getVectorInstrCost If we are inserting into or extracting from a scalable vector we do not know the number of elements at runtime, so we can only let the index wrap for fixed-length vectors. Tests added here: Analysis/CostModel/AArch64/sve-insert-extract.ll Differential Revision: https://reviews.llvm.org/D117099	2022-01-13 09:27:14 +00:00
Andrew Litteken	4ff4e7ea30	[CostModel] Use cost of target trunc type when only it is the only use of a non-register sized load The code size cost model for most targets uses the legalization cost for the type of the pointer of a load. If this load is followed directly by a trunc instruction, and is the only use of the result of the load, only one instruction is generated in the target assembly language. This adds a check for this case, and uses the target type of the trunc instruction if so. This did not show any changes in CTMark code size benchmarks. Reviewers: paquette, samparker, dmgreen Differential Revision: https://reviews.llvm.org/D109388	2022-01-12 18:03:50 -06:00
Philip Reames	e838949bee	[GlobalsModRef] Apply indirect-global rule to all globals initialized from noalias calls Extend the existing malloc-family specific optimization to all noalias calls. This allows us to handle allocation wrappers, and removes a dependency on a lib-func check in favor of generic attribute usage. Differential Revision: https://reviews.llvm.org/D116980	2022-01-11 08:44:31 -08:00
Roman Lebedev	5ceb070bbb	[SCEV] `getSequentialMinMaxExpr()`: look into `umin` when deduplicating operands We could just merge all umin into umin_seq, but that is likely a pessimization, so don't do that, but pretend that we did for the purpose of deduplication.	2022-01-11 18:51:57 +03:00
Roman Lebedev	b2be7dcf5b	[NFC][SCEV] More tests with operand-wise redundant operands of umin of umin_seq	2022-01-11 18:51:56 +03:00
Roman Lebedev	138d5c750b	[NFC][SCEV] Add more tests for umin_seq with redundant operands	2022-01-11 17:51:51 +03:00
Roman Lebedev	5e16650792	[SCEV] `getSequentialMinMaxExpr()`: keep only the first instance of an operand Having the same operand more than once doesn't change the outcome here, neither reduction-wise nor poison-wise. We must keep the first instance specifically though.	2022-01-11 16:51:53 +03:00
Roman Lebedev	36075942f9	[SCEV] Add test for umin_seq with duplicate operands	2022-01-11 16:51:52 +03:00
Roman Lebedev	76a0abbc13	[SCEV] Reenable umin_seq support and fix the `computeSCEVAtScope()` This reverts commit `f62f47f5e1`.	2022-01-11 16:03:35 +03:00
Roman Lebedev	e0772cf00f	[NFC][SCEV] Add reproducers for umin_seq crashes As reported in https://reviews.llvm.org/D116766#3233042	2022-01-11 16:03:35 +03:00
Philip Reames	f62f47f5e1	Partial revert of `82fb4f4` Two crashes have been reported. This change disables the new logic while leaving the new node in tree. Hopefully, that's enough to allow investigation without breakage while avoiding massive churn.	2022-01-10 18:18:34 -08:00
Philip Reames	ed7ae1af72	Add coverage of GlobalsModRef's indirect global case	2022-01-10 15:54:26 -08:00
Roman Lebedev	82fb4f4b22	[SCEV] Sequential/in-order `UMin` expression As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692, SCEV is forbidden from reasoning about 'backedge taken count' if the branch condition is a poison-safe logical operation, which is conservatively correct, but is severely limiting. Instead, we should have a way to express those poison blocking properties in SCEV expressions. The proposed semantics is: ``` Sequential/in-order min/max SCEV expressions are non-commutative variants of commutative min/max SCEV expressions. If none of their operands are poison, then they are functionally equivalent, otherwise, if the operand that represents the saturation point* of given expression, comes before the first poison operand, then the whole expression is not poison, but is said saturation point. ``` * saturation point - the maximal/minimal possible integer value for the given type The lowering is straight-forward: ``` compare each operand to the saturation point, perform sequential in-order logical-or (poison-safe!) ordered reduction over those checks, and if reduction returned true then return saturation point else return the naive min/max reduction over the operands ``` https://alive2.llvm.org/ce/z/Q7jxvH (2 ops) https://alive2.llvm.org/ce/z/QCRrhk (3 ops) Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97 That allows us to handle the patterns in question. Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D116766	2022-01-10 20:51:26 +03:00
Simon Pilgrim	5eb47961c4	[CostModel][X86] Update ROTL/ROTR vXi8/vXi16 costs on AVX512BW targets Refresh based off recent improvements to codegen and the helper script from D103695	2022-01-10 13:18:25 +00:00
David Green	bc615e436c	[AArch64] Update addo and subo costs Similar to D116732, this adds basic scalar sadd_with_overflow, uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for aarch64, which are usually quite efficiently lowered. Differential Revision: https://reviews.llvm.org/D116734	2022-01-07 16:20:23 +00:00
Florian Hahn	7d9827f5cd	[LoopVersioning] Move loop-versioning test to correct directory. The moved test was incorrectly placed in Analysis/LoopAccessAnalysis as it runs loop-versioning.	2022-01-07 14:35:13 +00:00
Roman Lebedev	6a563e2570	[NFC][SCEV][IndVars] Add more tests for exit count w/ `select` See https://github.com/llvm/llvm-project/issues/53020	2022-01-07 01:30:21 +03:00
David Green	c65270cf96	[AArch64] Add basic umulo and smulo costs This adds some AArch64 specific smul_with_overflow and umul_with_overflow costs, overriding the default costs. The code generation for these mul with overflow intrinsics is usually better than the default expansion on AArch64. The costs come from https://godbolt.org/z/zEzYhMWqo with various types, or llvm/test/CodeGen/AArch64/arm64-xaluo.ll. Differential Revision: https://reviews.llvm.org/D116732	2022-01-06 17:22:47 +00:00
Nikita Popov	f430c1eb64	[Tests] Add elementtype attribute to indirect inline asm operands (NFC) This updates LLVM tests for D116531 by adding elementtype attributes to operands that correspond to indirect asm constraints.	2022-01-06 14:23:51 +01:00
Daniil Suchkov	524abc68f2	Introduce NewPM .dot printers for DomTree This patch adds a couple of NewPM function passes (dot-dom and dot-dom-only) that dump DomTree into .dot files. Reviewed-By: aeubanks Differential Revision: https://reviews.llvm.org/D116629	2022-01-05 23:25:40 +00:00
Nico Weber	085f078307	Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."" This reverts commit `859ebca744`. The change contained many unrelated changes and e.g. restored unit test failes for the old lld port.	2022-01-05 13:10:25 -05:00
David Salinas	859ebca744	Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit `640beb38e7`. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ibf8e397df94001f248fba609f072088a46abae08 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D115960 Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105	2022-01-05 17:57:32 +00:00
Jun Ma	80e56ad9ae	[TTI] Return invalid cost for scalable vector in getShuffleCost Differential Revision: https://reviews.llvm.org/D116362	2022-01-05 18:59:11 +08:00
Philip Reames	b061d86c69	[SCEV] Compute exit count from overflow check expressed w/ x.with.overflow intrinsics This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them. Motivation comes from compiling a simple example with -ftrapv. Differential Revision: https://reviews.llvm.org/D116499	2022-01-04 09:44:23 -08:00
Philip Reames	e18157c26b	Add extra test for D116499 requested in review	2022-01-04 09:44:23 -08:00
Philip Reames	0b09313cd5	[funcattrs] Infer writeonly argument attribute [part 2] This builds on the code from D114963, and extends it to handle calls both direct and indirect. With the revised code structure (from series of previously landed NFCs), this is pretty straight forward. One thing to note is that we can not infer writeonly for arguments which might be captured. If the pointer can be read back by the caller, and then read through, we have no way to track that. This is the same restriction we have for readonly, except that we get no mileage out of the "callee can be readonly" exception since a writeonly param on a readonly function is either a) readnone or b) UB. This means we can't actually infer much unless nocapture has already been inferred. Differential Revision: https://reviews.llvm.org/D115003	2022-01-04 09:07:54 -08:00
Florian Hahn	d8276208be	[LAA] Remove overeager assertion for aggregate types. `0a00d64` turned an early exit here into an assertion, but the assertion can be triggered, as PR52920 shows. The later code is agnostic to the accessed type, so just drop the assert. The patch also adds tests for LAA directly and loop-load-elimination to show the behavior is sane.	2022-01-04 15:20:35 +00:00
Philip Reames	65035e0d06	Precommit SCEV symbolic w.overflow exit tests	2022-01-02 11:43:31 -08:00
Philip Reames	f28c8e46c9	Autogen a SCEV test for ease of update	2022-01-02 11:30:29 -08:00
Daniil Fukalov	a2120f6b44	[NFC][AMDGPU][CostModel] Add tests for AMDGPU cost model, part 2.	2021-12-22 22:33:57 +03:00
Daniil Fukalov	deaedab14a	[NFC][AMDGPU][CostModel] Add tests for AMDGPU cost model.	2021-12-22 22:32:09 +03:00
Ricky Zhou	9927a06f74	[AA] Handle callbr instructions in alias analysis Before this change, AAResults::getModRefInfo() was missing a case for callbr instructions (asm goto), which may read/write memory. In PR52735, this led to a miscompile where a load was incorrect eliminated. Add this missing case, as well as an assert verifying that all memory-accessing instructions are handled properly. Fixes #52735. Differential Revision: https://reviews.llvm.org/D115992	2021-12-18 18:49:17 +01:00
Matthew Devereau	e00f22c1b1	[AArch64][SVE] Teach cost model that masked loads/stores are cheap Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.	2021-12-17 15:04:45 +00:00
Florian Hahn	f5f421e0ee	[SCEV] Apply loop guards in reverse order. This patch updates applyLoopGuards to first collect all conditions and then applies them in reverse order. This ensures the SCEVs with the shortest dependency chains are constructed first, limiting the required stack size. This fixes a crash reported in D113578. Note that the order conditions are applied can impact the accuracy of the result, mostly due to missing min/max simplifications when constructing SCEVs. The changed test highlights the impact of the evaluation order. I will follow up with a SCEV patch to improve min/max simplifications to get the same results for both orders.	2021-12-16 10:52:37 +00:00
Florian Hahn	eea568927b	[SCEV] Add test where result depends on order loop guards are applied. This patch adds 2 test cases where we fail to determine a tight bound on the backedge taken count because the ULT condition is applied before the signed conditions. The order the conditions are applied impacts which min/max folds are applied.	2021-12-15 19:10:28 +00:00
Alexandros Lamprineas	61bb8b5d40	[AArch64] Convert sra(X, elt_size(X)-1) to cmlt(X, 0) CMLT has twice the execution throughput of SSHR on Arm out-of-order cores. Differential Revision: https://reviews.llvm.org/D115457	2021-12-14 16:03:02 +00:00
Daniil Fukalov	e5c64b45be	[CostModel][AMDGPU] Fix intrinsics costs estimations. 1. Fixed costs inconsistency for llvm.fma.vXf16 instinsiscs. 2. Added tests for llvm.sadd.sat, llvm.ssub.sat, llvm.uadd.sat, llvm.usub.sat intrisics since they have special processing in cost model. 3. Minor intrisics' costs tests updat and refinement. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115385	2021-12-13 17:17:34 +03:00
Sameer Sahasrabuddhe	1d0244aed7	Reapply CycleInfo: Introduce cycles as a generalization of loops Reverts `02940d6d22`. Fixes breakage in the modules build. LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-10 14:36:43 +05:30
David Sherwood	8b0448ce5d	[AArch64][Analysis] Add on overhead costs for SVE gathers and scatters This patch adds on an overhead cost for gathers and scatters, which is a rough estimate based on performance investigations I have performed on SVE hardware for various micro-benchmarks. Differential Revision: https://reviews.llvm.org/D115143	2021-12-09 16:02:59 +00:00
Florian Hahn	3c55acc4a6	[MemoryLocation] Support memset_pattern{4,8} in getForArgument. memset_pattern{4,8} behave as memset_pattern16, with the only difference being the size of the pattern location. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114905	2021-12-08 19:39:45 +00:00
Jolanta Jensen	77b2bb5567	[LAA] Use type sizes when determining dependence. In the isDependence function the code does not try hard enough to determine the dependence between types. If the types are different it simply gives up, whereas in fact what we really care about are the type sizes. I've changed the code to compare sizes instead of types. Reviewed By: fhahn, sdesmalen Differential Revision: https://reviews.llvm.org/D108763	2021-12-08 15:00:58 +00:00
Haohai Wen	d2c093e79d	[CostModel][X86] Add i64 mul cost for avx512 as 1cy i64 mul cost is 1cy for all cpu that support avx512. Currently all X86 cpu uses i64 mul cost in X64 cost table which is not true for cpu that support avx512 (skx, icx). Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D115016	2021-12-08 11:29:08 +08:00
Jonas Devlieghere	02940d6d22	Revert "CycleInfo: Introduce cycles as a generalization of loops" This reverts commit `0fe61ecc2c` because it breaks the modules build. https://green.lab.llvm.org/green/job/clang-stage2-rthinlto/4858/ https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39112/	2021-12-07 13:06:34 -08:00
Cullen Rhodes	698584f89b	[IR] Remove unbounded as possible value for vscale_range minimum The default for min is changed to 1. The behaviour of -mvscale-{min,max} in Clang is also changed such that 16 is the max vscale when targeting SVE and no max is specified. Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D113294	2021-12-07 09:52:21 +00:00
Sameer Sahasrabuddhe	0fe61ecc2c	CycleInfo: Introduce cycles as a generalization of loops LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-07 12:02:34 +05:30
Florian Hahn	a9125792b3	[MemoryLocation] Support missing atomic intrinsics in getForArg. getForArgument is missing support for atomic memory transfer intrinsics. In terms of accessed locations they behave like regular memory transfer intrinsics and we already support them as such in getForSource/getForDest.	2021-12-04 22:18:39 +00:00
Florian Hahn	89f0f2771a	[BasicAA] Add atomic mem intrinsic tests.	2021-12-04 15:44:33 +00:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
Nikita Popov	49d040ac97	[SCEV] Fix ValuesAtScopesUsers consistency Fixes verification failure reported at: https://reviews.llvm.org/rGc9f9be0381d1 The issue is that getSCEVAtScope() might compute a result without inserting it in the ValuesAtScopes map in degenerate cases, specifically if the ValuesAtScopes entry is invalidated during the calculation. Arguably we should still insert the result if no existing placeholder is found, but for now just tweak the logic to only update ValuesAtScopesUsers if ValuesAtScopes is updated.	2021-12-03 10:03:10 +01:00
Florian Hahn	829b29b619	[MemoryLocation] strcat/strncat/strcpy read/write after their args. strcpy/strcat/strncat access memory starting from the passed in pointers. Construct memory locations for their args using getAfter. Discussed in D114872. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D114969	2021-12-03 08:48:23 +00:00
Daniil Fukalov	ab05ab59a7	[CostModel][AMDGPU] Fix instructions costs estimation for vector types. 1. Fixed vector instructions costs estimations incosistency - removed different logic for "not simple types" since it biases costs for these types. 2. Fixed legalization penalty for vectors too big for the target: changed from overwrite default legalization cost value estimation to added penalty. 3. Fixed few typos in tests. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114893	2021-12-03 03:08:08 +03:00
Philip Reames	740057d185	[funcattrs] Infer writeonly argument attribute This change extends the current logic for inferring readonly and readnone argument attributes to also infer writeonly. This change is deliberately minimal; there's a couple of areas for follow up. * I left out all call handling and thus any benefit from the SCC walk. When examining the test changes, I realized the existing code is imprecise, and am going to fix that in it's own revision before adding in the writeonly handling. (Mostly because updating the tests is hard when I, the human, can't figure out whether the result is correct.) * I left out handling for storing a value (as opposed to storing to a pointer). This should benefit readonly/readnone as well, and applies to a bunch of other instructions. Seemed worth having as a separate review. Differential Revision: https://reviews.llvm.org/D114963	2021-12-02 13:04:09 -08:00
Florian Hahn	222442ec2d	[BasicAA] Add tests for strcat/strncat/strcpy.	2021-12-02 17:38:07 +00:00
Florian Hahn	639a78a4bf	[MemoryLocation] Support strncpy in getForArgument. The size argument of strncpy can be used as bound for the size of its pointer arguments. strncpy is guaranteed to write N bytes and reads up to N bytes. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D114871	2021-12-02 14:18:05 +00:00
Florian Hahn	9f9e8ba114	[MemoryLocation] Support memset_chk in getForArgument. The size argument for memset_chk is an upper bound for the size of the pointer argument. memset_chk may write less than the specified length, if it exceeds the specified max size and aborts. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114870	2021-12-02 13:45:58 +00:00
Florian Hahn	47616c8855	[BasicAA] Add tests for memset_pattern{4,8,16}. This also removes the existing memset_pattern.ll test, which was relying on GVN. It is also covered by the new test directly.	2021-12-02 11:50:32 +00:00
Florian Hahn	524ad6babb	[BasicAA] Add memset_chk libfunc tests.	2021-12-01 14:15:46 +00:00
Florian Hahn	c6bd63803f	[BasicAA] Add strncpy libfunc tests.	2021-12-01 14:15:40 +00:00
Roman Lebedev	8cd782487f	[X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()` We ask `TTI.getAddressComputationCost()` about the cost of computing vector address, and then multiply it by the vector width. This doesn't make any sense, it implies that we'd do a vector GEP and then scalarize the vector of pointers, but there is no such thing in the vectorized IR, we perform scalar GEP's. This is especially bad on X86, and was effectively prohibiting any scalarized vectorization of gathers/scatters, because `X86TTIImpl::getAddressComputationCost()` says that cost of vector address computation is `10` as compared to `1` for scalar. The computed costs are similar to the ones with D111222+D111220, but we end up without masked memory intrinsics that we'd then have to expand later on, without much luck. (D111363) Differential Revision: https://reviews.llvm.org/D111460	2021-11-30 10:47:56 +03:00
Nikita Popov	77dd579827	[SCEV] Remove incorrect assert Fix assertion failure reported on D113349 by removing the assert. While the produced expression should be equivalent, it may not be strictly the same, e.g. due to lazy nowrap flag updates. Similar to what the main createSCEV() code does, simply retain the old value map entry if one already exists.	2021-11-29 17:09:12 +01:00
Roman Lebedev	7e73c2a66a	[X86][Costmodel] `getInterleavedMemoryOpCostAVX512()`: masked load can not be folded into a shuffle The mask on the shuffle is for the output, not the input. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114697	2021-11-29 18:37:07 +03:00
Roman Lebedev	5e96553608	[NFC][X86][LV][Costmodel] Add most basic test for masked interleaved load	2021-11-29 16:46:19 +03:00
Roman Lebedev	cffe3a084f	[X86][Costmodel] Now that `getReplicationShuffleCost()` is good, update `getInterleavedMemoryOpCostAVX512()` ... to actually ask about i1-elt-wide mask, since that is what will probably be used on AVX512. This unblocks D111460. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114316	2021-11-29 14:41:48 +03:00
Nikita Popov	2b160e95c8	Reland [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency Relative to the previous landing attempt, this introduces an additional flag on forgetMemoizedResults() to not remove SCEVUnknown phis from the value map. The invalidation after BECount calculation wants to leave these alone and skips them in its own use-def walk, but we can still end up invalidating them via forgetMemoizedResults() if there is another IR value with the same SCEV. This is intended as a temporary workaround only, and the need for this should go away once the getBackedgeTakenInfo() invalidation is refactored in the spirit of D114263. ----- This adds validation for consistency of ValueExprMap and ExprValueMap, and fixes identified issues: * Addrec construction directly wrote to ValueExprMap in a few places, without updating ExprValueMap. Add a helper to ensures they stay consistent. The adjustment in forgetSymbolicName() explicitly drops the old value from the map, so that we don't rely on it being overwritten. * forgetMemoizedResultsImpl() was dropping the SCEV from ExprValueMap, but not dropping the corresponding entries from ValueExprMap. Differential Revision: https://reviews.llvm.org/D113349	2021-11-27 12:37:15 +01:00
Zarko Todorovski	7f7dac7126	[NFC][llvm] Inclusive language: reword uses of sanity test and check Part of continuing work to use more inclusive language. Reworded uses of sanity check and sanity test in llvm/test/	2021-11-25 07:21:42 -05:00
Graham Hunter	dee810e117	[NFC][LAA] Precommit tests for forked pointers Precommit for https://reviews.llvm.org/D108699	2021-11-24 16:20:35 +00:00
Peter Waller	787b66eb5f	[LoopAccessAnalysis][SVE] Bail out for scalable vectors The supplied test case, reduced from real world code, crashes with a 'Invalid size request on a scalable vector.' error. Since it's similar in spirit to an existing LAA test, rename the file to generalize it to both. Differential Revision: https://reviews.llvm.org/D114155	2021-11-24 15:52:20 +00:00
Roman Lebedev	cd8d219536	[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 32 bit when have AVX512DQ I believe, this effectively completes `X86TTIImpl::getReplicationShuffleCost()` for AVX512, other than the question of handling plain AVX512F, where we end up with some really ugly "shuffles", but then is there any CPU's that support AVX512, but not AVX512DQ/AVX512BW? Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114315	2021-11-24 17:23:15 +03:00
Florian Mayer	6c06d8e310	[stack-safety] Check SCEV constraints at memory instructions. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D113160	2021-11-23 15:29:23 -08:00
Roman Lebedev	704d92607d	[X86][TTI] Finish costmodel for AVX512BW's VPMOVM2[BW] / VPMOV[BW]2M instructions Apparently my methodology was suboptimal, and not only did miss all the +VL tuples, i also missed some plain tuples. I believe, this adds everything missing. Indeed, these manual costmodels are just not okay long-term. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114334	2021-11-22 14:31:34 +03:00
Roman Lebedev	8d09dd61c3	[X86][TTI] Costmodel for AVX512DQ's VPMOVM2[DQ] / VPMOV[DQ]2M instructions Much like the VPMOVM2[BW] / VPMOV[BW]2M from AVX512BW, these either sign-extent the mask register into a vector, or pack the mask from vector register. Apparently, we didn't even have MCA tests for these, added in rG2f364f6f0d3a2420ca78cbd80abb186657180e05, so i'm just guessing that their perf characteristics are optimal. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114314	2021-11-22 14:31:34 +03:00
Sjoerd Meijer	4d21b64464	[BPI] Look-up tables for non-loop branches. NFC. This adds and uses look-up tables for non-loop branch probabilities, which have have probabilities directly encoded into the tables for the different condition codes. Compared to having this logic inlined in different functions, as it used to be the case, I think this is compacter and thus also easier to check/cross reference. This also adds a test for pointer heuristics that was missing. Differential Revision: https://reviews.llvm.org/D114009	2021-11-22 10:30:42 +00:00
Roman Lebedev	df70cf5e14	[NFC][X86][Costmodel] Actually test +prefer-256-bit in replication-shuffle-related tests :( While -prefer-256-bit indeed becomes complete with D114314, the real-world (the one with +prefer-256-bit) coverage is lacking. Hilarious.	2021-11-21 01:25:49 +03:00
Roman Lebedev	da47a63e03	[NFC][X86][Costmodel] Add AVX512DQ runlines to trunc.ll/extend.ll	2021-11-20 13:55:13 +03:00
Roman Lebedev	049799c311	[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 8 bit when have AVX512BW+AVX512VBMI If in addition to AVX512BW (that provides `{k}<->{i8,i16}` casts and i16 shuffles), we have AVX512VBMI, which provides i8 shuffles, we are in an optimal situation. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114071	2021-11-19 15:58:10 +03:00
Roman Lebedev	a751084bb4	[X86][Costmodel] `trunc v16i8 to v8i1` can appear after legalization, cost is same as for `trunc v8i8 to v8i1` Note that there are many other missing costs, i'm only adding the ones that are queried from `getReplicationShuffleCost()` for the existing (quite exhaustive) test coverage. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114070	2021-11-19 15:57:32 +03:00
Roman Lebedev	a50fdd3fc9	[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 16 bit when have AVX512BW Here we get pretty lucky. AVX512F does not provide any instructions to convert between a `k` vector mask and a vector, but AVX512BW adds `{k}<->nX{i8,i16}`conversions, and just as it happens, with AVX512BW we have a i16 shuffle. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113915	2021-11-19 15:55:41 +03:00
Philip Reames	ea12c2cb9c	[SCEV] Move mustprogress based no-self-wrap logic so it applies to all exit conditions This change moves logic which we'd added specifically for less than tests so that it applies to equalities and greater than tests as well. The basic idea is that if we can show an IV cycles infinitely through the same series on self-wrap, and that the exit condition must be taken to prevent UB, we can conclude that it must be taken before self-wrap and thus infer said flag. The motivation here is simple loops with unsigned induction variables w/non-one steps and inequality tests. A toy example would be: for (unsigned i = 0; i != N; i += 2) { body; } If body contains no side effects, and this is a mustprogress function, we can assume that this must be a finite loop and thus that the exit count is N/2. Differential Revision: https://reviews.llvm.org/D103991	2021-11-18 10:07:44 -08:00
Philip Reames	100df68496	[SCEV] Add test coverage for invertible functions of IVs	2021-11-18 08:56:45 -08:00
Florian Hahn	da9f2ba3b1	[SCEV] Reorder operands checks in collectConditions. The initial two cases require a SCEVConstant as RHS. Pull up the condition to check and swap SCEVConstants from below. Also remove a redundant check & swap if RHS is SCEVUnknown.	2021-11-18 09:36:16 +00:00
Florian Hahn	dd6281c4c1	[SCEV] Add additional guard tests with swapped condition ops.	2021-11-18 09:35:19 +00:00
Philip Reames	0623f52a46	Autogen a test for ease of update	2021-11-17 17:20:57 -08:00
Philip Reames	ad69402f3e	[SCEVAA] Avoid forming malformed pointer diff expressions This solves the same crash as in D104503, but with a different approach. The test case test_non_dom demonstrates a case where scev-aa crashes today. (If exercised either by -eval-aa or -licm.) The basic problem is that SCEV-AA expects to be able to compute a pointer difference between two SCEVs for any two pair of pointers we do an alias query on. For (valid, but out of scope) reasons, we can end up asking whether expressions in different sub-loops can alias each other. This results in a subtraction expression being formed where neither operand dominates the other. The approach this patch takes is to leverage the "defining scope" notion we introduced for flag semantics to detect and disallow the formation of the problematic SCEV. This ends up being relatively straight forward on that new infrastructure. This change does hint that we should probably be verifying a similar property for all SCEVs somewhere, but I'll leave that to a follow on change. Differential Revision: D114112	2021-11-17 12:38:04 -08:00
Florian Hahn	e8b55cf7b7	[SCEV] Apply loop guards when computing max BTC for arbitrary steps. Similar other cases in the current function (e.g. when the step is 1 or -1), applying loop guards can lead to tighter upper bounds for the backedge-taken counts. Fixes PR52464. Reviewed By: reames, nikic Differential Revision: https://reviews.llvm.org/D113578	2021-11-17 11:00:49 +00:00
Roman Lebedev	496ccb543e	[NFC][X86][Costmodel] Improve test coverage for i32->i64 vector *ext	2021-11-17 12:02:50 +03:00
Roman Lebedev	2037ec725f	[X86][Costmodel] `ext v64i1 to v32i16` can appear after legalization, cost is same as for `ext v32i1 to v32i16` Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113914	2021-11-17 12:02:50 +03:00
Roman Lebedev	23b194bf18	[X86][Costmodel] `trunc v32i16 to v64i1` can appear after legalization, cost is same as for `trunc v32i16 to v32i1` Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113913	2021-11-17 12:02:50 +03:00
Philip Reames	8d85e945b2	[SCEV] Canonicalize X - urem X, Y patterns There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version. The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency. The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions. Differential Revision: https://reviews.llvm.org/D114018	2021-11-16 11:59:21 -08:00
Philip Reames	3dd6d5b628	[tests] Add coverage for different forms of X - urem X, Y	2021-11-16 09:26:34 -08:00
Philip Reames	56ae2cfecf	autogen a SCEV test file	2021-11-16 09:26:34 -08:00
Florian Hahn	b7aec4f08e	[SCEV] Support rewriting ZExt expressions with loop guard info. So far, applying loop guard information has been restricted to SCEVUnknown. In a few cases, like PR40961 and PR52464, this leads to SCEV failing to determine tight upper bounds for the backedge taken count. This patch adjusts SCEVLoopGuardRewriter and applyLoopGuards to support re-writing ZExt expressions. This is a first step towards fixing PR40961 and PR52464. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D113577	2021-11-16 11:16:07 +00:00
David Green	309f1e4ac8	[ARM] Add datalayout to costmodel tests. NFC This adds a sensible datalayout to the ARM cost model tests, to prevent the costs reported being incorrect for the size of pointers.	2021-11-16 09:49:42 +00:00
Roman Lebedev	7114c60e8f	[NFC][X86][Costmodel] Improve test coverage for {i8,i16,i32,i64}->i1 vector trunc	2021-11-15 20:46:48 +03:00
Roman Lebedev	949103dc36	[NFC][X86][Costmodel] Improve test coverage for i1->{i8,i16,i32,i64} vector *ext	2021-11-15 20:46:48 +03:00
Roman Lebedev	bc35d5fe2f	[NFC][X86][Costmodel] Add i1 replication shuffle costmodel test coverage	2021-11-15 20:02:52 +03:00

... 2 3 4 5 6 ...

3474 Commits