llvm-project

Commit Graph

Author	SHA1	Message	Date
Daniil Fukalov	a2120f6b44	[NFC][AMDGPU][CostModel] Add tests for AMDGPU cost model, part 2.	2021-12-22 22:33:57 +03:00
Daniil Fukalov	deaedab14a	[NFC][AMDGPU][CostModel] Add tests for AMDGPU cost model.	2021-12-22 22:32:09 +03:00
Ricky Zhou	9927a06f74	[AA] Handle callbr instructions in alias analysis Before this change, AAResults::getModRefInfo() was missing a case for callbr instructions (asm goto), which may read/write memory. In PR52735, this led to a miscompile where a load was incorrect eliminated. Add this missing case, as well as an assert verifying that all memory-accessing instructions are handled properly. Fixes #52735. Differential Revision: https://reviews.llvm.org/D115992	2021-12-18 18:49:17 +01:00
Matthew Devereau	e00f22c1b1	[AArch64][SVE] Teach cost model that masked loads/stores are cheap Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.	2021-12-17 15:04:45 +00:00
Florian Hahn	f5f421e0ee	[SCEV] Apply loop guards in reverse order. This patch updates applyLoopGuards to first collect all conditions and then applies them in reverse order. This ensures the SCEVs with the shortest dependency chains are constructed first, limiting the required stack size. This fixes a crash reported in D113578. Note that the order conditions are applied can impact the accuracy of the result, mostly due to missing min/max simplifications when constructing SCEVs. The changed test highlights the impact of the evaluation order. I will follow up with a SCEV patch to improve min/max simplifications to get the same results for both orders.	2021-12-16 10:52:37 +00:00
Florian Hahn	eea568927b	[SCEV] Add test where result depends on order loop guards are applied. This patch adds 2 test cases where we fail to determine a tight bound on the backedge taken count because the ULT condition is applied before the signed conditions. The order the conditions are applied impacts which min/max folds are applied.	2021-12-15 19:10:28 +00:00
Alexandros Lamprineas	61bb8b5d40	[AArch64] Convert sra(X, elt_size(X)-1) to cmlt(X, 0) CMLT has twice the execution throughput of SSHR on Arm out-of-order cores. Differential Revision: https://reviews.llvm.org/D115457	2021-12-14 16:03:02 +00:00
Daniil Fukalov	e5c64b45be	[CostModel][AMDGPU] Fix intrinsics costs estimations. 1. Fixed costs inconsistency for llvm.fma.vXf16 instinsiscs. 2. Added tests for llvm.sadd.sat, llvm.ssub.sat, llvm.uadd.sat, llvm.usub.sat intrisics since they have special processing in cost model. 3. Minor intrisics' costs tests updat and refinement. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115385	2021-12-13 17:17:34 +03:00
Sameer Sahasrabuddhe	1d0244aed7	Reapply CycleInfo: Introduce cycles as a generalization of loops Reverts `02940d6d22`. Fixes breakage in the modules build. LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-10 14:36:43 +05:30
David Sherwood	8b0448ce5d	[AArch64][Analysis] Add on overhead costs for SVE gathers and scatters This patch adds on an overhead cost for gathers and scatters, which is a rough estimate based on performance investigations I have performed on SVE hardware for various micro-benchmarks. Differential Revision: https://reviews.llvm.org/D115143	2021-12-09 16:02:59 +00:00
Florian Hahn	3c55acc4a6	[MemoryLocation] Support memset_pattern{4,8} in getForArgument. memset_pattern{4,8} behave as memset_pattern16, with the only difference being the size of the pattern location. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114905	2021-12-08 19:39:45 +00:00
Jolanta Jensen	77b2bb5567	[LAA] Use type sizes when determining dependence. In the isDependence function the code does not try hard enough to determine the dependence between types. If the types are different it simply gives up, whereas in fact what we really care about are the type sizes. I've changed the code to compare sizes instead of types. Reviewed By: fhahn, sdesmalen Differential Revision: https://reviews.llvm.org/D108763	2021-12-08 15:00:58 +00:00
Haohai Wen	d2c093e79d	[CostModel][X86] Add i64 mul cost for avx512 as 1cy i64 mul cost is 1cy for all cpu that support avx512. Currently all X86 cpu uses i64 mul cost in X64 cost table which is not true for cpu that support avx512 (skx, icx). Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D115016	2021-12-08 11:29:08 +08:00
Jonas Devlieghere	02940d6d22	Revert "CycleInfo: Introduce cycles as a generalization of loops" This reverts commit `0fe61ecc2c` because it breaks the modules build. https://green.lab.llvm.org/green/job/clang-stage2-rthinlto/4858/ https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39112/	2021-12-07 13:06:34 -08:00
Cullen Rhodes	698584f89b	[IR] Remove unbounded as possible value for vscale_range minimum The default for min is changed to 1. The behaviour of -mvscale-{min,max} in Clang is also changed such that 16 is the max vscale when targeting SVE and no max is specified. Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D113294	2021-12-07 09:52:21 +00:00
Sameer Sahasrabuddhe	0fe61ecc2c	CycleInfo: Introduce cycles as a generalization of loops LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops. The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR. This review is a restart of an older review request: https://reviews.llvm.org/D83094 Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> Differential Revision: https://reviews.llvm.org/D112696	2021-12-07 12:02:34 +05:30
Florian Hahn	a9125792b3	[MemoryLocation] Support missing atomic intrinsics in getForArg. getForArgument is missing support for atomic memory transfer intrinsics. In terms of accessed locations they behave like regular memory transfer intrinsics and we already support them as such in getForSource/getForDest.	2021-12-04 22:18:39 +00:00
Florian Hahn	89f0f2771a	[BasicAA] Add atomic mem intrinsic tests.	2021-12-04 15:44:33 +00:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
Nikita Popov	49d040ac97	[SCEV] Fix ValuesAtScopesUsers consistency Fixes verification failure reported at: https://reviews.llvm.org/rGc9f9be0381d1 The issue is that getSCEVAtScope() might compute a result without inserting it in the ValuesAtScopes map in degenerate cases, specifically if the ValuesAtScopes entry is invalidated during the calculation. Arguably we should still insert the result if no existing placeholder is found, but for now just tweak the logic to only update ValuesAtScopesUsers if ValuesAtScopes is updated.	2021-12-03 10:03:10 +01:00
Florian Hahn	829b29b619	[MemoryLocation] strcat/strncat/strcpy read/write after their args. strcpy/strcat/strncat access memory starting from the passed in pointers. Construct memory locations for their args using getAfter. Discussed in D114872. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D114969	2021-12-03 08:48:23 +00:00
Daniil Fukalov	ab05ab59a7	[CostModel][AMDGPU] Fix instructions costs estimation for vector types. 1. Fixed vector instructions costs estimations incosistency - removed different logic for "not simple types" since it biases costs for these types. 2. Fixed legalization penalty for vectors too big for the target: changed from overwrite default legalization cost value estimation to added penalty. 3. Fixed few typos in tests. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114893	2021-12-03 03:08:08 +03:00
Philip Reames	740057d185	[funcattrs] Infer writeonly argument attribute This change extends the current logic for inferring readonly and readnone argument attributes to also infer writeonly. This change is deliberately minimal; there's a couple of areas for follow up. * I left out all call handling and thus any benefit from the SCC walk. When examining the test changes, I realized the existing code is imprecise, and am going to fix that in it's own revision before adding in the writeonly handling. (Mostly because updating the tests is hard when I, the human, can't figure out whether the result is correct.) * I left out handling for storing a value (as opposed to storing to a pointer). This should benefit readonly/readnone as well, and applies to a bunch of other instructions. Seemed worth having as a separate review. Differential Revision: https://reviews.llvm.org/D114963	2021-12-02 13:04:09 -08:00
Florian Hahn	222442ec2d	[BasicAA] Add tests for strcat/strncat/strcpy.	2021-12-02 17:38:07 +00:00
Florian Hahn	639a78a4bf	[MemoryLocation] Support strncpy in getForArgument. The size argument of strncpy can be used as bound for the size of its pointer arguments. strncpy is guaranteed to write N bytes and reads up to N bytes. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D114871	2021-12-02 14:18:05 +00:00
Florian Hahn	9f9e8ba114	[MemoryLocation] Support memset_chk in getForArgument. The size argument for memset_chk is an upper bound for the size of the pointer argument. memset_chk may write less than the specified length, if it exceeds the specified max size and aborts. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114870	2021-12-02 13:45:58 +00:00
Florian Hahn	47616c8855	[BasicAA] Add tests for memset_pattern{4,8,16}. This also removes the existing memset_pattern.ll test, which was relying on GVN. It is also covered by the new test directly.	2021-12-02 11:50:32 +00:00
Florian Hahn	524ad6babb	[BasicAA] Add memset_chk libfunc tests.	2021-12-01 14:15:46 +00:00
Florian Hahn	c6bd63803f	[BasicAA] Add strncpy libfunc tests.	2021-12-01 14:15:40 +00:00
Roman Lebedev	8cd782487f	[X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()` We ask `TTI.getAddressComputationCost()` about the cost of computing vector address, and then multiply it by the vector width. This doesn't make any sense, it implies that we'd do a vector GEP and then scalarize the vector of pointers, but there is no such thing in the vectorized IR, we perform scalar GEP's. This is especially bad on X86, and was effectively prohibiting any scalarized vectorization of gathers/scatters, because `X86TTIImpl::getAddressComputationCost()` says that cost of vector address computation is `10` as compared to `1` for scalar. The computed costs are similar to the ones with D111222+D111220, but we end up without masked memory intrinsics that we'd then have to expand later on, without much luck. (D111363) Differential Revision: https://reviews.llvm.org/D111460	2021-11-30 10:47:56 +03:00
Nikita Popov	77dd579827	[SCEV] Remove incorrect assert Fix assertion failure reported on D113349 by removing the assert. While the produced expression should be equivalent, it may not be strictly the same, e.g. due to lazy nowrap flag updates. Similar to what the main createSCEV() code does, simply retain the old value map entry if one already exists.	2021-11-29 17:09:12 +01:00
Roman Lebedev	7e73c2a66a	[X86][Costmodel] `getInterleavedMemoryOpCostAVX512()`: masked load can not be folded into a shuffle The mask on the shuffle is for the output, not the input. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114697	2021-11-29 18:37:07 +03:00
Roman Lebedev	5e96553608	[NFC][X86][LV][Costmodel] Add most basic test for masked interleaved load	2021-11-29 16:46:19 +03:00
Roman Lebedev	cffe3a084f	[X86][Costmodel] Now that `getReplicationShuffleCost()` is good, update `getInterleavedMemoryOpCostAVX512()` ... to actually ask about i1-elt-wide mask, since that is what will probably be used on AVX512. This unblocks D111460. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114316	2021-11-29 14:41:48 +03:00
Nikita Popov	2b160e95c8	Reland [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency Relative to the previous landing attempt, this introduces an additional flag on forgetMemoizedResults() to not remove SCEVUnknown phis from the value map. The invalidation after BECount calculation wants to leave these alone and skips them in its own use-def walk, but we can still end up invalidating them via forgetMemoizedResults() if there is another IR value with the same SCEV. This is intended as a temporary workaround only, and the need for this should go away once the getBackedgeTakenInfo() invalidation is refactored in the spirit of D114263. ----- This adds validation for consistency of ValueExprMap and ExprValueMap, and fixes identified issues: * Addrec construction directly wrote to ValueExprMap in a few places, without updating ExprValueMap. Add a helper to ensures they stay consistent. The adjustment in forgetSymbolicName() explicitly drops the old value from the map, so that we don't rely on it being overwritten. * forgetMemoizedResultsImpl() was dropping the SCEV from ExprValueMap, but not dropping the corresponding entries from ValueExprMap. Differential Revision: https://reviews.llvm.org/D113349	2021-11-27 12:37:15 +01:00
Zarko Todorovski	7f7dac7126	[NFC][llvm] Inclusive language: reword uses of sanity test and check Part of continuing work to use more inclusive language. Reworded uses of sanity check and sanity test in llvm/test/	2021-11-25 07:21:42 -05:00
Graham Hunter	dee810e117	[NFC][LAA] Precommit tests for forked pointers Precommit for https://reviews.llvm.org/D108699	2021-11-24 16:20:35 +00:00
Peter Waller	787b66eb5f	[LoopAccessAnalysis][SVE] Bail out for scalable vectors The supplied test case, reduced from real world code, crashes with a 'Invalid size request on a scalable vector.' error. Since it's similar in spirit to an existing LAA test, rename the file to generalize it to both. Differential Revision: https://reviews.llvm.org/D114155	2021-11-24 15:52:20 +00:00
Roman Lebedev	cd8d219536	[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 32 bit when have AVX512DQ I believe, this effectively completes `X86TTIImpl::getReplicationShuffleCost()` for AVX512, other than the question of handling plain AVX512F, where we end up with some really ugly "shuffles", but then is there any CPU's that support AVX512, but not AVX512DQ/AVX512BW? Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114315	2021-11-24 17:23:15 +03:00
Florian Mayer	6c06d8e310	[stack-safety] Check SCEV constraints at memory instructions. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D113160	2021-11-23 15:29:23 -08:00
Roman Lebedev	704d92607d	[X86][TTI] Finish costmodel for AVX512BW's VPMOVM2[BW] / VPMOV[BW]2M instructions Apparently my methodology was suboptimal, and not only did miss all the +VL tuples, i also missed some plain tuples. I believe, this adds everything missing. Indeed, these manual costmodels are just not okay long-term. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114334	2021-11-22 14:31:34 +03:00
Roman Lebedev	8d09dd61c3	[X86][TTI] Costmodel for AVX512DQ's VPMOVM2[DQ] / VPMOV[DQ]2M instructions Much like the VPMOVM2[BW] / VPMOV[BW]2M from AVX512BW, these either sign-extent the mask register into a vector, or pack the mask from vector register. Apparently, we didn't even have MCA tests for these, added in rG2f364f6f0d3a2420ca78cbd80abb186657180e05, so i'm just guessing that their perf characteristics are optimal. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114314	2021-11-22 14:31:34 +03:00
Sjoerd Meijer	4d21b64464	[BPI] Look-up tables for non-loop branches. NFC. This adds and uses look-up tables for non-loop branch probabilities, which have have probabilities directly encoded into the tables for the different condition codes. Compared to having this logic inlined in different functions, as it used to be the case, I think this is compacter and thus also easier to check/cross reference. This also adds a test for pointer heuristics that was missing. Differential Revision: https://reviews.llvm.org/D114009	2021-11-22 10:30:42 +00:00
Roman Lebedev	df70cf5e14	[NFC][X86][Costmodel] Actually test +prefer-256-bit in replication-shuffle-related tests :( While -prefer-256-bit indeed becomes complete with D114314, the real-world (the one with +prefer-256-bit) coverage is lacking. Hilarious.	2021-11-21 01:25:49 +03:00
Roman Lebedev	da47a63e03	[NFC][X86][Costmodel] Add AVX512DQ runlines to trunc.ll/extend.ll	2021-11-20 13:55:13 +03:00
Roman Lebedev	049799c311	[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 8 bit when have AVX512BW+AVX512VBMI If in addition to AVX512BW (that provides `{k}<->{i8,i16}` casts and i16 shuffles), we have AVX512VBMI, which provides i8 shuffles, we are in an optimal situation. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114071	2021-11-19 15:58:10 +03:00
Roman Lebedev	a751084bb4	[X86][Costmodel] `trunc v16i8 to v8i1` can appear after legalization, cost is same as for `trunc v8i8 to v8i1` Note that there are many other missing costs, i'm only adding the ones that are queried from `getReplicationShuffleCost()` for the existing (quite exhaustive) test coverage. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114070	2021-11-19 15:57:32 +03:00
Roman Lebedev	a50fdd3fc9	[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 16 bit when have AVX512BW Here we get pretty lucky. AVX512F does not provide any instructions to convert between a `k` vector mask and a vector, but AVX512BW adds `{k}<->nX{i8,i16}`conversions, and just as it happens, with AVX512BW we have a i16 shuffle. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113915	2021-11-19 15:55:41 +03:00
Philip Reames	ea12c2cb9c	[SCEV] Move mustprogress based no-self-wrap logic so it applies to all exit conditions This change moves logic which we'd added specifically for less than tests so that it applies to equalities and greater than tests as well. The basic idea is that if we can show an IV cycles infinitely through the same series on self-wrap, and that the exit condition must be taken to prevent UB, we can conclude that it must be taken before self-wrap and thus infer said flag. The motivation here is simple loops with unsigned induction variables w/non-one steps and inequality tests. A toy example would be: for (unsigned i = 0; i != N; i += 2) { body; } If body contains no side effects, and this is a mustprogress function, we can assume that this must be a finite loop and thus that the exit count is N/2. Differential Revision: https://reviews.llvm.org/D103991	2021-11-18 10:07:44 -08:00
Philip Reames	100df68496	[SCEV] Add test coverage for invertible functions of IVs	2021-11-18 08:56:45 -08:00

1 2 3 4 5 ...

3190 Commits