llvm-project

Commit Graph

Author	SHA1	Message	Date
Clement Courbet	ff41fc07b1	Revert "[AA] Teach BasicAA to recognize basic GEP range information." We have found a miscompile with this change, reverting while working on a reproducer. This reverts commit `455b60ccfb`.	2021-10-06 16:49:10 +02:00
Simon Pilgrim	0776924a17	[CostModel][X86] getCmpSelInstrCost - treat BAD_PREDICATEs the same as the worst case cost predicates for ICMP/FCMP instructions As suggested on D111024, we should treat getCmpSelInstrCost calls without a specific predicate as matching the worst case predicate cost. These regressions will be addressed with a mixture of D111024 and fixing other specific getCmpSelInstrCost calls to have realistic predicates.	2021-10-06 10:14:56 +01:00
Philip Reames	e64ed3c8df	[test] autogen a couple of additional tests	2021-10-05 18:58:08 -07:00
Philip Reames	c59c32caa0	[test] factor out reliance on noundef return value	2021-10-05 14:45:48 -07:00
Philip Reames	5020e104a1	[test] rework recently added SCEV tests These are meant to check a future patch which recurses through operands of SCEVs, but because all SCEVs are trivially bounded by function entry, we need to arrange the trivial scope not to be valid. (i.e. we specifically need a lower defining scope)	2021-10-05 14:42:53 -07:00
Philip Reames	94c1c56cc5	[tests] Cover cases we could infer SCEV flags, but don't	2021-10-05 13:16:16 -07:00
Roman Lebedev	f92961d238	[NFC] Fixup newly-added costmodel tests to actually test what they should	2021-10-05 21:35:47 +03:00
Roman Lebedev	200edc152b	[NFC][X86][LV] Add basic costmodel test coverage for not-fully-interleaved i32 loads The coverage could have cumulative explosion here, so i'm adding only the most basic cases, and hoping it's enough, though more can be added if needed.	2021-10-05 19:39:50 +03:00
Roman Lebedev	3f9b235482	[X86][Costmodel] Load/store i64/f64 Stride=6 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1jfGddcre - for intels `Block RThroughput: =36.0`; for ryzens, `Block RThroughput: =12.0` So could pick cost of `36` For store we have: https://godbolt.org/z/ao9srMT8r - for intels `Block RThroughput: =30.0`; for ryzens, `Block RThroughput: =12.0` So we could pick cost of `30`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111094	2021-10-05 16:58:58 +03:00
Roman Lebedev	e2784c5d8c	[X86][Costmodel] Load/store i64/f64 Stride=6 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/rc8jYxW6M - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0` So could pick cost of `18`. For store we have: https://godbolt.org/z/9PhPEr65G - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: =6.0` So we could pick cost of `15`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111093	2021-10-05 16:58:58 +03:00
Roman Lebedev	3960693048	[X86][Costmodel] Load/store i64/f64 Stride=6 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/onese7rec - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =3.0` So could pick cost of `6`. For store we have: https://godbolt.org/z/bMd7dddnT - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=6.0` So we could pick cost of `8`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111092	2021-10-05 16:58:58 +03:00
Roman Lebedev	79d6d12d95	[X86][Costmodel] Load/store i32/f32 Stride=6 VF=16 interleaving costs This one required quite a bit of an assembly surgery, but i think it's in the right ballpark.. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/na97Kb96o - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0` So could pick cost of `64`. For store we have: https://godbolt.org/z/GG1WeoKar - for intels `Block RThroughput: =66.0`; for ryzens, `Block RThroughput: <=27.5` So we could pick cost of `66`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111091	2021-10-05 16:58:58 +03:00
Roman Lebedev	2996a2b50f	[X86][Costmodel] Load/store i32/f32 Stride=6 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/jK85GWKaK - for intels `Block RThroughput: =31.0`; for ryzens, `Block RThroughput: <=17.0` So could pick cost of `31`. For store we have: https://godbolt.org/z/hPWWhEEf9 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=13.8` So we could pick cost of `33`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111089	2021-10-05 16:58:57 +03:00
Roman Lebedev	d51532d8aa	[X86][Costmodel] Load/store i32/f32 Stride=6 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/szEj1ceee - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: <=8.8` So could pick cost of `15`. For store we have: https://godbolt.org/z/81bq4fTo1 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=10.0` So we could pick cost of `12`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111087	2021-10-05 16:58:57 +03:00
Roman Lebedev	764fd5f463	[X86][Costmodel] Load/store i32/f32 Stride=6 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/aec96Thee - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.3` So could pick cost of `6`. For store we have: https://godbolt.org/z/aec96Thee - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0` So we could pick cost of `9`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111083	2021-10-05 16:58:57 +03:00
Roman Lebedev	c800119c46	[X86][Costmodel] Load/store i64/f64 Stride=4 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/3M3hbq7n8 - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: =8.0` So could pick cost of `20`. For store we have: https://godbolt.org/z/zvnPYWTx7 - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: =8.0` So we could pick cost of `20`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111076	2021-10-05 16:58:57 +03:00
Roman Lebedev	000ce0bfd5	[X86][Costmodel] Load/store i64/f64 Stride=4 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/MTKdzjvnr - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0` So could pick cost of `8`. For store we have: https://godbolt.org/z/cMYEvqoah - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0` So we could pick cost of `8`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111075	2021-10-05 16:58:57 +03:00
Roman Lebedev	dcc2b0d933	[X86][Costmodel] Load/store i64/f64 Stride=4 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/z197317d1 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0` So could pick cost of `6`. For store we have: https://godbolt.org/z/8dzszjf9q - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0` So we could pick cost of `6`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111073	2021-10-05 16:58:57 +03:00
Roman Lebedev	7d91037fd2	[X86][Costmodel] Load/store i32/f32 Stride=4 VF=16 interleaving costs This one required quite a bit of assembly surgery, but the trend continues, so i think this is right. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/EKWdj8cKT - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0` So could pick cost of `32`. For store we have: https://godbolt.org/z/zj4bb9P75 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0` So we could pick cost of `32`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111064	2021-10-05 16:58:57 +03:00
Roman Lebedev	4aee1e5b93	[X86][Costmodel] Load/store i32/f32 Stride=4 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/a6rxMG6ec - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=12.0` So could pick cost of `16`. For store we have: https://godbolt.org/z/ced1bdqc9 - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0` So we could pick cost of `16`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111063	2021-10-05 16:58:57 +03:00
Roman Lebedev	3c2e22b795	[X86][Costmodel] Load/store i32/f32 Stride=4 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/avq1oz98W - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0` So could pick cost of `8`. For store we have: https://godbolt.org/z/89PGMc1qs - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=6.0` So we could pick cost of `6`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111061	2021-10-05 16:58:57 +03:00
Roman Lebedev	b6234c1edf	[X86][Costmodel] Load/store i32/f32 Stride=4 VF=2 interleaving costs Finally, we are getting to the heavy-hitter stuff! The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/7crGWoar6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So could pick cost of `4`. For store we have: https://godbolt.org/z/T8aq3MszM - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.0` So we could pick cost of `5`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111060	2021-10-05 16:58:56 +03:00
Nikita Popov	30001af84e	[BasicAA] Ignore CanBeFreed in minimal extent reasoning When determining NoAlias based on object size and dereferenceability information, we can ignore frees for the same reason we can ignore possible null pointers (if null is not a valid pointer): Actually accessing the null pointer / freed pointer would be immediate UB, and AA results are only valid under the assumption of an access. This addresses a minor regression from D110745. Differential Revision: https://reviews.llvm.org/D111028	2021-10-04 22:08:57 +02:00
Roman Lebedev	dee4d699b2	[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=6	2021-10-04 20:57:35 +03:00
Roman Lebedev	c4dd0fe4b3	[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=6	2021-10-04 20:57:35 +03:00
Roman Lebedev	b8c7d5229c	[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=4	2021-10-04 17:31:57 +03:00
Roman Lebedev	f38cbd7859	[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=4	2021-10-04 17:31:57 +03:00
Roman Lebedev	cef0a693b6	[X86][Costmodel] Load/store i64/f64 Stride=3 VF=16 interleaving costs This required huge amount of assembly surgery, but i think this is about right. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/z11crMEcj - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: <=18.0` So could pick cost of `25`. For store we have: https://godbolt.org/z/eqT4ze3j4 - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=16.0` So we could pick cost of `24`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111031	2021-10-04 14:35:17 +03:00
Roman Lebedev	ede0611e79	[X86][Costmodel] Load/store i64/f64 Stride=3 VF=8 interleaving costs This one required quite a bit of assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/oYWv4cTnK - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=8.0` So pick cost of `10`. For store we have: https://godbolt.org/z/33GMhrsG9 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=8.0` So pick cost of `12`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111027	2021-10-04 14:35:01 +03:00
Roman Lebedev	eb9a694c17	[X86][Costmodel] Load/store i64/f64 Stride=3 VF=4 interleaving costs This one required quite a bit of assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/Tce3osvcz - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `5`. For store we have: https://godbolt.org/z/oc3arEcnE - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `6`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111026	2021-10-04 14:34:47 +03:00
Roman Lebedev	d3bbe781ea	[X86][Costmodel] Load/store i64/f64 Stride=3 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/sz5qdKnr4 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `1`. For store we have: https://godbolt.org/z/Kzdjff63v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111025	2021-10-04 14:34:33 +03:00
Roman Lebedev	4ca5bc07af	[X86][Costmodel] Load/store i32/f32 Stride=3 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=12.0` So pick cost of `14`. For store we have: https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =22.0`; for ryzens, `Block RThroughput: <=16.0` So pick cost of `22`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111022	2021-10-04 14:34:19 +03:00
Roman Lebedev	198aa84973	[X86][Costmodel] Load/store i32/f32 Stride=3 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/zdz5Ga6fs - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=6.0` So pick cost of `7`. For store we have: https://godbolt.org/z/qn71513ac - for intels `Block RThroughput: =11.0`; for ryzens, `Block RThroughput: <=8.0` So pick cost of `11`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111021	2021-10-04 14:34:05 +03:00
Roman Lebedev	a93411c3af	[X86][Costmodel] Load/store i32/f32 Stride=3 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/d8PdhEszo - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `3`. For store we have: https://godbolt.org/z/WojonfG5n - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `5`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111020	2021-10-04 14:34:03 +03:00
Roman Lebedev	3e93fcdfc8	[X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/z8qa14bs3 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: =1.5` So pick cost of `3`. For store we have: https://godbolt.org/z/GYGajoc4K - for intels `Block RThroughput: <=4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111019	2021-10-04 14:31:50 +03:00
Philip Reames	35ab211c37	[SCEV] Use trivial bound on defining scope of all SCEVs when computing flags This addresses a comment from review on D109845. Even for SCEVs which we can't find true bounds without recursing through operands, entry to the function forms a trivial upper bound. In some cases, this trivial bound is enough to prove safety of flag inference.	2021-10-03 16:01:30 -07:00
Philip Reames	d02db32644	[SCEV] Use full logic when infering flags on add and gep This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path. We can still do much better here (on both paths), but this is our first step. Differential Revision: https://reviews.llvm.org/D111003	2021-10-03 15:32:15 -07:00
Philip Reames	f39978b84f	[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec This fixes a violation of the wrap flag rules introduced in `c4048d8f`. This is an alternate fix to D106852. The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined. In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation. One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece. The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.) Differential Revision: https://reviews.llvm.org/D109845	2021-10-03 15:19:33 -07:00
Roman Lebedev	67f1ee2e38	[X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/rMaYr67hz - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=17.8` So pick cost of `56`. For store we have: https://godbolt.org/z/eMsbKqnvv - for intels `Block RThroughput: <=54.0`; for ryzens, `Block RThroughput: <=15.0` So pick cost of `54`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111018	2021-10-03 23:40:35 +03:00
Roman Lebedev	3cbc0a07f9	[X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: =28.0`; for ryzens, `Block RThroughput: <=8.5` So pick cost of `28`. For store we have: https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=7.0` So pick cost of `27`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111017	2021-10-03 23:40:21 +03:00
Roman Lebedev	72f8a9244a	[X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=2.3` So pick cost of `9`. For store we have: https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: <=12.0`; for ryzens, `Block RThroughput: <=3.3` So pick cost of `12`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111016	2021-10-03 23:40:05 +03:00
Roman Lebedev	04f1469cb4	[X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `7`. For store we have: https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `6`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111015	2021-10-03 23:39:51 +03:00
Roman Lebedev	8e8fb77aa4	[X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/xnE988aej - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.5` So pick cost of `5`. For store we have: https://godbolt.org/z/rMGT31Tnh - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111014	2021-10-03 23:39:36 +03:00
Roman Lebedev	a5e5883ef5	[X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/c1jjKqP7b - for intels `Block RThroughput: <=82.0`; for ryzens, `Block RThroughput: <=26.0` So pick cost of `82`. For store we have: https://godbolt.org/z/YM4ErY8x7 - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=25.5` So pick cost of `90`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111013	2021-10-03 23:39:22 +03:00
Roman Lebedev	bd5ba437fd	[X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/Gz8hhqfTM - for intels `Block RThroughput: <=43.0`; for ryzens, `Block RThroughput: <=14.0` So pick cost of `43`. For store we have: https://godbolt.org/z/9vrdssYa8 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=12.0` So pick cost of `27`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111012	2021-10-03 23:39:08 +03:00
Roman Lebedev	0b27f9c088	[X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/v98qPTTf6 - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0` So pick cost of `18`. For store we have: https://godbolt.org/z/rn5T9E8q6 - for intels `Block RThroughput: <=16.0`; for ryzens, `Block RThroughput: <=4.5` So pick cost of `16`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111011	2021-10-03 23:38:54 +03:00
Roman Lebedev	6fe4cce558	[X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=7.0` So pick cost of `14`. For store we have: https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `9`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111010	2021-10-03 23:38:40 +03:00
Roman Lebedev	396b95e5c9	[X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/jvj6jzns5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `6`. For store we have: https://godbolt.org/z/ros7eebMP - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `7`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111008	2021-10-03 23:38:10 +03:00
Roman Lebedev	025ce15435	[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=3	2021-10-03 17:52:11 +03:00
Roman Lebedev	f3c6c76cfd	[NFC][X86][LV] Add costmodel test coverage for interleaved i32/f32 load/store stride=3	2021-10-03 16:49:51 +03:00
Roman Lebedev	e311cdd18d	[NFC][X86][LV] Add costmodel test coverage for interleaved i8 load/store stride=6	2021-10-03 14:33:59 +03:00
Roman Lebedev	acb459574a	[X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs While we already model this tuple, the load cost is divergent from reality, so fix it. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/zWMhhnPYa - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=24.0` So pick cost of `56`. For store we have: https://godbolt.org/z/vnqqjWx51 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `12`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110971	2021-10-02 13:40:21 +03:00
Roman Lebedev	0e71ae6da8	[X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/TrGW7cKsE - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=12.0` So pick cost of `24`. For store we have: https://godbolt.org/z/Mh7qaqEfe - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `8`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110970	2021-10-02 13:40:21 +03:00
Roman Lebedev	74e4a0e327	[X86][Costmodel] Load/store i8 Stride=4 VF=8 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/v7746Wcf7 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=6.0` So pick cost of `12`. For store we have: https://godbolt.org/z/aEeEohEbP - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110969	2021-10-02 13:40:20 +03:00
Roman Lebedev	ae08362cb8	[X86][Costmodel] Load/store i8 Stride=4 VF=4 interleaving costs While we already model this tuple, the store cost is divergent from reality, so fix it. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1n4bPh7Tn - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. For store we have: https://godbolt.org/z/r8K9sveqo - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110968	2021-10-02 13:40:20 +03:00
Roman Lebedev	935b9693ae	[X86][Costmodel] Load/store i8 Stride=4 VF=2 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/KP6nn36zs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. For store we have: https://godbolt.org/z/ov95zhrq6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110966	2021-10-02 13:40:20 +03:00
Roman Lebedev	448c939839	[X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs For VF=16, costs are correct. For VF=32, load cost is divergent. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/qKjevqf4W - for intels `Block RThroughput: <=14.0`; for ryzens, `Block RThroughput: <=4.5` So pick cost of `14`. For store we have: https://godbolt.org/z/xTssTq319 - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=5.5` So pick cost of `13`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110961	2021-10-02 13:39:15 +03:00
Roman Lebedev	d1460c88a6	[X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1jeocxj55 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `6`. For store we have: https://godbolt.org/z/fr7xfa3K5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `6`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110960	2021-10-02 13:39:15 +03:00
Roman Lebedev	f1df2d8eaf	[X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/obWz3PrfK - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5` So pick cost of `3`. For store we have: https://godbolt.org/z/orjPshn3h - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110958	2021-10-02 13:39:10 +03:00
Roman Lebedev	8a3c64c3a2	[X86][Costmodel] Load/store i8 Stride=3 VF=2 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/WYscYMcW4 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5` So pick cost of `3`. For store we have: https://godbolt.org/z/e9qvYdbbs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110956	2021-10-02 13:39:05 +03:00
Philip Reames	91dfc0840d	[test] add coverage for a SCEVUnknown scoped value in isSCEVExprNeverPoison Note that a couple of the "negative" tests also end up showing miscompiles due to D109845 which is not yet fixed.	2021-10-01 16:39:23 -07:00
Philip Reames	2ca8a3f213	[SCEV] Stop blindly propagating flags from inbound geps to SCEV nodes This fixes a violation of the wrap flag rules introduced in `c4048d8f`. This was also noted in the (very old) PR23527. The issue being fixed is that we assume the inbound flag on any GEP assumes that all users of any gep (or add) which happens to map to that SCEV would also be UB if the (other) gep overflowed. That's simply not true. In terms of the test diffs, I don't see anything seriously problematic. The lost flags are expected (given the semantic restriction on when its legal to tag the SCEV), and there are several cases where the previously inferred flags are unsound per the new semantics. The only common trend I noticed when looking at the deltas is that by not considering branch on poison as immediate UB in ValueTracking, we do miss a few cases we could reclaim. We may be able to claw some of these back with the follow ideas mentioned in PR51817. It's worth noting that most of the changes are analysis result only changes. The two transform changes are pretty minimal. In one case, we miss the opportunity to infer a nuw (correctly). In the other, we fail to fold an exit and produce a loop invariant form instead. This one is probably over-reduced as the program appears to be undefined in practice, and neither before or after exploits that. Differential Revision: https://reviews.llvm.org/D109789	2021-10-01 16:30:44 -07:00
Philip Reames	24cde2f602	[SCEV] Remove invariant requirement from isSCEVExprNeverPoison This code is attempting to prove that I must execute if we enter the defining scope of the SCEV which will be created from I. In the case where it found a defining addrec scope, it had a rather odd restriction that all of the other operands must be loop invariant in that addrec's loop. As near as I can tell here, we really only need a upper bound on the defining scope. If we can prove the stronger property, then we must also have proven the property on the exact defining scope as well. In practice, the actual effect of this change is narrow. The compile time restriction at the top of the routine basically limits us to I being an arithmetic in some loop L with both an addrec operand in L, and a unknown operands in L. Possible to demonstrate, but the main value of the change is removing unneeded code. Differential Revision: https://reviews.llvm.org/D110892	2021-10-01 15:57:37 -07:00
Philip Reames	d0bca006bb	[test] split flags-from-poison.ll to allow ease of autogen update	2021-10-01 15:35:09 -07:00
Nikita Popov	b084b98abe	[BasicAA] Make test more robust (NFC) When taking into account the fact that GEP indices are truncated to 32-bits in this test, the "path dependence" goes away, so inferring MustAlias for all pointers would be correct. As this goes against the spirit of the test, change it to extend from i16 instead.	2021-10-01 22:57:01 +02:00
Nikita Popov	b7ff048915	[BasicAA] Add additional truncation tests (NFC) These show that the known bits and non-zero heuristics are incorrect when truncation is involved.	2021-10-01 22:57:01 +02:00
Roman Lebedev	53d7bdbfbf	[NFC][X86][LV] Improve costmodel test coverage for interleaved i8 load/store stride=4	2021-10-01 22:49:06 +03:00
Nikita Popov	04a6f80e9b	[BasicAA] Add additional 32-bit truncation test (NFC) This is a variant with a variable index, in which case the pointer size adjustment is not performed.	2021-10-01 21:20:59 +02:00
Roman Lebedev	727a359979	[NFC][X86][LV] Improve costmodel test coverage for interleaved i8 load/store stride=3	2021-10-01 18:47:25 +03:00
Roman Lebedev	3e260efdfc	[X86][Costmodel] Load/store i64/f64 Stride=2 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1WMTojvfW - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0` So pick cost of `16`. For store we have: https://godbolt.org/z/1WMTojvfW - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=16.0` So pick cost of `16`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110840	2021-10-01 17:48:14 +03:00
Roman Lebedev	abd37de63e	[X86][Costmodel] Load/store i64/f64 Stride=2 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/PGYbYKPq8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `8`. For store we have: https://godbolt.org/z/PGYbYKPq8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=8.0` So pick cost of `8`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110838	2021-10-01 17:48:14 +03:00
Roman Lebedev	71bc31b907	[X86][Costmodel] Load/store i64/f64 Stride=2 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/j5co1qWEW - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. For store we have: https://godbolt.org/z/j5co1qWEW - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110837	2021-10-01 17:48:14 +03:00
Roman Lebedev	612e5b05a2	[X86][Costmodel] Load/store i64/f64 Stride=2 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/8a1cfGeMn - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/jMdcM47bx - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `2`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110835	2021-10-01 17:48:14 +03:00
Roman Lebedev	ea76cb87ee	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=32 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 Here for `store` pattern we are starting to have spilling, so accurate modelling may be problematic, although if i drop the spilling, the measurements don't change. For load we have: https://godbolt.org/z/1oTTnncbx - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0` So pick cost of `16`. For store we have: https://godbolt.org/z/1oTTnncbx - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: =8.0` So pick cost of `16`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110761	2021-10-01 17:48:14 +03:00
Roman Lebedev	80cd8da78d	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/M9eev3xe8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `8`. For store we have: https://godbolt.org/z/M9eev3xe8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0` So pick cost of `8`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110756	2021-10-01 17:48:14 +03:00
Roman Lebedev	3a0643e9c2	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. For store we have: https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110755	2021-10-01 17:48:13 +03:00
Roman Lebedev	b12aeaec9a	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `2`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110754	2021-10-01 17:48:13 +03:00
Roman Lebedev	f44d9009c2	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/4rY96hnGT - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/vbo37Y3r9 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: =0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110753	2021-10-01 17:48:13 +03:00
Florian Hahn	413b7ac6b5	[BasicAA] Add test showing 32 bit overflow issue for GEPs. This patch additional tests with i64 GEP indices for 32 bit pointers. @mustalias_overflow_in_32_bit_add_mul_gep highlights a case where BasicAA currently incorrectly determines noalias. Modeled in Alive2 for 32 bit pointers: https://alive2.llvm.org/ce/z/HHjQgb Modeled in Alive2 for 64 bit pointers: https://alive2.llvm.org/ce/z/DoWK2c	2021-10-01 11:37:56 +01:00
Philip Reames	bdb5aa65b1	[test] Add tests covering a missing opt in SCEV's isSCEVExprNeverPoison	2021-09-30 16:15:06 -07:00
Florian Hahn	1fbdbb5595	Revert "Recommit "[SCEV] Look through single value PHIs." (take 2)" This reverts commit `764d9aa979`. This patch exposed a few additional cases where SCEV expressions are not properly invalidated. See PR52024, PR52023.	2021-09-30 20:53:51 +01:00
Craig Topper	765348298c	[CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted. I believe the cost model was also incorrect for the old expansion. The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result to calculate theirs signs. Then 2 icmps to compare the signs. Followed by an And. The previous cost model was using 3 icmps and 2 selects. Digging back through git blame, those 2 selects in the cost model used to be 2 icmps, but were changed in https://reviews.llvm.org/D90681 Differential Revision: https://reviews.llvm.org/D110739	2021-09-30 09:41:14 -07:00
Daniil Fukalov	cf362ff4ca	[NFC][AMDGPU] Improve cost model tests coverage.	2021-09-30 18:13:17 +03:00
Roman Lebedev	6be397eb35	[NFC][X86][LV] Add costmodel test coverage for interleaved i64/f64 load/store stride=2	2021-09-30 17:31:18 +03:00
Roman Lebedev	6776bcfeb6	[NFC][Costmodel][LV][X86] Add test coverage for f32 interleaved load/store stride=2	2021-09-30 14:29:35 +03:00
Clement Courbet	455b60ccfb	[AA] Teach BasicAA to recognize basic GEP range information. The information can be implicit (from `ValueTracking`) or explicit. This implements the backend part of the following RFC https://groups.google.com/g/llvm-dev/c/T9o51zB1JY. We still need to settle on how to best represent the information in the IR, but this is a separate discussion. Differential Revision: https://reviews.llvm.org/D109746	2021-09-30 08:29:32 +02:00
Roman Lebedev	52912fe7ae	[NFC][X86][LV] Add costmodel test coverage for interleaved i32 load/store stride=2	2021-09-29 22:16:59 +03:00
Daniil Fukalov	6a187f9a57	[NFC][AMDGPU] Add missing gfx90a test cases to fsub.ll.	2021-09-29 21:55:54 +03:00
Roman Lebedev	2d42a192e0	[X86][Costmodel] Load/store i8 Stride=2 VF=32 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/xz6x7c35P - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.5` So pick cost of `6`. For store we have: https://godbolt.org/z/xz6x7c35P - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110709	2021-09-29 21:52:45 +03:00
Roman Lebedev	bac60c55e0	[X86][Costmodel] Load/store i8 Stride=2 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/a9hv4z47v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0` So pick cost of `4`. For store we have: https://godbolt.org/z/6GfPn1b79 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `3`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110708	2021-09-29 21:52:45 +03:00
Roman Lebedev	1962185671	[X86][Costmodel] Load/store i8 Stride=2 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 Identical to VF=2. For load we have: https://godbolt.org/z/4TEbdzbMM - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/MYfzGPf3Y - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110705	2021-09-29 21:52:45 +03:00
Roman Lebedev	08face1f9a	[X86][Costmodel] Load/store i8 Stride=2 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 Identical to VF=2. For load we have: https://godbolt.org/z/sGE41GYo7 - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/ba5r3s9xa - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110704	2021-09-29 21:52:45 +03:00
Roman Lebedev	7d52628eb0	[X86][Costmodel] Load/store i8 Stride=2 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/caKqjr9hb - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/6TTn3eKj8 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110702	2021-09-29 21:52:44 +03:00
Simon Pilgrim	17f1fc1e54	[TTI] BasicTTI::getInterleavedMemoryOpCost(): use getScalarizationOverhead() getScalarizationOverhead() results in a somewhat better cost estimation than counting the insertion/extraction costs directly. Notably, this is still overestimating the costs. Original Patch by: @lebedev.ri (Roman Lebedev) Differential Revision: https://reviews.llvm.org/D110713	2021-09-29 16:41:53 +01:00
Roman Lebedev	c13b4b6b0d	[NFC][X86][LV] Add costmodel test coverage for interleaved i8 load/store stride=2	2021-09-29 15:28:05 +03:00
Roman Lebedev	ff05e25a84	[NFC][X86][LV] Add some test coverage for [un]masked gather/scatter While we did have test coverage for the intrinsics, i don't believe there was LV-based test coverage.	2021-09-29 14:28:49 +03:00
Simon Pilgrim	bddc04bc4c	[CostModel][X86] Add SSE2/AVX1/AVX512BW test coverage for i16 interleaved load/store	2021-09-28 18:00:56 +01:00
Roman Lebedev	b6b7860954	[X86][Costmodel] Load/store i16 Stride=6 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For this tuple, measuring becomes problematic since there's a lot of spilling going on, but apparently all these memory ops do not affect worst-case estimate at all here. For load we have: https://godbolt.org/z/5qGb9odP6 - for intels `Block RThroughput: <=106.0`; for ryzens, `Block RThroughput: <=34.8` So pick cost of `106`. For store we have: https://godbolt.org/z/KrWcv4Ph7 - for intels `Block RThroughput: =58.0`; for ryzens, `Block RThroughput: <=20.5` So pick cost of `58`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110593	2021-09-28 19:15:08 +03:00
Roman Lebedev	24e42f7d28	[X86][Costmodel] Load/store i16 Stride=6 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/3Tc5s897j - for intels `Block RThroughput: =39.0`; for ryzens, `Block RThroughput: <=13.5` So pick cost of `39`. For store we have: https://godbolt.org/z/fo1h9E67e - for intels `Block RThroughput: =21.0`; for ryzens, `Block RThroughput: <=12.0` So pick cost of `21`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110592	2021-09-28 19:15:07 +03:00
Roman Lebedev	b3011bcc78	[X86][Costmodel] Load/store i16 Stride=6 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1Wcaf9c7T - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=4.5` So pick cost of `9`. For store we have: https://godbolt.org/z/1Wcaf9c7T - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: <=6.0` So pick cost of `15`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110591	2021-09-28 19:15:01 +03:00
Roman Lebedev	aa93c55889	[X86][Costmodel] Load/store i16 Stride=6 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/bhscej4WM - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=7.0` So pick cost of `13`. For store we have: https://godbolt.org/z/Yf4Pfnxbq - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=3.5` So pick cost of `10`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110590	2021-09-28 19:14:56 +03:00
Max Kazantsev	00be84f910	Recommit "[Test] Add more tests with cycled phis"	2021-09-28 19:36:47 +07:00
Max Kazantsev	a91145f75a	Revert "[Test] Add more tests with cycled phis" This reverts commit `7128a545b3`. Need to regenerate tests after rebase.	2021-09-28 19:32:26 +07:00
Max Kazantsev	7128a545b3	[Test] Add more tests with cycled phis	2021-09-28 19:04:12 +07:00
Florian Hahn	764d9aa979	Recommit "[SCEV] Look through single value PHIs." (take 2) This reverts commit `8fdac7cb7a`. The issue causing the revert has been fixed a while ago in `60b852092c`. Original message: Now that SCEVExpander can preserve LCSSA form, we do not have to worry about LCSSA form when trying to look through PHIs. SCEVExpander will take care of inserting LCSSA PHI nodes as required. This increases precision of the analysis in some cases. Reviewed By: mkazantsev, bmahjour Differential Revision: https://reviews.llvm.org/D71539	2021-09-28 10:32:17 +01:00
Roman Lebedev	2a7a768dad	[X86][Costmodel] Load/store i16 Stride=4 VF=32 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For this tuple, measuring becomes problematic since there's a lot of spilling going on, but apparently all these memory ops do not affect worst-case estimate at all here. For load we have: https://godbolt.org/z/zP4hd8MT6 - for intels `Block RThroughput: =150.0`; for ryzens, `Block RThroughput: <=59` So pick cost of `150`. For store we have: https://godbolt.org/z/vKb8zTK8E - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=24.0` So pick cost of `64`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110548	2021-09-27 22:20:01 +03:00
Roman Lebedev	ee5a050e2e	[X86][Costmodel] Load/store i16 Stride=4 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =75.0`; for ryzens, `Block RThroughput: <=29.5` So pick cost of `75`. (note that `# 32-byte Reload` does not affect throughput there.) For store we have: https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=12.0` So pick cost of `32`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110543	2021-09-27 22:20:01 +03:00
Roman Lebedev	5615d6a6dd	[X86][Costmodel] Load/store i16 Stride=4 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/dd8T5P471 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=14.5` So pick cost of `33`. For store we have: https://godbolt.org/z/zPxcKWhn4 - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=6.0` So pick cost of `10`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110541	2021-09-27 22:20:01 +03:00
Roman Lebedev	df2b42d12e	[X86][Costmodel] Load/store i16 Stride=4 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/rnsf639Wh - for intels `Block RThroughput: =17.0`; for ryzens, `Block RThroughput: <=7.5` So pick cost of `17`. For store we have: https://godbolt.org/z/565KKrcY6 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0` So pick cost of `6`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110537	2021-09-27 22:20:01 +03:00
Roman Lebedev	45caac91c4	[X86][Costmodel] Load/store i16 Stride=4 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/5EYc6r9nh - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `6`. For store we have: https://godbolt.org/z/z61e5d6GE - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110536	2021-09-27 22:20:01 +03:00
Daniil Fukalov	1f73f0c19d	[NFC][AMDGPU] Update cost model tests: 1. Convert to generated tests. 2. Added code-size case in few places.	2021-09-27 19:26:02 +03:00
Roman Lebedev	7424deb743	[X86][Costmodel] Load/store i16 Stride=2 VF=32 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/q6GbK89br - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: <=7.0` So pick cost of `18`. For store we have: https://godbolt.org/z/Yzfoo5TnW - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `8`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110507	2021-09-27 14:21:12 +03:00
Roman Lebedev	a5113e9445	[X86][Costmodel] Load/store i16 Stride=2 VF=16 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/Y1E7qnjz8 - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.5` So pick cost of `9`. For store we have: https://godbolt.org/z/Y1E7qnjz8 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110506	2021-09-27 14:20:11 +03:00
Roman Lebedev	70c90cc5bd	[X86][Costmodel] Load/store i16 Stride=2 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/e5YE99a4P - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0` So pick cost of `6`. For store we have: https://godbolt.org/z/3vM4KsE1n - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `3`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110505	2021-09-27 14:18:29 +03:00
Roman Lebedev	49e532aa52	[X86][Costmodel] Load/store i16 Stride=2 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1j3nf3dro - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/4n1zvP37j - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110504	2021-09-27 14:15:25 +03:00
Max Kazantsev	4992220ea7	[Test] Regenerate test checks with autogen script	2021-09-27 16:55:59 +07:00
Max Kazantsev	0bd9162fd7	[Test] Add test showing that SCEV cannot properly infer ranges of cycled phis	2021-09-27 15:01:43 +07:00
Roman Lebedev	d9413f46b3	[X86][Costmodel] Load/store i16 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/M8vEKs5jY - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/Kx1nKz7je - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103144	2021-09-26 19:13:23 +03:00
Simon Pilgrim	3538ee763d	[CostModel][X86] Improve AVX1/AVX2 v16i32->v16i16/v16i8 truncation costs (PR51972) Based off worst case btver2 (AVX1) and haswell (AVX2) llvm-mca reports	2021-09-26 13:43:46 +01:00
Simon Pilgrim	8c83bd3bd4	[CostModel][X86] Adjust vXi32 multiply costs if it can be performed using PMADDWD Update the costs to match the codegen from combineMulToPMADDWD - not only can we use PMADDWD is its zero-extended, but also if its a constant or sign-extended from a vXi16 (which can be replaced with a zero-extension).	2021-09-25 16:28:48 +01:00
Daniil Fukalov	4f28a2eb03	[NFC] Refactor tests to improve readability.	2021-09-24 01:57:30 +03:00
Simon Pilgrim	c931d35216	[CostModel][X86] Increase i64 mul cost from 1 to 2 Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization. Noticed while investigating PR51436.	2021-09-23 14:48:21 +01:00
Florian Mayer	36daf074d9	[hwasan] also omit safe mem[cpy\|mov\|set]. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D109816	2021-09-22 11:08:27 +01:00
Antonio Frighetto	43d6991c2a	[IR] Look through bitcast in hasFnAttribute() A logic incompleteness may lead MemorySSA to be too conservative in its results. Specifically, when dealing with a call of kind `call i32 bitcast (i1 (i1)* @test to i32 (i32)*)(i32 %1)`, where the function `test` is declared with readonly attribute, the bitcast is not looked through, obscuring function attributes. Hence, some methods of CallBase (e.g., doesNotReadMemory) could provide suboptimal results. Differential Revision: https://reviews.llvm.org/D109888	2021-09-21 21:57:02 +02:00
David Spickett	92c9b28347	Revert "[AArch64][SVE] Teach cost model that masked loads/stores are cheap" This reverts commit `734708e04f`. Due to build failures on the 2 stage SVE VLS bot. https://lab.llvm.org/buildbot/#/builders/176/builds/908/steps/11/logs/stdio	2021-09-20 08:45:18 +00:00
Nikita Popov	80110aafa0	[Tests] Fix incorrect noalias metadata Mostly this fixes cases where !noalias or !alias.scope were passed a scope rather than a scope list. In some cases I opted to drop the metadata entirely instead, because it is not really relevant to the test.	2021-09-18 20:51:00 +02:00
Philip Reames	df7c2bcf4e	precommit tests for D109457	2021-09-16 12:43:22 -07:00
Philip Reames	f79ce5875f	autogen a SCEV test for ease of update	2021-09-16 12:19:30 -07:00
Max Kazantsev	e4da0f9657	[Test] Add test showing missing opportunity in range inference for SCEV	2021-09-16 15:40:56 +07:00
Philip Reames	248e430f37	precommit test for D109845/D106852	2021-09-15 12:53:55 -07:00
Philip Reames	9bdb19cca2	[SCEV] (udiv X, Y) * Y is always NUW Motivated by the removal done in D109782. This implements the correct flag part generically. Differential Revision: https://reviews.llvm.org/D109786	2021-09-15 11:34:50 -07:00
Philip Reames	a92f11b682	switch a couple of SCEV tests to autogen for ease of update	2021-09-15 11:11:07 -07:00
Simon Pilgrim	0767e43d87	[CostModel][X86] Adjust bitreverse/ctpop/ctlz/cttz AVX2+ costs based on llvm-mca reports Based off the worse case numbers generated by D103695, the AVX2/512 bit reversing/counting costs were higher than necessary (based off instruction counts instead of actual throughput).	2021-09-15 13:04:40 +01:00
Philip Reames	baff4b4105	[test] precommit anoter test for D109786	2021-09-14 15:31:44 -07:00
Philip Reames	162aed4824	[test] precommit test for D109786	2021-09-14 15:28:26 -07:00
Philip Reames	336291e777	autogen a test for ease of update in later patch	2021-09-14 14:48:47 -07:00
Florian Hahn	e248d69036	Recommit "[LAA] Support pointer phis in loop by analyzing each incoming pointer." SCEV does not look through non-header PHIs inside the loop. Such phis can be analyzed by adding separate accesses for each incoming pointer value. This results in 2 more loops vectorized in SPEC2000/186.crafty and avoids regressions when sinking instructions before vectorizing. Fixes PR50296, PR50288. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D102266	2021-09-14 11:19:12 +01:00
Florian Mayer	5b5d774f5d	[hwasan] Respect returns attribute when tracking values. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D109233	2021-09-13 20:52:24 +01:00
Florian Hahn	4c84a0f24c	[LAA] Add additional pointer phi tests.	2021-09-13 10:05:31 +01:00
Florian Mayer	57335b6e2e	[stack-safety] Allow to determine safe accesses. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D109503	2021-09-10 19:23:54 +01:00
Philip Reames	eede4846a9	[SCEV] Allow negative steps for LT exit count computation for unsigned comparisons This bit of code is incredibly suspicious. It allows fully unknown (but potentially negative) steps, but not steps known to be negative. The comment about scev flag inference is worrying, but also not correct to my knowledge. At best, this might be covering up some related miscompile. However, there's no test in tree for it, the review history doesn't include obvious motivation, and the C++ example doesn't appear to give wrong results when hand translated to IR. I think it's time to remove this and see what falls out. During review, there were concerns raised about the correctness of the corresponding signed case. This change was deliberately narrowed to the unsigned case which has been auditted and appears correct for negative values. We need to get back to the known-negative signed case, but that'll be a future patch if nothing falls out from this one. Differential Revision: https://reviews.llvm.org/D104140	2021-09-09 14:09:29 -07:00
Eli Friedman	8f792707c4	[ScalarEvolution] Fix pointer/int confusion in howManyLessThans. In general, howManyLessThans doesn't really want to work with pointers at all; the result is an integer, and the operands of the icmp are effectively integers. However, isLoopEntryGuardedByCond doesn't like extra ptrtoint casts, so the arguments to isLoopEntryGuardedByCond need to be computed without those casts. Somehow, the values got mixed up with the recent howManyLessThans improvements; fix the confused values, and add a better comment to explain what's happening. Differential Revision: https://reviews.llvm.org/D109465	2021-09-09 12:38:33 -07:00
Eli Friedman	0375734439	[NFC] Add extra test for D106331	2021-09-08 14:18:47 -07:00
Michael Kruse	088577a38e	[Delinerization] Require by offset to be zero. Users of delinearization assume that the the offset into the array element is zero. In most cases it will indeed be zero, but if it is not, the delinearization has to fail since it violates that assumption without the API even allowing to signal to the caller that the by offset is non-zero. This bug caused Polly to miscompile blender (526.blender_r from SPEC CPU 2017) in -polly-process-unprofitable mode. The SCEV expression incorrectly delinearized has been reduced in the test case byte_offset.ll. The dropped offset into the array element of size 4 (a float) is ((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>). This significant component was just dropped, and the wrong pointer was computed when regenerating code from the remaining delinearized subscripts. This occurred during blender's subsurface scattering implementation. As a result, blender's rendering diverged from the reference image. Patch D108885 would also fix the API. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D109133	2021-09-08 16:02:37 -05:00
Arthur Eubanks	b493124ae2	[MemorySSA] Support invariant.group metadata The implementation is mostly copied from MemDepAnalysis. We want to look at all loads and stores to the same pointer operand. Bitcasts and zero GEPs of a pointer are considered the same pointer value. We choose the most dominating instruction. Since updating MemorySSA with invariant.group is non-trivial, for now handling of invariant.group is not cached in any way, so it's part of the walker. The number of loads/stores with invariant.group is small for now anyway. We can revisit if this actually noticeably affects compile times. To avoid invariant.group affecting optimized uses, we need to have optimizeUsesInBlock() not use invariant.group in any way. Co-authored-by: Piotr Padlewski <prazek@google.com> Reviewed By: asbirlea, nikic, Prazek Differential Revision: https://reviews.llvm.org/D109134	2021-09-08 13:06:12 -07:00
Philip Reames	6cdca906c7	[SCEV] Use no-self-wrap flags infered from exit structure to compute trip count The basic problem being solved is that we largely give up when encountering a trip count involving an IV which is not an addrec. We will fall back to the brute force constant eval, but that doesn't have the information about the fact that we can't cycle back through the same set of values. There's a high level design question of whether this is the right place to handle this, and if not, where that place is. The major alternative here would be to return a conservative upper bound, and then rely on two invocations of indvars to add the facts to the narrow IV, and then reconstruct SCEV. (I have not implemented the alternative and am not 100% sure this would work out.) That's arguably more in line with existing code, but I find this substantially easier to reason about. During review, no one expressed a strong opinion, so we went with this one. Differential Revision: D108651	2021-09-07 17:00:02 -07:00
David Sherwood	5dcf4b4fe0	[SVE][NFC] Add SVE cost model tests for gathers/scatters We previously didn't have any tests to defend the cost model for gathers and scatters using SVE without a vscale_range attribute. I've added tests to existing files: Analysis/CostModel/AArch64/sve-gather.ll Analysis/CostModel/AArch64/sve-scatter.ll Differential Revision: https://reviews.llvm.org/D109055	2021-09-07 14:13:37 +01:00
Nikita Popov	8d54c8a0c3	[SCEV] Fix applyLoopGuards() with range check idiom (PR51760) Due to a typo, this replaced %x with umax(C1, umin(C2, %x + C3)) rather than umax(C1, umin(C2, %x)). This didn't make a difference for the existing tests, because the result is only used for range calculation, and %x will usually have an unknown starting range, and the additional offset keeps it unknown. However, if %x already has a known range, we may compute a result range that is too small.	2021-09-06 22:22:41 +02:00
Andrew Litteken	bd4b1b5f6d	[IRSim] Adding support for recognizing branch similarity The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them. This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar. We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter. Reviewers: paquette, yroux Differential Revision: https://reviews.llvm.org/D106989	2021-09-06 11:55:38 -07:00
Simon Pilgrim	f114ef3731	[CostModel][X86] Add generic costs for vXi32 MUL -> v2Xi16 PMADDDW folds Based off the improved fold in D108522 This should eventually allow us to replace the SLM only cost patterns with generic versions.	2021-09-05 16:08:11 +01:00
Simon Pilgrim	9962ebaee5	[CostModel][X86] Add vXi32 multiply pattern tests Add tests for vXi32 multiplies where the operands have been extended from vXi8/vXi16	2021-09-05 16:08:11 +01:00
Arthur Eubanks	bd020bbbd2	[test] Cleanup tests with -enable-new-pm in llvm/test/Analysis	2021-09-04 16:06:10 -07:00
Arthur Eubanks	d896f22fda	[test] Cleanup legacy PM tests in llvm/test/Analyis/ScalarEvolution	2021-09-04 15:57:30 -07:00
Arthur Eubanks	813a7f1ad7	[MemorySSA] Properly handle liveOnEntry in the walker printer Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109177	2021-09-02 12:51:27 -07:00
Arthur Eubanks	a270de359f	[test] Remove missed RUN line after D109040	2021-09-02 11:44:45 -07:00
Arthur Eubanks	50153213c8	[test][NewPM] Remove RUN lines using -analyze Only tests in llvm/test/Analysis. -analyze is legacy PM-specific. This only touches files with `-passes`. I looked through everything and made sure that everything had a new PM equivalent. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D109040	2021-09-02 11:38:14 -07:00
Nikita Popov	c86e1ce73b	[SCEVExpander] Simplify pointer overflow check This is a followup to D104662 to generate slightly nicer code for pointer overflow checks. Bypass expandAddToGEP and instead explicitly generate i8 GEPs. This saves some bitcasts and negates the value in a more obvious way. In particular, this prevents SCEV from looking through the umul.with.overflow, same as in the integer case. The wrapping-pointer-ni.ll test deserves a comment: Previously, this generated a typed GEP which used the umulo argument rather than the multiplication result. This results in more compact IR in that case, but effectively does the multiplication twice, the second one is just hidden in the GEP. Reusing the umulo result seems pretty reasonable to me. Differential Revision: https://reviews.llvm.org/D109093	2021-09-02 20:15:59 +02:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
David Sherwood	d581d94385	[SVE] Fix the FP arithmetic instruction costs for SVE Several FP instructions (fadd, fsub, etc.) were incorrectly assigned a higher cost for SVE because they have custom lowering, however we know they are legal. This patch explicitly assigns a cost of 2 to these opcodes. Tests added here: Analysis/CostModel/AArch64/arith-fp-sve.ll Differential Revision: https://reviews.llvm.org/D108993	2021-09-02 09:55:13 +01:00
Arthur Eubanks	1c503e923a	[test] Precommit/fix up existing test for MemorySSA/invariant.group	2021-09-01 22:58:17 -07:00
Arthur Eubanks	7b08d9da55	Reland [MemorySSA] Add pass to print results of MemorySSA walker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109028	2021-09-01 18:58:57 -07:00
Arthur Eubanks	0f63496ea4	Revert "[MemorySSA] Add pass to print results of MemorySSA walker" This reverts commit `8f98477c2d`. Breaks bots	2021-09-01 18:45:19 -07:00
Arthur Eubanks	8f98477c2d	[MemorySSA] Add pass to print results of MemorySSA walker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109028	2021-09-01 18:29:15 -07:00
Philip Reames	29fa37ec9f	[SCEV] If max BTC is zero, then so is the exact BTC [2 of 2] This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch. The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll Differential Revision: https://reviews.llvm.org/D109015	2021-09-01 11:51:48 -07:00
David Sherwood	f024a4818d	[NFC] Re-run update_analyze_test_checks on Analysis/CostModel/AArch64/sve-intrinsics.ll	2021-09-01 12:09:58 +01:00
David Sherwood	930d5077f4	Revert "[NFC] Re-run update_analyze_test_checks on Analysis/CostModel/AArch64/sve-intrinsics.ll" This reverts commit `aeb2bd68dc`.	2021-09-01 11:52:29 +01:00
David Sherwood	aeb2bd68dc	[NFC] Re-run update_analyze_test_checks on Analysis/CostModel/AArch64/sve-intrinsics.ll	2021-09-01 11:44:02 +01:00
Philip Reames	c49503a76d	[SCEV] Add a testcase for zero max btc with non-constant exact btc Reduced from the ArchiveCommandLine.ll case seen in D108848.	2021-08-31 11:00:41 -07:00
Philip Reames	6600e1759b	[SCEV] If max BTC is zero, then so is the exact BTC [1 of N] This patch is specifically the howManyLessThan case. There will be a couple of followon patches for other codepaths. The subtle bit is explaining why the two codepaths have a difference while both are correct. The test case with modifications is a good example, so let's discuss in terms of it. * The previous exact bounds for this example of (-126 + (126 smax %n))<nsw> can evaluate to either 0 or 1. Both are "correct" results, but only one of them results in a well defined loop. If %n were 127 (the only possible value producing a trip count of 1), then the loop must execute undefined behavior. As a result, we can ignore the TC computed when %n is 127. All other values produce 0. * The max taken count computation uses the limit (i.e. the maximum value END can be without resulting in UB) to restrict the bound computation. As a result, it returns 0 which is also correct. WARNING: The logic above only holds for a single exit loop. The current logic for max trip count would be incorrect for multiple exit loops, except that we never call computeMaxBECountForLT except when we can prove either a) no overflow occurs in this IV before exit, or b) this is the sole exit. An alternate approach here would be to add the limit logic to the symbolic path. I haven't played with this extensively, but I'm hesitant because a) the term is optional and b) I'm not sure it'll reliably simplify away. As such, the resulting code quality from expansion might actually get worse. This was noticed while trying to figure out why D108848 wasn't NFC, but is otherwise standalone. Differential Revision: https://reviews.llvm.org/D108921	2021-08-31 08:50:11 -07:00
Philip Reames	301fbf9b81	[SCEV] Clarify the overflow precondition of computeMaxBECountForLT [NFC] And add a test case to illustrate that we do in fact produce the right result for the multiple exit case. I have gotten myself confused at least three times when reading this code, so clarify to prevent future confusion.	2021-08-30 09:49:17 -07:00
Daniil Fukalov	5b3fad4966	[AMDGPU][CostModel] Update shuffle instruction tests. NFC. New tests ported over from test/Analysis/CostModel/AArch64/shuffle-other.ll.	2021-08-30 19:17:27 +03:00
Matthew Devereau	9b830c798e	[AArch64][SVE] Teach cost model masked gathers/scatters are cheap Tell the cost model to use the scalable calculation for non-neon fixed vector. This results in a cheaper cost for fixed-length SVE masked gathers/scatters allowing the vectorizor to emit them more frequently.	2021-08-26 11:17:47 +01:00
Philip Reames	4d235bf75d	[tests] Add a couple tests for intersection of `ec8d87e` and D108651	2021-08-24 14:29:36 -07:00
Philip Reames	ec8d87e9f5	[SCEV] Infer nuw from nw for addrecs This was previously committed in `914836b`, and reverted due to confusion on the status of the review. Differential Revision: https://reviews.llvm.org/D108601	2021-08-24 14:24:05 -07:00
Philip Reames	35b0b1a64a	[test] Prcommit tests for D108651	2021-08-24 14:18:58 -07:00
Philip Reames	58582bae63	Revert "[SCEV] Infer nsw/nuw from nw for addrecs" This reverts commit `914836b1c8`. Further comments on review came up after initial approval. Reverting while addressing.	2021-08-24 09:28:37 -07:00
Philip Reames	914836b1c8	[SCEV] Infer nsw/nuw from nw for addrecs If we no an addrec doesn't self-wrap, the increment is strictly positive, and the start value is the smallest representable value, then we know that the corresponding wrap type can not occur. Differential Revision: https://reviews.llvm.org/D108601	2021-08-24 08:53:21 -07:00
Simon Pilgrim	9efda541bf	[CostModel][X86] Add costs for f32/f64 scalar and vector types. The f16 half types are still pretty useless as we don't have it as a legal type (we treat them as i16 most of the time)	2021-08-20 14:31:12 +01:00
Bjorn Pettersson	d52f506192	[NewPM] Use parameterized syntax for a couple of more passes A couple of passes that are parameterized in new-PM used different pass names (in cmd line interface) while using the same pass class name. This patch updates the PassRegistry to model pass parameters more properly using PASS_WITH_PARAMS. Reason for the change is to ensure that we have a 1-1 mapping between class name and pass name (when disregarding the params). With a 1-1 mapping it is more obvious which pass name to use in options such as -debug-only, -print-after etc. The opt -passes syntax is changed for the following passes: early-cse-memssa => early-cse<memssa> post-inline-ee-instrument => ee-instrument<post-inline> loop-extract-single => loop-extract<single> lower-matrix-intrinsics-minimal => lower-matrix-intrinsics<minimal> This patch is not updating pass names in docs/Passes.rst. Not quite sure what the status is for that document (e.g. when it comes to listing pass paramters). It is only loop-extract-single that is mentioned in Passes.rst today, out of the passes mentioned above. Differential Revision: https://reviews.llvm.org/D108362	2021-08-20 14:59:21 +02:00
Simon Pilgrim	72ebcd3198	[CostModel][X86] Add isnan half/float/double costs tests	2021-08-19 18:07:06 +01:00
Simon Pilgrim	9419729b6a	[CostModel][X86] Add VPOPCNTDQ/BITALG ctpop costs VPOPCNTDQ + BITALG add ctpop instructions for vXi64/vXi32 + vXi16/vXi8 vector types respectively	2021-08-19 15:40:09 +01:00
Simon Pilgrim	2d60fdd7aa	[CostModel][X86] Add VPOPCNT/BITALG test coverage for ctpop/cttz costs	2021-08-19 14:05:58 +01:00
Matthew Devereau	734708e04f	[AArch64][SVE] Teach cost model that masked loads/stores are cheap Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.	2021-08-19 13:01:33 +01:00
Peter Collingbourne	6f85225ef3	StackLifetime: Remove asserts for multiple lifetime intrinsics. According to the langref, it is valid to have multiple consecutive lifetime start or end intrinsics on the same object. For llvm.lifetime.start: "If ptr [...] is a stack object that is already alive, it simply fills all bytes of the object with poison." For llvm.lifetime.end: "Calling llvm.lifetime.end on an already dead alloca is no-op." However, we currently fail an assertion in such cases. I've observed the assertion failure when the loop vectorization pass duplicates the intrinsic. We can conservatively handle these intrinsics by ignoring all but the first one, which can be implemented by removing the assertions. Differential Revision: https://reviews.llvm.org/D108337	2021-08-18 18:45:28 -07:00
Nikita Popov	3dd8c9176b	[LICM] Remove AST-based implementation MSSA-based LICM has been enabled by default for a few years now. This drops the old AST-based implementation. Using loop(licm) will result in a fatal error, the use of loop-mssa(licm) is required (or just licm, which defaults to loop-mssa). Note that the core canSinkOrHoistInst() logic has to retain AST support for now, because it is shared with LoopSink. Differential Revision: https://reviews.llvm.org/D108244	2021-08-18 20:21:53 +02:00
David Sherwood	219d4518fc	[Analysis][AArch64] Make fixed-width ordered reductions slightly more expensive For tight loops like this: float r = 0; for (int i = 0; i < n; i++) { r += a[i]; } it's better not to vectorise at -O3 using fixed-width ordered reductions on AArch64 targets. Although the resulting number of instructions in the generated code ends up being comparable to not vectorising at all, there may be additional costs on some CPUs, for example perhaps the scheduling is worse. It makes sense to deter vectorisation in tight loops. Differential Revision: https://reviews.llvm.org/D108292	2021-08-18 17:01:56 +01:00
Dylan Fleming	ef198cd99e	[SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute Removed AArch64 usage of the getMaxVScale interface, replacing it with the vscale_range(min, max) IR Attribute. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D106277	2021-08-17 14:42:47 +01:00
Nikita Popov	735a590471	[MemorySSA] Remove -enable-mssa-loop-dependency option This option has been enabled by default for quite a while now. The practical impact of removing the option is that MSSA use cannot be disabled in default pipelines (both LPM and NPM) and in manual LPM invocations. NPM can still choose to enable/disable MSSA using loop vs loop-mssa. The next step will be to require MSSA for LICM and drop the AST-based implementation entirely. Differential Revision: https://reviews.llvm.org/D108075	2021-08-16 20:59:37 +02:00
Nikita Popov	e11354c0a4	[Tests] Remove explicit -enable-mssa-loop-dependency options (NFC) This is enabled by default. Drop explicit uses in preparation for removing the option. Also drop RUN lines that are now the same (typically modulo a -verify-memoryssa option).	2021-08-14 21:21:07 +02:00
Florian Hahn	f999312872	Recommit "[Matrix] Overload stride arg in matrix.columnwise.load/store." This reverts the revert `28c04794df`. The failing MLIR test that caused the revert should be fixed in this version. Also includes a PPC test fix previously in `1f87c7c478`.	2021-08-12 18:31:57 +01:00
Florian Hahn	a72cd6353c	Revert "[Matrix] Update column.major.load call in PPC test." Dependent commit `a1ef81de35` has been reverted in `a1ef81de35`.	2021-08-12 13:13:52 +01:00
Florian Hahn	1f87c7c478	[Matrix] Update column.major.load call in PPC test. `a1ef81de35` adjusted the definition of the intrinsic, but did not update a PowerPC test. Fix the test by updating the call & declaration of @llvm.matrix.column.major.load.	2021-08-12 11:26:33 +01:00
Archibald Elliott	b764b1ef2f	[NFC][X86] New Test Requires Asserts D105263 introduced this new test. It fails when asserts are disabled, due to using a debug option on opt. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D107805	2021-08-10 10:22:04 +01:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Nikita Popov	88003cea1c	[MemCpyOpt] Remove MemDepAnalysis-based implementation The MemorySSA-based implementation has been enabled for a few months (since D94376). This patch drops the old MDA-based implementation entirely. I've kept this to only the basic cleanup of dropping various conditions -- the code could be further cleaned up now that there is only one implementation. Differential Revision: https://reviews.llvm.org/D102113	2021-08-07 22:35:44 +02:00
Zheng Chen	30b0c455b1	[LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D107618	2021-08-06 16:51:09 +00:00
David Green	649cf4514d	[AArch64] Expand the SVE min/max reduction costs to NEON This takes the existing SVE costing for the various min/max reduction intrinsics and expands it to NEON, where I believe it applies equally well. In the process it changes the lowering to use min/max cost, as opposed to summing up the cost of ICmp+Select. Differential Revision: https://reviews.llvm.org/D106239	2021-08-05 23:23:24 +01:00
Bardia Mahjour	0e08891ec1	[DA] control compile-time spent by MIV tests Function exploreDirections() in DependenceAnalysis implements a recursive algorithm for refining direction vectors. This algorithm has worst-case complexity of O(3^(n+1)) where n is the number of common loop levels. In this patch I'm adding a threshold to control the amount of time we spend in doing MIV tests (which most of the time end up resulting in over pessimistic direction vectors anyway). Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D107159	2021-08-05 09:50:11 -04:00
Irina Dobrescu	b01417d3c5	[AArch64] Optimise min/max lowering in ISel Differential Revision: https://reviews.llvm.org/D106561	2021-08-02 13:40:21 +01:00
Sjoerd Meijer	46a861af3d	[CostModel][AArch64] Add some shuffle concat tests. NFC. Test ported over from test/Analysis/CostModel/ARM/shuffle.ll.	2021-08-02 12:11:00 +01:00

... 2 3 4 5 6 ...

3190 Commits