llvm-project

Commit Graph

Author	SHA1	Message	Date
Piotr Sobczak	67b979466a	[InstCombine][AMDGPU] Simplify tbuffer loads Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66926 llvm-svn: 370475	2019-08-30 14:20:04 +00:00
Haojian Wu	ed170c9bf9	Remove an extra ";", NFC. llvm-svn: 370465	2019-08-30 12:09:31 +00:00
Hideto Ueno	6381b143f6	[Attributor] Implement AANoAliasCallSiteArgument initialization Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66927 llvm-svn: 370456	2019-08-30 10:00:32 +00:00
Roman Lebedev	5c9f3cfec7	[LoopIdiomRecognize] BCmp loop idiom recognition Summary: @mclow.lists brought up this issue up in IRC. It is a reasonably common problem to compare some two values for equality. Those may be just some integers, strings or arrays of integers. In C, there is `memcmp()`, `bcmp()` functions. In C++, there exists `std::equal()` algorithm. One can also write that function manually. libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ libc++ does not do anything like that, it simply relies on simple C++'s `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!) So likely, there exists a certain performance opportunities. Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that is using `memcmp()` (in this case, compiled with modified compiler). {F8768213} ``` #include <algorithm> #include <cmath> #include <cstdint> #include <iterator> #include <limits> #include <random> #include <type_traits> #include <utility> #include <vector> #include "benchmark/benchmark.h" template <class T> bool equal(T* a, T* a_end, T* b) noexcept { for (; a != a_end; ++a, ++b) { if (a != b) return false; } return true; } template <typename T> std::vector<T> getVectorOfRandomNumbers(size_t count) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(), std::numeric_limits<T>::max()); std::vector<T> v; v.reserve(count); std::generate_n(std::back_inserter(v), count, [&dis, &gen]() { return dis(gen); }); assert(v.size() == count); return v; } struct Identical { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto Tmp = getVectorOfRandomNumbers<T>(count); return std::make_pair(Tmp, std::move(Tmp)); } }; struct InequalHalfway { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto V0 = getVectorOfRandomNumbers<T>(count); auto V1 = V0; V1[V1.size() / size_t(2)]++; // just change the value. return std::make_pair(std::move(V0), std::move(V1)); } }; template <class T, class Gen> void BM_bcmp(benchmark::State& state) { const size_t Length = state.range(0); const std::pair<std::vector<T>, std::vector<T>> Data = Gen::template Gen<T>(Length); const std::vector<T>& a = Data.first; const std::vector<T>& b = Data.second; assert(a.size() == Length && b.size() == a.size()); benchmark::ClobberMemory(); benchmark::DoNotOptimize(a); benchmark::DoNotOptimize(a.data()); benchmark::DoNotOptimize(b); benchmark::DoNotOptimize(b.data()); for (auto _ : state) { const bool is_equal = equal(a.data(), a.data() + a.size(), b.data()); benchmark::DoNotOptimize(is_equal); } state.SetComplexityN(Length); state.counters["eltcnt"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant); state.counters["eltcnt/sec"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate); const size_t BytesRead = 2 * sizeof(T) * Length; state.counters["bytes_read/iteration"] = benchmark::Counter(BytesRead, benchmark::Counter::kDefaults, benchmark::Counter::OneK::kIs1024); state.counters["bytes_read/sec"] = benchmark::Counter( BytesRead, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); } template <typename T> static void CustomArguments(benchmark::internal::Benchmark* b) { const size_t L2SizeBytes = []() { for (const benchmark::CPUInfo::CacheInfo& I : benchmark::CPUInfo::Get().caches) { if (I.level == 2) return I.size; } return 0; }(); // What is the largest range we can check to always fit within given L2 cache? const size_t MaxLen = L2SizeBytes / /total bufs/ 2 / /maximal elt size/ sizeof(T) / /safety margin/ 2; b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN); } BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway) ->Apply(CustomArguments<uint64_t>); ``` {F8768210} ``` $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx 2019-04-25 21:17:11 Running build-old/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 0.65, 3.90, 4.14 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101 ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N BM_bcmp<uint8_t, Identical>_RMS 8 % 8 % <...> BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409 ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N BM_bcmp<uint16_t, Identical>_RMS 25 % 25 % <...> BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488 ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N BM_bcmp<uint32_t, Identical>_RMS 42 % 42 % <...> BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138 ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N BM_bcmp<uint64_t, Identical>_RMS 27 % 27 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392 ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860 ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140 ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099 ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 % RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0 2019-04-25 21:19:29 Running build-new/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.01, 2.85, 3.71 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590 ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N BM_bcmp<uint8_t, Identical>_RMS 37 % 37 % <...> BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948 ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N BM_bcmp<uint16_t, Identical>_RMS 34 % 34 % <...> BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627 ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N BM_bcmp<uint32_t, Identical>_RMS 35 % 35 % <...> BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855 ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N BM_bcmp<uint64_t, Identical>_RMS 33 % 33 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569 ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547 ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394 ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498 ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 % Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 -0.9570 -0.9570 432131 18593 432101 18590 <...> BM_bcmp<uint16_t, Identical>/256000 -0.8826 -0.8826 161408 18950 161409 18948 <...> BM_bcmp<uint32_t, Identical>/128000 -0.7714 -0.7714 81497 18627 81488 18627 <...> BM_bcmp<uint64_t, Identical>/64000 -0.6239 -0.6239 50138 18855 50138 18855 <...> BM_bcmp<uint8_t, InequalHalfway>/512000 -0.9503 -0.9503 192405 9570 192392 9569 <...> BM_bcmp<uint16_t, InequalHalfway>/256000 -0.9253 -0.9253 127858 9547 127860 9547 <...> BM_bcmp<uint32_t, InequalHalfway>/128000 -0.8088 -0.8088 49140 9396 49140 9394 <...> BM_bcmp<uint64_t, InequalHalfway>/64000 -0.7041 -0.7041 32101 9499 32099 9498 ``` What can we tell from the benchmark? * Performance of naive equality check somewhat improves with element size, maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s for uint64_t. I think, that instability implies performance problems. * Performance of `memcmp()`-aware benchmark always maxes out at around bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the naive variant! * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and linearly decreases with element size. For uint64_t, it's ~4x+ the elements/second. * The call obvious is more pricey than the loop, with small element count. As it can be seen from the full output {F8768210}, the `memcmp()` is almost universally worse, independent of the element size (and thus buffer size) when element count is less than 8. So all in all, bcmp idiom does indeed pose untapped performance headroom. This diff does implement said idiom recognition. I think a reasonable test coverage is present, but do tell if there is anything obvious missing. Now, quality. This does succeed to build and pass the test-suite, at least without any non-bundled elements. {F8768216} {F8768217} This transform fires 91 times: ``` $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json Tests: 1149 Metric: loop-idiom.NumBCmp Program result-new MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00 MultiSource/Applications/d/make_dparser 3.00 SingleSource/UnitTests/vla 2.00 MultiSource/Applications/Burg/burg 1.00 MultiSourc.../Applications/JM/lencod/lencod 1.00 MultiSource/Applications/lemon/lemon 1.00 MultiSource/Benchmarks/Bullet/bullet 1.00 MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00 MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00 MultiSourc...Prolangs-C/simulator/simulator 1.00 ``` The size changes are: I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look. ``` $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash Tests: 1149 Same hash: 907 (filtered out) Remaining: 242 Metric: size..text Program result-old result-new diff test-suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6% test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -3.5% test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -0.1% test-suite...plications/d/make_dparser.test 89585.00 89505.00 -0.1% test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -0.1% test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -0.1% test-suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0% test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -0.0% test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -0.0% test-suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0% test-suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan% test-suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan% test-suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan% Geomean difference nan% result-old result-new diff count 1.000000e+01 10.00000 10.000000 mean 3.152090e+05 311695.40000 0.006749 std 3.790398e+05 372091.42232 0.036605 min 7.530000e+02 833.00000 -0.034981 25% 4.243300e+04 42401.00000 -0.000866 50% 1.197370e+05 119689.00000 -0.000392 75% 6.397050e+05 639705.00000 -0.000005 max 1.001697e+06 966657.00000 0.106242 ``` I don't have timings though. And now to the code. The basic idea is to completely replace the whole loop. If we can't fully kill it, don't transform. I have left one or two comments in the code, so hopefully it can be understood. Also, there is a few TODO's that i have left for follow-ups: * widening of `memcmp()`/`bcmp()` * step smaller than the comparison size * Metadata propagation * more than two blocks as long as there is still a single backedge? * ??? Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet Reviewed By: courbet Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists Tags: #llvm Differential Revision: https://reviews.llvm.org/D61144 llvm-svn: 370454	2019-08-30 09:51:23 +00:00
Sanjay Patel	65f1c04000	[InstCombine] reduce duplicated code; NFC llvm-svn: 370399	2019-08-29 19:36:18 +00:00
Alina Sbirlea	4b87023bae	Revert enabling MemorySSA. Breaks sanitizers bots. Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370397	2019-08-29 19:01:23 +00:00
Florian Hahn	f9cdb98f40	[LoopUnrollAndJam] Use Lazy strategy for DTU. We can also apply the earlier updates to the lazy DTU, instead of applying them directly. Reviewers: kuhar, brzycki, asbirlea, SjoerdMeijer Reviewed By: brzycki, asbirlea, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D66918 llvm-svn: 370391	2019-08-29 17:47:58 +00:00
Alina Sbirlea	6289ee941d	[MemorySSA & LoopPassManager] Enable MemorySSA as loop dependency. Update tests. Summary: I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance. The feedback obtains here will guide the next steps towards enabling this. This patch enables the use of MemorySSA in the loop pass manager. Passes that currently use MemorySSA: - EarlyCSE Passes that use MemorySSA after this patch: - EarlyCSE - LICM - SimpleLoopUnswitch Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch): - LoopInstSimplify - LoopSimplifyCFG - LoopUnswitch - LoopRotate - LoopSimplify - LCSSA Loop passes that do not update MemorySSA: - IndVarSimplify - LoopDelete - LoopIdiom - LoopSink - LoopUnroll - LoopInterchange - LoopUnrollAndJam - LoopVectorize - LoopReroll - IRCE Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370384	2019-08-29 17:08:13 +00:00
Michael Liao	001871dee8	[SimplifyCFG] Skip sinking common lifetime markers of `alloca`. Summary: - Similar to the workaround in fix of PR30188, skip sinking common lifetime markers of `alloca`. They are mostly left there after inlining functions in branches. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66950 llvm-svn: 370376	2019-08-29 16:12:05 +00:00
Roman Lebedev	9f35d2b564	[SimplifyCFG] FoldTwoEntryPHINode(): don't bailout on i1 PHI's if we can hoist a 'not' from incoming values Summary: As it can be seen in the tests in D65143/D65144, even though we have formed an '@llvm.umul.with.overflow' and got rid of potential for division-by-zero, the control flow remains, we still have that branch. We have this condition: ``` // Don't fold i1 branches on PHIs which contain binary operators // These can often be turned into switches and other things. if (PN->getType()->isIntegerTy(1) && (isa<BinaryOperator>(PN->getIncomingValue(0)) \|\| isa<BinaryOperator>(PN->getIncomingValue(1)) \|\| isa<BinaryOperator>(IfCond))) return false; ``` which was added back in rL121764 to help with `select` formation i think? That check prevents us to flatten the CFG here, even though we know we no longer need that guard and will be able to drop everything but the '@llvm.umul.with.overflow' + `not`. As it can be seen from tests, we end here because the `not` is being sinked into the PHI's incoming values by InstCombine, so we can't workaround this by hoisting it to after PHI. Thus i suggest that we relax that check to not bailout if we'd get to hoist the `not`. Reviewers: craig.topper, spatel, fhahn, nikic Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65147 llvm-svn: 370349	2019-08-29 12:47:34 +00:00
Roman Lebedev	473a063a5e	[InstCombine] Fold '((%x * %y) u/ %x) != %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `((%x * %y) u/ %x) != %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` $ /repositories/alive2/build-Clang-unknown/alive -root-only ~/llvm-patch1.ll Processing /home/lebedevri/llvm-patch1.ll.. ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp ne i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %r = extractvalue {i4, i1} %n0, 1 ret %r Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp eq i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 ret i1 %r Done: 1 Optimization is correct! ``` Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65144 llvm-svn: 370348	2019-08-29 12:47:20 +00:00
Roman Lebedev	fb38b7aab3	[InstCombine] Fold '(-1 u/ %x) u< %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `(-1 u/ %x) u< %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` ---------------------------------------- Name: no overflow %o0 = udiv i4 -1, %x %r = icmp ult i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow, swapped %o0 = udiv i4 -1, %x %r = icmp ugt i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp uge i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp ule i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ``` As it can be observed from tests, while simply forming the `@llvm.umul.with.overflow` is easy, if we were looking for the inverted answer, then more work needs to be done to cleanup the now-pointless control-flow that was guarding against division-by-zero. This is being addressed in follow-up patches. Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic, xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65143 llvm-svn: 370347	2019-08-29 12:47:08 +00:00
Roman Lebedev	f13b0e3ed8	[InstCombine] Shift amount reassociation in bittest: trunc-of-lshr (PR42399) Summary: Finally, the fold i was looking forward to :) The legality check is muddy, i doubt i've groked the full generalization, but it handles all the cases i care about, and can come up with: https://rise4fun.com/Alive/26j I.e. we can perform the fold if any of the following is true: * The shift amount is either zero or one less than widest bitwidth * Either of the values being shifted has at most lowest bit set * The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount; * The value that is being shifted by `lshr` (which is truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one I strongly suspect there is some better generalization, but i'm not aware of it as of right now. For now i also avoided using actual `computeKnownBits()`, but restricted it to constants. Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66383 llvm-svn: 370324	2019-08-29 10:26:23 +00:00
Simon Pilgrim	920b04011b	Fix variable set but no used warnings on NDEBUG builds. NFCI. llvm-svn: 370319	2019-08-29 10:08:45 +00:00
Simon Pilgrim	ef9c6a7077	Fix variable set but no used warning on NDEBUG builds. NFCI. llvm-svn: 370317	2019-08-29 09:58:47 +00:00
Hideto Ueno	cbab334e40	[Attributor] Deduce "noalias" attribute Summary: This patch adds very basic deduction for noalias. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Tags: LLVM Differential Revision: https://reviews.llvm.org/D66207 llvm-svn: 370295	2019-08-29 05:52:00 +00:00
Florian Hahn	3177b92231	[LoopUnroll] Use Lazy strategy for DTU used for MergeBlockIntoPredecessor. We do not access the DT in the loop, so we do not have to apply updates eagerly. We can apply them lazyly and flush them after we are done merging blocks. As follow-up work, we might be able to use the DTU above as well, instead of manually updating the DT. This brings the example from PR43134 from ~100s to ~4s for a relase + assertions build on my machine. Reviewers: efriedma, kuhar, asbirlea, brzycki Reviewed By: kuhar, brzycki Differential Revision: https://reviews.llvm.org/D66911 llvm-svn: 370292	2019-08-29 04:26:29 +00:00
Johannes Doerfert	bf112139ac	[Attributor] Improve messages in iteration verify mode When we now verify the iteration count we will see the actual count and the expected count before the assertion is triggered. llvm-svn: 370285	2019-08-29 01:29:44 +00:00
Johannes Doerfert	62a9c1da78	[Attributor][Fix] Indicate change correctly llvm-svn: 370283	2019-08-29 01:26:58 +00:00
Johannes Doerfert	1aac182f31	[Attributor] Fix typo llvm-svn: 370282	2019-08-29 01:26:09 +00:00
Artur Pilipenko	925afc1ce7	Fix for "DICompileUnit not listed in llvm.dbg.cu" verification error after ... ...cloning a function from a different module Currently when a function with debug info is cloned from a different module, the cloned function may have hanging DICompileUnits, so that the module with the cloned function fails debug info verification. The proposed fix inserts all DICompileUnits reachable from the cloned function to "llvm.dbg.cu" metadata operands of the cloned function module. Reviewed By: aprantl, efriedma Differential Revision: https://reviews.llvm.org/D66510 Patch by Oleg Pliss (Oleg.Pliss@azul.com) llvm-svn: 370265	2019-08-28 21:27:50 +00:00
Julian Lettner	3ae9b9d5e4	[ASan] Make insertion of version mismatch guard configurable By default ASan calls a versioned function `__asan_version_mismatch_check_vXXX` from the ASan module constructor to check that the compiler ABI version and runtime ABI version are compatible. This ensures that we get a predictable linker error instead of hard-to-debug runtime errors. Sometimes, however, we want to skip this safety guard. This new command line option allows us to do just that. rdar://47891956 Reviewed By: kubamracek Differential Revision: https://reviews.llvm.org/D66826 llvm-svn: 370258	2019-08-28 20:40:55 +00:00
Sanjay Patel	2d4b6777c4	[InstCombine] clean up wrap propagation for reassociated ops; NFCI Always true/false checks were flagged by static analysis; https://bugs.llvm.org/show_bug.cgi?id=43143 I have not confirmed the logic difference in propagating nsw vs. nuw, but presumably we would have noticed a bug by now if that was wrong. llvm-svn: 370248	2019-08-28 18:58:06 +00:00
Pirama Arumuga Nainar	19205abaaa	[ValueMapper] NFC: Remove dead code to pause metadata mapping Summary: This functionality was added when Mapper::mapMetadata was recursive. It is no longer needed after r265456, which switched it to be iterative. Reviewers: dexonsmith, srhines Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66860 llvm-svn: 370236	2019-08-28 17:43:14 +00:00
Johannes Doerfert	f7ca0fe1c8	[Attributor] Regularly clear dependences to remove spurious ones As dependences between abstract attributes can become stale, e.g., if one was sufficient to imply another one at some point but it has since been wakened to the point it is not usable for the formerly implied one. To weed out spurious dependences, and thereby eliminate unneeded updates, we introduce an option to determine how often the dependence cache is cleared and recomputed during the fixpoint iteration. Note that the initial value was determined such that we see a positive result on our tests. Differential Revision: https://reviews.llvm.org/D63315 llvm-svn: 370230	2019-08-28 16:58:52 +00:00
Simon Pilgrim	1d8a886c59	Reduce scope of variable only used in a local pattern match. NFCI. llvm-svn: 370224	2019-08-28 16:06:33 +00:00
Craig Topper	f79d8a064c	[InstCombine] Disable recursion in foldGEPICmp for vector pointer GEPs Due to missing vector support in this function, recursion can generate worse code in some cases. llvm-svn: 370221	2019-08-28 15:40:34 +00:00
Simon Pilgrim	b569624049	Fix uninitialized variable warning in cppcheck. NFCI. InstCombiner::MaxArraySizeForCombine is set outside the constructor so we need to ensure it has a default initialization value. llvm-svn: 370220	2019-08-28 15:19:49 +00:00
David Bolvansky	af118bb6d0	[NFC] Added a comment to avoid possible confusion llvm-svn: 370217	2019-08-28 15:04:48 +00:00
Simon Pilgrim	316bfb0f48	Remove duplicate 'BitWidth' variable. NFCI. llvm-svn: 370212	2019-08-28 14:37:44 +00:00
Johannes Doerfert	07a5c129c6	[Attributor] Restrict liveness and return information to functions Summary: Until we have proper call-site information we should not recompute liveness and return information for each call site. This patch directly uses the function versions and introduces TODOs at the usage sites. The required iterations to get to the fixpoint are most of the time reduced by this change and we always avoid work duplication. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66562 llvm-svn: 370208	2019-08-28 14:09:14 +00:00
Simon Pilgrim	284118ce3b	InstCombiner::visitSelectInst - rename Pred to MinMaxPred to stop shadow variable warning. NFCI. We have a lot of Predicate variables, all similarly named.... llvm-svn: 370207	2019-08-28 14:05:38 +00:00
Ayal Zaks	d15df0ede5	[LV] Fold tail by masking - handle reductions Allow vectorizing loops that have reductions when tail is folded by masking. A select is introduced in VPlan, choosing between the last value carried by the loop-exit/live-out instruction of the reduction, and the penultimate value carried by the reduction phi, according to the "i < n" mask of fold-tail. This select replaces the last value as the live-out value of the loop. Differential Revision: https://reviews.llvm.org/D66720 llvm-svn: 370173	2019-08-28 09:02:23 +00:00
David Bolvansky	05bda8b4e5	Annotate return values of allocation functions with dereferenceable_or_null Summary: Example define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6 ret i8* %call } Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: aaron.ballman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66651 llvm-svn: 370168	2019-08-28 08:28:20 +00:00
Fangrui Song	6964027315	[LoopFusion] Fix another -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build llvm-svn: 370156	2019-08-28 03:12:40 +00:00
Craig Topper	5bbb604bb5	[InstCombine] Disable some portions of foldGEPICmp for GEPs that return a vector of pointers. Fix other portions. llvm-svn: 370114	2019-08-27 21:38:56 +00:00
Philip Reames	2694522f13	[Loads/SROA] Remove blatantly incorrect code and fix a bug revealed in the process The code we had isSafeToLoadUnconditionally was blatantly wrong. This function takes a "Size" argument which is supposed to describe the span loaded from. Instead, the code use the size of the pointer passed (which may be unrelated!) and only checks that span. For any Size > LoadSize, this can and does lead to miscompiles. Worse, the generic code just a few lines above correctly handles the cases which are valid. So, let's delete said code. Removing this code revealed two issues: 1) As noted by jdoerfert the removed code incorrectly handled external globals. The test update in SROA is to stop testing incorrect behavior. 2) SROA was confusing bytes and bits, but this wasn't obvious as the Size parameter was being essentially ignored anyway. Fixed. Differential Revision: https://reviews.llvm.org/D66778 llvm-svn: 370102	2019-08-27 19:34:43 +00:00
David Bolvansky	0c2692108c	[InstCombine] Fold select with ctlz to cttz Summary: Handle pattern [0]: int ctz(unsigned int a) { int c = __clz(a & -a); return a ? 31 - c : c; } In reality, the compiler can generate much better code for cttz, so fold away this pattern. https://godbolt.org/z/c5kPtV [0] https://community.arm.com/community-help/f/discussions/2114/count-trailing-zeros Reviewers: spatel, nikic, lebedev.ri, dmgreen, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66308 llvm-svn: 370037	2019-08-27 10:22:40 +00:00
Johannes Doerfert	39681e733c	[Attributor] Introduce an API to delete stuff Summary: During the fixpoint iteration, including the manifest stage, we should not delete stuff as other abstract attributes might have a reference to the value. Through the API this can now be done safely at the very end. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66779 llvm-svn: 370014	2019-08-27 04:57:54 +00:00
Vitaly Buka	aeca56964f	msan, codegen, instcombine: Keep more lifetime markers used for msan Reviewers: eugenis Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits Tags: #clang, #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D66695 llvm-svn: 369979	2019-08-26 22:15:50 +00:00
Evgeniy Stepanov	ed4fefb0df	[hwasan] Fix test failure in r369721. Try harder to emulate "old runtime" in the test. To get the old behavior with the new runtime library, we need both disable personality function wrapping and enable landing pad instrumentation. llvm-svn: 369977	2019-08-26 21:44:55 +00:00
Philip Reames	cf3b555973	Add a clarify comment for meaning of SafePointes [NFC] Extracted from D66688 as requested. llvm-svn: 369962	2019-08-26 20:48:35 +00:00
Philip Reames	b92c971099	[InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null for vectors Extend the transform introduced in https://reviews.llvm.org/D66608 to work for vector geps as well. Differential Revision: https://reviews.llvm.org/D66671 llvm-svn: 369949	2019-08-26 19:11:49 +00:00
Johannes Doerfert	b504eb8bb5	[Attributor] Adjust and test the iteration bound of tests Summary: Try to verify how many iterations we need for a fixpoint in our tests. This patch adjust the way we count to make it easier to follow. It also adjusts the bounds to actually account for a fixpoint and not only the minimum number to pass all checks. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66757 llvm-svn: 369945	2019-08-26 18:55:47 +00:00
Johannes Doerfert	a4a308cc25	[Attributor] Further cut down on non-determinism llvm-svn: 369936	2019-08-26 17:51:23 +00:00
Johannes Doerfert	19b0043641	[Attributor] Allow explicit dependence tracking By default, the Attributor tracks potential dependences between abstract attributes based on the issued Attributor::getAAFor queries. This simplifies the development of new abstract attributes but it can also lead to spurious dependences that might increase compile time and make internalization harder (D63312). With this patch, abstract attributes can opt-out of implicit dependence tracking and instead register dependences explicitly. It is up to the implementation to make sure all existing dependences are registered. Differential Revision: https://reviews.llvm.org/D63314 llvm-svn: 369935	2019-08-26 17:48:05 +00:00
Bjorn Pettersson	d804bd17de	[LoopUnroll] Handle certain PHIs in full unrolling properly Summary: When reconstructing the CFG of the loop after unrolling, LoopUnroll could in some cases remove the phi operands of loop-carried values instead of preserving them, resulting in undef phi values after loop unrolling. When doing this reconstruction, avoid removing incoming phi values for phis in the successor blocks if the successor is the block we are jumping to anyway. Patch-by: ebevhan Reviewers: fhahn, efriedma Reviewed By: fhahn Subscribers: bjope, lebedev.ri, zzheng, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66334 llvm-svn: 369886	2019-08-26 09:29:53 +00:00
Roman Lebedev	9cf08c6de1	[Constant] Add 'isElementWiseEqual()' method Promoting it from InstCombine's tryToReuseConstantFromSelectInComparison(). Return true if this constant and a constant 'Y' are element-wise equal. This is identical to just comparing the pointers, with the exception that for vectors, if only one of the constants has an `undef` element in some lane, the constants still match. llvm-svn: 369842	2019-08-24 06:49:51 +00:00
Roman Lebedev	de19f749e0	[InstCombine] matchThreeWayIntCompare(): commutativity awareness Summary: `matchThreeWayIntCompare()` looks for ``` select i1 (a == b), i32 Equal, i32 (select i1 (a < b), i32 Less, i32 Greater) ``` but both of these selects/compares can be in it's commuted form, so out of 8 variants, only the two most basic ones is handled. This fixes regression being introduced in D66232. Reviewers: spatel, nikic, efriedma, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66607 llvm-svn: 369841	2019-08-24 06:49:36 +00:00
Roman Lebedev	2c75fe7f2a	[InstCombine] Try to reuse constant from select in leading comparison Summary: If we have e.g.: ``` %t = icmp ult i32 %x, 65536 %r = select i1 %t, i32 %y, i32 65535 ``` the constants `65535` and `65536` are suspiciously close. We could perform a transformation to deduplicate them: ``` Name: ult %t = icmp ult i32 %x, 65536 %r = select i1 %t, i32 %y, i32 65535 => %t.inv = icmp ugt i32 %x, 65535 %r = select i1 %t.inv, i32 65535, i32 %y ``` https://rise4fun.com/Alive/avb While this may seem esoteric, this should certainly be good for vectors (less constant pool usage) and for opt-for-size - need to have only one constant. But the real fun part here is that it allows further transformation, in particular it finishes cleaning up the `clamp` folding, see e.g. `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`. We start with e.g. ``` %dont_need_to_clamp_positive = icmp sle i32 %X, 32767 %dont_need_to_clamp_negative = icmp sge i32 %X, -32768 %clamp_limit = select i1 %dont_need_to_clamp_positive, i32 -32768, i32 32767 %dont_need_to_clamp = and i1 %dont_need_to_clamp_positive, %dont_need_to_clamp_negative %R = select i1 %dont_need_to_clamp, i32 %X, i32 %clamp_limit ``` without this patch we currently produce ``` %1 = icmp slt i32 %X, 32768 %2 = icmp sgt i32 %X, -32768 %3 = select i1 %2, i32 %X, i32 -32768 %R = select i1 %1, i32 %3, i32 32767 ``` which isn't really a `clamp` - both comparisons are performed on the original value, this patch changes it into ``` %1.inv = icmp sgt i32 %X, 32767 %2 = icmp sgt i32 %X, -32768 %3 = select i1 %2, i32 %X, i32 -32768 %R = select i1 %1.inv, i32 32767, i32 %3 ``` and then the magic happens! Some further transform finishes polishing it and we finally get: ``` %t1 = icmp sgt i32 %X, -32768 %t2 = select i1 %t1, i32 %X, i32 -32768 %t3 = icmp slt i32 %t2, 32767 %R = select i1 %t3, i32 %t2, i32 32767 ``` which is beautiful and just what we want. Proofs for `getFlippedStrictnessPredicateAndConstant()` for de-canonicalization: https://rise4fun.com/Alive/THl Proofs for the fold itself: https://rise4fun.com/Alive/THl Reviewers: spatel, dmgreen, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66232 llvm-svn: 369840	2019-08-24 06:49:25 +00:00
Fangrui Song	eb70ac0249	[LoopFusion] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build llvm-svn: 369836	2019-08-24 02:50:42 +00:00
Peter Collingbourne	5b31ac5096	hwasan: Fix use of uninitialized memory. Reported by e.g. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/23071/steps/build%20with%20ninja/logs/stdio llvm-svn: 369815	2019-08-23 21:37:20 +00:00
Johannes Doerfert	5a5a139917	[Attributor] Manifest alignment in load and store instructions Summary: We can now manifest alignment information in load/store instructions if the pointer is known to have a better alignment. Reviewers: uenoku, sstefan1, lebedev.ri Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66567 llvm-svn: 369804	2019-08-23 20:20:10 +00:00
Benjamin Kramer	dc5f805d31	Do a sweep of symbol internalization. NFC. llvm-svn: 369803	2019-08-23 19:59:23 +00:00
Philip Reames	9cb059fdcc	Fix a bug in just submitted rL369789 Started implementing the vector case and realized the scalar case hadn't handled the GEP producing a different type than the base correctly. It's entertaining seeing what slips through review when we're focused on the 'hard' parts. :( Also adding an extra vector test as it happened to be in workspace and wasn't worth separating. llvm-svn: 369795	2019-08-23 18:27:57 +00:00
Philip Reames	5b02cfa0b3	[InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null This generalizes the isGEPKnownNonNull rule from ValueTracking to apply when we do not know if the base is non-null, and thus need to replace one condition with another. The core notion is that since an inbounds GEP can only form null if the base pointer is null and the offset is zero. However, if the offset is non-zero, the the "inbounds" marker makes the result poison. Thus, we're free to ignore the case where the offset is non-zero. Similarly, there's no case under which a non-null base can result in a null result without generating poison. Differential Revision: https://reviews.llvm.org/D66608 llvm-svn: 369789	2019-08-23 17:58:58 +00:00
Johannes Doerfert	23400e618b	[Attributor] Manifest constant return values Summary: If the unique return value is a constant we now replace call uses with that constant. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66551 llvm-svn: 369785	2019-08-23 17:41:37 +00:00
Johannes Doerfert	785fad3202	[Attributor] Deal with shrinking dereferenceability in a loop Summary: If we have a loop in which the dereferenceability of a pointer decreases we did slowly decrease it iteration by iteration, leading to a timeout. With this patch we detect such circular reasoning and indicate a fixpoint early. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66558 llvm-svn: 369784	2019-08-23 17:29:23 +00:00
Sanjay Patel	5a5d44e801	[SLP] use range-for loops, fix formatting; NFC These are part of D57059, but that patch doesn't apply cleanly to trunk at this point, so we might as well remove some of the noise. llvm-svn: 369776	2019-08-23 16:22:32 +00:00
Cameron McInally	688f3bc240	[Reassoc] Small fix to support unary FNeg in NegateValue(...) Differential Revision: https://reviews.llvm.org/D66612 llvm-svn: 369772	2019-08-23 15:49:38 +00:00
Johannes Doerfert	2f2d7c3add	[Attributor][Fix] Deal with "growing" dereferenceability Summary: If we have a negative inbounds offset dereferenceabily "grows". However, until we do not handle the overflow that can occur in the dereferenceable bytes and the problem with loops, we simply do not grow the state. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66557 llvm-svn: 369771	2019-08-23 15:45:46 +00:00
Johannes Doerfert	deb9ea3a8c	[Attributor][NFCI] Avoid lookups when resolving returned values If the number of potentially returned values not change since the last traversal we do not need to visit the returned values again. This works as we only add values to the returned values set now. Differential Revision: https://reviews.llvm.org/D66484 llvm-svn: 369770	2019-08-23 15:42:19 +00:00
Sanjay Patel	9182467886	[SLP] fix formatting; NFC These are part of D57059, but that patch doesn't apply cleanly to trunk at this point, so we might as well remove some of the noise. llvm-svn: 369769	2019-08-23 15:26:12 +00:00
Johannes Doerfert	9543f1498c	[Attributor] FIX: Treat new attributes as changed ones Summary: When we have new attributes and we end the fixpoint iteration because the iteration limit is reached, we need to treat the new ones as if they changed in the last iteration, as they might have. This adds a test for which we should not derive anything regardless of the iteration limit, e.g., if we abort there should not be any attributes manifested in the IR. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66549 llvm-svn: 369768	2019-08-23 15:24:57 +00:00
Johannes Doerfert	695089ecfb	[Attributor][NFCI] Try to avoid potential non-deterministic behavior This commit replaces sets with set vectors in an effort to make the behavior of the Attributor deterministic. llvm-svn: 369767	2019-08-23 15:23:49 +00:00
Teresa Johnson	ea314fd476	[ThinLTO] Fix handling of weak interposable symbols Summary: Keep aliasees alive if their alias is live, otherwise we end up with an alias to a declaration, which is invalid. This can happen when the aliasee is weak and non-prevailing. This fix exposed the fact that we were then attempting to internalize the weak symbol, which was not exported as it was not prevailing. We should not internalize interposable symbols in general, unless this is the prevailing copy, since it can lead to incorrect inlining and other optimizations. Most of the changes in this patch are due to the restructuring required to pass down the prevailing callback. Finally, while implementing the test cases, I found that in the case of a weak aliasee that is still marked not live because its alias isn't live, after dropping the definition we incorrectly marked the declaration with weak linkage when resolving prevailing symbols in the module. This was due to some special case handling for symbols marked WeakLinkage in the summary located before instead of after a subsequent check for the symbol being a declaration. It turns out that we don't actually need this special case handling any more (looking back at the history, when that was added the code was structured quite differently) - we will correctly mark with weak linkage further below when the definition hasn't been dropped. Fixes PR42542. Reviewers: pcc Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, dang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66264 llvm-svn: 369766	2019-08-23 15:18:58 +00:00
Philip Reames	2a52583d67	[IndVars] Fix a bug noticed by inspection We were computing the loop exit value, but not ensuring the addrec belonged to the loop whose exit value we were computing. I couldn't actually trip this; the test case shows the basic setup which might trip this, but none of the variations I've tried actually do. llvm-svn: 369730	2019-08-23 04:03:23 +00:00
Fangrui Song	3fc933af8b	[AlignmentFromAssumptions] getNewAlignmentDiff(): use getURemExpr() The alignment is calculated incorrectly, thus sometimes it doesn't generate aligned mov instructions, as shown by the example below: ``` // b.cc typedef long long index; extern "C" index g_tid; extern "C" index g_num; void add3(float* __restrict__ a, float* __restrict__ b, float* __restrict__ c) { index n = 641024; index m = 161024; index k = 41024; index tid = g_tid; index num = g_num; __builtin_assume_aligned(a, 32); __builtin_assume_aligned(b, 32); __builtin_assume_aligned(c, 32); for (index i0=tidk; i0<m; i0+=numk) for (index i1=0; i1<nm; i1+=m) for (index i2=0; i2<k; i2++) c[i1+i0+i2] = b[i0+i2] + a[i1+i0+i2]; } ``` Compile with `clang b.cc -Ofast -march=skylake -mavx2 -S` ``` vmovaps -224(%rdi,%rbx,4), %ymm0 vmovups -192(%rdi,%rbx,4), %ymm1 # should be movaps vmovups -160(%rdi,%rbx,4), %ymm2 # should be movaps vmovups -128(%rdi,%rbx,4), %ymm3 # should be movaps vaddps -224(%rsi,%rbx,4), %ymm0, %ymm0 vaddps -192(%rsi,%rbx,4), %ymm1, %ymm1 vaddps -160(%rsi,%rbx,4), %ymm2, %ymm2 vaddps -128(%rsi,%rbx,4), %ymm3, %ymm3 vmovaps %ymm0, -224(%rdx,%rbx,4) vmovups %ymm1, -192(%rdx,%rbx,4) # should be movaps vmovups %ymm2, -160(%rdx,%rbx,4) # should be movaps vmovups %ymm3, -128(%rdx,%rbx,4) # should be movaps ``` Differential Revision: https://reviews.llvm.org/D66575 Patch by Dun Liang llvm-svn: 369723	2019-08-23 02:17:04 +00:00
Peter Collingbourne	21a1814417	hwasan: Untag unwound stack frames by wrapping personality functions. One problem with untagging memory in landing pads is that it only works correctly if the function that catches the exception is instrumented. If the function is uninstrumented, we have no opportunity to untag the memory. To address this, replace landing pad instrumentation with personality function wrapping. Each function with an instrumented stack has its personality function replaced with a wrapper provided by the runtime. Functions that did not have a personality function to begin with also get wrappers if they may be unwound past. As the unwinder calls personality functions during stack unwinding, the original personality function is called and the function's stack frame is untagged by the wrapper if the personality function instructs the unwinder to keep unwinding. If unwinding stops at a landing pad, the function is still responsible for untagging its stack frame if it resumes unwinding. The old landing pad mechanism is preserved for compatibility with old runtimes. Differential Revision: https://reviews.llvm.org/D66377 llvm-svn: 369721	2019-08-23 01:28:44 +00:00
Peter Collingbourne	2452d7030b	IR. Change strip* family of functions to not look through aliases. I noticed another instance of the issue where references to aliases were being replaced with aliasees, this time in InstCombine. In the instance that I saw it turned out to be only a QoI issue (a symbol ended up being missing from the symbol table due to the last reference to the alias being removed, preventing HWASAN from symbolizing a global reference), but it could easily have manifested as incorrect behaviour. Since this is the third such issue encountered (previously: D65118, D65314) it seems to be time to address this common error/QoI issue once and for all and make the strip* family of functions not look through aliases. Includes a test for the specific issue that I saw, but no doubt there are other similar bugs fixed here. As with D65118 this has been tested to make sure that the optimization isn't load bearing. I built Clang, Chromium for Linux, Android and Windows as well as the test-suite and there were no size regressions. Differential Revision: https://reviews.llvm.org/D66606 llvm-svn: 369697	2019-08-22 19:56:14 +00:00
Hideto Ueno	70576cac52	[Attributor][NFC] Move DerefState to header and use StateWrapper Summary: In D65402, I want to get DerefState from AADereferenceable but it was not allowed. This patch moves DerefState definition into Attributor.h and makes AADerefenceable inherit StateWrapper. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66585 llvm-svn: 369653	2019-08-22 14:18:29 +00:00
Serguei Katkov	036e636aa7	[Loop Peeling] Fix silly bug in metadata update. We must update loop metedata before we moved to parent loop if it is present. llvm-svn: 369637	2019-08-22 10:06:46 +00:00
Johannes Doerfert	d98f975089	[Attributor] Fix: Gracefully handle non-instruction users Function can have users that are not instructions, e.g., bitcasts. For now, we simply give up when we see them. llvm-svn: 369588	2019-08-21 21:48:56 +00:00
Johannes Doerfert	5427aa843b	[Attributor][NFC] Fix copy & paste error llvm-svn: 369577	2019-08-21 20:57:20 +00:00
Johannes Doerfert	2db8528fb4	[Attributor][NFC] Remove leftover semicolon llvm-svn: 369576	2019-08-21 20:56:56 +00:00
Johannes Doerfert	d410805d57	[Attributor] Use existing unreachable instead of introducing new ones So far we split the unreachable off and placed a new one, this is not necessary. llvm-svn: 369575	2019-08-21 20:56:41 +00:00
Florian Hahn	b5e52bfd83	[GVN] Do PHI translations across all edges between the load and the unavailable pred. Currently we do not properly translate addresses with PHIs if LoadBB != LI->getParent(), because PHITranslateAddr expects a direct predecessor as argument, because it considers all instructions outside of the current block to not requiring translation. The amount of cases that trigger this should be very low, as most single predecessor blocks should be folded into their predecessor by GVN before we actually start with value numbering. It is still not guaranteed to happen, so we should do PHI translation along all edges between the loads' block and the predecessor where we have to place a load. There are a few test cases showing current limits of the PHI translation, which could be improved later. Reviewers: spatel, reames, efriedma, john.brawn Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D65020 llvm-svn: 369570	2019-08-21 20:06:50 +00:00
Philip Reames	764b0fd5a3	[instcombine] icmp eq/ne (sub C, Y), C -> icmp eq/ne Y, 0 Noticed while looking at pr43028. llvm-svn: 369541	2019-08-21 15:51:57 +00:00
Sanjay Patel	e728259278	[InstCombine] narrow icmp with extended operands of different widths An intermediate extend is used to widen the narrow operand to the width of the other (wider) operand. At that point, we have the same logic as the existing transform that was restricted to folds of equal width zext/sext. This mostly solves PR42700: https://bugs.llvm.org/show_bug.cgi?id=42700 llvm-svn: 369519	2019-08-21 11:56:08 +00:00
Stefan Stipanovic	26121ae4d0	[Attributor] Liveness for internal functions. For an internal function, if all its call sites are dead, the body of the function is considered dead. Reviewers: jdoerfert, uenoku Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D66155 llvm-svn: 369470	2019-08-20 23:16:57 +00:00
Alexandre Ganea	21e9603030	[Sanitizer] Remove unused functions Differential Revision: https://reviews.llvm.org/D66503 llvm-svn: 369468	2019-08-20 22:56:40 +00:00
Michael Liao	a99086dbdd	[Attributor] Remove unused variable. NFC. llvm-svn: 369444	2019-08-20 21:02:31 +00:00
Wenlei He	5adace352d	[AutoFDO] Make call targets order deterministic for sample profile Summary: StringMap is used for storing call target to frequency map for AutoFDO. However the iterating order of StringMap is non-deterministic, which leads to non-determinism in AutoFDO profile output. Now new API getSortedCallTargets and SortCallTargets are added for deterministic ordering and output. Roundtrip test for text profile and binary profile is added. Reviewers: wmi, davidxl, danielcdh Subscribers: hiraditya, mgrang, llvm-commits, twoh Tags: #llvm Differential Revision: https://reviews.llvm.org/D66191 llvm-svn: 369440	2019-08-20 20:52:00 +00:00
Sanjay Patel	292b1087f4	[InstCombine] add helper function for icmp+zext/sext; NFC llvm-svn: 369421	2019-08-20 18:15:17 +00:00
Sanjay Patel	2e68e4d60e	[InstCombine] make fold for icmp with sext more efficient; NFC We were creating 2 instructions and relying on a subsequent fold to invert a not(icmp). Create the final icmp directly instead. llvm-svn: 369411	2019-08-20 17:03:22 +00:00
Sanjay Patel	a90ee0eeb6	[InstCombine] improve readability for icmp with cast folds; NFC 1. Update function name and stale code comments. 2. Use variable names that are less ambiguous. 3. Move operand checks into the function as early exits. llvm-svn: 369390	2019-08-20 14:56:44 +00:00
Jinsong Ji	cda334ba54	[BlockExtractor] Avoid assert with wrong line format Summary: When the line format is wrong, we may end up accessing out of bound memory. eg: the test with invalide line will cause assert. Assertion `idx < size()' failed The fix is to report fatal when we found mismatched line format. Reviewers: qcolombet, volkan Reviewed By: qcolombet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66444 llvm-svn: 369389	2019-08-20 14:46:02 +00:00
Sanjay Patel	f99d254aae	[InstCombine] simplify min/max of min/max with same operands (PR35607) This is the original integer variant requested in: https://bugs.llvm.org/show_bug.cgi?id=35607 As noted in the TODO and several similar TODOs around this block, we could do this in instsimplify, but then it would cost more because we would be trying to match min/max via ValueTracking in 2 different places. There are 4 commuted variants for each of smin/smax/umin/umax that are not matched here. There are also icmp predicate variants that are not included in the affected test file because they are already handled by instsimplify by folding the final icmp to true/false. https://rise4fun.com/Alive/3KVc Name: smax(smax, smin) %c1 = icmp slt i32 %x, %y %c2 = icmp slt i32 %y, %x %min = select i1 %c1, i32 %x, i32 %y %max = select i1 %c2, i32 %x, i32 %y %c3 = icmp sgt i32 %max, %min %r = select i1 %c3, i32 %max, i32 %min => %r = %max Name: smin(smax, smin) %c1 = icmp slt i32 %x, %y %c2 = icmp slt i32 %y, %x %min = select i1 %c1, i32 %x, i32 %y %max = select i1 %c2, i32 %x, i32 %y %c3 = icmp sgt i32 %max, %min %r = select i1 %c3, i32 %min, i32 %max => %r = %min Name: umax(umax, umin) %c1 = icmp ult i32 %x, %y %c2 = icmp ult i32 %y, %x %min = select i1 %c1, i32 %x, i32 %y %max = select i1 %c2, i32 %x, i32 %y %c3 = icmp ult i32 %min, %max %r = select i1 %c3, i32 %max, i32 %min => %r = %max Name: umin(umax, umin) %c1 = icmp ult i32 %x, %y %c2 = icmp ult i32 %y, %x %min = select i1 %c1, i32 %x, i32 %y %max = select i1 %c2, i32 %x, i32 %y %c3 = icmp ult i32 %min, %max %r = select i1 %c3, i32 %min, i32 %max => %r = %min llvm-svn: 369386	2019-08-20 13:39:17 +00:00
Fangrui Song	f182617352	[Attributor] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r369331 llvm-svn: 369334	2019-08-20 07:21:43 +00:00
Johannes Doerfert	12cbbab9d9	[Attributor] Create abstract attributes on-demand Before, we create the set of abstract attributes initially and then dealt with the fact hat a lookup could fail, e.g., return a nullptr. This patch will ensure we always return a valid object from a lookup, allowing us not only to remove the nullptr checks but also to grow the set of abstract attributes "in-flight" on-demand. One can now start from those that have the best chance of improving performance without the need to specify all they might depend on. While this introduces some boilerplate, the usage of attributes is much easier and cleaner now. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66276 llvm-svn: 369331	2019-08-20 06:15:50 +00:00
Johannes Doerfert	169af994bc	[Attributor][NFC] Cleanup statistics code llvm-svn: 369330	2019-08-20 06:09:56 +00:00
Johannes Doerfert	cfcca1a5b1	[Attributor] Use structured deduction for AADereferenceable Summary: This is analogous to D66128 but for AADereferenceable. We have the logic concentrated in the floating value updateImpl and we use the combiner helper classes for arguments and return values. The regressions will go away with "on-demand" attribute creation. Improvements are already visible in the existing tests. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66272 llvm-svn: 369329	2019-08-20 06:08:35 +00:00
Johannes Doerfert	b9b8791fed	[Attributor] Use structured deduction for AANonNull Summary: What D66126 did for AAAlign, this patch does for AANonNull. Agian, the logic becomes more concise and localized. Again, returned poiners are not annotated properly but that will not be an issue if this lands with the "on-demand" generation of attributes. First improvements due to the genericValueTraversal are already visible. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66128 llvm-svn: 369328	2019-08-20 06:02:39 +00:00
Johannes Doerfert	028b2aa56a	[Attributor] Fix the "clamp" operator The clamp operator should not take the known of the given state as the known is potentially based on assumed information. This also adds TODOs to guide improvements. llvm-svn: 369327	2019-08-20 05:57:01 +00:00
Dinar Temirbulatov	081c57989e	[SLP][NFC] Avoid repetitive calls to getSameOpcode() We can avoid repetitive calls getSameOpcode() for already known tree elements by keeping MainOp and AltOp in TreeEntry. Differential Revision: https://reviews.llvm.org/D64700 llvm-svn: 369315	2019-08-20 00:22:04 +00:00
Johannes Doerfert	de7674ce76	Recommit "[Attributor] Fix: Do not partially resolve returned calls." This reverts commit `b1752f670f`. Fixed the issue with a different commit, reapply this one as it was, afaik, not broken. llvm-svn: 369303	2019-08-19 21:35:31 +00:00
Evgeniy Stepanov	55ccd16354	Refactor isPointerOffset (NFC). Summary: Simplify the API using Optional<> and address comments in https://reviews.llvm.org/D66165 Reviewers: vitalybuka Subscribers: hiraditya, llvm-commits, ostannard, pcc Tags: #llvm Differential Revision: https://reviews.llvm.org/D66317 llvm-svn: 369300	2019-08-19 21:08:04 +00:00
Johannes Doerfert	056f1b5cc7	Re-apply fixed "[Attributor] Fix: Make sure we set the changed flag" This reverts commit `cedd0d9a6e`. Re-apply the original commit but make sure the variables are initialized (even if they are not used) so UBSan is not complaining. llvm-svn: 369294	2019-08-19 19:14:10 +00:00
Alina Sbirlea	1a3fdaf6a6	[MemorySSA] Rename uses when inserting memory uses. Summary: When inserting uses from outside the MemorySSA creation, we don't normally need to rename uses, based on the assumption that there will be no inserted Phis (if Def existed that required a Phi, that Phi already exists). However, when dealing with unreachable blocks, MemorySSA will optimize away Phis whose incoming blocks are unreachable, and these Phis end up being re-added when inserting a Use. There are two potential solutions here: 1. Analyze the inserted Phis and clean them up if they are unneeded (current method for cleaning up trivial phis does not cover this) 2. Leave the Phi in place and rename uses, the same way as whe inserting defs. This patch use approach 2. Resolves first test in PR42940. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66033 llvm-svn: 369291	2019-08-19 18:57:40 +00:00
Sanjay Patel	b38bac3699	[SLP] reduce duplicated code; NFC llvm-svn: 369250	2019-08-19 11:39:56 +00:00
David L. Jones	cedd0d9a6e	Revert [Attributor] Fix: Make sure we set the changed flag This reverts r369159 (git commit `cbaf1fdea2`) r369160 caused a test to fail under UBSAN. See thread on llvm-commits. llvm-svn: 369241	2019-08-19 08:00:08 +00:00
David L. Jones	b1752f670f	Revert [Attributor] Fix: Do not partially resolve returned calls. This reverts r369160 (git commit `f72d9b1c97`) r369160 caused some tests to fail under UBSAN. See thread on llvm-commits. llvm-svn: 369236	2019-08-19 07:16:24 +00:00
Roman Lebedev	9b957d3321	[InstCombine] Cherry-pick NFC cleanups of foldShiftIntoShiftInAnotherHandOfAndInICmp() from D66383 llvm-svn: 369207	2019-08-18 12:26:33 +00:00
Alina Sbirlea	f92109dc01	[MemorySSA] Loop passes should mark MSSA preserved when available. This patch applies only to the new pass manager. Currently, when MSSA Analysis is available, and pass to each loop pass, it will be preserved by that loop pass. Hence, mark the analysis preserved based on that condition, vs the current `EnableMSSALoopDependency`. This leaves the global flag to affect only the entry point in the loop pass manager (in FunctionToLoopPassAdaptor). llvm-svn: 369181	2019-08-17 01:02:12 +00:00
Sanjay Patel	a53ad0e157	Revert r367891 - "[InstCombine] combine mul+shl separated by zext" This reverts commit `5dbb90bfe1`. As noted in the post-commit thread for r367891, this can create a multiply that is lowered to a libcall that may not exist. We need to improve the backend decomposition for integer multiply before trying to re-land this (if it's still worthwhile after doing the backend work). llvm-svn: 369174	2019-08-16 23:36:28 +00:00
Jian Cai	16fa8b0970	Reland "[ARM] push LR before __gnu_mcount_nc" This relands r369147 with fixes to unit tests. https://reviews.llvm.org/D65019 llvm-svn: 369173	2019-08-16 23:30:16 +00:00
Johannes Doerfert	f72d9b1c97	[Attributor] Fix: Do not partially resolve returned calls. By partially resolving returned calls we did not record that they were not fully resolved which caused odd behavior down the line. We could also end up with some, but not all, returned values of the callee in the returned values map of the caller, another odd behavior we want to avoid. llvm-svn: 369160	2019-08-16 21:59:52 +00:00
Johannes Doerfert	cbaf1fdea2	[Attributor] Fix: Make sure we set the changed flag The flag was updated before we actually run the visitor callback so we might miss updates. llvm-svn: 369159	2019-08-16 21:55:01 +00:00
Johannes Doerfert	6dedc78d9d	[Attributor] Add all missing attribute definitions/symbols As a preparation to "on-demand" abstract attribute generation we need implementations for all attributes (as they can be queried and then created on-demand where we now fail to find one). Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66129 llvm-svn: 369155	2019-08-16 21:31:11 +00:00
Jian Cai	2d957cfe02	Revert "[ARM] push LR before __gnu_mcount_nc" This reverts commit `f4cf3b9593`. llvm-svn: 369149	2019-08-16 20:40:21 +00:00
Jian Cai	f4cf3b9593	[ARM] push LR before __gnu_mcount_nc Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of the stack on ARM32. Differential Revision: https://reviews.llvm.org/D65019 llvm-svn: 369147	2019-08-16 20:21:08 +00:00
Johannes Doerfert	234eda563d	[Attributor] Towards a more structured deduction pattern Summary: This is the first commit aiming to structure the attribute deduction. The base idea is that we have default propagation patterns as listed below on top of which we can add specific, e.g., context sensitive, logic. Deduction patterns used in this patch: - argument states are determined from call site argument states, see AAAlignArgument and AAArgumentFromCallSiteArguments. - call site argument states are determined as if they were floating values, see AAAlignCallSiteArgument and AAAlignFloating. - floating value states are determined by traversing the def-use chain and combining the states determined for the leaves, see AAAlignFloating and genericValueTraversal. - call site return states are determined from function return states, see AAAlignCallSiteReturned and AACallSiteReturnedFromReturned. - function return states are determined from returned value states, see AAAlignReturned and AAReturnedFromReturnedValues. Through this strategy all logic for alignment is concentrated in the AAAlignFloating::updateImpl method. Note: This commit works on its own but is part of a larger change that involves "on-demand" creation of abstract attributes that will participate in the fixpoint iteration. Without this part, we sometimes do not have an AAAlign abstract attribute to query, loosing information we determined before. All tests have appropriate FIXMEs and the information will be recovered once we added all parts. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66126 llvm-svn: 369144	2019-08-16 19:51:23 +00:00
Johannes Doerfert	66cf87e290	[Attributor][NFC] Introduce aliases for call site attributes Until we have call site specific liveness and/or value information there is no need to do call site specific deduction. Though, we need the symbols in follow up patches that make Attributor::getAAFor return a reference. llvm-svn: 369143	2019-08-16 19:49:00 +00:00
Johannes Doerfert	fe6dbadc0d	[Attributor] Introduce initialize calls and move code to keep attributes concise Summary: This patch should not change the behavior except that the added initialize methods might indicate an optimistic fixpoint earlier. The code movement is done to keep the attribute definitions in a single block where it makes sense. No functional changes intended there. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66258 llvm-svn: 369142	2019-08-16 19:36:17 +00:00
Sanjay Patel	39eb2324f7	[InstCombine] canonicalize a scalar-select-of-vectors to vector select This pattern may arise more frequently with an enhancement to SLP vectorization suggested in PR42755: https://bugs.llvm.org/show_bug.cgi?id=42755 ...but we should handle this pattern to make things easier for the backend either way. For all in-tree targets that I looked at, codegen for typical vector sizes looks better when we change to a vector select, so this is safe to do without a cost model (in other words, as a target-independent canonicalization). For example, if the condition of the select is a scalar, we end up with something like this on x86: vpcmpgtd %xmm0, %xmm1, %xmm0 vpextrb $12, %xmm0, %eax testb $1, %al jne LBB0_2 ## %bb.1: vmovaps %xmm3, %xmm2 LBB0_2: vmovaps %xmm2, %xmm0 Rather than the splat-condition variant: vpcmpgtd %xmm0, %xmm1, %xmm0 vpshufd $255, %xmm0, %xmm0 ## xmm0 = xmm0[3,3,3,3] vblendvps %xmm0, %xmm2, %xmm3, %xmm0 Differential Revision: https://reviews.llvm.org/D66095 llvm-svn: 369140	2019-08-16 18:51:30 +00:00
Vasileios Porpodas	1d254f3dae	[SLPVectorizer] Make the scheduler aware of the TreeEntry operands. Summary: The scheduler's dependence graph gets the use-def dependencies by accessing the operands of the instructions in a bundle. However, buildTree_rec() may change the order of the operands in TreeEntry, and the scheduler is currently not aware of this. This is not causing any functional issues currently, because reordering is restricted to the operands of a single instruction. Once we support operand reordering across multiple TreeEntries, as shown here: http://www.llvm.org/devmtg/2019-04/slides/Poster-Porpodas-Supernode_SLP.pdf , the scheduler will need to get the correct operands from TreeEntry and not from the individual instructions. In short, this patch: - Connects the scheduler's bundle with the corresponding TreeEntry. It introduces new TE and Lane fields in ScheduleData. - Moves the location where the operands of the TreeEntry are initialized. This used to take place in newTreeEntry() setting one operand at a time, but is now moved pre-order just before the recursion of buildTree_rec(). This is required because the scheduler needs to access both operands of the TreeEntry in tryScheduleBundle(). - Updates the scheduler to access the instruction operands through the TreeEntry operands instead of accessing the instruction operands directly. Reviewers: ABataev, RKSimon, dtemirbulatov, Ayal, dorit, hfinkel Reviewed By: ABataev Subscribers: hiraditya, llvm-commits, lebedev.ri, rcorcs Tags: #llvm Differential Revision: https://reviews.llvm.org/D62432 llvm-svn: 369131	2019-08-16 17:21:18 +00:00
Evandro Menezes	05e9c2ac2e	[InstCombine] Simplify pow(2.0, itofp(y)) to ldexp(1.0, y) Simplify `pow(2.0, itofp(y))` to `ldexp(1.0, y)`. Differential revision: https://reviews.llvm.org/D65979 llvm-svn: 369120	2019-08-16 15:33:41 +00:00
Roman Lebedev	16244fccfe	[InstCombine] Shift amount reassociation in bittest: trunc-of-shl (PR42399) Summary: This is continuation of D63829 / https://bugs.llvm.org/show_bug.cgi?id=42399 I thought naive pattern would solve my issue, but nope, it involved truncation, thus more folds needed.. This isn't really the fold i'm interested in, i need trunc-of-lshr, but i'we decided to start with `shl` because it's simpler. In this case, no extra legality checks are needed: https://rise4fun.com/Alive/CAb We should be careful about not increasing instruction count, since we need to produce `zext` because `and` is done in wider type. Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66057 llvm-svn: 369117	2019-08-16 15:10:41 +00:00
Simon Pilgrim	59894d4668	[SLPVectorizer] Silence null dereference warning. NFCI. cppcheck + MSVC analyzer both over zealously warn that we might dereference a null Bundle pointer - add an assertion to check for null to silence the warning, plus its a good idea to check that we succeeded in finding a schedule bundle anyway.... llvm-svn: 369094	2019-08-16 10:28:23 +00:00
Evgeniy Stepanov	75344955fc	Move isPointerOffset function to ValueTracking (NFC). Summary: To be reused in MemTag sanitizer. Reviewers: pcc, vitalybuka, ostannard Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66165 llvm-svn: 369062	2019-08-15 22:58:28 +00:00
Jonas Devlieghere	0eaee545ee	[llvm] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013	2019-08-15 15:54:37 +00:00
Dorit Nuzman	d57d73daed	[LV] fold-tail predication should be respected even with assume_safety assume_safety implies that loads under "if's" can be safely executed speculatively (unguarded, unmasked). However this assumption holds only for the original user "if's", not those introduced by the compiler, such as the fold-tail "if" that guards us from loading beyond the original loop trip-count. Currently the combination of fold-tail and assume-safety pragmas results in ignoring the fold-tail predicate that guards the loads, generating unmasked loads. This patch fixes this behavior. Differential Revision: https://reviews.llvm.org/D66106 Reviewers: Ayal, hsaito, fhahn llvm-svn: 368973	2019-08-15 07:12:14 +00:00
Gor Nishanov	efe0093404	[coroutine] Fixes "cannot move instruction since its users are not dominated by CoroBegin" problem. Summary: Fixes https://bugs.llvm.org/show_bug.cgi?id=36578 and https://bugs.llvm.org/show_bug.cgi?id=36296. Supersedes: https://reviews.llvm.org/D55966 One of the fundamental transformation that CoroSplit pass performs before splitting the coroutine is to find which values need to survive between suspend and resume and provide a slot for them in the coroutine frame to spill and restore the value as needed. Coroutine frame becomes available once the storage for it was allocated and that point is marked in the pre-split coroutine with a llvm.coro.begin intrinsic. FE normally puts all of the user-authored code that would be accessing those values after llvm.coro.begin, however, sometimes instructions accessing those values would end up prior to coro.begin. For example, writing out a value of the parameter into the alloca done by the FE or instructions that are added by the optimization passes such as SROA when it rewrites allocas. Prior to this change, CoroSplit pass would try to move instructions that may end up accessing the values in the coroutine frame after CoroBegin. However it would run into problems (report_fatal_error) if some of the values would be used both in the allocation function (for example allocator is passed as a parameter to a coroutine) and in the use-authored body of the coroutine. To handle this case and to simplify the instruction moving logic, this change removes all of the instruction moving. Instead, we only change the uses of the spilled values that are dominated by coro.begin and leave other instructions intact. Before: ``` %var = alloca i32 %1 = getelementptr .. %var; ; will move this one after coro.begin %f = call i8* @llvm.coro.begin( ``` After: ``` %var = alloca i32 %1 = getelementptr .. %var; stays put %f = call i8* @llvm.coro.begin( ``` If we discover that there is a potential write into an alloca, prior to coro.begin we would copy its value from the alloca into the spill slot in the coroutine frame. Before: ``` %var = alloca i32 store .. %var ; will move this one after coro.begin %f = call i8* @llvm.coro.begin( ``` After: ``` %var = alloca i32 store .. %var ;stays put %f = call i8* @llvm.coro.begin( %tmp = load %var store %tmp, %spill.slot.for.var ``` Note: This change does not handle array allocas as that is something that C++ FE does not produce, but, it can be added in the future if need arises Reviewers: llvm-commits, modocache, ben-clayton, tks2103, rjmccall Reviewed By: modocache Subscribers: bartdesmet Differential Revision: https://reviews.llvm.org/D66230 llvm-svn: 368949	2019-08-15 00:48:51 +00:00
Johannes Doerfert	54f6be7b83	[Attributor] Try to fix "missing field 'RetInsts' initializer" warning http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/35674/steps/build_Lld/logs/stdio llvm-svn: 368938	2019-08-14 22:32:29 +00:00
Johannes Doerfert	5304b72a81	[Attributor][NFC] Make debug output consistent llvm-svn: 368931	2019-08-14 22:04:28 +00:00
Philip Reames	7b0515176b	[SCEV] Rename getMaxBackedgeTakenCount to getConstantMaxBackedgeTakenCount [NFC] llvm-svn: 368930	2019-08-14 21:58:13 +00:00
Johannes Doerfert	4395b31d99	[Attributor][NFC] Try to eliminate warnings (debug build + fall through) llvm-svn: 368928	2019-08-14 21:46:28 +00:00
Johannes Doerfert	17b578bc75	[Attributor][NFC] Introduce statistics macros for new positions llvm-svn: 368927	2019-08-14 21:46:25 +00:00
Johannes Doerfert	e1e844d6b0	[Attributor][NFC] Add merge/join/clamp operators to the IntegerState Differential Revision: https://reviews.llvm.org/D66146 llvm-svn: 368925	2019-08-14 21:35:20 +00:00
Johannes Doerfert	6a1274a52e	[Attributor] Use the AANoNull attribute directly in AADereferenceable Summary: Instead of constantly keeping track of the nonnull status with the dereferenceable information we can simply query the nonnull attribute whenever we need the information (debug + manifest). Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66113 llvm-svn: 368924	2019-08-14 21:31:32 +00:00
Johannes Doerfert	def9928204	[Attributor] Use liveness during the creation of AAReturnedValues Summary: As one of the first attributes, and one of the complex ones, AAReturnedValues was not using liveness but we filtered the result after the fact. This change adds liveness usage during the creation. The algorithm is also improved and shorter. The new algorithm will collect returned values over time using the generic facilities that work with liveness already, e.g., genericValueTraversal which does not look at dead PHI node predecessors. A test to show how this leads to better results is included. Note: Unresolved calls and resolved calls are now tracked explicitly. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66120 llvm-svn: 368922	2019-08-14 21:29:37 +00:00
Johannes Doerfert	9a1a1f96d9	[Attributor] Do not update or manifest dead attributes Summary: If the associated context instruction is assumed dead we do not need to update or manifest the state. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66116 llvm-svn: 368921	2019-08-14 21:25:08 +00:00
Johannes Doerfert	710ebb03ed	[Attributor] Use IRPosition consistently Summary: The next attempt to clean up the Attributor interface before we grow it further. Before, we used a combination of two values (associated + anchor) and an argument number (or -1) to determine a location. This was very fragile. The new system uses exclusively IR positions and we restrict the generation of IR positions to special constructor methods that verify internal constraints we have. This will catch misuse early. The auto-conversion, e.g., in getAAFor, is now performed through the SubsumingPositionIterator. This iterator takes an IR position and allows to visit all IR positions that "subsume" the given one, e.g., function attributes "subsume" argument attributes of that function. For a detailed breakdown see the class comment of SubsumingPositionIterator. This patch also introduces the IRPosition::getAttrs() to extract IR attributes at a certain position. The method knows how to look up in different positions that are equivalent, e.g., the argument position for call site arguments. We also introduce three new positions kinds such that we have all IR positions where attributes can be placed and one for "floating" values. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65977 llvm-svn: 368919	2019-08-14 21:18:01 +00:00
Dinar Temirbulatov	da0435a690	[SLP][NFC] Use pointers to address to ScalarToTreeEntry elements, instead of indexes. llvm-svn: 368906	2019-08-14 19:46:50 +00:00
Philip Reames	6cca3ad43e	[RLEV] Rewrite loop exit values for multiple exit loops w/o overall loop exit count We already supported rewriting loop exit values for multiple exit loops, but if any of the loop exits were not computable, we gave up on all loop exit values. This patch generalizes the existing code to handle individual computable loop exits where possible. As discussed in the review, this is a starting point for figuring out a better API. The code is a bit ugly, but getting it in lets us test as we go. Differential Revision: https://reviews.llvm.org/D65544 llvm-svn: 368898	2019-08-14 18:27:57 +00:00
Matt Arsenault	dbc1f207fa	InferAddressSpaces: Move target intrinsic handling to TTI I'm planning on handling intrinsics that will benefit from checking the address space enums. Don't bother moving the address collection for now, since those won't need th enums. llvm-svn: 368895	2019-08-14 18:13:00 +00:00
Matt Arsenault	0eac2a2963	InferAddressSpaces: Remove unnecessary check for ConstantInt The IR is invalid if this isn't a constant since immarg was added. llvm-svn: 368893	2019-08-14 18:01:42 +00:00
David Bolvansky	f94460d4b6	[SLC] Dereferenceable annonation - handle valid null pointers Reviewers: jdoerfert, reames Reviewed By: jdoerfert Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66161 llvm-svn: 368884	2019-08-14 17:15:20 +00:00
David Bolvansky	0e0fbae1a4	[BuildLibCalls] Noalias annotation Summary: I think this is better solution than annotating callsites in IC/SLC. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66217 llvm-svn: 368875	2019-08-14 16:50:06 +00:00
Bill Wendling	cc2bebe039	Ignore indirect branches from callbr. Summary: We can't speculate around indirect branches: indirectbr and invoke. The callbr instruction needs to be included here. Reviewers: nickdesaulniers, manojgupta, chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66200 llvm-svn: 368873	2019-08-14 16:44:07 +00:00
Simon Pilgrim	828a89e244	Fix "not all control paths return a value" MSVC warnings. NFCI. llvm-svn: 368831	2019-08-14 11:31:05 +00:00
Simon Pilgrim	3f40bdb558	Fix "not all control paths return a value" MSVC warning. NFCI. llvm-svn: 368830	2019-08-14 11:29:56 +00:00
Simon Pilgrim	8bba4798c2	Fix "not all control paths return a value" MSVC warnings. NFCI. llvm-svn: 368829	2019-08-14 11:29:16 +00:00
Roman Lebedev	32f1e1a01d	[InstCombine] Refactor getFlippedStrictnessPredicateAndConstant() out of canonicalizeCmpWithConstant(), NFCI I'd like to use it elsewhere, hopefully without reinventing the wheel. No functional change intended so far. llvm-svn: 368820	2019-08-14 09:57:20 +00:00
Dorit Nuzman	491ca2425d	[LV] Fold-tail flag This is the compiler-flag equivalent of the Predicate pragma (https://reviews.llvm.org/D65197), to direct the vectorizer to fold the remainder-loop into the main-loop using predication. Differential Revision: https://reviews.llvm.org/D66108 Reviewers: Ayal, hsaito, fhahn, SjoerdMeije llvm-svn: 368801	2019-08-14 05:22:20 +00:00
David L. Jones	d4edd9d97e	Revert '[LICM] Make Loop ICM profile aware' and 'Fix pass dependency for LICM' This reverts r368526 (git commit `7e71aa24bc`) This reverts r368542 (git commit `cb5a90fd31`) llvm-svn: 368800	2019-08-14 04:50:33 +00:00
John McCall	a318c55073	Coroutines: adjust for SVN r358739 CallSite has been removed in favour of CallBase. Adjust the coroutine split to account for that. llvm-svn: 368798	2019-08-14 03:54:25 +00:00
John McCall	3bbf207fbc	Don't run a full verifier pass in coro-splitting's private pipeline. Potentially addresses rdar://49022293. llvm-svn: 368797	2019-08-14 03:54:18 +00:00
John McCall	5f60b68c68	Remove unreachable blocks before splitting a coroutine. The suspend-crossing algorithm is not correct in the presence of uses that cannot be reached on some successor path from their defs. llvm-svn: 368796	2019-08-14 03:54:13 +00:00
John McCall	2133feec93	Support swifterror in coroutine lowering. The support for swifterror allocas should work in all lowerings. The support for swifterror arguments only really works in a lowering with prototypes where you can ensure that the prototype also has a swifterror argument; I'm not really sure how it could possibly be made to work in the switch lowering. llvm-svn: 368795	2019-08-14 03:54:05 +00:00

1 2 3 4 5 ...

22423 Commits