llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	03127f795b	[InstCombine] PHI-aware aggregate reconstruction: correctly detect "use" basic block While the original implementation added in D85787 / `ae7f08812e` is not incorrect, it is known to be suboptimal. In particular, it is not incorrect to use the basic block in which the original `insertvalue` instruction is located as the merge point, that is not necessarily optimal, as `@test6` shows. We should look at all the AggElts, and, if they are all defined in the same basic block, then that is the basic block we should use. On RawSpeed library, this catches +4% (+50) more cases. On vanilla LLVM test-suits, this catches +12% (+92) more cases.	2020-08-18 00:45:18 +03:00
Roman Lebedev	f4f673e0e3	[NFC][InstCombine] PHI-aware aggregate reconstruction: don't capture UseBB in lambdas, take it as argument In a following patch, UseBB will be detected later, so capturing it is potentially error-prone (capture by ref vs by val). Also, parametrized UseBB will likely be needed for multiple levels of PHI indirections later on anyways.	2020-08-18 00:45:18 +03:00
Roman Lebedev	4973ca3eac	[NFC][InstCombine] PHI-aware aggregate reconstruction: insert PHI node manually This is NFC at the moment, because right now we always insert the PHI into the same basic block in which the original `insertvalue` instruction is, but that will change. Also, fixes addition of the suffix to the value names.	2020-08-18 00:45:17 +03:00
Florian Hahn	4cc20aa743	[DSE,MemorySSA] Skip access already dominated by a killing def. If we already found a killing def (= a def that completely overwrites the location) that dominates an access, we can skip processing it further. This does not help with compile-time, but increases the number of memory accesses we can process with the same scan budget, leading to more stores being eliminated. Improvements with this change Same hash: 203 (filtered out) Remaining: 34 Metric: dse.NumFastStores Program base dom diff test-suite...rolangs-C++/family/family.test 2.00 4.00 100.0% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 172.00 229.00 33.1% test-suite...ks/Prolangs-C/agrep/agrep.test 10.00 12.00 20.0% test-suite...oxyApps-C++/miniFE/miniFE.test 44.00 51.00 15.9% test-suite...marks/7zip/7zip-benchmark.test 1285.00 1474.00 14.7% test-suite...006/450.soplex/450.soplex.test 254.00 289.00 13.8% test-suite...006/447.dealII/447.dealII.test 2466.00 2798.00 13.5% test-suite...000/197.parser/197.parser.test 9.00 10.00 11.1% test-suite.../Benchmarks/nbench/nbench.test 85.00 91.00 7.1% test-suite...ce/Applications/siod/siod.test 68.00 72.00 5.9% test-suite...ications/JM/lencod/lencod.test 786.00 824.00 4.8% test-suite...6/464.h264ref/464.h264ref.test 765.00 798.00 4.3% test-suite.../Benchmarks/Ptrdist/bc/bc.test 105.00 109.00 3.8% test-suite...lications/obsequi/Obsequi.test 29.00 28.00 -3.4% test-suite...3.xalancbmk/483.xalancbmk.test 1322.00 1367.00 3.4% test-suite...chmarks/MallocBench/gs/gs.test 118.00 122.00 3.4% test-suite...T2006/401.bzip2/401.bzip2.test 60.00 62.00 3.3% test-suite...6/482.sphinx3/482.sphinx3.test 30.00 31.00 3.3% test-suite...rks/tramp3d-v4/tramp3d-v4.test 862.00 887.00 2.9% test-suite...telecomm-gsm/telecomm-gsm.test 78.00 80.00 2.6% test-suite...ediabench/gsm/toast/toast.test 78.00 80.00 2.6% test-suite.../Applications/SPASS/SPASS.test 163.00 167.00 2.5% test-suite...lications/ClamAV/clamscan.test 240.00 245.00 2.1% test-suite...006/453.povray/453.povray.test 1392.00 1419.00 1.9% test-suite...000/255.vortex/255.vortex.test 211.00 215.00 1.9% test-suite...:: External/Povray/povray.test 1295.00 1317.00 1.7% test-suite...lications/sqlite3/sqlite3.test 175.00 177.00 1.1% test-suite...T2000/256.bzip2/256.bzip2.test 99.00 100.00 1.0% test-suite...0/253.perlbmk/253.perlbmk.test 629.00 635.00 1.0% test-suite.../CINT2006/403.gcc/403.gcc.test 1183.00 1194.00 0.9% test-suite.../CINT2000/176.gcc/176.gcc.test 647.00 653.00 0.9% test-suite...ications/JM/ldecod/ldecod.test 512.00 516.00 0.8% test-suite...0.perlbench/400.perlbench.test 1026.00 1034.00 0.8% test-suite...-typeset/consumer-typeset.test 1876.00 1877.00 0.1% Geomean difference 7.3%	2020-08-17 20:54:48 +01:00
Florian Hahn	df4756ec6c	[DSE,MemorySSA] Check for underlying objects first. isWriteAtEndOfFunction needs to check all memory uses of Def, which is much more expensive than getting the underlying objects in practice. Switch the call order, as recommended by the TODO, which was added as per an earlier review. This shaves off a bit of compile-time.	2020-08-17 18:52:18 +01:00
Florian Hahn	139810449b	[DSE,MemorySSA] Account for ScanLimit == 0 on entry. Currently the code does not account for the fact that getDomMemoryDef can be called with ScanLimit == 0, if we reached the limit while processing an earlier access. Also tighten the check a bit more and bump the scan limit now that it is handled properly. In some cases, this brings a 2x speedup in terms of compile-time.	2020-08-17 17:55:14 +01:00
Aditya Kumar	cb6e6936db	NFC: [GVNHoist] Hoist loop invariant code and rename variables for readability Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D86031	2020-08-17 09:43:34 -07:00
Sanjay Patel	e6b6787d01	[InstCombine] fold abs(X)/X to cmp+select The backend can convert the select-of-constants to bit-hack shift+logic if desirable. https://alive2.llvm.org/ce/z/pgJT6E define i8 @src(i8 %x) { %0: %a = abs i8 %x, 1 %d = sdiv i8 %x, %a ret i8 %d } => define i8 @tgt(i8 %x) { %0: %cond = icmp sgt i8 %x, 255 %r = select i1 %cond, i8 1, i8 255 ret i8 %r } Transformation seems to be correct!	2020-08-17 08:01:28 -04:00
Sanjay Patel	6cd4a6f6b2	[InstCombine] reduce code duplication; NFC	2020-08-17 08:01:27 -04:00
Yonghong Song	aa61e43040	[InstCombine] Fix a compilation bug With gcc 6.3.0, I hit the following compilation bug. ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic] }; ^ cc1plus: all warnings being treated as errors The error is introduced by Commit `ae7f08812e` ("[InstCombine] Aggregate reconstruction simplification (PR47060)")	2020-08-16 21:56:42 -07:00
Roman Lebedev	0ec1f0f332	[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically	2020-08-16 23:37:36 +03:00
Roman Lebedev	ae7f08812e	[InstCombine] Aggregate reconstruction simplification (PR47060) This pattern happens in clang C++ exception lowering code, on unwind branch. We end up having a `landingpad` block after each `invoke`, where RAII cleanup is performed, and the elements of an aggregate `{i8, i32}` holding exception info are `extractvalue`'d, and we then branch to common block that takes extracted `i8` and `i32` elements (via `phi` nodes), form a new aggregate, and finally `resume`'s the exception. The problem is that, if the cleanup block is effectively empty, it shouldn't be there, there shouldn't be that `landingpad` and `resume`, said `invoke` should be a `call`. Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`. But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft, while it is pointless, does not look like "empty cleanup block". So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has higher cost than it could have on unwind branch :S This doesn't happen that often, but it will basically happen once per C++ function with complex CFG that called more than one other function that isn't known to be `nounwind`. I think, this is a missing fold in InstCombine, so i've implemented it. I think, the algorithm/implementation is rather self-explanatory: 1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate. 2. For each element, try to find from which aggregate it was extracted. If it was extracted from the aggregate with identical type, from identical element index, great. 3. If all elements were found to have been extracted from the same aggregate, then we can just use said original source aggregate directly, instead of re-creating it. 4. If we fail to find said aggregate when looking only in the current block, we need be PHI-aware - we might have different source aggregate when coming from each predecessor. I'm not sure if this already handles everything, and there are some FIXME's, i'll deal with all that later in followups. I'd be fine with going with post-commit review here code-wise, but just in case there are thoughts, i'm posting this. On RawSpeed, for example, this has the following effect: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 1253 \| 1253 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 948 \| 1355 \| 407 \| 42.93% \| 42.93% \| \| instcount.NumInsertValueInst \| 4382 \| 3210 \| -1172 \| -26.75% \| 26.75% \| \| simplifycfg.NumSinkCommonCode \| 574 \| 458 \| -116 \| -20.21% \| 20.21% \| \| simplifycfg.NumSinkCommonInstrs \| 1154 \| 921 \| -233 \| -20.19% \| 20.19% \| \| instcount.NumExtractValueInst \| 29017 \| 26397 \| -2620 \| -9.03% \| 9.03% \| \| instcombine.NumDeadInst \| 166618 \| 174705 \| 8087 \| 4.85% \| 4.85% \| \| instcount.NumPHIInst \| 51526 \| 50678 \| -848 \| -1.65% \| 1.65% \| \| instcount.NumLandingPadInst \| 20865 \| 20609 \| -256 \| -1.23% \| 1.23% \| \| instcount.NumInvokeInst \| 34023 \| 33675 \| -348 \| -1.02% \| 1.02% \| \| simplifycfg.NumSimpl \| 113634 \| 114708 \| 1074 \| 0.95% \| 0.95% \| \| instcombine.NumSunkInst \| 15030 \| 14930 \| -100 \| -0.67% \| 0.67% \| \| instcount.TotalBlocks \| 219544 \| 219024 \| -520 \| -0.24% \| 0.24% \| \| instcombine.NumCombined \| 644562 \| 645805 \| 1243 \| 0.19% \| 0.19% \| \| instcount.TotalInsts \| 2139506 \| 2135377 \| -4129 \| -0.19% \| 0.19% \| \| instcount.NumBrInst \| 156988 \| 156821 \| -167 \| -0.11% \| 0.11% \| \| instcount.NumCallInst \| 1206144 \| 1207076 \| 932 \| 0.08% \| 0.08% \| \| instcount.NumResumeInst \| 5193 \| 5190 \| -3 \| -0.06% \| 0.06% \| \| asm-printer.EmittedInsts \| 948580 \| 948299 \| -281 \| -0.03% \| 0.03% \| \| instcount.TotalFuncs \| 11509 \| 11507 \| -2 \| -0.02% \| 0.02% \| \| inline.NumDeleted \| 97595 \| 97597 \| 2 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 210514 \| 210522 \| 8 \| 0.00% \| 0.00% \| ``` So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half, and there is a very apparent decrease in instruction and basic block count. On vanilla llvm-test-suite: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 744 \| 744 \| 0.00% \| 0.00% \| \| instcount.NumInsertValueInst \| 2705 \| 2053 \| -652 \| -24.10% \| 24.10% \| \| simplifycfg.NumInvokes \| 1212 \| 1424 \| 212 \| 17.49% \| 17.49% \| \| instcount.NumExtractValueInst \| 21681 \| 20139 \| -1542 \| -7.11% \| 7.11% \| \| simplifycfg.NumSinkCommonInstrs \| 14575 \| 14361 \| -214 \| -1.47% \| 1.47% \| \| simplifycfg.NumSinkCommonCode \| 6815 \| 6743 \| -72 \| -1.06% \| 1.06% \| \| instcount.NumLandingPadInst \| 14851 \| 14712 \| -139 \| -0.94% \| 0.94% \| \| instcount.NumInvokeInst \| 27510 \| 27332 \| -178 \| -0.65% \| 0.65% \| \| instcombine.NumDeadInst \| 1438173 \| 1443371 \| 5198 \| 0.36% \| 0.36% \| \| instcount.NumResumeInst \| 2880 \| 2872 \| -8 \| -0.28% \| 0.28% \| \| instcombine.NumSunkInst \| 55187 \| 55076 \| -111 \| -0.20% \| 0.20% \| \| instcount.NumPHIInst \| 321366 \| 320916 \| -450 \| -0.14% \| 0.14% \| \| instcount.TotalBlocks \| 886816 \| 886493 \| -323 \| -0.04% \| 0.04% \| \| instcount.TotalInsts \| 7663845 \| 7661108 \| -2737 \| -0.04% \| 0.04% \| \| simplifycfg.NumSimpl \| 886791 \| 887171 \| 380 \| 0.04% \| 0.04% \| \| instcount.NumCallInst \| 553552 \| 553733 \| 181 \| 0.03% \| 0.03% \| \| instcombine.NumCombined \| 3200512 \| 3201202 \| 690 \| 0.02% \| 0.02% \| \| instcount.NumBrInst \| 741794 \| 741656 \| -138 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 14443 \| 14445 \| 2 \| 0.01% \| 0.01% \| \| asm-printer.EmittedInsts \| 7978085 \| 7977916 \| -169 \| 0.00% \| 0.00% \| \| inline.NumDeleted \| 73188 \| 73189 \| 1 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 291959 \| 291968 \| 9 \| 0.00% \| 0.00% \| ``` Roughly similar effect, less instructions and blocks total. See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e. Compile-time wise, this appears to be roughly geomean-neutral: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions And this is a win size-wize in general: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text See https://bugs.llvm.org/show_bug.cgi?id=47060 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85787	2020-08-16 23:27:56 +03:00
Sanjay Patel	3ffb751f3d	[InstCombine] fold copysign with fabs/fneg operand We already get this in the backend, but we need to do it in IR too to consistently get yet more copysign transforms.	2020-08-16 08:53:47 -04:00
Sanjay Patel	3fed67b7e6	[InstCombine] reduce code duplication; NFC	2020-08-16 08:53:47 -04:00
Wenlei He	577e58bcc7	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose. This is a resubmit of https://reviews.llvm.org/D83743	2020-08-15 20:17:21 -07:00
Luofan Chen	266949b2bc	[Attributor][NFC] Format code	2020-08-16 00:00:45 +08:00
Luofan Chen	b7448a348b	[Attributor][NFC] Use indexes instead of iterator When adding elements when iterating, the iterator will become valid, which could cause errors. This fixes the issue by using indexes instead of iterator.	2020-08-15 23:09:46 +08:00
Luofan Chen	87a85f3d57	[Attributor] Use internalized version of non-exact functions This patch internalize non-exact functions and replaces of their uses with the internalized version. Doing this enables the analysis of non-exact functions. We can do this because some non-exact functions with the same name whose linkage is `linkonce_odr` or `weak_odr` should have the same semantics, so we can safely internalize and replace use of them (the result of the other version of this function should be the same.). Note that not all functions can be internalized, e.g., function with `linkonce` or `weak` linkage. For now when specified in commandline, we internalize all functions that meet the requirements without calculating the cost of such internalzation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84167	2020-08-15 20:23:38 +08:00
Dávid Bolvanský	f134fc4f1b	Reland "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"	2020-08-15 12:14:57 +02:00
Martin Storsjö	3e7403a134	Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)" This reverts commit `6dbf0cfcf7`. That commit caused failed assertions, e.g. like this: $ cat sprintf-strcpy.c char ptr; void func(void) { ptr += sprintf(ptr, "%s", ""); } $ clang -c sprintf-strcpy.c -O2 -target x86_64-linux-gnu clang: ../lib/IR/Value.cpp:473: void llvm::Value::doRAUW(llvm::Value, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.	2020-08-15 09:35:11 +03:00
Dávid Bolvanský	f62de7c9c7	[SLC] Transform strncpy(dst, "text", C) to memcpy(dst, "text\0\0\0", C) for C <= 128 only Transformation creates big strings for big C values, so bail out for C > 128. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86004	2020-08-15 01:53:32 +02:00
Gui Andrade	05e3ab41e4	[MSAN] Avoid dangling ActualFnStart when replacing instruction This would be a problem if the entire instrumented function was a call to e.g. memcpy Use FnPrologueEnd Instruction* instead of ActualFnStart BB* Differential Revision: https://reviews.llvm.org/D86001	2020-08-14 23:50:38 +00:00
Christopher Tetreault	416a6a85b1	[SVE] Remove calls to VectorType::getNumElements from AggressiveInstCombine Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D82218	2020-08-14 16:40:34 -07:00
Jordan Rupprecht	38884641f2	Temporarily revert "[SCEVExpander] Add helper to clean up instrs inserted while expanding." This reverts commit `7829c33084`. The assertion is triggering on some internal code. A reduced test case is in progress.	2020-08-14 14:52:37 -07:00
Dávid Bolvanský	6dbf0cfcf7	[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str) Transform sprintf(dst, "%s", str) -> strcpy(dst, str) if result is unused Avoid sprintf(dest, "%s", str) -> llvm.memcpy(align 1 dest, align 1 str, strlen(str)+1) if optimizing for size. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85963	2020-08-14 23:48:53 +02:00
Gui Andrade	36ebabc153	[MSAN] Convert ActualFnStart to be a particular Instruction *, not BB This allows us to add addtional instrumentation before the function start, without splitting the first BB. Differential Revision: https://reviews.llvm.org/D85985	2020-08-14 21:43:56 +00:00
Gui Andrade	97de0188dd	[MSAN] Reintroduce libatomic load/store instrumentation Have the front-end use the `nounwind` attribute on atomic libcalls. This prevents us from seeing `invoke __atomic_load` in MSAN, which is problematic as it has no successor for instrumentation to be added.	2020-08-14 20:31:10 +00:00
Shinji Okumura	5f55a8193c	[Attributor] Implement AAPotentialValues This patch provides an implementation of `AAPotentialValues`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85632	2020-08-14 20:51:14 +09:00
Sam Parker	725400f993	[NFCI][SimpleLoopUnswitch] Adjust CostKind query When getUserCost was transitioned to use an explicit CostKind, TCK_CodeSize was used even though the original kind was implicitly SizeAndLatency so restore this behaviour. We now only query for CodeSize when optimising for minsize. I expect this to not change anything as, I think all, targets will currently return the same value for CodeSize and SizeLatency. Indeed I see no changes in the test suite for Arm, AArch64 and X86. Differential Revision: https://reviews.llvm.org/D85829	2020-08-14 07:54:20 +01:00
Arthur Eubanks	48cd5b72b1	Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)" This reverts commit `ab9fc8bae8`. Incorrect transformation if the result is used. Causes breakages, e.g. http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/8193/	2020-08-13 21:05:03 -07:00
Peter Collingbourne	c201f27225	hwasan: Emit the globals note even when globals are uninstrumented. This lets us support the scenario where a binary is linked from a mix of object files with both instrumented and non-instrumented globals. This is likely to occur on Android where the decision of whether to use instrumented globals is based on the API level, which is user-facing. Previously, in this scenario, it was possible for the comdat from one of the object files with non-instrumented globals to be selected, and since this comdat did not contain the note it would mean that the note would be missing in the linked binary and the globals' shadow memory would be left uninitialized, leading to a tag mismatch failure at runtime when accessing one of the instrumented globals. It is harmless to include the note when targeting a runtime that does not support instrumenting globals because it will just be ignored. Differential Revision: https://reviews.llvm.org/D85871	2020-08-13 16:33:22 -07:00
Dávid Bolvanský	ab9fc8bae8	[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str) Solves 46489	2020-08-14 00:05:55 +02:00
Dávid Bolvanský	5ef2287d36	[SLC] Optimize strncpy(a, a, C) to memcpy(a, a000, C) Solves PR47154	2020-08-13 22:22:51 +02:00
Aditya Kumar	1a8c9cd1d9	Fix PR45442: Bail out when MemorySSA information is not available Reviewers: sebpop, uabelho, fhahn Reviewed by: fhahn Differential Revision: https://reviews.llvm.org/D85881	2020-08-13 11:25:58 -07:00
Aditya Kumar	44716856db	Fix PR45442: Bail out when MemorySSA information is not available	2020-08-13 09:31:18 -07:00
Bjorn Pettersson	11446b02c7	[VectorCombine] Fix for non-zero addrspace when creating vector load from scalar load This is a fixup to commit `43bdac2906`, to make sure the address space from the original load pointer is retained in the vector pointer. Resolves problem with Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. due to address space mismatch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85912	2020-08-13 18:25:32 +02:00
Serguei Katkov	98ba0a5ffe	[InstCombine] Handle gc.relocate(null) in one iteration InstCombine adds users of transformed instruction to working list to process on the same iteration. However gc.relocate may have a hidden user (next gc.relocate) which is connected through gc.statepoint intrinsic and there is no direct def-use chain between them. In this case if the next gc.relocation is already processed it will not be added to worklist and will not be able to be processed on the same iteration. Let's we have the following case: A = gc.relocate(null) B = statepoint(A) C = gc.relocate(B, hidden(A)) If C is already considered then after replacement of A with null, statepoint B instruction will be added to the queue but not C. C can be processed only on the next iteration. If the chain of relocation is pretty long the many iteration may be required. This change is to reduce the number of iteration to meet the latest changes related to reducing infinite loop threshold. This is a quick (not best) fix. In the follow up patches I plan to move gc relocation handling into statepoint handler. This should also help to remove unused gc live entries in statepoint bundle. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D75598	2020-08-13 23:16:27 +07:00
David Stenberg	e8ebebb0bd	[InstCombine] Fix incorrect Modified status When removing instructions from unreachable blocks, and only debug info intrinsics were removed, InstCombine could incorrectly return a false Modified status. This is fixed by making removeAllNonTerminatorAndEHPadInstructions() also return how many debug info intrinsics that were removed, and take that into account. This was caught using the check introduced by D80916. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D85839	2020-08-13 15:10:41 +02:00
Florian Hahn	3b0878a370	[DSE,MSSA] Fix crash when using tryToMergePartialOverlappingStores. We are re-using tryToMergePartialOverlappingStores, which requires earlier to domiante Later. In the long run, tryToMergeParialOverlappingStores should be re-written using MemorySSA. Fixes PR46513.	2020-08-13 12:07:56 +01:00
Aditya Kumar	f902a7eccf	[HotColdSplit] Fix variable name spelling	2020-08-12 22:50:08 -07:00
Sanjay Patel	23bd33c6ac	[InstCombine] prefer xor with -1 because 'not' is easier to understand (PR32706) This is a retry of rL300977 which was reverted because of infinite loops. We have fixed all of the known places where that would happen, but there's still a chance that this patch will cause infinite loops. This matches the demanded bits behavior in the DAG and should fix: https://bugs.llvm.org/show_bug.cgi?id=32706 Differential Revision: https://reviews.llvm.org/D32255	2020-08-12 15:50:33 -04:00
Roman Lebedev	d6f0600c96	[NFC][InstCombine] Add FIXME's for getLogBase2() / visitUDivOperand() These are not correctness issues. In visitUDivOperand(), if the (potential) divisor is undef, then udiv is already UB, so it is not incorrect to keep undef as shift amount. But, that is suboptimal. We could instead simply drop that select, picking the other operand. Afterwards, getLogBase2() could assert that there is no undef in divisor.	2020-08-12 22:06:54 +03:00
Roman Lebedev	12d93a27e7	[InstCombine] Sanitize undef vector constant to 1 in X(2^C) with X << C (PR47133) While xundef is undef, shift-by-undef is poison, which we must avoid introducing. Also log2(iN undef) is NOT iN undef, because log2(iN undef) u< N. See https://bugs.llvm.org/show_bug.cgi?id=47133	2020-08-12 22:06:53 +03:00
Ilya Leoshkevich	f5a252ed68	[SanitizerCoverage] Use zeroext for cmp parameters on all targets Commit `9385aaa848` ("[sancov] Fix PR33732") added zeroext to __sanitizer_cov_trace(_const)?_cmp[1248] parameters for x86_64 only, however, it is useful on other targets, in particular, on SystemZ: it fixes swap-cmp.test. Therefore, use it on all targets. This is safe: if target ABI does not require zero extension for a particular parameter, zeroext is simply ignored. A similar change has been implemeted as part of commit `3bc439bdff` ("[MSan] Add instrumentation for SystemZ"), and there were no problems with it. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D85689	2020-08-12 18:38:12 +02:00
Sanjay Patel	cc892fd9f4	[VectorCombine] early exit if target has no vector registers Based on post-commit discussion in: D81766 Other vectorization passes (SLP and Loop) use this TTI API similarly.	2020-08-12 09:22:31 -04:00
Sanjay Patel	912c09e845	[InstCombine] eliminate a pointer cast around insertelement I'm not sure if this solves PR46839 completely, but reducing the casting should help: https://bugs.llvm.org/show_bug.cgi?id=46839 Differential Revision: https://reviews.llvm.org/D85647	2020-08-12 09:08:17 -04:00
Sam Parker	ea8448e361	[LoopUnroll] Adjust CostKind query When TTI was updated to use an explicit cost, TCK_CodeSize was used although the default implicit cost would have been the hand-wavey cost of size and latency. So, revert back to this behaviour. This is not expected to have (much) impact on targets since most (all?) of them return the same value for SizeAndLatency and CodeSize. When optimising for size, the logic has been changed to query CodeSize costs instead of SizeAndLatency. This patch also adds a testing option in the unroller so that OptSize thresholds can be specified. Differential Revision: https://reviews.llvm.org/D85723	2020-08-12 12:56:09 +01:00
Cullen Rhodes	511d5aaca3	[Transforms][SROA] Skip uses of allocas where the type is scalable When visiting load and store instructions in SROA skip scalable vectors. This is relevant in the implementation of the 'arm_sve_vector_bits' attribute that is used to define VLS types, where an alloca of a fixed-length vector could be bitcasted to scalable. See D85128 for more information. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85725	2020-08-12 09:35:48 +00:00
Kyungwoo Lee	d73be5af0a	[NFC] Factor out hasForceAttributes This is a preparation for https://reviews.llvm.org/D85586. Differential Revision: https://reviews.llvm.org/D85793	2020-08-12 02:16:57 -04:00
Sanjay Patel	b0b95dab1c	[VectorCombine] add safety check for 0-width register Based on post-commit discussion in D81766, Hexagon sets this to "0". I'll see if I can come up with a test, but making the obvious code fix first to unblock that target.	2020-08-11 20:30:02 -04:00
Vedant Kumar	30c1633386	Revert "[Instruction] Add updateLocationAfterHoist helper" This reverts commit `4a646ca9e2`. This is causing some bots to fail with "!dbg attachment points at wrong subprogram for function", like: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/67958/steps/stage%201%20check/logs/stdio	2020-08-11 14:54:09 -07:00
Amy Huang	54b6cca0f2	[globalopt] Change so that emitting fragments doesn't use the type size of DIVariables When turning on -debug-info-kind=constructor we ran into a "fragment covers entire variable" error during thinlto. The fragment is currently always emitted if there is no type size, but sometimes the variable has a forward declared struct type which doesn't have a size. This changes the code to get the type size from the GlobalVariable instead. Differential Revision: https://reviews.llvm.org/D85572	2020-08-11 14:50:56 -07:00
Kazu Hirata	cfdc96714b	[Instcombine] Fix uses of undef (PR46940) Without this patch, we attempt to distribute And over Xor even in unsafe circumstances like so: undef & (true ^ true) ==> (undef & true) ^ (undef & true) and evaluate it to undef instead of false. Note that "true ^ true" may show up implicitly with one true being part of a PHI node. This patch fixes the problem by teaching SimplifyUsingDistributiveLaws to not use undef as part of simplifications. Reviewers: spatel, aqjune, nikic, lebedev.ri, fhahn, jdoerfert Differential Revision: https://reviews.llvm.org/D85687	2020-08-11 14:13:32 -07:00
Vedant Kumar	4a646ca9e2	[Instruction] Add updateLocationAfterHoist helper Introduce a helper on Instruction which can be used to update the debug location after hoisting. Use this in GVN and LICM, where we were mistakenly introducing new line 0 locations after hoisting (the docs recommend dropping the location in this case). For more context, see the discussion in https://reviews.llvm.org/D60913. Differential Revision: https://reviews.llvm.org/D85670	2020-08-11 14:05:20 -07:00
Whitney Tsang	aa994d9867	[NFC][LoopUnrollAndJam] Use BasicBlock::replacePhiUsesWith instead of static function updatePHIBlocks. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85673	2020-08-11 15:35:14 +00:00
Dinar Temirbulatov	b1600d8b89	[NFC] Guard the cost report block of debug outputs with NDEBUG and switch to SmallString, this is part of D57779.	2020-08-11 16:34:47 +02:00
Kai Nacke	b3aece0531	[SystemZ/ZOS] Add binary format goff and operating system zos to the triple Adds the binary format goff and the operating system zos to the triple class. goff is selected as default binary format if zos is choosen as operating system. No further functionality is added. Reviewers: efriedma, tahonermann, hubert.reinterpertcast, MaskRay Reviewed By: efriedma, tahonermann, hubert.reinterpertcast Differential Revision: https://reviews.llvm.org/D82081	2020-08-11 05:26:26 -04:00
Florian Hahn	0b774acf11	[SLP] Make sure instructions are ordered when computing spill cost. The entries in VectorizableTree are not necessarily ordered by their position in basic blocks. Collect them and order them by dominance so later instructions are guaranteed to be visited first. For instructions in different basic blocks, we only scan to the beginning of the block, so their order does not matter, as long as all instructions in a basic block are grouped together. Using dominance ensures a deterministic order. The modified test case contains an example where we compute a wrong spill cost (2) without this patch, even though there is no call between any instruction in the bundle. This seems to have limited practical impact, .e.g on X86 with a recent Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006 there are no binary changes. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D82444	2020-08-11 11:18:12 +02:00
Dávid Bolvanský	c2f0101310	[InstCombine] ~(~X + Y) -> X - Y Proof: https://alive2.llvm.org/ce/z/4xharr Solves PR47051 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85593	2020-08-11 11:05:42 +02:00
Florian Hahn	7829c33084	[SCEVExpander] Add helper to clean up instrs inserted while expanding. SCEVExpander already tracks which instructions have been inserted n InsertedValues/InsertedPostIncValues. This patch adds an additional vector to collect the instructions in insertion order. This can then be used to remove exactly the instructions inserted by the expander. This replaces ExpandedValuesCleaner, which in some cases might remove values not inserted by the expander (e.g. if a value was dead before insertion and is then used during expansion). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D84327	2020-08-11 09:30:31 +01:00
Shinji Okumura	06eee8748f	[Attributor][NFC] Connect AAPotentialValues with AAValueSimplify This patch enables `AAValueSimplify` to use information from `AAPotentialValues` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85668	2020-08-11 15:52:02 +09:00
Wei Mi	4cd8e9b169	[SampleFDO] Stop letting findCalleeFunctionSamples return unrelated profiles for invoke instructions. We see a warning of "No debug information found in function foo: Function profile not used" in a case. The function foo is called by an invoke instruction. It has no debug information because it has attribute((nodebug)) in the definition. It shouldn't have profile instance in the sample profile but compiler thinks it does, that turns out to be a compiler bug in findCalleeFunctionSamples. The bug is exposed when sample-profile-merge-inlinee is enabled recently. Currently in findCalleeFunctionSamples, CalleeName is unset and is empty for invoke instruction. For empty CalleeName, findFunctionSamplesAt will treat the call as an indirect call and will return any inline instance profile at the same location as the instruction. That leads to a wrong profile being returned to function foo. The patch set CalleeName when the instruction is an invoke. Differential Revision: https://reviews.llvm.org/D85664	2020-08-10 12:41:09 -07:00
Fangrui Song	3b21a07fd7	[PGO] Delete dead comdat renaming code related to GlobalAlias. NFC A GlobalAlias is an address-taken user of its aliased function. canRenameComdatFunc has excluded such cases. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D85597	2020-08-10 09:02:04 -07:00
Sanjay Patel	bebca662d4	[InstCombine] rearrange code for readability; NFC The code comment refers to the path where we change the size of the integer type, so handle that first, otherwise deal with the general case.	2020-08-10 08:07:29 -04:00
Florian Hahn	8393b9fd1f	[LoopInterchange] Move instructions from preheader to outer loop header. Instructions defined in the original inner loop preheader may depend on values defined in the outer loop header, but the inner loop header will become the entry block in the loop nest. Move the instructions from the preheader to the outer loop header, so we do not break dominance. We also have to check for unsafe instructions in the preheader. If there are no unsafe instructions, all instructions should be movable. Currently we move all instructions except the terminator and rely on LICM to hoist out invariant instructions later. Fixes PR45743	2020-08-10 12:41:33 +01:00
Florian Hahn	54cb552b96	[LoopInterchange] Form LCSSA phis for values in orig outer loop header. Values defined in the outer loop header could be used in the inner loop latch. In that case, we need to create LCSSA phis for them, because after interchanging they will be defined in the new inner loop and used in the new outer loop.	2020-08-10 11:33:19 +01:00
Juneyoung Lee	ef018cb65c	[BuildLibCalls] Add noundef to standard I/O functions This patch adds noundef to return value and arguments of standard I/O functions. With this patch, passing undef or poison to the functions becomes undefined behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++, passing undef to them was already UB in source. With this patch, the functions cannot return undef or poison anymore as well. According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified value; 3.19.3 says unspecified value is a valid value of the relevant type, and using unspecified value is unspecified behavior, which is not UB, so it cannot be undef (using undef is UB when e.g. it is used at branch condition). — The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10). — The details of the value stored by the fgetpos function (7.21.9.1). — The details of the value returned by the ftell function for a text stream (7.21.9.4). In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85345	2020-08-10 10:58:25 +09:00
Florian Hahn	d236e1c7b6	[InstSimplify/NewGVN] Add option to control the use of undef. Making use of undef is not safe if the simplification result is not used to replace all uses of the result. This leads to problems in NewGVN, which does not replace all uses in the IR directly. See PR33165 for more details. This patch adds an option to SimplifyQuery to disable the use of undef. Note that I've only guarded uses if isa<UndefValue>/m_Undef where SimplifyQuery is currently available. If we agree on the general direction, I'll update the remaining uses. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84792	2020-08-09 19:16:56 +01:00
Florian Hahn	23817cbd0b	[SCEVExpander] Make sure cast properly dominates Builder's IP. The selected cast must properly dominate the Builder's IP, so we cannot re-use the cast, if it matches the builder's IP.	2020-08-09 16:51:19 +01:00
Aditya Kumar	53ac144848	[HotColdSplit] Add options for splitting cold functions in separate section Add support for (if enabled) splitting cold functions into a separate section in order to further boost locality of hot code. Authored by: rjf (Ruijie Fang) Reviewed by: hiraditya,rcorcs,vsk Differential Revision: https://reviews.llvm.org/D85331	2020-08-09 08:48:12 -07:00
Sanjay Patel	43bdac2906	[VectorCombine] try to create vector loads from scalar loads This patch was adjusted to match the most basic pattern that starts with an insertelement (so there's no extract created here). Hopefully, that removes any concern about interfering with other passes. Ie, the transform should almost always be profitable. We could make an argument that this could be part of canonicalization, but we conservatively try not to create vector ops from scalar ops in passes like instcombine. If the transform is not profitable, the backend should be able to re-scalarize the load. Differential Revision: https://reviews.llvm.org/D81766	2020-08-09 09:05:06 -04:00
Florian Hahn	c70f0b9d4a	[SCEVExpander] Avoid re-using existing casts if it means updating users. Currently the SCEVExpander tries to re-use existing casts, even if they are not exactly at the insertion point it was asked to create the cast. To do so in some case, it creates a new cast at the insertion point and updates all users to use the new cast. This behavior is problematic, because it changes the IR outside of the instructions created during the expansion. Therefore we cannot completely undo all changes made during expansion. This re-use should be only an extra optimization, so only using the new cast in the expanded instructions should not be a correctness issue. There are many cases equivalent instructions are created during expansion. This patch also adjusts findInsertPointAfter to skip instructions inserted during expansion. This enables re-using existing casts without the renaming any uses, by picking a better insertion point. Reviewed By: efriedma, lebedev.ri Differential Revision: https://reviews.llvm.org/D84399	2020-08-09 13:25:17 +01:00
Simon Pilgrim	f13e92d4b2	[InstCombine] Use CreateVectorSplat(ElementCount) variant directly This was introduced at rGe20223672100, and the CreateVectorSplat(unsigned NumElements) variant calls it internally	2020-08-08 19:26:02 +01:00
Roman Lebedev	e492f0e03b	[SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics SimplifyCFG has two main folds for resumes - one when resume is directly using the landingpad, and the other one where resume is using a PHI node. While for the first case, we were already correctly ignoring all the PHI nodes, and both the debug info intrinsics and lifetime intrinsics, in the PHI-based-one, we weren't ignoring PHI's in the resume block, and weren't ignoring lifetime intrinsics. That is clearly a bug. On RawSpeed library, this results in +9.34% (+81) more invoke->call folds, -0.19% (-39) landing pads, -0.24% (-81) invoke instructions but +51 call instructions and -132 basic blocks. Though, the run-time performance impact appears to be within the noise.	2020-08-08 20:00:28 +03:00
Roman Lebedev	1f452ac1d7	[NFC][SimplifyCFG] Rewrite isCleanupBlockEmpty() to be iterator_range-based	2020-08-08 20:00:28 +03:00
Roman Lebedev	a587bf3eb0	[NFC][SimplifyCFG] Count the number of invokes turned into calls due to empty cleanup blocks	2020-08-08 20:00:27 +03:00
Juneyoung Lee	b6d9add71b	[InstCombine] Optimize select(freeze(icmp eq/ne x, y), x, y) This patch adds an optimization that folds select(freeze(icmp eq/ne x, y), x, y) to x or y. This was needed to resolve slowdown after D84940 is applied. I tried to bake this logic into foldSelectInstWithICmp, but it wasn't clear. This patch conservatively writes the pattern in a separate function, foldSelectWithFrozenICmp. The output does not need freeze; https://alive2.llvm.org/ce/z/X49hNE (from @nikic) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85533	2020-08-08 15:22:29 +09:00
Gui Andrade	17ff170e3a	Revert "[MSAN] Instrument libatomic load/store calls" Problems with instrumenting atomic_load when the call has no successor, blocking compiler roll This reverts commit `33d239513c`.	2020-08-07 19:45:51 +00:00
Yuanfang Chen	954bd9c861	[NewPM] Only verify loop for nonskipped user loop pass No verification for pass mangers since it is not needed. No verification for skipped loop pass since the asserted condition is not used. Add a BeforeNonSkippedPass callback for this. The callback needs more inputs than its parameters to work so the callback is added on-the-fly. Reviewed By: aeubanks, asbirlea Differential Revision: https://reviews.llvm.org/D84977	2020-08-07 11:00:31 -07:00
Jay Foad	ffe1edfc53	[NFC][GVN] Fix "avaliable" typos Differential Revision: https://reviews.llvm.org/D85520	2020-08-07 14:22:24 +01:00
Shinji Okumura	c575ba28de	[Attributor] AAPotentialValues Interface This is a split patch of D80991. This patch introduces AAPotentialValues and its interface only. For more detail of AAPotentialValues abstract attribute, see the original patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83283	2020-08-07 17:35:12 +09:00
Shinji Okumura	f13f2e16f0	[Attributor] Check violation of returned position nonnull and noundef attribute in AAUndefinedBehavior This patch is a follow up of D84733. If a function has noundef attribute in returned position, instructions that return undef or poison value cause UB. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85178	2020-08-07 12:02:42 +09:00
Roman Lebedev	be02adfad7	[InstCombine] Fold (x + C1) * (-1<<C2) --> (-C1 - x) * (1<<C2) Negator knows how to do this, but the one-use reasoning is getting a bit muddy here, we don't really want to increase instruction count, so we need to both lie that "IsNegation" and have an one-use check on the outermost LHS value.	2020-08-06 23:40:16 +03:00
Roman Lebedev	0c1c756a31	[InstCombine] Generalize %x * (-1<<C) --> (-%x) * (1<<C) fold Multiplication is commutative, and either of operands can be negative, so if the RHS is a negated power-of-two, we should try to make it true power-of-two (which will allow us to turn it into a left-shift), by trying to sink the negation down into LHS op. But, we shouldn't re-invent the logic for sinking negation, let's just use Negator for that. Tests and original patch by: Simon Pilgrim @RKSimon! Differential Revision: https://reviews.llvm.org/D85446	2020-08-06 23:39:53 +03:00
Roman Lebedev	7ce76b06ec	[InstCombine] Fold sdiv exact X, -1<<C --> -(ashr exact X, C) While that does increases instruction count, shift is obviously better than a division. Name: base Pre: (1<<C1) >= 0 %o0 = shl i8 1, C1 %r = sdiv exact i8 C0, %o0 => %r = ashr exact i8 C0, C1 Name: neg %o0 = shl i8 -1, C1 %r = sdiv exact i8 C0, %o0 => %t0 = ashr exact i8 C0, C1 %r = sub i8 0, %t0 Name: reverse Pre: C1 != 0 && C1 u< 8 %t0 = ashr exact i8 C0, C1 %r = sub i8 0, %t0 => %o0 = shl i8 -1, C1 %r = sdiv exact i8 C0, %o0 https://rise4fun.com/Alive/MRplf	2020-08-06 23:37:16 +03:00
Roman Lebedev	47aec80e4a	[NFC][InstCombine] Negator: add a comment about negating exact arithmentic shift	2020-08-06 23:37:16 +03:00
Roman Lebedev	442cb88f53	[InstCombine] Generalize sdiv exact X, 1<<C --> ashr exact X, C fold to handle non-splat vectors	2020-08-06 23:37:15 +03:00
Anton Afanasyev	a7478fab6c	[SLP] Fix order of `insertelement`/`insertvalue` seed operands Summary: This patch takes the indices operands of `insertelement`/`insertvalue` into account while generation of seed elements for `findBuildAggregate()`. This function has kept the original order of `insert`s before. Also this patch optimizes `findBuildAggregate()` preventing it from redundant temporary vector allocations and its multiple reversing. Fixes llvm.org/pr44067 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83779	2020-08-06 22:09:24 +03:00
Juneyoung Lee	c771087161	[InstCombine] Fold freeze(undef) into a proper constant This is a simple patch that folds freeze(undef) into a proper constant after inspecting its uses. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84948	2020-08-06 18:40:04 +09:00
David Green	745bf6cf44	[LoopVectorizer] Inloop vector reductions Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069	2020-08-06 10:10:50 +01:00
Roman Lebedev	a512c89476	[NFC][InstCombine] Refactor '(-NSW x) pred x' fold	2020-08-06 11:50:36 +03:00
Roman Lebedev	141357663e	[InstCombine] (-NSW x) u<= x --> x s<=0 (PR39480) Name: (-x) u<= x --> x s<= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ule i8 %neg_x, %x => %r = icmp sle i8 %x, 0 https://rise4fun.com/Alive/V22 https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:36 +03:00
Roman Lebedev	132be1f502	[InstCombine] (-NSW x) u< x --> x s< 0 (PR39480) Name: (-x) u< x --> x s< 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ult i8 %neg_x, %x => %r = icmp slt i8 %x, 0 https://rise4fun.com/Alive/zSuf https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:36 +03:00
Roman Lebedev	0e1241a3c9	[InstCombine] (-NSW x) u>= x --> x s>= 0 (PR39480) Name: (-x) u>= x --> x s>= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp uge i8 %neg_x, %x => %r = icmp sge i8 %x, 0 https://rise4fun.com/Alive/LLHd https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00
Roman Lebedev	16c642fa39	[InstCombine] (-NSW x) u> x --> x s> 0 (PR39480) Name: (-x) u> x --> x s> 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ugt i8 %neg_x, %x => %r = icmp sgt i8 %x, 0 https://rise4fun.com/Alive/Raea https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00
Roman Lebedev	59387c0dd7	[InstCombine] (-NSW x) s<= x --> x s>= 0 (PR39480) Name: (-x) s<= x --> x >= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp sle i8 %neg_x, %x => %r = icmp sge i8 %x, 0 https://rise4fun.com/Alive/91k https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00
Roman Lebedev	01a6c4bd26	[InstCombine] (-NSW x) s< x --> x s> 0 (PR39480) Name: (-x) s< x --> x > 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp slt i8 %neg_x, %x => %r = icmp sgt i8 %x, 0 https://rise4fun.com/Alive/3IXb https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00
Roman Lebedev	3885207651	[InstCombine] (-NSW x) s>= x --> x s<= 0 (PR39480) Name: (-x) s>= x --> x s<= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp sge i8 %neg_x, %x => %r = icmp sle i8 %x, 0 https://rise4fun.com/Alive/Hdip https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:34 +03:00
Roman Lebedev	8878b79cfe	[InstCombine] (-NSW x) ==/!= x --> x ==/!= 0 (PR39480) Name: (-x) == x --> x == 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp eq i8 %neg_x, %x => %r = icmp eq i8 %x, 0 Name: (-x) != x --> x != 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ne i8 %neg_x, %x => %r = icmp ne i8 %x, 0 https://rise4fun.com/Alive/4slH https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:34 +03:00
Roman Lebedev	5060f5682b	[InstCombine] (-NSW x) s> x --> x s< 0 (PR39480) Name: (-x) s> x --> x s< 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp sgt i8 %neg_x, %x => %r = icmp slt i8 %x, 0 https://rise4fun.com/Alive/ZslD https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:34 +03:00
Chuanqi Xu	92f1f1e40d	[Coroutines] Use to collect lifetime marker of in CoroFrame Differential Revision: https://reviews.llvm.org/D85279	2020-08-06 14:21:55 +08:00
Juneyoung Lee	9f717d7b94	[JumpThreading] Allow duplicating a basic block into preds when its branch condition is freeze(phi) This is the last JumpThreading patch for getting the performance numbers shown at https://reviews.llvm.org/D84940#2184653 . This patch makes ProcessBlock call ProcessBranchOnPHI when the branch condition is freeze(phi) as well (originally it calls the function when the condition is phi only). Since what ProcessBranchOnPHI does is to duplicate the basic block into predecessors if profitable, it is still valid when the condition is freeze(phi) too. ``` p = phi [a, pred1] [b, pred2] p.fr = freeze p br p.fr, ... => pred1: p.fr = freeze a br p.fr, ... pred2: p.fr2 = freeze b br p.fr2, ... ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85029	2020-08-06 09:51:17 +09:00
Sanjay Patel	c66169136f	[InstCombine] fold icmp with 'mul nsw/nuw' and constant operands This also removes a more specific fold that only handled icmp with 0. https://rise4fun.com/Alive/sdM9 Name: mul nsw with icmp eq Pre: (C1 != 0) && (C2 % C1) == 0 %a = mul nsw i8 %x, C1 %r = icmp eq i8 %a, C2 => %r = icmp eq i8 %x, C2 / C1 Name: mul nuw with icmp eq Pre: (C1 != 0) && (C2 %u C1) == 0 %a = mul nuw i8 %x, C1 %r = icmp eq i8 %a, C2 => %r = icmp eq i8 %x, C2 /u C1 Name: mul nsw with icmp ne Pre: (C1 != 0) && (C2 % C1) == 0 %a = mul nsw i8 %x, C1 %r = icmp ne i8 %a, C2 => %r = icmp ne i8 %x, C2 / C1 Name: mul nuw with icmp ne Pre: (C1 != 0) && (C2 %u C1) == 0 %a = mul nuw i8 %x, C1 %r = icmp ne i8 %a, C2 => %r = icmp ne i8 %x, C2 /u C1	2020-08-05 17:29:32 -04:00
Roman Lebedev	f3056dcc02	[InstCombine] Negator: -(cond ? x : -x) --> cond ? -x : x We were errneously only doing that for old-style abs/nabs, but we have no such legality check on the condition of the select. https://rise4fun.com/Alive/xBHS	2020-08-05 21:47:30 +03:00
Evgenii Stepanov	f2c0423995	[msan] Remove readnone and friends from call sites. MSan removes readnone/readonly and similar attributes from callees, because after MSan instrumentation those attributes no longer apply. This change removes the attributes from call sites, as well. Failing to do this may cause DSE of paramTLS stores before calls to readonly/readnone functions. Differential Revision: https://reviews.llvm.org/D85259	2020-08-05 10:34:45 -07:00
Jordan Rupprecht	3c39db0c44	Revert "[LoopVectorizer] Inloop vector reductions" This reverts commit `e9761688e4`. It breaks the build: ``` ~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>' return ReductionOperations; ```	2020-08-05 10:24:15 -07:00
David Green	e9761688e4	[LoopVectorizer] Inloop vector reductions Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069	2020-08-05 18:14:05 +01:00
Roman Lebedev	a05ec856a3	[NFC][InstCombine] Negator: include all the needed headers, IWYU	2020-08-05 20:12:36 +03:00
Roman Lebedev	3a3c9519e2	[InstCombine] Negator: 0 - (X + Y) --> (-X) - Y iff a single operand negated This was the most obvious regression in f5df5cd5586ae9cfb2d9e53704dfc76f47aff149.f5df5cd5586ae9cfb2d9e53704dfc76f47aff149 We really don't want to do this if the original/outermost subtraction isn't a negation, and therefore doesn't go away - just sinking negation isn't a win. We are actually appear to be missing folds so hoist it. https://rise4fun.com/Alive/tiVe	2020-08-05 20:01:13 +03:00
Roman Lebedev	f5df5cd558	Recommit "[InstCombine] Negator: -(X << C) --> X * (-1 << C)" This reverts commit `ac70b37a00` which reverted commit `8aeb2fe13a` because codegen tests got broken and i needed time to investigate. This shows some regressions in tests, but they are all around GEP's, so i'm not really sure how important those are. https://rise4fun.com/Alive/1Gn	2020-08-05 15:59:13 +03:00
Juneyoung Lee	e0d99e9aaf	[JumpThreading] Consider freeze as a zero-cost instruction This is a simple patch that makes freeze as a zero-cost instruction, as bitcast already is. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85023	2020-08-05 14:42:36 +09:00
Roman Lebedev	ac70b37a00	Revert "[InstCombine] Negator: -(X << C) --> X * (-1 << C)" Breaks codegen tests, will recommit later. This reverts commit `8aeb2fe13a`.	2020-08-05 03:19:38 +03:00
Roman Lebedev	8aeb2fe13a	[InstCombine] Negator: -(X << C) --> X * (-1 << C) This shows some regressions in tests, but they are all around GEP's, so i'm not really sure how important those are. https://rise4fun.com/Alive/1Gn	2020-08-05 03:13:14 +03:00
Adrian Prantl	bf82ff61a6	Teach SROA to handle allocas with more than one dbg.declare. It is technically legal for optimizations to create an alloca that is used by more than one dbg.declare, if one or both of them are inlined instances of aliasing variables. Differential Revision: https://reviews.llvm.org/D85172	2020-08-04 15:54:51 -07:00
Arthur Eubanks	f50b3ff02e	[Hexagon] Use InstSimplify instead of ConstantProp This is the last remaining use of ConstantProp, migrate it to InstSimplify in the goal of removing ConstantProp. Add -hexagon-instsimplify option to enable skipping of instsimplify in tests that can't handle the extra optimization. Differential Revision: https://reviews.llvm.org/D85047	2020-08-04 15:42:39 -07:00
Ilya Leoshkevich	153df1373e	[SanitizerCoverage] Fix types of __stop* and __start* symbols If a section is supposed to hold elements of type T, then the corresponding CreateSecStartEnd()'s Ty parameter represents T. Forwarding it to GlobalVariable constructor causes the resulting GlobalVariable's type to be T, and its SSA value type to be T**, which is one indirection too many. This issue is mostly masked by pointer casts, however, the global variable still gets an incorrect alignment, which causes SystemZ to choose wrong instructions to access the section.	2020-08-04 21:53:27 +02:00
Bardia Mahjour	3c0f347002	[NFC][LV] Vectorized Loop Skeleton Refactoring This patch tries to improve readability and maintenance of createVectorizedLoopSkeleton by reorganizing some lines, updating some of the comments and breaking it up into smaller logical units. Reviewed By: pjeeva01 Differential Revision: https://reviews.llvm.org/D83824	2020-08-04 14:50:57 -04:00
Nikita Popov	4564974504	[SCCP] Propagate inequalities Teach SCCP to create notconstant lattice values from inequality assumes and nonnull metadata, and update getConstant() to make use of them. Additionally isOverdefined() needs to be changed to consider notconstant an overdefined value. Handling inequality branches is delayed until our branch on undef story in other passes has been improved. Differential Revision: https://reviews.llvm.org/D83643	2020-08-04 20:20:52 +02:00
Juneyoung Lee	e734e8286b	[JumpThreading] Remove cast's constraint As discussed in D84949, this removes the constraint to cast since it does not cause compile time degradation. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D85188	2020-08-04 19:09:25 +09:00
Juneyoung Lee	6f97103b56	[JumpThreading] Don't limit the type of an operand Compared to the optimized code with branch conditions never frozen, limiting the type of freeze's operand causes generation of suboptimal code in some cases. I would like to suggest removing the constraint, as this patch does. If the number of freeze instructions becomes significant, this can be revisited. Differential Revision: https://reviews.llvm.org/D84949	2020-08-04 16:21:58 +09:00
Fangrui Song	b959906cb9	[PGO] Use multiple comdat groups for COFF D84723 caused multiple definition issues (related to comdat) on Windows: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/67465	2020-08-03 21:33:16 -07:00
Fangrui Song	e56626e438	[PGO] Move __profc_ and __profvp_ from their own comdat groups to __profd_'s comdat group D68041 placed `__profc_`, `__profd_` and (if exists) `__profvp_` in different comdat groups. There are some issues: * Cost: one or two additional section headers (`.group` section(s)): 64 or 128 bytes on ELF64. * `__profc_`, `__profd_` and (if exists) `__profvp_` should be retained or discarded. Placing them into separate comdat groups is conceptually inferior. * If the prevailing group does not include `__profvp_` (value profiling not used) but a non-prevailing group from another translation unit has `__profvp_` (the function is inlined into another and triggers value profiling), there will be a stray `__profvp_` if --gc-sections is not enabled. This has been fixed by `3d6f53018f`. Actually, we can reuse an existing symbol (we choose `__profd_`) as the group signature to avoid a string in the string table (the sole reason that D68041 could improve code size is that `__profv_` was an otherwise unused symbol which wasted string table space). This saves one or two section headers. For a -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_INSTRUMENTED=IR build, `ninja clang lld`, the patch has saved 10.5MiB (2.2%) for the total .o size. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D84723	2020-08-03 20:35:50 -07:00
Max Kazantsev	7647c2716e	[SimpleLoopUnswitch][NFC] Add option to always drop make.implicit metadata in non-trivial unswitching and save compile time We might want this if we find out that using of MustExecute analysis is too expensive. By default we do the analysis because its complexity does not exceed the complexity of whole loop copying in unswitching. Follow-up for D84925. Differential Revision: https://reviews.llvm.org/D85001 Reviewed By: asbirlea	2020-08-04 10:16:40 +07:00
Shinji Okumura	ffe0066b62	[Attributor][NFC] Clang format	2020-08-04 09:04:12 +09:00
Hiroshi Yamauchi	3e89cbf38e	[PGO] Enable the extended value profile buckets for mem op sizes. Following up D81682 and enable the new, extended value profile buckets for mem op sizes. Differential Revision: https://reviews.llvm.org/D83903	2020-08-03 12:25:11 -07:00
Arthur Eubanks	456f38a971	Fix layering violation Transforms/Utils -> Scalar Introduced in D85063.	2020-08-03 11:53:23 -07:00
Florian Hahn	1e392fc445	[ArgPromotion] Replace all md uses of promoted values with undef. Currently, ArgPromotion may leave metadata uses of promoted values, which will end up in the wrong function, creating invalid IR. PR33641 fixed this for dead arguments, but it can be also be triggered arguments with users that are promoted (see the updated test case). We also have to drop uses to them after promoting them. We need to do this after dealing with the non-metadata uses, so I also moved the empty use case to the loop that deals with updating the arguments of the new function. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D85127	2020-08-03 19:31:53 +01:00
Hiroshi Yamauchi	f78f509c75	[PGO] Extend the value profile buckets for mem op sizes. Extend the memop value profile buckets to be more flexible (could accommodate a mix of individual values and ranges) and to cover more value ranges (from 11 to 22 buckets). Disabled behind a flag (to be enabled separately) and the existing code to be removed later. Differential Revision: https://reviews.llvm.org/D81682	2020-08-03 11:04:32 -07:00
Arthur Eubanks	7c19c89dd5	[NewPM][LoopVersioning] Port LoopVersioning to NPM Reviewed By: ychen, fhahn Differential Revision: https://reviews.llvm.org/D85063	2020-08-03 10:32:09 -07:00
Gui Andrade	3ebd1ba64f	[MSAN] Instrument freeze instruction by clearing shadow Freeze always returns a defined value. This also prevents msan from checking the input shadow, which happened because freeze wasn't explicitly visited. Differential Revision: https://reviews.llvm.org/D85040	2020-08-03 16:42:17 +00:00
Sanjay Patel	23693ffc3b	[InstCombine] reduce xor-of-or's bitwise logic (PR46955); 2nd try The 1st try at this (rG2265d01f2a5b) exposed what looks like unspecified behavior in C/C++ resulting in test variations. The arguments to BinaryOperator::CreateAnd() were both IRBuilder function calls, and the order in which they execute determines the order of the new instructions in the IR. But the order of function arg evaluation is not fixed by the rules of C/C++, so depending on compiler config, the test would fail because the test expected a single fixed ordering of instructions. Original commit message: I tried to use m_Deferred() on this, but didn't find a clean way to do that. http://bugs.llvm.org/PR46955 https://alive2.llvm.org/ce/z/2h6QTq	2020-08-03 10:21:56 -04:00
Sanjay Patel	f19a9be385	Revert "[InstCombine] reduce xor-of-or's bitwise logic (PR46955)" This reverts commit `2265d01f2a`. Seeing bot failures after this change like: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/42586	2020-08-03 08:58:41 -04:00
Sanjay Patel	2265d01f2a	[InstCombine] reduce xor-of-or's bitwise logic (PR46955) I tried to use m_Deferred() on this, but didn't find a clean way to do that. http://bugs.llvm.org/PR46955 https://alive2.llvm.org/ce/z/2h6QTq	2020-08-03 08:31:43 -04:00
Florian Hahn	98db27711d	[LV] Do not check widening decision for instrs outside of loop. No widening decisions will be computed for instructions outside the loop. Do not try to get a widening decision. The load/store will be just a scalar load, so treating at as normal should be fine I think. Fixes PR46950. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85087	2020-08-03 10:09:24 +01:00
Shinji Okumura	434cf2ded3	[Attributor] Check nonnull attribute violation in AAUndefinedBehavior This patch makes it possible to handle nonnull attribute violation at callsites in AAUndefinedBehavior. If null pointer is passed to callee at a callsite and the corresponding argument of callee has nonnull attribute, the behavior of the callee is undefined. In this patch, violations of argument nonnull attributes is only handled. But violations of returned nonnull attributes can be handled and I will implement that in a follow-up patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84733	2020-08-03 17:12:50 +09:00
Florian Hahn	599955eb56	Recommit "[IPConstProp] Remove and move tests to SCCP." This reverts commit `59d6e814ce`. The cause for the revert (3 clang tests running opt -ipconstprop) was fixed by removing those lines.	2020-08-02 22:23:54 +01:00
Shinji Okumura	376b64926b	Revert "[Attributor] AAPotentialValues Interface" The commit cause build failure.	2020-08-02 22:49:52 +09:00
Shinji Okumura	d3f01b6681	[Attributor] AAPotentialValues Interface This is a split patch of D80991. This patch introduces AAPotentialValues and its interface only. For more detail of AAPotentialValues abstract attribute, see the original patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83283	2020-08-02 19:12:17 +09:00
AK	20797989ea	Outline non returning functions unless a longjmp __assert_fail, abort, exit etc. are cold. TODO: outline throw Authored by: rjf (Ruijie Fang) Reviewed by: hiraditya,tejohnson,fhahn Differential Revision: https://reviews.llvm.org/D69257	2020-08-01 22:16:14 -07:00
Kazu Hirata	60434989e5	Use llvm::is_contained where appropriate (NFC) Use llvm::is_contained where appropriate (NFC) Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D85083	2020-08-01 21:51:06 -07:00
Nikita Popov	25af353b0e	[NewPM][LVI] Abandon LVI after CVP As mentioned on D70376, LVI can currently cause performance issues when running under NewPM. The problem is that, unlike the legacy pass manager, NewPM will not immediately discard the LVI analysis if the following pass does not need it. This is a problem, because LVI has a high memory requirement, and mass invalidation of LVI values is very inefficient. LVI should only be alive during passes that actively interact with it. This patch addresses the issue by explicitly abandoning LVI after CVP, which gets us back to the LegacyPM behavior. Differential Revision: https://reviews.llvm.org/D84959	2020-08-01 23:47:46 +02:00
Craig Topper	4a19e6156e	[InstCombine] Fold abs(-x) -> abs(x) Negating the input doesn't matter. I left a FIXME to copy the nsw flag if its present on the neg but not on the abs. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85055	2020-08-01 13:25:00 -07:00
Florian Hahn	05b44f7eae	[LCSSA] Provide option for caller to clean up unused PHIs. formLCSSAForInstructions is used by SCEVExpander, which tracks all inserted instructions including LCSSA phis using asserting value handles. This means cleanup needs to happen in the caller. Extend formLCSSAForInstructions to take an optional pointer to a vector. If this argument is non-nullptr, instead of directly deleting the phis, add them to the vector, so the caller can process them. This should address various PPC buildbot failures, including http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/40567	2020-08-01 20:43:19 +01:00
Florian Hahn	a9b06a2c14	[LCSSA] Use IRBuilder for PHI creation. Use IRBuilder instead PHINode::Create. This should not impact the generated code, but IRBuilder provides a way to register callbacks for inserted instructions, which is convenient for some users. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D85037	2020-08-01 18:44:15 +01:00
Chen Zheng	8c5edf5023	[SCEV] don't query getSCEV() for incomplete phis querying getSCEV() for incomplete phis leads to wrong cache value in `ExprToIVMap`, because incomplete phis may be simplified to same value before get SCEV expression. Reviewed By: lebedev.ri, mkazantsev Differential Revision: https://reviews.llvm.org/D77560	2020-08-01 02:38:54 -04:00
Sidharth Baveja	b7cfa6ca92	[Loop Peeling] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling. The reason for this change is that Loop Peeling is no longer only being used by loop unrolling; Patch D82927 introduces loop peeling with fusion, such that loops can be modified to have to same trip count, making them legal to be peeled. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D83056	2020-07-31 18:31:58 +00:00
Teresa Johnson	1479cdfe4f	[ThinLTO] Compile time improvement to propagateAttributes I found that propagateAttributes was ~23% of a thin link's run time (almost 4x higher than the second hottest function). The main reason is that it re-examines a global var each time it is referenced. This becomes unnecessary once it is marked both non read only and non write only. I added a set to avoid doing redundant work, which dropped the runtime of that thin link by almost 15%. I made a smaller efficiency improvement (no measurable impact) to skip all summaries for a VI if the first copy is dead. I added an assert to ensure that all copies are dead if any is. The code in computeDeadSymbols marks all summaries for a VI as live. There is one corner case where it was skipping marking an alias as live, that I fixed. However, since the code earlier marked all copies of a preserved GUID's VI as live, and each 'visit' marks all copies live, the only case where this could make a difference is summaries that were marked live when they were built initially, and that is only a few special compiler generated symbols and inline assembly symbols, so it likely is never provoked in practice. Differential Revision: https://reviews.llvm.org/D84985	2020-07-31 10:54:02 -07:00
Florian Hahn	3b0d30ffd3	[SCEVExpander] Name temporary instructions for LCSSA insertion (NFC).	2020-07-31 18:16:46 +01:00
Hongtao Yu	d23c1d6a8d	[AutoFDO] Avoid merging inlinee samples multiple times A function call can be replicated by optimizations like loop unroll and jump threading and the replicates end up sharing the sample nested callee profile. Therefore when it comes to merging samples for uninlined callees in the sample profile inliner, a callee profile can be merged multiple times which will cause an assert to fire. This change avoids merging same callee profile for duplicate callsites by filtering out callee profiles with a non-zero head sample count. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D84997	2020-07-31 09:30:05 -07:00
Benjamin Kramer	c6f08b14d4	Hide some internal symbols. NFC.	2020-07-31 17:28:02 +02:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Juneyoung Lee	ad48367722	[JumpThreading] Let SimplifyPartiallyRedundantLoad look into freeze This patch allows SimplifyPartiallyRedundantLoad work when the branch condition was frozen. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84944	2020-07-31 15:28:24 +09:00
Max Kazantsev	8aaeee5fb6	[SimpleLoopUnswitch] Preserve make.implicit in non-trivial unswitch if legal We can preserve make.implicit metadata in the split block if it is guaranteed that after following the branch we always reach the block where processing of null case happens, which is equivalent to "initial condition must execute if the loop is entered". Differential Revision: https://reviews.llvm.org/D84925 Reviewed By: asbirlea	2020-07-31 11:38:43 +07:00
Max Kazantsev	d889e17eca	[SimpleLoopUnswitch] Drop make.implicit metadata in case of non-trivial unswitching Non-trivial unswitching simply moves terminator being unswitch from the loop up to the switch block. It also preserves all metadata that was there. It might not be a correct thing to do for `make.implicit` metadata. Consider case: ``` for (...) { cond = // computed in loop if (cond) return X; if (p == null) throw_npe(); !make implicit } ``` Before the unswitching, if `p` is null and we reach this check, we are guaranteed to go to `throw_npe()` block. Now we unswitch on `p == null` condition: ``` if (p == null) !make implicit { for (...) { if (cond) return X; throw_npe() } } else { for (...) { if (cond) return X; } } ``` Now, following `true` branch of `p == null` does not always lead us to `throw_npe()` because the loop has side exit. Now, if we run ImplicitNullCheck pass on this code, it may end up making the unswitch condition implicit. This may lead us to turning normal path to `return X` into signal-throwing path, which is not efficient. Note that this does not happen during trivial unswitch: it guarantees that we do not have side exits before condition being unswitched. This patch fixes this situation by unconditional dropping of `make.implicit` metadata when we perform non-trivial unswitch. We could preserve it if we could prove that the condition always executes. This can be done as a follow-up. Differential Revision: https://reviews.llvm.org/D84916 Reviewed By: asbirlea	2020-07-31 11:33:02 +07:00
Wei Mi	836991d367	Fix a crash when the sample profile uses md5 and -sample-profile-merge-inlinee is enabled. When -sample-profile-merge-inlinee is enabled, new FunctionSamples may be created during profile merge without GUIDToFuncNameMap being initialized. That will occasionally cause compiler crash. The patch fixes it. Differential Revision: https://reviews.llvm.org/D84994	2020-07-30 21:21:06 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Vitaly Buka	b256cb88a7	[ValueTracking] Remove AllocaForValue parameter findAllocaForValue uses AllocaForValue to cache resolved values. The function is used only to resolve arguments of lifetime intrinsic which usually are not fare for allocas. So result reuse is likely unnoticeable. In followup patches I'd like to replace the function with GetUnderlyingObjects. Depends on D84616. Differential Revision: https://reviews.llvm.org/D84617	2020-07-30 18:48:34 -07:00
Vitaly Buka	61cab352e3	[NFC] Move findAllocaForValue into ValueTracking.h Differential Revision: https://reviews.llvm.org/D84616	2020-07-30 18:22:59 -07:00
kuterd	49def10e02	[Attributor] Add time trace support. This patch addes time trace functionality to have a better understanding of the analysis times. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84980	2020-07-31 03:08:50 +03:00
Nikita Popov	4c16eafe12	[SCCP] Remove dead switch cases based on range information Determine whether switch edges are feasible based on range information, and remove non-feasible edges lateron. This does not try to determine whether the default edge is dead, as we'd have to determine that the range is fully covered by the cases for that. Another limitation here is that we don't remove dead cases that have the same successor as a live case. I'm not handling this because I wanted to keep the edge removal based on feasible edges only, rather than inspecting ranges again there -- this does not seem like a particularly useful case to handle. Differential Revision: https://reviews.llvm.org/D84270	2020-07-30 21:21:08 +02:00
Simon Pilgrim	4a161bd8b3	LoopUnroll.cpp - pass std::vector by const reference to needToInsertPhisForLCSSA helper. NFCI. Avoid an unnecessary pass by value.	2020-07-30 18:17:04 +01:00
Yuanfang Chen	555cf42f38	[NewPM][PassInstrument] Add PrintPass callback to StandardInstrumentations Problem: Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order. Solution: Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder) This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order. This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want. The test fixes looks messy. It includes changes: - Remove PassInstrumentationAnalysis - Remove PassAdaptor - If a PassAdaptor is for a real pass, the pass is added - Pass reorder (to the correct order), related to PassAdaptor - Add missing passes (due to Debuglogging not passed down) Reviewed By: asbirlea, aeubanks Differential Revision: https://reviews.llvm.org/D84774	2020-07-30 10:07:57 -07:00
Hiroshi Yamauchi	3d6f53018f	[PGO] Include the mem ops into the function hash. To avoid hash collisions when the only difference is in mem ops.	2020-07-30 09:26:20 -07:00
Simon Pilgrim	6316b0023e	Attributor.h - remove unnecessary includes. NFCI. Fix implicit cpp include dependencies.	2020-07-30 15:26:41 +01:00
David Green	1da0c47fa2	[LoopVectorizer] Don't create unused block masks for reductions. NFC This removes some unneeded block masks when we don't have any reductions. It should not have any effect on codegen as the values created are dead anyway. Differential Revision: https://reviews.llvm.org/D81415	2020-07-30 14:28:08 +01:00
Florian Hahn	59d6e814ce	Revert "[IPConstProp] Remove and move tests to SCCP." This reverts commit `e77624a3be`. Looks like some clang tests manually invoke -ipconstprop via opt.....	2020-07-30 13:06:54 +01:00
Florian Hahn	e77624a3be	[IPConstProp] Remove and move tests to SCCP. As far as I know, ipconstprop has not been used in years and ipsccp has been used instead. This has the potential for confusion and sometimes leads people to spend time finding & reporting bugs as well as updating it to work with the latest API changes. This patch moves the tests over to SCCP. There's one functional difference I am aware of: ipconstprop propagates for each call-site individually, so for functions that are called with different constant arguments it can sometimes produce better results than ipsccp (at much higher compile-time cost).But IPSCCP can be thought to do so as well for internal functions and as mentioned earlier, the pass seems unused in practice (and there are no plans on working towards enabling it anytime). Also discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143773.html Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84447	2020-07-30 12:36:27 +01:00
Simon Pilgrim	cc529285fd	VectorUtils.h - reduce unnecessary includes. NFC. Replace TargetLibraryInfo.h include with forward declaration and fix implicit dependencies. Reduce SmallSet.h include to SmallVector.h include.	2020-07-30 12:27:49 +01:00
Max Kazantsev	3678ad88a6	[NFC] Remove unused variable	2020-07-30 13:32:15 +07:00
Juneyoung Lee	111a02decd	[JumpThreading] Fold br(freeze(undef)) This patch makes JumpThreading fold br(freeze(undef)) if the freeze instruction is only used by the branch. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84818	2020-07-30 09:38:50 +09:00
Hiroshi Yamauchi	ae7589e1f1	Revert "[PGO] Include the mem ops into the function hash." This reverts commit `120e66b341`. Due to a buildbot failure.	2020-07-29 15:04:57 -07:00
Hiroshi Yamauchi	120e66b341	[PGO] Include the mem ops into the function hash. To avoid hash collisions when the only difference is in mem ops. Differential Revision: https://reviews.llvm.org/D84782	2020-07-29 13:59:40 -07:00
Florian Hahn	f75564ad4e	Reland "[SCEVExpander] Add option to preserve LCSSA directly." This reverts the revert commit `dc28675768`. It includes a fix for Polly, which uses SCEVExpander on IR that is not in LCSSA form. Set PreserveLCSSA = false in that case, to ensure we do not introduce LCSSA phis where there were none before.	2020-07-29 20:41:53 +01:00
Matt Morehouse	e2d0b44a7c	[DFSan] Add efficient fast16labels instrumentation mode. Adds the -fast-16-labels flag, which enables efficient instrumentation for DFSan when the user needs <=16 labels. The instrumentation eliminates most branches and most calls to __dfsan_union or __dfsan_union_load. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D84371	2020-07-29 18:58:47 +00:00
Florian Hahn	dc28675768	Revert "[SCEVExpander] Add option to preserve LCSSA directly." This reverts commit `99166fd4fb`, because it breaks the polly builders. polly/test/Isl/CodeGen/invariant_load_escaping_second_scop.ll fails because a apparently unnecessary LCSSA phi node is introduced. Make the bots green again, while I take a closer look.	2020-07-29 19:19:04 +01:00
Arthur Eubanks	71d0a2b8a3	[DFSan][NewPM] Port DataFlowSanitizer to NewPM Reviewed By: ychen, morehouse Differential Revision: https://reviews.llvm.org/D84707	2020-07-29 10:19:15 -07:00
Roman Lebedev	1d51dc38d8	[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108	2020-07-29 20:05:30 +03:00
David Sherwood	9ad7c980bb	[SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Differential Revision: https://reviews.llvm.org/D83542	2020-07-29 16:29:19 +01:00
Florian Hahn	99166fd4fb	[SCEVExpander] Add option to preserve LCSSA directly. This patch teaches SCEVExpander to directly preserve LCSSA. As it is currently, SCEV does not look through PHI nodes in loops, as it might break LCSSA form. Once SCEVExpander can preserve LCSSA form, it should be safe for SCEV to look through PHIs. To preserve LCSSA form, this patch uses formLCSSAForInstructions on operands of newly created instructions, if the definition is inside a different loop than the new instruction. The final value we return from expandCodeFor may also need LCSSA phis, depending on the insert point. As no user for it exists there yet, create a temporary instruction at the insert point, which can be passed to formLCSSAForInstructions. This temporary instruction is removed after LCSSA construction. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D71538	2020-07-29 15:07:37 +01:00
David Green	60280e9818	[Analysis] TTI: Add CastContextHint for getCastInstrCost Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types. Sometimes there is a context instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization. Thus, the current system can get the cost of certain casts incorrect as the correct cost can vary greatly based on the context in which it's used. For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext(load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar situations can come up with how masked loads can be extended when being split. To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce. Original patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79162	2020-07-29 13:32:53 +01:00
Yevgeny Rouban	5d6cd61904	[LoopSimplifyCFG] Delete landing pads in dead exit blocks In addition to removing phi nodes this patch removes any landing pad that the dead exit block might have. Without this fix Verifier complains about a new switch instruction jumps to a block with a landing pad. Differential Revision: https://reviews.llvm.org/D84320	2020-07-29 18:36:51 +07:00
Johannes Doerfert	450dc09d69	[SROA][Mem2Reg] Use efficient droppable use API (after D83976) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84804	2020-07-28 17:41:01 -05:00
Sanjay Patel	f75cf240d6	[InstCombine] avoid crashing on vector constant expression (PR46872)	2020-07-28 15:02:36 -04:00
Juneyoung Lee	4c9af6d0e0	[JumpThreading] Add a basic support for freeze instruction This patch adds a basic support for freeze instruction to JumpThreading by making ComputeValueKnownInPredecessorsImpl look into its operand. Reviewed By: efriedma, nikic Differential Revision: https://reviews.llvm.org/D84598	2020-07-29 03:12:14 +09:00
Arthur Eubanks	2ca6c422d2	[FunctionAttrs] Rename functionattrs -> function-attrs To match NewPM pass name, and also for readability. Also rename rpo-functionattrs -> rpo-function-attrs while we're here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84694	2020-07-28 09:09:13 -07:00
Jinsong Ji	d28f86723f	Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `bf544fa1c3`. Fixed the typo in PPCInstrInfo.cpp.	2020-07-28 14:00:11 +00:00
Luofan Chen	5ee07dc53f	[Attributor] Track AA dependency using dependency graph This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D78861	2020-07-28 18:02:49 +08:00
Roman Lebedev	e40315d2b4	[GVN] Rewrite IsValueFullyAvailableInBlock(): no recursion, less false-negatives While this doesn't appear to help with the perf issue being exposed by D84108, the function as-is is very weird, convoluted, and what's worse, recursive. There was no need for `SpeculativelyAvaliableAndUsedForSpeculation`, tri-state choice is enough. We don't even ever check for that state. The basic idea here is that we need to perform a depth-first traversal of the predecessors of the basic block in question, either finding a preexisting state for the block in a map, or inserting a "placeholder" `SpeculativelyAvaliable`, If we encounter an `Unavaliable` block, then we need to give up search, and back-propagate the `Unavaliable` state to the each successor of said block, more specifically to the each `SpeculativelyAvaliable` we've just created. However, if we have traversed entirety of the predecessors and have not encountered an `Unavaliable` block, then it must mean the value is fully available. We could update each inserted `SpeculativelyAvaliable` into a `Avaliable`, but we don't need to, as assertion excersizes, because we can assume that if we see an `SpeculativelyAvaliable` entry, it is actually `Avaliable`, because during the time we've produced it, if we would have found that it has an `Unavaliable` predecessor, we would have updated it's successors, including this block, into `Unavaliable` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D84181	2020-07-28 10:19:28 +03:00
Wei Mi	a23f62343c	Supplement instr profile with sample profile. PGO profile is usually more precise than sample profile. However, PGO profile needs to be collected from loadtest and loadtest may not be representative enough to the production workload. Sample profile collected from production can be used as a supplement -- for functions cold in loadtest but warm/hot in production, we can scale up the related function in PGO profile if the function is warm or hot in sample profile. The implementation contains changes in compiler side and llvm-profdata side. Given an instr profile and a sample profile, for a function cold in PGO profile but warm/hot in sample profile, llvm-profdata will either mark all the counters in the profile to be -1 or scale up the max count in the function to be above hot threshold, depending on the zero counter ratio in the profile. The assumption is if there are too many counters being zero in the function profile, the profile is more likely to cause harm than good, then llvm-profdata will mark all the counters to be -1 indicating the function is hot but the profile is unaccountable. In compiler side, if a function profile with all -1 counters is seen, the function entry count will be set to be above hot threshold but its internal profile will be dropped. In the long run, it may be useful to let compiler support using PGO profile and sample profile at the same time, but that requires more careful design and more substantial changes to make two profiles work seamlessly. The patch here serves as a simple intermediate solution. Differential Revision: https://reviews.llvm.org/D81981	2020-07-27 20:17:40 -07:00
Arthur Eubanks	c37bb5e2a5	[DFSan] Remove unused DataFlowSanitizer vars Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D84704	2020-07-27 14:59:07 -07:00
Jinsong Ji	bf544fa1c3	Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `adffce7153`. This is breaking test-suite, revert while investigation.	2020-07-27 21:07:00 +00:00
Roman Lebedev	351d234d86	[OpenMPOpt] Most SCC's are uninteresting, don't waste time on them (up to 16x faster) Summary: This seems obvious in hindsight, but the result is surprising. I've measured compile-time of `-openmpopt` pass standalone on RawSpeed unity build, and while there is some OpenMP stuff, most is not OpenMP. But nonetheless the pass does a lot of costly preparations before ever trying to look for OpenMP stuff in SCC. Numbers (n=25): 0.094624s -> 0.005976s, an -93.68% improvement, or ~16x Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, hiraditya, guansong, llvm-commits, sstefan1 Tags: #llvm Differential Revision: https://reviews.llvm.org/D84689	2020-07-27 23:36:34 +03:00
Jinsong Ji	adffce7153	[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html no one is making use of QPX/A2Q/BGQ/BGP CNK anymore. This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang, CNK support in openmp/polly. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83915	2020-07-27 19:24:39 +00:00
Kazu Hirata	902cbcd59e	Use llvm::is_contained where appropriate (NFC) Summary: This patch replaces std::find with llvm::is_contained where appropriate. Reviewers: efriedma, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr Tags: #llvm Differential Revision: https://reviews.llvm.org/D84489	2020-07-27 10:20:44 -07:00
Roman Lebedev	1da9834557	[JumpThreading] ProcessBranchOnXOR(): bailout if any pred ends in indirect branch (PR46857) SplitBlockPredecessors() can not split blocks that have such terminators, and in two other places we already ensure that we don't end up calling SplitBlockPredecessors() on such blocks. Do so in one more place. Fixes https://bugs.llvm.org/show_bug.cgi?id=46857	2020-07-27 15:39:03 +03:00
Nathan James	d127112724	[llvm][NFC] Silence unused variable warning by using isa over dyn_cast	2020-07-27 13:37:21 +01:00
Juneyoung Lee	e1eacf27c6	[InstCombine] Fold freeze into phi if one operand is not undef This patch adds folding freeze into phi if it has only one operand to target. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84601	2020-07-27 17:07:27 +09:00
Fangrui Song	fae221e7ad	[gcov] Simplify/speed up CFG hash calculation	2020-07-26 21:15:33 -07:00
Roman Lebedev	96d74530c0	[Reduce] Argument reduction: do deal with function declarations We can happily turn function definitions into declarations, thus obscuring their argument from being elided by this pass. I don't believe there is a good reason to just ignore declarations. likely even proper llvm intrinsics ones, at worst the input becomes uninteresting. The other question here is that all these transforms are all-or-nothing. In some cases, should we be treating each use separately? The main blocker here seemed to be that llvm::CloneFunctionInto() does `&OldFunc->front()`, which inserts a nullptr into a densemap, which is not happy about it and asserts.	2020-07-26 01:31:56 +03:00
Nikita Popov	632a89e866	[SCCP] Restore the change reporting as well Reapply `5db5b4bc43`.	2020-07-25 15:11:30 +02:00
Nikita Popov	ad16e71c95	Reapply [SCCP] Directly remove non-feasible edges Reapply with DTU update moved after CFG update, which is a requirement of the API. ----- Non-feasible control-flow edges are currently removed by replacing the branch condition with a constant and then calling ConstantFoldTerminator. This happens in a rather roundabout manner, by inspecting the users (effectively: predecessors) of unreachable blocks, and further complicated by the need to explicitly materialize the condition for "forced" edges. I would like to extend SCCP to discard switch conditions that are non-feasible based on range information, but this is incompatible with the current approach (as there is no single constant we could use.) Instead, this patch explicitly removes non-feasible edges. It currently only needs to handle the case where there is a single feasible edge. The llvm_unreachable() branch will need to be implemented for the aforementioned switch improvement. Differential Revision: https://reviews.llvm.org/D84264	2020-07-25 14:52:35 +02:00
Simon Pilgrim	b5e14d78f1	SimplifyLibCalls - remove unnecessary header and forward declaration. NFC. We include TargetLibraryInfo.h so don't need to forward declare it, and we don't need to include TargetLibraryInfo.h in SimplifyLibCalls.cpp as well.	2020-07-25 12:58:39 +01:00
Florian Hahn	3c1476d26c	[IPSCCP] Drop argmemonly after replacing pointer argument. This patch updates IPSCCP to drop argmemonly and inaccessiblemem_or_argmemonly if it replaces a pointer argument. Fixes PR46717. Reviewers: efriedma, davide, nikic, jdoerfert Reviewed By: efriedma, jdoerfert Differential Revision: https://reviews.llvm.org/D84432	2020-07-25 11:52:14 +01:00
Rong Xu	1dd39b1133	[PGO] Fix incorrect function entry count Function entry count might be zero after the profile counts reset and before reentry to the function. Zero profile entry count is very bad as the profile count from BFI will be wrong. A simple fix is to set the profile entry count to 1 if there are non-zero profile counts in this function. Differential Revision: https://reviews.llvm.org/D84378	2020-07-24 17:39:55 -07:00
Rong Xu	31bd15c562	[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction Skip profile count promotion if any of the ExitBlocks contains a ret instruction. This is to prevent dumping of incomplete profile -- if the the loop is a long running loop and dump is called in the middle of the loop, the result profile is incomplete. ExitBlocks containing a ret instruction is an indication of a long running loop -- early exit to error handling code. Differential Revision: https://reviews.llvm.org/D84379	2020-07-24 17:38:31 -07:00
Rong Xu	5546c2ab42	Revert "[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction" This reverts commit `6fdc6f6c7d`.	2020-07-24 17:35:44 -07:00
Rong Xu	6fdc6f6c7d	[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction Skip profile count promotion if any of the ExitBlocks contains a ret instruction. This is to prevent dumping of incomplete profile -- if the the loop is a long running loop and dump is called in the middle of the loop, the result profile is incomplete. ExitBlocks containing a ret instruction is an indication of a long running loop -- early exit to error handling code. Differential Revision: https://reviews.llvm.org/D84379	2020-07-24 17:13:58 -07:00
Johannes Doerfert	aa09db495a	[SROA] Teach promote to register about droppable instructions This is the second of two patches to address PR46753. We basically allow SROA to promote allocas that are used in doppable instructions, for now that means `llvm.assume`. The (transitive) uses are replaced by `undef` in the droppable instructions. See also D83976. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D83978	2020-07-24 15:15:39 -05:00
Johannes Doerfert	ce8928f2e4	[Mem2Reg] Teach promote to register about droppable instructions This is the first of two patches to address PR46753. We basically allow mem2reg to promote allocas that are used in doppable instructions, for now that means `llvm.assume`. The uses of the alloca (or a bitcast or zero offset GEP from there) are replaced by `undef` in the droppable instructions. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D83976	2020-07-24 15:15:38 -05:00
Johannes Doerfert	ce2d69b557	[SROA][Mem2Reg] Do not crash on alloca + addrspacecast SROA knows that it can look through addrspacecast but PromoteMemoryToRegister did not handle them. This caused an assertion error for the test case, exposed while running `Transforms/PhaseOrdering/inlining-alignment-assumptions.ll` with D83978 applied. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84085	2020-07-24 15:15:38 -05:00
Gui Andrade	1e77b3af12	[MSAN] Allow inserting array checks Flattens arrays by ORing together all their elements. Differential Revision: https://reviews.llvm.org/D84446	2020-07-24 20:12:58 +00:00
Simon Pilgrim	0128b9505c	Revert rG5dd566b7c7b78bd- "PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI." This reverts commit `5dd566b7c7`. Causing some buildbot failures that I'm not seeing on MSVC builds.	2020-07-24 13:02:33 +01:00
Simon Pilgrim	5dd566b7c7	PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI. PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list. This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.	2020-07-24 12:40:50 +01:00
Fangrui Song	4637daa990	Revert D84264 "[SCCP] Directly remove non-feasible edges" & `5db5b4bc43` It breaks stage-2 build. Clang crashed when compiling llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp llvm/Support/GenericDomTree.h eraseNode: Node is not a leaf node	2020-07-23 17:51:48 -07:00
Sidharth Baveja	38a8217931	[Loop Fusion] Integrate Loop Peeling into Loop Fusion (re-land after fixing ASAN build failures) This patch adds the ability to peel off iterations of the first loop in loop fusion. This can allow for both loops to have the same trip count, making it legal for them to be fused together. Here is a simple scenario peeling can be used in loop fusion: for (i = 0; i < 10; ++i) a[i] = a[i] + 3; for (j = 1; j < 10; ++j) b[j] = b[j] + 5; Here is we can make use of peeling, and then fuse the two loops together. We can peel off the 0th iteration of the loop i, and then combine loop i and j for i = 1 to 10. a[0] = a[0] +3; for (i = 1; i < 10; ++i) { a[i] = a[i] + 3; b[i] = b[i] + 5; } Currently peeling with loop fusion is only supported for loops with constant trip counts and a single exit point. Both unguarded and guarded loops are supported. Reviewed By: bmahjour (Bardia Mahjour), MaskRay (Fangrui Song) Differential Revision: https://reviews.llvm.org/D82927	2020-07-23 21:02:04 +00:00
Nikita Popov	5db5b4bc43	[SCCP] Add missing change reporting Forgot to actually use the return value of the function.	2020-07-23 20:58:29 +02:00
Nikita Popov	9394c3ec88	[SCCP] Directly remove non-feasible edges Non-feasible control-flow edges are currently removed by replacing the branch condition with a constant and then calling ConstantFoldTerminator. This happens in a rather roundabout manner, by inspecting the users (effectively: predecessors) of unreachable blocks, and further complicated by the need to explicitly materialize the condition for "forced" edges. I would like to extend SCCP to discard switch conditions that are non-feasible based on range information, but this is incompatible with the current approach (as there is no single constant we could use.) Instead, this patch explicitly removes non-feasible edges. It currently only needs to handle the case where there is a single feasible edge. The llvm_unreachable() branch will need to be implemented for the aforementioned switch improvement. Differential Revision: https://reviews.llvm.org/D84264	2020-07-23 20:32:57 +02:00
Nikita Popov	def48b0e88	[PredicateInfo][SCCP] Remove assertion (PR46814) As long as RenamedOp is not guaranteed to be accurate, we cannot assert here and should just return false. This was already done for the other conditions in this function. Fixes https://bugs.llvm.org/show_bug.cgi?id=46814.	2020-07-23 19:36:51 +02:00
Gui Andrade	3285b24249	[MSAN] Allow emitting checks for struct types Differential Revision: https://reviews.llvm.org/D82680	2020-07-23 16:50:59 +00:00
Gui Andrade	0025d52c0f	[MSAN] Never allow checking calls to __sanitizer_unaligned_{load,store} These functions expect the caller to always pass shadows over TLS. Differential Revision: https://reviews.llvm.org/D84351	2020-07-23 16:42:59 +00:00
Simon Pilgrim	86fd5be6fd	AggressiveInstCombine.h - remove unused includes. NFC.	2020-07-23 16:20:13 +01:00
Braedy Kuzma	24e41a34fe	[Matrix] Add asserts for mismatched element types. This patch clarifies the failing point of having input or output vectors of differing types. Before, lowering would fail elsewhere (e.g. in `fmul` creation) which may have been not immediately clear. As a side effect, the `getElementType` and `getVectoryTy` functions required the `const` qualifier to be added. Reviewers: fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D84374	2020-07-23 16:02:48 +01:00
Florian Hahn	ecd3f853a8	[SCEVExpander] Use IRBuilderCallbackInserter to call rememberInstruction. Currently there are plenty of instructions that SCEVExpander creates but does not track as created. IRBuilder allows specifying a callback whenever an instruction is inserted. Use this to call rememberInstruction automatically for each created instruction. There are still a few rememberInstruction calls remaining, because in some cases Inst::Create functions are used to construct instructions. Suggested by @lebedev.ri in D75980. Reviewers: mkazantsev, reames, sanjoy.google, lebedev.ri Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D84326	2020-07-23 14:25:28 +01:00
Shinji Okumura	697c6d8907	[Attributor] Cache query results for isPotentiallyReachable in AAReachability Summary: This is the next patch of [[ https://reviews.llvm.org/D76210 \| D76210 ]]. This patch made a map in `InformationCache` for caching results. Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis Reviewed By: jdoerfert Subscribers: hiraditya, uenoku, kuter, bbn, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83246	2020-07-23 20:49:28 +09:00
Simon Pilgrim	5b20c14525	ValueProfileCollector.h - remove unnecessary includes. NFC.	2020-07-23 12:33:13 +01:00
Hiroshi Yamauchi	557db6f8aa	Reland D84057 [PGO][PGSO] Remove a temporary flag used for gradual rollout. The revert was a misfire. Remove the temporary flag PGSOIRPassOrTestOnly and the guard code which was used for the staged rollout. This is a cleanup (NFC) as it's now false by default. Differential Revision: https://reviews.llvm.org/D84057	2020-07-22 20:57:25 -07:00
Fangrui Song	27650ec554	Revert D81682 "[PGO] Extend the value profile buckets for mem op sizes." This reverts commit `4a539faf74`. There is a __llvm_profile_instrument_range related crash in PGO-instrumented clang: ``` (gdb) bt llvm::ConstantRange const&, llvm::APInt const&, unsigned int, bool) () llvm::ScalarEvolution::getRangeForAffineAR(llvm::SCEV const, llvm::SCEV const, llvm::SCEV const*, unsigned int) () ``` (The body of __llvm_profile_instrument_range is inlined, so we can only find__llvm_profile_instrument_target in the trace) ``` 23│ 0x000055555dba0961 <+65>: nopw %cs:0x0(%rax,%rax,1) 24│ 0x000055555dba096b <+75>: nopl 0x0(%rax,%rax,1) 25│ 0x000055555dba0970 <+80>: mov %rsi,%rbx 26│ 0x000055555dba0973 <+83>: mov 0x8(%rsi),%rsi # %rsi=-1 -> SIGSEGV 27│ 0x000055555dba0977 <+87>: cmp %r15,(%rbx) 28│ 0x000055555dba097a <+90>: je 0x55555dba0a76 <__llvm_profile_instrument_target+342> ```	2020-07-22 16:08:25 -07:00
Rong Xu	50da55a585	[PGO] Supporting code for always instrumenting entry block This patch includes the supporting code that enables always instrumenting the function entry block by default. This patch will NOT the default behavior. It adds a variant bit in the profile version, adds new directives in text profile format, and changes llvm-profdata tool accordingly. This patch is a split of D83024 (https://reviews.llvm.org/D83024) Many test changes from D83024 are also included. Differential Revision: https://reviews.llvm.org/D84261	2020-07-22 15:01:53 -07:00
Fangrui Song	dbdda8232a	Revert D84057 "[PGO][PGSO] Remove a temporary flag used for gradual rollout." This reverts commit `e64afefdf8`. It caused a PGO bootstrapped clang to crash on many source files. `__llvm_profile_instrument_range` seems to trigger a null pointer dereference. Call stack: __llvm_profile_instrument_range llvm::APInt::udiv(llvm::APInt const&) const getRangeForAffineARHelper	2020-07-22 14:28:28 -07:00
Fangrui Song	5724c8ba29	Temporarily revert D83903 "[PGO] Enable the extended value profile buckets for mem op sizes." `__llvm_profile_instrument_memop` transitively calls calloc, thus calloc should not be instrumented. I saw a `calloc -> __llvm_profile_instrument_memop -> calloc -> __llvm_profile_instrument_memop -> ...` infinite loop leading to stack overflow when the malloc implementation (e.g. tcmalloc) is built and instrumented along with the application. We should figure out the library calls which may be instrumented and disable their instrumentation before rolling out this change. Reviewed By: yamauchi Differential Revision: https://reviews.llvm.org/D84358	2020-07-22 13:12:19 -07:00
Gui Andrade	33d239513c	[MSAN] Instrument libatomic load/store calls These calls are neither intercepted by compiler-rt nor is libatomic.a naturally instrumented. This patch uses the existing libcall mechanism to detect a call to atomic_load or atomic_store, and instruments them much like the preexisting instrumentation for atomics. Calls to _load are modified to have at least Acquire ordering, and calls to _store at least Release ordering. Because this needs to be converted at runtime, msan injects a LUT (implemented as a vector with extractelement). Differential Revision: https://reviews.llvm.org/D83337	2020-07-22 16:45:06 +00:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Sebastian Neubauer	2c659082bd	[AMDGPU] Don't combine memory intrs to v3i16 v3i16 and v3f16 currently cannot be legalized and lowered so they should not be emitted by inst combining. Moved the check down to still allow extracting 1 or 2 elements via the dmask. Fixes image intrinsics being combined to return v3x16. Differential Revision: https://reviews.llvm.org/D84223	2020-07-22 12:44:01 +02:00
Sjoerd Meijer	5567c62afa	[Matrix] Add LowerMatrixIntrinsics to the NPM Pass LowerMatrixIntrinsics wasn't running yet running under the new pass manager, and this adds LowerMatrixIntrinsics to the pipeline (to the same place as where it is running in the old PM). Differential Revision: https://reviews.llvm.org/D84180	2020-07-22 09:47:53 +01:00
Max Kazantsev	360ab70712	[SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls We do not thread blocks with convergent calls, but this check was missing when we decide to insert PR Phis into it (which we only do for threading). Differential Revision: https://reviews.llvm.org/D83936 Reviewed By: nikic	2020-07-22 13:53:50 +07:00
Fangrui Song	8a268bec1b	Revert D82927 "[Loop Fusion] Integrate Loop Peeling into Loop Fusion" This reverts commit `bb8850d34d`. It broke 3 check-llvm-transforms-loopfusion tests in an ASAN build. LoopFuse.cpp `for (BasicBlock *Pred : predecessors(BB)) {` may operate on a deleted BB.	2020-07-21 12:24:50 -07:00
Hiroshi Yamauchi	7bedae7dee	[PGO][PGSO] Add profile guided size optimization to loop vectorization legality.	2020-07-21 11:16:36 -07:00
Jordan Rupprecht	1ee1da1ea5	[NFC] Fix unused var warning	2020-07-21 09:26:01 -07:00
Sidharth Baveja	bb8850d34d	[Loop Fusion] Integrate Loop Peeling into Loop Fusion Summary: This patch adds the ability to peel off iterations of the first loop in loop fusion. This can allow for both loops to have the same trip count, making it legal for them to be fused together. Here is a simple scenario peeling can be used in loop fusion: for (i = 0; i < 10; ++i) a[i] = a[i] + 3; for (j = 1; j < 10; ++j) b[j] = b[j] + 5; Here is we can make use of peeling, and then fuse the two loops together. We can peel off the 0th iteration of the loop i, and then combine loop i and j for i = 1 to 10. a[0] = a[0] +3; for (i = 1; i < 10; ++i) { a[i] = a[i] + 3; b[i] = b[i] + 5; } Currently peeling with loop fusion is only supported for loops with constant trip counts and a single exit point. Both unguarded and guarded loops are supported. Author: sidbav (Sidharth Baveja) Reviewers: kbarton, Meinersbur, bkramer, Whitney, skatkov, ashlykov, fhahn, bmahjour Reviewed By: bmahjour Subscribers: bmahjour, mgorny, hiraditya, zzheng Tags: LLVM Differential Revision: https://reviews.llvm.org/D82927	2020-07-21 15:59:14 +00:00
Jon Roelofs	dc09c65f63	LoopIdiomRecognize: use ExpandedValuesCleaner in another place This is a necessary cleanup after having expanded a SCEV. See: https://reviews.llvm.org/D84071#inline-774728 Differential Revision: https://reviews.llvm.org/D84174	2020-07-21 09:32:23 -06:00
Jon Roelofs	4d75cc4b0a	More conservatively report status from LoopIdiomRecognize Being "precise" here is getting us into trouble with one of the EXPENSIVE_CHECKS buildbots, see [1]. Rather than reporting IR additions that later get rolled back as "no change", instead we now conservatively report that there was. 1: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143509.html Differential Revision: https://reviews.llvm.org/D84071	2020-07-21 09:32:22 -06:00
Florian Hahn	752fea7c27	[SCCP] Add range metadata to call sites with known return ranges. If we inferred a range for the function return value, we can add !range at all call-sites of the function, if the range does not include undef. Reviewers: efriedma, davide, nikic Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D83952	2020-07-21 10:06:54 +01:00
Sanjay Patel	750f4c591d	[InstCombine] allow peeking through zext of shift amount to match rotate idioms (PR45701) We might want to also allow trunc of the shift amount, but that seems less likely? define i32 @src(i32 %x, i1 %y) { %0: %rem = and i1 %y, 1 %cmp = icmp eq i1 %rem, 0 %sh_prom = zext i1 %rem to i32 %sub = sub nsw nuw i1 0, %rem %sh_prom1 = zext i1 %sub to i32 %shr = lshr i32 %x, %sh_prom1 %shl = shl i32 %x, %sh_prom %or = or i32 %shl, %shr %r = select i1 %cmp, i32 %x, i32 %or ret i32 %r } => define i32 @tgt(i32 %x, i1 %y) { %0: %t = zext i1 %y to i32 %r = fshl i32 %x, i32 %x, i32 %t ret i32 %r } Transformation seems to be correct! https://alive2.llvm.org/ce/z/xgMvE3 http://bugs.llvm.org/PR45701	2020-07-20 16:18:11 -04:00
Florian Hahn	f13a59bcff	[Matrix] Use TileInfo to create tiled loop nest for matrix multiply. This patch uses the TileInfo introduced in D77550 to generate a loop nest for tiled matrix multiplication, instead of generating the unrolled code for the whole multiplication. This makes code-generation more scalable for larger matrixes. Initially loops are only used if both the number of rows and columns are divisible by the tile size. Other cases will be added as follow-up. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D81308	2020-07-20 21:11:53 +01:00
Hiroshi Yamauchi	9f5d8e8a72	[PGO] Enable the extended value profile buckets for mem op sizes. Following up D81682 and enable the new, extended value profile buckets for mem op sizes. Differential Revision: https://reviews.llvm.org/D83903	2020-07-20 12:05:09 -07:00
Hiroshi Yamauchi	e64afefdf8	[PGO][PGSO] Remove a temporary flag used for gradual rollout. Remove the temporary flag PGSOIRPassOrTestOnly and the guard code which was used for the staged rollout. This is a cleanup (NFC) as it's now false by default. Differential Revision: https://reviews.llvm.org/D84057	2020-07-20 11:12:11 -07:00
Florian Hahn	e1270b16c9	[Matrix] Add TileInfo abstraction for tiled matrix code-gen. This patch adds a TileInfo abstraction and utilities to create a 3-level loop nest for tiling. Reviewers: anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D77550	2020-07-20 18:49:08 +01:00
Matt Arsenault	5e999cbe8d	IR: Define byref parameter attribute This allows tracking the in-memory type of a pointer argument to a function for ABI purposes. This is essentially a stripped down version of byval to remove some of the stack-copy implications in its definition. This includes the base IR changes, and some tests for places where it should be treated similarly to byval. Codegen support will be in a future patch. My original attempt at solving some of these problems was to repurpose byval with a different address space from the stack. However, it is technically permitted for the callee to introduce a write to the argument, although nothing does this in reality. There is also talk of removing and replacing the byval attribute, so a new attribute would need to take its place anyway. This is intended avoid some optimization issues with the current handling of aggregate arguments, as well as fixes inflexibilty in how frontends can specify the kernel ABI. The most honest representation of the amdgpu_kernel convention is to expose all kernel arguments as loads from constant memory. Today, these are raw, SSA Argument values and codegen is responsible for turning these into loads. Background: There currently isn't a satisfactory way to represent how arguments for the amdgpu_kernel calling convention are passed. In reality, arguments are passed in a single, flat, constant memory buffer implicitly passed to the function. It is also illegal to call this function in the IR, and this is only ever invoked by a driver of some kind. It does not make sense to have a stack passed parameter in this context as is implied by byval. It is never valid to write to the kernel arguments, as this would corrupt the inputs seen by other dispatches of the kernel. These argumets are also not in the same address space as the stack, so a copy is needed to an alloca. From a source C-like language, the kernel parameters are invisible. Semantically, a copy is always required from the constant argument memory to a mutable variable. The current clang calling convention lowering emits raw values, including aggregates into the function argument list, since using byval would not make sense. This has some unfortunate consequences for the optimizer. In the aggregate case, we end up with an aggregate store to alloca, which both SROA and instcombine turn into a store of each aggregate field. The optimizer never pieces this back together to see that this is really just a copy from constant memory, so we end up stuck with expensive stack usage. This also means the backend dictates the alignment of arguments, and arbitrarily picks the LLVM IR ABI type alignment. By allowing an explicit alignment, frontends can make better decisions. For example, there's real no advantage to an aligment higher than 4, so a frontend could choose to compact the argument layout. Similarly, there is a high penalty to using an alignment lower than 4, so a frontend could opt into more padding for small arguments. Another design consideration is when it is appropriate to expose the fact that these arguments are all really passed in adjacent memory. Currently we have a late IR optimization pass in codegen to rewrite the kernel argument values into explicit loads to enable vectorization. In most programs, unrelated argument loads can be merged together. However, exposing this property directly from the frontend has some disadvantages. We still need a way to track the original argument sizes and alignments to report to the driver. I find using some side-channel, metadata mechanism to track this unappealing. If the kernel arguments were exposed as a single buffer to begin with, alias analysis would be unaware that the padding bits betewen arguments are meaningless. Another family of problems is there are still some gaps in replacing all of the available parameter attributes with metadata equivalents once lowered to loads. The immediate plan is to start using this new attribute to handle all aggregate argumets for kernels. Long term, it makes sense to migrate all kernel arguments, including scalars, to be passed indirectly in the same manner. Additional context is in D79744.	2020-07-20 10:23:09 -04:00
Benjamin Kramer	e88b6ed748	[LLE] std::inserter doesn't work with SmallSet, so don't use it.	2020-07-20 15:47:42 +02:00
Benjamin Kramer	44ab60f74d	[LoopSimplify] Use SmallPtrSet and range for loops more. NFCI.	2020-07-20 15:00:59 +02:00

... 3 4 5 6 7 ...

25091 Commits